nlp自然语言处理_不要被NLP Research淹没

nlp自然语言处理

自然语言处理 (Natural Language Processing)

到底是怎么回事? (What is going on?)

NLP is the new Computer Vision

NLP是新的计算机视觉

With enormous amount go textual datasets available; giants like Google, Microsoft, Facebook etc have diverted their focus towards NLP.

大量可用的文本数据集; 谷歌,微软,Facebook等巨头已经将注意力转向了自然语言处理。

Models using thousands of super-costly TPUs/GPUs, making them infeasible for most.

使用成千上万的价格昂贵的TPU / GPU进行建模,这对于大多数人来说是不可行的。

This gave me anxiety! (we’ll come back to that)

这让我感到焦虑! (我们会回到那个)

Let’s these Tweets put things into perspective:

让我们通过以下Tweet透视事物:

Tweet 1:

鸣叫1:

Image for post

Tweet 2: (read the trailing tweet)

鸣叫2 :(请 阅读尾随的鸣叫)

Image for post

结果呢? (Consequences?)

In about last one-year following knowledge became mainstream:

在大约一年的时间里,以下知识已成为主流:

  • Transformers was followed by Reformer, Longformer, GTrXL, Linformer, and others.

    紧随其后的是Transformers,Reformer,Longformer,GTrXL,Linformer等。
  • BERT was followed by XLNet, RoBERTa, AlBERT, Electra, BART, T5, Big Bird, and others.

    BERT之后是XLNet,RoBERTa,AlBERT,Electra,BART,T5,Big Bird等。
  • Model Compression was extended by DistilBERT, TinyBERT, BERT-of-Theseus, Huffman Coding, Movement Pruning, PrunBERT, MobileBERT, and others.

    模型压缩由DistilBERT,TinyBERT,Theseus BERT,Huffman编码,Motion Pruning,PrunBERT,MobileBERT等扩展。
  • Even new tokenizations were introduced: Byte-Pair encoding (BPE), Word-Piece Encoding (WPE), Sentence-Piece Encoding (SPE), and others.

    甚至引入了新的标记化:字节对编码(BPE),字片编码(WPE),句子片编码(SPE)等。

This is barely the tip of the iceberg.

这仅仅是冰山一角。

So while you were trying to understand and implement a model, a bunch of new lighter and faster models were already available.

因此,当您尝试理解和实现模型时,已经有很多新的更轻,更快的模型。

如何应付呢? (How to Cope with it?)

The answer is short:

答案很简短:

you don’t need to know it all, know only what is necessary and use what is available

您不需要一无所知,只知道什么是必要的,并使用可用的

原因 (Reason)

I read them all to realize most of the research is re-iteration of similar concepts.

我阅读了所有内容,以了解大部分研究是对类似概念的重复

At the end of the day (vaguely speaking):

在一天结束时(含糊地说):

  • the reformer is hashed version of the transformers and longfomer is a convolution-based counterpart of the transformers

    重整器是变压器的哈希版本,而longfomer是变压器的基于卷积的对应形式
  • all compression techniques are trying to consolidate information

    所有压缩技术都在尝试整合信息
  • everything from BERT to GPT3 is just a language model

    从BERT到GPT3的一切都只是一种语言模型

优先级->准确性管道 (Priorities -> Pipeline over Accuracy)

Learn to use what’s available, efficiently, before jumping on to what else can be used

在跳到其他可用功能之前,学会有效地使用可用的功能

In practice, these models are a small part of a much bigger pipeline.

实际上,这些模型只是规模更大的产品线的小部分

Your first instinct should not be of competeing with Tech Giants’ in-terms of training a better model.

您的第一个本能不应该是与Tech Giants在训练更好模型方面的竞争。

Instead, Your first instinct should be to use the availbale models to build an end-to-end application which solves a practical problem.

相反,您的第一个本能应该是使用availbale模型来构建解决实际问题的端到端应用程序。

Now if you feel that the model is the performance bottleneck of your application; re-train that model or switch to another model.

现在,如果您认为模型是应用程序的性能瓶颈,那就可以了。 重新训练该模型或切换到另一个模型。

Consider the following:

考虑以下:

  • Huge deep learning models usually take thousands for GPU hours just to train.

    庞大的深度学习模型通常需要数千个小时才能进行GPU训练。
  • This increases 10x when you consider hyper-parameter tuning (HP Tuning).

    当您考虑进行超参数调整(HP调整)时,这将增加10倍。
  • HP Tuning something as efficient as an Electra model can also take a week or two.

    HP调整与Electra型号一样高效的东西也可能需要一两个星期。

实际方案->实际加速 (Practical Scenario -> The Real Speedup)

Take an example of Q&A Systems. Given millions of documents, for this task, something like ElasticSearch is way more essential to the pipeline than a new Q&A model (comparatively).

以问答系统为例。 给定数百万个文档,对于此任务,相对于新的问答模型,ElasticSearch之类的东西对于管道更重要。

In production success of your pipeline will not (only) be determined by how awesome are your Deep Learning models but also by:

在生产中,成功的流水线(不仅)取决于深度学习模型的出色程度,还取决于:

  • the latency of the inference time

    推理时间的延迟

  • predictability of the results and boundary cases

    结果和边界案例的可预测性

  • the ease of fine-tuning

    易于调整
  • the ease of reproducing the model on a similar dataset

    在相似的数据集上再现模型的难易程度

Something like DistilBERT can be scaled to handle Billion queries as beautifully mentioned in this blog by Robolox.

正如Robolox在本博客中提到的那样,可以扩展DistilBERT之类的功能来处理十亿个查询。

While new models can decrease the inference time by 2x-5x.

新模型可以将推理时间减少2x-5x

Techniques like quantization, pruning and using Onnx can decrease the inference time by 10x-40x!

量化 ,修剪和使用Onnx等技术可以将推理时间减少10x-40x

个人经验 (Personal Experience)

I was working on an Event Extraction pipeline, which used:

我正在研究事件提取管道,该管道使用:

  • 4 different transformer-based models

    4种基于变压器的不同模型
  • 1 RNN-based model

    1基于RNN的模型

But. At the heart of the entire pipeline were:

但。 整个流程的核心是:

  • WordNet

    词网
  • FrameNet

    框架网
  • Word2Vec

    Word2Vec
  • Regular-Expressions

    常用表达

And. Most of my team’s focus was on:

和。 我团队的大部分精力都放在:

  • Extraction of text from PPTs, images & tables

    从PPT,图像和表格中提取文本

  • Cleaning & preprocessing text

    清洗和预处理文本

  • Visualization of results

    结果可视化

  • Optimization of ElasticSearch

    ElasticSearch的优化
  • Format of info for Neo4J

    Neo4J的信息格式

结论 (Conclusion)

It is more essential to have an average performing pipeline than to have a non-functional pipeline with a few brilliant modules.

具有平均性能的管道比具有一些出色模块的非功能性管道更为重要。

Neither Christopher Manning nor Andrew NG knows it all. They just know what is required and when it is required; well enough.

Christopher Manning和Andrew Andrew都不知道这一切。 他们只知道需要什么,什么时候需要。 足够好。

So, have realistic expectations of yourself.

因此,对自己有现实的期望。

Thank you!

谢谢!

翻译自: https://medium.com/towards-artificial-intelligence/dont-be-overwhelmed-by-nlp-c174a8b673cb

nlp自然语言处理

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/391059.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

opencv 随笔

装环境好累,python3.6,opencv3.4 好不容易装好了,结果 addweight的时候总是报错 The operation is neither array op array (where arrays have the same size and the same number of channels), nor array op scalar, nor scalar op array …

js打开飞行模式_什么是飞行模式? 它有什么作用?什么时候应该打开它?

js打开飞行模式If youve flown on an airplane in the last decade and you have a smart phone, youve likely had to put that phone in airplane mode before the plane takes off.如果您在过去的十年中乘坐过飞机,并且拥有一部智能手机,那么您可能必…

在Java 里面怎么比较字符串

问题:在Java 里面怎么比较字符串 到目前为止,我使用 操作符去比较字符串在我的程序里面。然而,却产生了一个bug,将这个改为了.equals()以后,就把bug修复了 是不是太辣鸡了?它什么时候应该被使用或者说是不…

中小型研发团队架构实践三要点(转自原携程架构师张辉清)

如果你正好处在中小型研发团队…… 中小型研发团队很多,而社区在中小型研发团队架构实践方面的探讨却很少。中小型研发团队特别是 50 至 200 人的研发团队,在早期的业务探索阶段,更多关注业务逻辑,快速迭代以验证商业模式&#xf…

时间序列预测 预测时间段_应用时间序列预测:美国住宅

时间序列预测 预测时间段1.简介 (1. Introduction) During these COVID19 months housing sector is rebounding rapidly after a downtime since the early months of the year. New residential house construction was down to about 1 million in April. As of July 1.5 mi…

zabbix之web监控

Web monitoring(web监控)是用来监控Web程序的,可以监控到Web程序的下载速度,返回码以及响应时间,还支持把一组连续的Web动作作为一个整体进行监控。 1.Web监控的原理 Web监控即对HTTP服务的监控,模拟用户去访问网站,对…

如何使用Webpack在HTML,CSS和JavaScript之间共享变量

Earlier this week, I read an article explaining how CSS-in-JS slows down the rendering of some React apps and how static CSS is faster. But CSS-in-JS is very popular because, among other features, you can style dynamically using JavaScript variables.本周初…

Java中获得了方法名称的字符串,怎么样调用该方法

问题: Java中获得了方法名称的字符串,怎么样调用该方法 如果我有以下两个变量 Object obj; String methodName "getName";在不知道obj的类的情况下,我怎么样才能调用该类的名叫methodName的方法呢? 这个方法被调用时…

经验主义 保守主义_为什么我们需要行动主义-始终如此。

经验主义 保守主义It’s been almost three months since George Floyd was murdered and the mass protests. Three months since the nationwide protests, looting and riots across America.距离乔治弗洛伊德(George Floyd)被谋杀和大规模抗议活动已经快三个月了。 全国抗议…

Begin

Hello everyone, Finally,a technician from feiyang help me solve the question. Even though it is not the linux version i want.emmm...linux mint a new one i dont know about it And, lets make the life regular and delicate转载于:https://www.cnblogs.com/lxc-run…

redis介绍以及安装

一、redis介绍 redis是一个key-value存储系统。和Memcached类似,它支持存储的values类型相对更多,包括字符串、列表、哈希散列表、集合,有序集合。 这些数据类型都支持push/pop、add/remove及取交集并集和差集及更丰富的操作,而且…

java python算法_用Java,Python和C ++示例解释的搜索算法

java python算法什么是搜索算法? (What is a Search Algorithm?) This kind of algorithm looks at the problem of re-arranging an array of items in ascending order. The two most classical examples of that is the binary search and the merge sort algor…

Java中怎么把文本追加到已经存在的文件

Java中怎么把文本追加到已经存在的文件 我需要重复把文本追加到现有文件中。我应该怎么办? 回答一 你是想实现日志的目的吗?如果是的话,这里有几个库可供选择,最热门的两个就是Log4j 和 Logback了 Java 7 对于一次性的任务&a…

python机器学习预测_使用Python和机器学习预测未来的股市趋势

python机器学习预测Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works withou…

线程系列3--Java线程同步通信技术

上一篇文章我们讲解了线程间的互斥技术,使用关键字synchronize来实现线程间的互斥技术。根据不同的业务情况,我们可以选择某一种互斥的方法来实现线程间的互斥调用。例如:自定义对象实现互斥(synchronize("自定义对象")…

Python数据结构之四——set(集合)

Python版本:3.6.2 操作系统:Windows 作者:SmallWZQ 经过几天的回顾和学习,我终于把Python 3.x中的基础知识介绍好啦。下面将要继续什么呢?让我想想先~~~嗯,还是先整理一下近期有关Python基础知识的随笔吧…

volatile关键字有什么用

问题:volatile关键字有什么用 在工作的时候,我碰到了volatile关键字。但是我不是非常了解它。我发现了这个解释 这篇文章已经解释了问题中的关键字的细节了,你们曾经用过它吗或者见过正确使用这个关键字的样例 回答 Java中同步的实现大多是…

knn 机器学习_机器学习:通过预测意大利葡萄酒的品种来观察KNN的工作方式

knn 机器学习Introduction介绍 For this article, I’d like to introduce you to KNN with a practical example.对于本文,我想通过一个实际的例子向您介绍KNN。 I will consider one of my project that you can find in my GitHub profile. For this project, …

MMU内存管理单元(看书笔记)

http://note.youdao.com/noteshare?id8e12abd45bba955f73874450e5d62b5b&subD09C7B51049D4F88959668B60B1263B5 笔记放在了有道云上面了,不想再写一遍了。 韦东山《嵌入式linux完全开发手册》看书笔记转载于:https://www.cnblogs.com/coversky/p/7709381.html

Java中如何读取文件夹下的所有文件

问题:Java中如何读取文件夹下的所有文件 Java里面是如何读取一个文件夹下的所有文件的? 回答一 public void listFilesForFolder(final File folder) {for (final File fileEntry : folder.listFiles()) {if (fileEntry.isDirectory()) {listFilesFor…