nlp gpt论文_GPT-3:NLP镇的最新动态

nlp gpt论文

什么是GPT-3? (What is GPT-3?)

The launch of Open AI’s 3rd generation of the pre-trained language model, GPT-3 (Generative Pre-training Transformer) has got the data science fraternity buzzing with excitement!

Open AI的第三代预训练语言模型GPT-3(生成式预训练变压器)的发布使数据科学界的关注度高涨!

The world of Language Models (LM) is quite fascinating. To give a brief introduction, these models learn the probabilities of a sequence of words that occur in a commonly spoken language (say, English) and predict the next possible word in that sequence. They are essential for numerous NLP tasks like:

语言模型(LM)的世界非常迷人。 为了简要介绍,这些模型学习了以常用口语(例如英语)出现的单词序列的概率,并预测了该序列中的下一个可能单词。 它们对于许多NLP任务至关重要,例如:

  • Language Translation

    语言翻译
  • Text Classification

    文字分类
  • Sentiment Extraction

    情感提取
  • Reading Comprehension

    阅读理解
  • Named Entity Recognition

    命名实体识别
  • Question Answer Systems

    问答系统
  • News Article Generation, etc

    新闻文章生成等

They’ve become immensely popular since the release of BERT by Google, with a host of companies competing to build the next big thing in the NLP domain!

自Google发行BERT以来,它们已经变得非常受欢迎,许多公司竞相在NLP领域打造下一个重要产品!

Open AI’s GPT-3 is the largest Language Model having 175 BN parameters, 10x more than that of Microsoft’s Turing NLG

Open AI的GPT-3是最大的语言模型,具有175个BN参数,是Microsoft Turing NLG的10倍以上

Open AI has been in the race for a long time now. The capabilities, features and limitations of their latest edition, GPT-3, has been described in a detailed research paper. Its predecessor GPT-2 (released in Feb 2019) was trained on 40GB of text data and had 1.5 BN parameters. In comparison, GPT-3 has a whopping 175 BN parameters, 10 times more than the next largest LM, the Turing NLG, developed by Microsoft with 17 BN parameters!

开放式AI竞赛已经有很长时间了。 最新研究版本GPT-3的功能,特性和局限性已在一份详细的研究论文中进行了描述。 它的前身GPT-2 (于2019年2月发布)接受了40GB文本数据的训练,参数为1.5BN。 相比之下,GPT-3的参数高达175个BN,是第二大LM图灵NLG的十倍,图灵NLG是由微软开发的具有17个BN参数的!

Image for post
Fig-1: Comparison of all available language models (LMs) parameter wise图1:所有可用语言模型(LM)参数明智的比较

GPT-3 is based on the concepts of transformer and attention similar to GPT-2. It has been trained on a large and variety of data like Common Crawl, web texts, books and Wikipedia, based on the tokens from each data. Prior to training the model, the average quality of the datasets has been improved in 3 steps.

GPT-3基于变压器和注意力的概念 类似于GPT-2。 根据每个数据的标记,已经针对大量数据(例如Common Crawl ,Web文本,书籍和Wikipedia)进行了培训。 在训练模型之前,数据集的平均质量已通过3个步骤得到了改善。

The following table shows the training corpus of GPT-3:

下表显示了GPT-3的训练语料库:

Image for post

GPT-3 has variants in terms of

GPT-3在以下方面具有变体

  • Sizes (Parameters and Layers)

    大小(参数和层)
  • Architectures

    建筑学
  • Learning hyper-parameters (batch size in tokens and learning rate) ranging from 125 MN to 175 BN parameters

    学习超参数(令牌的批量大小和学习率)范围从125 MN到175 BN参数

“The largest version of GPT-3 has 175 BN Parameters, 96 Attention Layers and 3.2 MN Batch Size”

GPT-3的最大版本具有175 BN参数,96个注意层和3.2 MN批处理大小”

Here are the details of the different variants of GPT-3 model:

以下是GPT-3模型的不同变体的详细信息:

Image for post
Fig-2: Details of variants of the GPT-3 model图2:GPT-3模型的变体详细信息

它能做什么? (What can it do?)

Many of the NLP tasks discussed in this blog can be performed by GPT-3 without any gradient, parameter updates or fine-tuning. This makes it a Task-Agnostic Model as it can perform tasks without any or very few prompts or examples or demonstrations called shots.

GPT-3可以执行此博客中讨论的许多NLP任务,而无需进行任何渐变,参数更新或微调。 这使其成为与任务无关的模型,因为它可以执行任务而无需任何或很少的提示,示例或称为镜头的演示。

The following image displays a Zero / One / Few-Shot based task accuracy comparison for various sizes of the model (in terms of parameters) for a simple task to remove random symbols from a word with the number of in-context examples ranging between 10 to 100.

下图显示了针对零模型,零模型,零模型的任务准确性比较,该模型针对各种大小的模型(就参数而言),以完成一项简单任务,以从单词中删除随机符号,上下文中示例的数量在10个之间到100。

Image for post
Fig-3: Zero / One / Few-Shot based task accuracy comparison for models of different sizes图3:基于零/一/少量射击的任务精度比较,用于不同大小的模型

“假新闻”难题 (The “Fake News” Conundrum)

Earlier, the release of the largest model of GPT-2 was briefly stalled due to a controversial debate of it being capable of generating fake news. It was later published on Colab notebooks. In recent times, however, this has been quite common and the real news themselves have been hard to believe!

早些时候,由于有争议的关于GPT-2能够产生假新闻的争议,GPT-2的最大型号的发布暂时停止了。 后来发表在Colab笔记本上 。 但是,最近这种情况已经很普遍了,真正的新闻本身很难让人相信!

The fake news generated by GPT-3 has been so difficult to distinguish from the real ones, and in one of the experiments, the results show that only 50% of the fake news could actually be detected!

由GPT-3生成的虚假新闻很难与真实新闻区分开,在其中一项实验中,结果表明实际上只能检测到50%的虚假新闻!

Image for post
Fig-4: Accuracy comparison of manual fake news detection for models of different sizes图4:不同大小型号的人工假新闻检测的准确性比较

In a task to predict the last word of a sentence, GPT-3 outperformed the current SOTA (state of the art) algorithm by 8% with an accuracy score of 76% in a zero-shot setting. In the few-shots setting, it has achieved an accuracy score of 86.4%!

在预测句子的最后一个单词的任务中,GPT-3在零击设置中的性能得分为76%,优于当前的SOTA(最新技术)算法。 在几次拍摄设置中,它的准确率达到86.4%!

In a closed book question answering tasks, GPT-3 outperformed a fine-tuned SOTA that uses an Information Retrieval component in both one and few-shot settings.

在一本闭卷的问答任务中,GPT-3的性能优于经过精心调整的SOTA,该SOTA在一次和多次拍摄设置中都使用了信息检索组件。

Image for post
Fig-5: Performance of GPT-3 on Trivia QA for models of different sizes图5:不同尺寸型号的Trivia QA上GPT-3的性能

The GPT-3 API has been on the waiting list, but all the folks who could get a chance to try it shared their interesting findings and amazing results of this powerful model. Here are a few things that were observed while experimenting on the API’s interface called the Playground.

GPT-3 API一直在等待中,但是所有有机会尝试使用它的人都分享了他们有趣的发现以及该强大模型的惊人结果。 这是在API的称为Playground的接口上进行实验时观察到的一些事情。

Open AI GPT-3 API游乐场摘要: (Summary of the Open AI GPT-3 API Playground:)

Settings and Presets:Upon clicking on the settings icon, one can configure various parameters like the text length, temperature (from low/boring to standard to chaotic/creative), start and stop generated text etc. And there are multiple presets to choose and play around with like Chat, Q&A, Parsing Unstructured Data, Summarize for a 2nd grader

设置和预设:单击设置图标后,可以配置各种参数,例如文本长度,温度(从低/无聊到标准到混乱/创意),开始和停止生成的文本等。并且有多个预设可供选择和玩耍,例如聊天,问答,解析非结构化数据,为二年级学生汇总

  • Chat:

    聊天:

    The chat preset looks more like a chatbot where you can set the character of the AI as friendly, creative, clever and helpful which provides informative answers in a very polite manner whereas if you set the character of the AI to brutal it responds exactly as the character suggests!

    聊天预设看起来更像是一个聊天机器人,您可以在其中将AI的角色设置为友好,富有创造力,聪明和乐于助人,以非常有礼貌的方式提供信息丰富的答案,而如果将AI的角色设置为残酷,则其响应方式与性格暗示!

  • Q&A:

    问答:

    Question answering needs some training before it starts answering our questions and people did not have any complaints with the kind of answers received.

    问题解答在开始回答我们的问题之前需要接受一些培训,并且人们对所收到的答案没有任何抱怨。

  • Parsing Unstructured Data:

    解析非结构化数据:

    This is an interesting preset of the model which can comprehend and extract structured information from the unstructured text

    这是模型的一个有趣的预设,它可以理解和从非结构化文本中提取结构化信息

  • Summarize for 2nd Grader:

    总结二年级学生:

    This preset shows another level of text compression by rephrasing the difficult sentences and concepts into simpler words and sentences that can be easily understood by a kid

    该预设通过将难于理解的句子和概念改写为较容易理解的简单单词和句子,从而显示了另一级文本压缩

Multilingual text processing:GPT-3 can handle languages other than English better than GPT-2. People have tried tasks in various languages German, Russian and Japanese it did perform well and were very much ready for multilingual text processing.

多语言文本处理: GPT-3可以比GPT-2更好地处理英语以外的语言。 人们尝试了多种语言的德语,俄语和日语任务,性能很好,并且已经为多语言文本处理做好了充分的准备。

Text Generation:It can generate poems on demand that too in a particular style if required, can write stories and essays with some fine-tuning even in other languages

文本生成:它可以按需生成诗歌,如果需要,也可以使用特定样式的诗,甚至可以用其他语言对故事和论文进行微调。

Code Generation:People have claimed that this API can generate code with a minimum prompts

代码生成:人们声称此API可以在最少提示的情况下生成代码

Here is an article which showcases all its capabilities and excerpts from social media.

是一篇文章,展示了其所有功能和来自社交媒体的摘录。

And this is how the AI interface looks like (Below image shows the Q&A preset):

这就是AI界面的样子(下图显示了Q&A预设):

Image for post
Fig-6: Preview of the AI Playground page for a Q&A preset图6:Q&A预设的AI Playground页面预览

我们如何使用它? (How can we use it?)

Unlike a lot of language models, GPT-3 does not need Transfer Learning, where the model is fine-tuned on task-specific data sets for specific tasks. The author of a research paper on GPT-3 mentions the following advantages of having a task-agnostic model:

与许多语言模型不同,GPT-3不需要转移学习,在该模型中,可以根据特定任务的特定于任务的数据集对模型进行微调。 有关GPT-3的研究论文的作者提到了具有任务不可知模型的以下优点:

  • Collecting task-specific data is difficult

    收集特定于任务的数据很困难
  • Fine-tuning might yield out-of-distribution performance

    微调可能会导致分布外性能
  • Need for an adaptable NLP system similar to humans, which can understand the natural language (English) and perform tasks with few or no prompts

    需要类似于人类的适应性NLP系统,该系统可以理解自然语言(英语),并且很少或没有提示地执行任务

The applications of GPT-3 are in-context learning, where a model is fed with a task/prompt/shot or an example and it responds to it on the basis of the skills and pattern recognition abilities that were learnt during the training to adapt the current specific task.

GPT-3的应用是在上下文中学习,在模型中提供任务/提示/镜头或示例,并根据训练过程中学习的技能和模式识别能力对模型做出响应当前的特定任务。

Despite its tremendous useability, the huge model size is the biggest factor hindering the usage for most people, except those with available resources. However, there are discussions in the fraternity that distillation might come to the rescue!

尽管具有巨大的可用性,但是巨大的模型大小是阻碍大多数人(除了拥有可用资源的人)使用的最大因素。 但是,在兄弟会中有讨论可能会解救蒸馏 !

有什么限制? (What are the limitations?)

The Open AI founder himself said that “GPT-3 has weaknesses and it makes silly mistakes”. It is weak in the segment of sentence comparison where it has to see the usage of a word in 2 different sentences.

Open AI创始人本人说:“ GPT-3有弱点,并且会犯愚蠢的错误”。 它在句子比较部分中很弱,在该部分中必须查看两个不同句子中一个单词的用法。

As per the researchers, it still faces some problems in the following tasks:

根据研究人员的说法,它在以下任务中仍然面临一些问题:

  • Repetitions

    重复次数
  • Coherence loss

    相干损失
  • Contradictions

    矛盾之处
  • Drawing real conclusions

    得出真实结论
  • Multiple digit additions and subtractions

    多位数加减
Image for post
Fig-7: Chart showing the results of different arithmetic tasks in a few-shot setting for models of different sizes图7:该图表显示了针对不同大小的模型在几次设置中不同算术任务的结果

结论 (Conclusion)

It is great to have an NLP system that doesn’t require large amounts of custom-task specific datasets and custom-model architecture to solve specific NLP tasks. The experiments conducted show its power, potential and impact on the future of NLP advancement.

拥有不需要大量特定于定制任务的数据集和定制模型体系结构来解决特定NLP任务的NLP系统,真是太好了。 进行的实验表明了它的力量,潜力以及对NLP未来发展的影响。

Though GPT-3 doesn’t do well on everything and the size of it makes it difficult to use by everyone, this is just the threshold of a lot of new improvements to come in the field of NLP!

尽管GPT-3不能在所有方面都做得很好,并且它的大小使每个人都难以使用,但这只是NLP领域中许多新改进的门槛!

翻译自: https://medium.com/quick-bites/gpt-3-the-latest-in-the-nlp-town-961259a0930f

nlp gpt论文

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/388531.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

真实不装| 阿里巴巴新人上路指北

新手上路,总想听听前辈们分享他们走过的路。橙子选取了阿里巴巴合伙人逍遥子(阿里巴巴集团CEO) 、Eric(蚂蚁金服董事长兼CEO)、Judy(阿里巴巴集团CPO)的几段分享,他们是如何看待职场…

小程序学习总结

上个周末抽空了解了一下小程序,现在将所学所感记录以便日后翻看;需要指出的是我就粗略过了下小程序的api了解了下小程序的开发流程以及工具的使用,然后写了一个小程序的demo;在我看来,如果有前端基础学习小程序无异于锦上添花了,而我这个三年的码农虽也写过不少前端代码但离专业…

uber 数据可视化_使用R探索您在Uber上的活动:如何分析和可视化您的个人数据历史记录

uber 数据可视化Perhaps, dear reader, you are too young to remember that before, the only way to request a particular transport service such as a taxi was to raise a hand to make a signal to an available driver, who upon seeing you would stop if he was not …

java B2B2C springmvc mybatis电子商城系统(四)Ribbon

2019独角兽企业重金招聘Python工程师标准>>> 一:Ribbon是什么? Ribbon是Netflix发布的开源项目,主要功能是提供客户端的软件负载均衡算法,将Netflix的中间层服务连接在一起。Ribbon客户端组件提供一系列完善的配置项如…

基于plotly数据可视化_[Plotly + Datashader]可视化大型地理空间数据集

基于plotly数据可视化简介(我们将创建的内容): (Introduction (what we’ll create):) Unlike the previous tutorials in this map-based visualization series, we will be dealing with a very large dataset in this tutorial (about 2GB of lat, lon coordinat…

Centos用户和用户组管理

inux系统是一个多用户多任务的分时操作系统,任何一个要使用系统资源的用户,都必须首先向系统管理员申请一个账号,然后以这个账号的身份进入系统。1、添加新的用户账号使用useradd命令,其语法如下:useradd 选项 用户名-…

划痕实验 迁移面积自动统计_从Jupyter迁移到合作实验室

划痕实验 迁移面积自动统计If you want to use Google Colaboratory to perform your data analysis, for building data pipelines and data visualizations, here is the beginners’ guide to migrate from one tool to the other.如果您想使用Google Colaboratory进行数据分…

数据开放 数据集_除开放式清洗之外:叙述是开放数据门户的未来吗?

数据开放 数据集There is growing consensus in the open data community that the mere release of open data — that is data that can be freely accessed, remixed, and redistributed — is not enough to realize the full potential of openness. Successful open data…

ios android 交互 区别,很多人不承认:iOS的返回交互,对比Android就是反人类。

宁之的奥义2020-09-21 10:54:39点灭只看此人举报给你解答:美国人都是左撇子,所以他们很方便🐶给你解答:美国人都是左撇子,所以他们很方便🐶亮了(504)回复查看评论(19)回忆的褶皱楼主2020-09-21 11:01:01点灭…

Servlet+JSP

需要说明的是,其实工具的版本不是主要因素,所以我下面忽略版本。 你能搜到这篇文章,说明你已经知道怎么部署Tomcat,并运行自己的网页了。 但是,我们知道,每次修改源文件,我们总得手工把文件co…

正态分布高斯分布泊松分布_正态分布:将数据转换为高斯分布

正态分布高斯分布泊松分布For detailed implementation in python check my GitHub repository.有关在python中的详细实现,请查看我的GitHub存储库。 介绍 (Introduction) Some machine learning model like linear and logistic regression assumes a Gaussian di…

BABOK - 开篇:业务分析知识体系介绍

本文更新版已挪至 http://www.zhoujingen.cn/itbang/328.html ---------------------------------------------- 当我们作项目时,下面这张图很多人都明白,从计划、构建、测试、部署实施后发现提供的方案并不能真正解决用户的问题,那么我们是…

黑苹果 wifi android,动动手指零负担让你的黑苹果连上Wifi

动动手指零负担让你的黑苹果连上Wifi2019-12-02 10:08:485点赞36收藏4评论购买理由黑苹果Wifi是个头疼的问题,高“贵”的原机Wifi蓝牙很贵,比如我最近偶然得到的BCM94360CS2,估计要180。稍微便宜的一点的,搞各种ID,各种…

float在html语言中的用法,float属性值包括

html中不属于float常用属性值的是float常用的值就三个:left\right\none。没有其他的值了。 其中none这个值是默认的,所以一般不用写。css中float属性有几种用法?值 描述left 元素向左浮动。 right 元素向右浮动。 none 默认值。元素不浮动,并…

它们是什么以及为什么我们不需要它们

Once in a while, when reading papers in the Reinforcement Learning domain, you may stumble across mysterious-sounding phrases such as ‘we deal with a filtered probability space’, ‘the expected value is conditional on a filtration’ or ‘the decision-mak…

LoadRunner8.1破解汉化过程

LR8.1版本已经将7.8和8.0中通用的license封了,因此目前无法使用LR8.1版本,包括该版本的中文补丁。 破解思路:由于软件的加密程序和运行的主程序是分开的,因此可以使用7.8的加密程序覆盖8.1中的加密程序,这样老的7.8和…

TCP/IP网络编程之基于TCP的服务端/客户端(二)

回声客户端问题 上一章TCP/IP网络编程之基于TCP的服务端/客户端(一)中,我们解释了回声客户端所存在的问题,那么单单是客户端的问题,服务端没有任何问题?是的,服务端没有问题,现在先让…

谈谈iOS获取调用链

本文由云社区发表iOS开发过程中难免会遇到卡顿等性能问题或者死锁之类的问题,此时如果有调用堆栈将对解决问题很有帮助。那么在应用中如何来实时获取函数的调用堆栈呢?本文参考了网上的一些博文,讲述了使用mach thread的方式来获取调用栈的步…

python 移动平均线_Python中的移动平均线

python 移动平均线There are situations, particularly when dealing with real-time data, when a conventional average is of little use because it includes old values which are no longer relevant and merely give a misleading impression of the current situation.…

html5字体的格式转换,font字体

路由器之家网今天精心准备的是《font字体》,下面是详解!html中的标签是什么意思HTML提供了文本样式标记,用来控制网页中文本的字体、字号和颜色,多种多样的文字效果可以使网页变得更加绚丽。其基本语法格式:文本内容fa…