文本训练集_训练文本中的不稳定性

文本训练集

介绍 (Introduction)

In text generation, conventionally, maximum likelihood estimation is used to train a model to generate a text one token at a time. Each generated token will be compared against the ground-truth data. If any token is different from the actual token, this information will be used to update the model. However, such training may cause the generation to be generic or repetitive.

通常,在文本生成中,最大似然估计用于训练模型以一次生成一个令牌的文本。 每个生成的令牌将与真实数据进行比较。 如果任何令牌与实际令牌不同,则此信息将用于更新模型。 但是,这样的训练可能导致生成是通用的或重复的。

Generative Adversarial Network (GAN) tackles this problem by introducing 2 models — generator and discriminator. The goal of the discriminator is to determine whether a sentence x is real or fake (fake refers to generated by models), whereas the generator attempts to produce a sentence that can fool the discriminator. These two models are competing against each other, which results in the improvement of both networks until the generator can produce a human-like sentence.

生成对抗网络(GAN)通过引入两种模型(生成器和鉴别器)解决了这个问题。 判别器的目标是确定句子x是真实的还是假的(伪造是指由模型生成),而生成器会尝试生成可以使判别器蒙蔽的句子。 这两个模型相互竞争,导致两个网络都得到改善,直到生成器可以产生类似于人的句子为止。

Although we may see some promising results with computer vision and text generation communities, getting hands-on this type of modeling is difficult.

尽管我们可能会在计算机视觉和文本生成社区中看到一些令人鼓舞的结果,但是很难进行这种类型的建模。

GANS问题 (Problem with GANS)

  1. Mode Collapse (Lack of Diversity) This is a common problem with GAN training. Mode collapse occurs when the model does not care about the input random noise, and it keeps generating the same sentence regardless of the input. In this sense, the model is now trying to fool the discriminator, and finding a single point is sufficient enough to do so.

    模式崩溃(缺乏多样性)这是GAN训练中的常见问题。 当模型不关心输入的随机噪声时,就会发生模式崩溃,并且不管输入如何,它都会不断生成相同的句子。 从这个意义上讲,模型现在正试图欺骗鉴别器,找到一个单点足以做到这一点。

  2. Unstable Training. The most important problem is to ensure that the generator and the discriminator are on par to each other. If either one outperforms each other, the whole training will become unstable, and no useful information will be learned. For example, when the generator’s loss is slowly reducing, that means the generator starts to find a way to fool the discriminator even though the generation is still immature. On the other hand, when discriminator is OP, there is no new information for the generator to learn. Every generation will be evaluated as fake; therefore, the generator will have to rely on changing the word randomly in searching for a sentence that may fool the D.

    培训不稳定。 最重要的问题是确保生成器和鉴别器彼此相同。 如果任何一个人的表现都超过对方,则整个培训将变得不稳定,并且将不会学习有用的信息。 例如,当发电机的损耗逐渐降低时,这意味着即使发电机还不成熟,发电机也开始寻找一种欺骗鉴别器的方法。 另一方面,当判别器为OP时,没有新的信息可供生成器学习。 每一代都将被视为假货; 因此,生成器将不得不依靠随机更改单词来搜索可能使D蒙蔽的句子。

  3. Intuition is NOT Enough. Sometimes, your intended modeling is correct, but it may not work as you want it to be. It may require more than that to work. Frequently, you need to do hyperparameters tuning by tweaking learning rate, trying different loss functions, using batch norm, or trying different activation functions.

    直觉还不够 。 有时,您预期的建模是正确的,但可能无法按您希望的那样工作。 可能需要更多的工作。 通常,您需要通过调整学习率,尝试使用不同的损失函数,使用批处理规范或尝试使用不同的激活函数来进行超参数调整。

  4. Lots of Training Time. Some work reported training up to 400 epochs. That is tremendous if we compare with Seq2Seq that might take only 50 epochs or so to get a well-structured generation. The reason that causes it to be slow is the exploration of the generation. G does not receive any explicit signal of which token is bad. Rather it receives for the whole generation. To able to produce a natural sentence, G needs to explore various combinations of words to reach there. How often do you think G can accidentally produce <eos> out of nowhere? If we use MLE, the signal is pretty clear that there should be <eos> and there are <pad> right after it.

    大量的培训时间。 一些工作报告说培训了多达400个纪元。 如果我们与Seq2Seq进行比较,那可能只花费50个纪元左右即可得到结构良好的世代,这是巨大的。 导致它变慢的原因是一代人的探索。 G没有收到任何明显的信号,指出哪个令牌不好。 相反,它为整个世代所接受。 为了产生自然的句子,G需要探索各种单词组合以到达那里。 您认为G多久会偶然地偶然产生<eos>? 如果我们使用MLE,则信号很清楚,应该有<eos>,紧随其后的是<pad>。

潜在解决方案 (Potential Solutions)

Many approaches have been attempted to handle this type of training.

已经尝试了许多方法来处理这种训练。

  1. Use ADAM Optimizer. Some suggest using the ADAM for the generator and SGD for the discriminator. But most importantly, some paper starts to tweak the beta for the ADAM. betas=(0.5, 0.999)

    使用ADAM优化器 。 有些人建议使用ADAM作为生成器,使用SGD作为鉴别器。 但最重要的是,一些论文开始调整ADAM的beta版本。 beta =(0.5,0.999)

  2. Wasserstein GAN. Some work reports using WGAN can stabilize the training greatly. From our experiments, however, WGAN can not even reach the quality of regular GAN. Perhaps we are missing something. (See? It’s quite difficult)

    瓦瑟斯坦甘 。 使用WGAN的一些工作报告可以大大稳定培训。 但是,根据我们的实验,WGAN甚至无法达到常规GAN的质量。 也许我们缺少了一些东西。 (看?这很困难)

  3. GAN Variation. Some suggest trying KL-GAN, or VAE-GAN. These can make the models easier to train.

    GAN变化 。 有些人建议尝试KL-GAN或VAE-GAN。 这些可以使模型更容易训练。

  4. Input Noise to the Discriminator. To make the discriminator’s learning on par with the generator, which in general have a harder time than the D, we input some noise along with the input as well as using dropout to make things easier.

    鉴别器的输入噪声 。 为了使鉴别器的学习与生成器(通常比生成器困难)相提并论,我们在输入的同时输入一些噪声,并使用压差使事情变得更容易。

  5. DCGAN (Deep Convolutional GAN). This is only for computer vision tasks. However, this model is known to avoid unstable training. The key in this model is to not use ReLU, use BatchNorm, and use Strided Convolution.

    DCGAN(深度卷积GAN) 。 这仅用于计算机视觉任务。 但是,已知该模型可以避免不稳定的训练。 该模型的关键是不使用ReLU,使用BatchNorm和使用Strided Convolution。

  6. Ensemble of Discriminator. Instead of having a single discriminator, multiple discriminators are trained with different batch, to capture different aspects of respect. Thus, the generator can not just fool a single D, but to be more generalized so that it can fool all of them. This is also related to Dropout GAN (many D and dropout some during training).

    鉴别器合奏 。 代替单个鉴别器,而是用不同的批次训练多个鉴别器,以捕获尊重的不同方面。 因此,生成器不仅可以欺骗单个D,还可以对其进行更广泛的概括以使其欺骗所有D。 这也与辍学GAN(许多D,并且在培训期间辍学)有关。

  7. Parameter Tuning. With the learning rate, dropout ratio, batch size, and so on. It is difficult to determine how much a model is better than another. Therefore, some would test on multiple parameters and see whichever works best. One bottleneck is there is no evaluation metric for GAN, which results in a lot of manual checks to determine the quality.

    参数调整 。 具有学习率,辍学率,批量大小等。 很难确定一个模型比另一个模型好多少。 因此,有些人会在多个参数上进行测试,然后看哪个效果最好。 一个瓶颈是没有针对GAN的评估指标,这导致需要大量手动检查来确定质量。

  8. Scheduling G and D. Trying to learn G 5 times followed by D 1 times are reported to be useless in many work. If you want to try scheduling, do something more meaningful.

    安排G和D。 据报告,尝试学习G 5次然后学习D 1次在许多工作中是没有用的。 如果您想尝试安排时间,请做一些更有意义的事情。

while 
train_G()while discriminator
train_D()

结论 (Conclusion)

Adversarial-based text generation opens a new avenue on how a model is trained. Instead of relying on MLE, discriminator(s) are used to signal whether or not the generation is correct. However, such training has its downside that it is quite hard to train. Many studies suggest some tips on how to avoid the problems are described above; however, you need to try a variety of settings (or parameters) to assure your generative model can learn properly.

基于对抗的文本生成为如何训练模型开辟了一条新途径。 鉴别器不是依靠MLE,而是用于发信号通知生成是否正确。 但是,这种训练有其缺点,那就是很难训练。 上面有许多研究提出了一些避免问题的技巧。 但是,您需要尝试各种设置(或参数)以确保生成模型可以正确学习。

进一步阅读 (Further Reading)

翻译自: https://towardsdatascience.com/instability-in-training-text-gan-20273d6a859a

文本训练集

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/392550.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

山东省赛 传递闭包

https://vjudge.net/contest/311348#problem/A 思路&#xff1a;用floyd传递闭包处理点与点之间的关系&#xff0c;之后开数组记录每个数字比它大的个数和小的个数&#xff0c;如果这个个数超过n/2那么它不可能作为中位数&#xff0c;其他的都有可能。 #include<bits/stdc.h…

如何使用动态工具提示构建React Native图表

by Vikrant Negi通过Vikrant Negi 如何使用动态工具提示构建React Native图表 (How to build React Native charts with dynamic tooltips) Creating charts, be it on the web or on mobile apps, has always been an interesting and challenging task especially in React …

如何解决ajax跨域问题(转)

由 于此前很少写前端的代码(哈哈&#xff0c;不合格的程序员啊)&#xff0c;最近项目中用到json作为系统间交互的手段&#xff0c;自然就伴随着众多ajax请求&#xff0c;随之而来的就是要解决 ajax的跨域问题。本篇将讲述一个小白从遇到跨域不知道是跨域问题&#xff0c;到知道…

mysql并发错误_又谈php+mysql并发数据出错问题

最近&#xff0c;项目中的所有crond定时尽量取消&#xff0c;改成触发式。比如每日6点清理数据。原来的逻辑&#xff0c;写一个crond定时搞定现在改为触发式6点之后第一个玩家/用户 进入&#xff0c;才开始清理数据。出现了一个问题1 如何确保第一个玩家触发&#xff1f;updat…

leetcode 621. 任务调度器(贪心算法)

给你一个用字符数组 tasks 表示的 CPU 需要执行的任务列表。其中每个字母表示一种不同种类的任务。任务可以以任意顺序执行&#xff0c;并且每个任务都可以在 1 个单位时间内执行完。在任何一个单位时间&#xff0c;CPU 可以完成一个任务&#xff0c;或者处于待命状态。 然而&…

英国脑科学领域_来自英国A级算法崩溃的数据科学家的4课

英国脑科学领域In the UK, families, educators, and government officials are in an uproar about the effects of a new algorithm for scoring “A-levels,” the advanced level qualifications used to evaluate students’ knowledge of specific subjects in preparati…

MVC发布后项目存在于根目录中的子目录中时的css与js、图片路径问题

加载固定资源js与css <script src"Url.Content("~/Scripts/js/jquery.min.js")" type"text/javascript"></script> <link href"Url.Content("~/Content/css/shop.css")" rel"stylesheet" type&quo…

telegram 机器人_学习使用Python在Telegram中构建您的第一个机器人

telegram 机器人Imagine this, there is a message bot that will send you a random cute dog image whenever you want, sounds cool right? Let’s make one!想象一下&#xff0c;有一个消息机器人可以随时随地向您发送随机的可爱狗图像&#xff0c;听起来很酷吧&#xff1…

判断输入的字符串是否为回文_刷题之路(九)--判断数字是否回文

Palindrome Number问题简介&#xff1a;判断输入数字是否是回文,不是返回0,负数返回0举例:1:输入: 121输出: true2:输入: -121输出: false解释: 回文为121-&#xff0c;所以负数都不符合3:输入: 10输出: false解释: 倒序为01&#xff0c;不符合要求解法一&#xff1a;这道题比较…

python + selenium 搭建环境步骤

介绍在windows下&#xff0c;selenium python的安装以及配置。1、首先要下载必要的安装工具。 下载python&#xff0c;我安装的python3.0版本,根据你自己的需要安装下载setuptools下载pip(python的安装包管理工具) 配置系统的环境变量 python,需要配置2个环境变量C:\Users\AppD…

VirtualBox 虚拟机复制

本文简单讲两种情况下的复制方式 1 跨电脑复制 2 同一virtrul box下 虚拟机复制 ---------------------------------------------- 1 跨电脑复制 a虚拟机 是老的虚拟机 b虚拟机 是新的虚拟机 新虚拟机b 新建&#xff0c; 点击下一步会生成 相应的文件夹 找到老虚拟机a的 vdi 文…

javascript实用库_编写实用JavaScript的实用指南

javascript实用库by Nadeesha Cabral通过Nadeesha Cabral 编写实用JavaScript的实用指南 (A practical guide to writing more functional JavaScript) Functional programming is great. With the introduction of React, more and more JavaScript front-end code is being …

数据库数据过长避免_为什么要避免使用商业数据科学平台

数据库数据过长避免让我们从一个类比开始 (Lets start with an analogy) Stick with me, I promise it’s relevant.坚持下去&#xff0c;我保证这很重要。 If your selling vegetables in a grocery store your business value lies in your loyal customers and your positi…

mysql case快捷方法_MySQL case when使用方法实例解析

首先我们创建数据库表&#xff1a; CREATE TABLE t_demo (id int(32) NOT NULL,name varchar(255) DEFAULT NULL,age int(2) DEFAULT NULL,num int(3) DEFAULT NULL,PRIMARY KEY (id)) ENGINEInnoDB DEFAULT CHARSETutf8;插入数据&#xff1a;INSERT INTO t_demo VALUES (1, 张…

【~~~】POJ-1006

很简单的一道题目&#xff0c;但是引出了很多知识点。 这是一道中国剩余问题&#xff0c;先贴一下1006的代码。 #include "stdio.h" #define MAX 21252 int main() { int p , e , i , d , n 1 , days 0; while(1) { scanf("%d %d %d %d",&p,&e,&…

Java快速扫盲指南

文章转自&#xff1a;https://segmentfault.com/a/1190000004817465#articleHeader22 JDK&#xff0c;JRE和 JVM 的区别 JVM&#xff1a;java 虚拟机&#xff0c;负责将编译产生的字节码转换为特定机器代码&#xff0c;实现一次编译多处执行&#xff1b; JRE&#xff1a;java运…

xcode扩展_如何将Xcode插件转换为Xcode扩展名

xcode扩展by Khoa Pham通过Khoa Pham 如何将Xcode插件转换为Xcode扩展名 (How to convert your Xcode plugins to Xcode extensions) Xcode is an indispensable IDE for iOS and macOS developers. From the early days, the ability to build and install custom plugins ha…

leetcode 861. 翻转矩阵后的得分(贪心算法)

有一个二维矩阵 A 其中每个元素的值为 0 或 1 。 移动是指选择任一行或列&#xff0c;并转换该行或列中的每一个值&#xff1a;将所有 0 都更改为 1&#xff0c;将所有 1 都更改为 0。 在做出任意次数的移动后&#xff0c;将该矩阵的每一行都按照二进制数来解释&#xff0c;矩…

数据分析团队的价值_您的数据科学团队的价值

数据分析团队的价值This is the first article in a 2-part series!!这是分两部分的系列文章中的第一篇&#xff01; 组织数据科学 (Organisational Data Science) Few would argue against the importance of data in today’s highly competitive corporate world. The tech…

mysql 保留5位小数_小猿圈分享-MySQL保留几位小数的4种方法

今天小猿圈给大家分享的是MySQL使用中4种保留小数的方法&#xff0c;希望可以帮助到大家&#xff0c;让大家的工作更加方便。1 round(x,d)用于数据x的四舍五入, round(x) ,其实就是round(x,0),也就是默认d为0&#xff1b;这里有个值得注意的地方是&#xff0c;d可以是负数&…