文本训练集

介绍 (Introduction)

In text generation, conventionally, maximum likelihood estimation is used to train a model to generate a text one token at a time. Each generated token will be compared against the ground-truth data. If any token is different from the actual token, this information will be used to update the model. However, such training may cause the generation to be generic or repetitive.

通常，在文本生成中，最大似然估计用于训练模型以一次生成一个令牌的文本。每个生成的令牌将与真实数据进行比较。如果任何令牌与实际令牌不同，则此信息将用于更新模型。但是，这样的训练可能导致生成是通用的或重复的。

Generative Adversarial Network (GAN) tackles this problem by introducing 2 models — generator and discriminator. The goal of the discriminator is to determine whether a sentence x is real or fake (fake refers to generated by models), whereas the generator attempts to produce a sentence that can fool the discriminator. These two models are competing against each other, which results in the improvement of both networks until the generator can produce a human-like sentence.

生成对抗网络(GAN)通过引入两种模型(生成器和鉴别器)解决了这个问题。判别器的目标是确定句子x是真实的还是假的(伪造是指由模型生成)，而生成器会尝试生成可以使判别器蒙蔽的句子。这两个模型相互竞争，导致两个网络都得到改善，直到生成器可以产生类似于人的句子为止。

Although we may see some promising results with computer vision and text generation communities, getting hands-on this type of modeling is difficult.

尽管我们可能会在计算机视觉和文本生成社区中看到一些令人鼓舞的结果，但是很难进行这种类型的建模。

GANS问题 (Problem with GANS)

Mode Collapse (Lack of Diversity) This is a common problem with GAN training. Mode collapse occurs when the model does not care about the input random noise, and it keeps generating the same sentence regardless of the input. In this sense, the model is now trying to fool the discriminator, and finding a single point is sufficient enough to do so.
模式崩溃(缺乏多样性)这是GAN训练中的常见问题。当模型不关心输入的随机噪声时，就会发生模式崩溃，并且不管输入如何，它都会不断生成相同的句子。从这个意义上讲，模型现在正试图欺骗鉴别器，找到一个单点足以做到这一点。
Unstable Training. The most important problem is to ensure that the generator and the discriminator are on par to each other. If either one outperforms each other, the whole training will become unstable, and no useful information will be learned. For example, when the generator’s loss is slowly reducing, that means the generator starts to find a way to fool the discriminator even though the generation is still immature. On the other hand, when discriminator is OP, there is no new information for the generator to learn. Every generation will be evaluated as fake; therefore, the generator will have to rely on changing the word randomly in searching for a sentence that may fool the D.
培训不稳定。 最重要的问题是确保生成器和鉴别器彼此相同。如果任何一个人的表现都超过对方，则整个培训将变得不稳定，并且将不会学习有用的信息。例如，当发电机的损耗逐渐降低时，这意味着即使发电机还不成熟，发电机也开始寻找一种欺骗鉴别器的方法。另一方面，当判别器为OP时，没有新的信息可供生成器学习。每一代都将被视为假货；因此，生成器将不得不依靠随机更改单词来搜索可能使D蒙蔽的句子。
Intuition is NOT Enough. Sometimes, your intended modeling is correct, but it may not work as you want it to be. It may require more than that to work. Frequently, you need to do hyperparameters tuning by tweaking learning rate, trying different loss functions, using batch norm, or trying different activation functions.
直觉还不够 。有时，您预期的建模是正确的，但可能无法按您希望的那样工作。可能需要更多的工作。通常，您需要通过调整学习率，尝试使用不同的损失函数，使用批处理规范或尝试使用不同的激活函数来进行超参数调整。
Lots of Training Time. Some work reported training up to 400 epochs. That is tremendous if we compare with Seq2Seq that might take only 50 epochs or so to get a well-structured generation. The reason that causes it to be slow is the exploration of the generation. G does not receive any explicit signal of which token is bad. Rather it receives for the whole generation. To able to produce a natural sentence, G needs to explore various combinations of words to reach there. How often do you think G can accidentally produce <eos> out of nowhere? If we use MLE, the signal is pretty clear that there should be <eos> and there are <pad> right after it.
大量的培训时间。 一些工作报告说培训了多达400个纪元。如果我们与Seq2Seq进行比较，那可能只花费50个纪元左右即可得到结构良好的世代，这是巨大的。导致它变慢的原因是一代人的探索。 G没有收到任何明显的信号，指出哪个令牌不好。相反，它为整个世代所接受。为了产生自然的句子，G需要探索各种单词组合以到达那里。您认为G多久会偶然地偶然产生<eos>？如果我们使用MLE，则信号很清楚，应该有<eos>，紧随其后的是<pad>。

潜在解决方案 (Potential Solutions)

Many approaches have been attempted to handle this type of training.

已经尝试了许多方法来处理这种训练。

Use ADAM Optimizer. Some suggest using the ADAM for the generator and SGD for the discriminator. But most importantly, some paper starts to tweak the beta for the ADAM. betas=(0.5, 0.999)
使用ADAM优化器 。有些人建议使用ADAM作为生成器，使用SGD作为鉴别器。但最重要的是，一些论文开始调整ADAM的beta版本。 beta =(0.5，0.999)
Wasserstein GAN. Some work reports using WGAN can stabilize the training greatly. From our experiments, however, WGAN can not even reach the quality of regular GAN. Perhaps we are missing something. (See? It’s quite difficult)
瓦瑟斯坦甘 。使用WGAN的一些工作报告可以大大稳定培训。但是，根据我们的实验，WGAN甚至无法达到常规GAN的质量。也许我们缺少了一些东西。 (看？这很困难)
GAN Variation. Some suggest trying KL-GAN, or VAE-GAN. These can make the models easier to train.
GAN变化 。有些人建议尝试KL-GAN或VAE-GAN。这些可以使模型更容易训练。
Input Noise to the Discriminator. To make the discriminator’s learning on par with the generator, which in general have a harder time than the D, we input some noise along with the input as well as using dropout to make things easier.
鉴别器的输入噪声 。为了使鉴别器的学习与生成器(通常比生成器困难)相提并论，我们在输入的同时输入一些噪声，并使用压差使事情变得更容易。
DCGAN (Deep Convolutional GAN). This is only for computer vision tasks. However, this model is known to avoid unstable training. The key in this model is to not use ReLU, use BatchNorm, and use Strided Convolution.
DCGAN(深度卷积GAN) 。这仅用于计算机视觉任务。但是，已知该模型可以避免不稳定的训练。该模型的关键是不使用ReLU，使用BatchNorm和使用Strided Convolution。
Ensemble of Discriminator. Instead of having a single discriminator, multiple discriminators are trained with different batch, to capture different aspects of respect. Thus, the generator can not just fool a single D, but to be more generalized so that it can fool all of them. This is also related to Dropout GAN (many D and dropout some during training).
鉴别器合奏 。代替单个鉴别器，而是用不同的批次训练多个鉴别器，以捕获尊重的不同方面。因此，生成器不仅可以欺骗单个D，还可以对其进行更广泛的概括以使其欺骗所有D。这也与辍学GAN(许多D，并且在培训期间辍学)有关。
Parameter Tuning. With the learning rate, dropout ratio, batch size, and so on. It is difficult to determine how much a model is better than another. Therefore, some would test on multiple parameters and see whichever works best. One bottleneck is there is no evaluation metric for GAN, which results in a lot of manual checks to determine the quality.
参数调整 。具有学习率，辍学率，批量大小等。很难确定一个模型比另一个模型好多少。因此，有些人会在多个参数上进行测试，然后看哪个效果最好。一个瓶颈是没有针对GAN的评估指标，这导致需要大量手动检查来确定质量。
Scheduling G and D. Trying to learn G 5 times followed by D 1 times are reported to be useless in many work. If you want to try scheduling, do something more meaningful.
安排G和D。 据报告，尝试学习G 5次然后学习D 1次在许多工作中是没有用的。如果您想尝试安排时间，请做一些更有意义的事情。

while 
    train_G()while discriminator
    train_D()

结论 (Conclusion)

Adversarial-based text generation opens a new avenue on how a model is trained. Instead of relying on MLE, discriminator(s) are used to signal whether or not the generation is correct. However, such training has its downside that it is quite hard to train. Many studies suggest some tips on how to avoid the problems are described above; however, you need to try a variety of settings (or parameters) to assure your generative model can learn properly.

基于对抗的文本生成为如何训练模型开辟了一条新途径。鉴别器不是依靠MLE，而是用于发信号通知生成是否正确。但是，这种训练有其缺点，那就是很难训练。上面有许多研究提出了一些避免问题的技巧。但是，您需要尝试各种设置(或参数)以确保生成模型可以正确学习。