gan训练失败_我尝试过(但失败了)使用GAN来创作艺术品,但这仍然值得。

gan训练失败

This work borrows heavily from the Pytorch DCGAN Tutorial and the NVIDA paper on progressive GANs.

这项工作大量借鉴了Pytorch DCGAN教程 有关渐进式GAN NVIDA论文

One area of computer vision I’ve been wanting to explore are GANs. So when my wife and I moved into a home that had some extra wall space, I realized I could create a network to make some wall art and avoid a trip to Bed Bath & Beyond (2 birds with one code!).

我一直想探索的计算机视觉领域之一是GAN。 因此,当我和我的妻子搬进一所拥有一些额外墙壁空间的房屋时,我意识到我可以创建一个网络来制作一些墙壁艺术品,并避免去Bed Bath&Beyond(两只鸟只用一个密码!)旅行。

什么是GAN? (What are GANs?)

GANs (Generative Adversarial Networks) work using two synergistic neural networks: one that creates forgery images (the generator), and another neural net that takes in the forgery images along with real examples of art and attempts to classify them as either real or fake (the discriminator). The networks then iterate, the generator getting better at making fakes and the discriminator getting better at detecting them. At the end of the process, you hopefully have a generator that can randomly create authentic-looking art. This method can be applied to generate more than images. In her book You Look Like a Thing and I Love You, Janelle Shane discusses using GANs to make everything from cookie recipes to pick up lines (which is where the book gets its namesake).

GAN(Generative Adversarial Networks)使用两个协同神经网络进行工作:一个创建伪造图像(生成器),另一个神经网络将伪造图像与真实的艺术实例一起输入并尝试将它们分类为真实或伪造(鉴别器)。 然后网络进行迭代,生成器在伪造品方面变得更好,而鉴别器在伪造品方面变得更好。 在此过程的最后,您希望有一个生成器可以随机创建看起来真实的艺术品。 该方法可以应用于生成更多的图像。 珍妮尔·谢恩(Janelle Shane)在她的《 你看起来像一件东西,我爱你》一书中,讨论了使用GAN制作从饼干食谱到捡拾食物的各种东西(这就是该书的名字)。

If you don’t know what GANs are I suggest reading this Pytorch article for an in-depth explanation.

如果您不知道GAN是什么,我建议您阅读这篇Pytorch 文章以获得更深入的解释。

挑战性 (Challenges)

Creating a GANs model that generates satisfactory results comes with several difficulties which I’ll need to address in my project.

创建可产生令人满意结果的GAN模型会带来一些困难,我将在项目中解决这些困难。

Data. Like all neural networks, you’ll need a lot of data; however, GANs appear to have an even more voracious appetite. Most GAN projects I’ve read about have leveraged tens or hundreds of thousands images. In contrast, my dataset is only a few thousand images that I was able to pull from a Google image search. In terms of style, I’d love to end with something that resembles a Rothko, but I’ll settle for generic Bed Bath and Beyond.

数据。 像所有神经网络一样,您将需要大量数据。 但是,GAN的胃口似乎更大。 我读过的大多数GAN项目都利用了成千上万的图像。 相反,我的数据集仅是我能够从Google图片搜索中提取的几千张图片。 在风格方面,我很想以类似于Rothko的东西作为结尾,但是我会选择通用的Bed Bath and Beyond。

Training time. In NVIDA’s paper on progressive GANs, they trained their network for days using multiple GPUs. In my case I’ll be using Google Colab and hope the free-tier hardware will be good enough.

训练时间。 在NVIDA关于渐进式GAN的论文中 ,他们使用多个GPU训练了几天的网络。 就我而言,我将使用Google Colab,希望免费的硬件足够好。

Mode Collapse. Besides being the name of my new dubstep project, mode collapse is what happens when the variety of the generated images begin to converge. Essentially the generator is seeing that a few images are doing well at fooling the discriminator and decides to make all its output look like those few images.

模式崩溃。 除了作为我的新dubstep项目的名称之外,模式崩溃是当生成的各种图像开始融合时发生的情况。 本质上,生成器看到一些图像在欺骗鉴别器方面表现良好,并决定使其所有输出看起来像那几幅图像。

Image Resolution. The larger the wanted image, the larger the needed network. So how high of a resolution will I need? Well, the recommended number of pixels per inch for digital prints is 300, so if I want something I can hang in a 12x15" frame I’ll need a final resolution of 54,000 squared pixels! I obviously won’t be able to build a model to that high of a resolution, but for this experiment I’ll say that’s the goal and I’ll see where I end up. To help with this, I’ll also be using a progressive GANs approach. This was pioneered by NVIDA where they first trained a model at a low resolution and then progressively added the extra layers needed to increase the image resolution. You can think of it as wading into the pool instead of diving directly into the deep end. In their paper they were able to generate celebrity images at a resolution of 1024 x 1024 pixels (my target is only 50x that amount).

图像分辨率。 所需的图像越大,所需的网络越大。 那么我需要多高的分辨率? 好吧,建议的数字打印每英寸像素数是300,因此,如果我想挂在12x15英寸的帧中,则最终分辨率必须为54,000平方像素!我显然无法建立一个分辨率达到如此高的分辨率,但对于本实验,我将说这是目标,然后看看最终结果。为帮助实现这一点,我还将使用渐进式GANs方法。他们首先以低分辨率训练模型,然后逐步添加增加图像分辨率所需的额外图层,您可以将其视为涉入池中,而不是直接潜入较深的一端。生成分辨率为1024 x 1024像素的名人图片(我的目标仅为该数量的50x)。

获取代码 (Getting in the Code)

My full code can be found on github. The main things I want show in this article are the generator and the discriminator.

我的完整代码可以在github上找到。 我想在本文中展示的主要内容是生成器和鉴别器。

The Discriminator. My discriminator looks like any other image classification network. The unique thing about this class is that it takes the number of layers (based on the image size) as a parameter. This is allows me to do the “progressive” part of the Progressive GANs without having to rewrite my classes each time I increment the image size.

鉴别器。 我的鉴别器看起来像任何其他图像分类网络。 该类的独特之处在于它将层数(基于图像大小)作为参数。 这使我可以进行渐进式GAN的“渐进式”部分,而不必每次增加图像大小时都重写类。

class Discriminator(nn.Module):def __init__(self, ngpu, n_layers):super(Discriminator, self).__init__()self.ngpu = ngpuself.n_layers = n_layers# makes the desired number of convolutional layersself.layers = nn.ModuleList([nn.Conv2d(N_CHANNELS, N_DISC_CHANNELS * 2, 4, 2, 1, bias=False)])self.layers.extend([nn.Conv2d(N_DISC_CHANNELS * 2, N_DISC_CHANNELS * 2, 4, 2, 1, bias=False) for i in range(self.n_layers - 2)])self.layers.append(nn.Conv2d(N_DISC_CHANNELS * 2, 1, 4, 1, 0, bias=False))# transformationsself.batch2 = nn.BatchNorm2d(N_DISC_CHANNELS * 2)self.LeakyReLU = nn.LeakyReLU(0.2)self.sigmoid = nn.Sigmoid()def forward(self, x):for i, name in enumerate(self.layers):x = self.layers[i](x)if i == 0:x = self.LeakyReLU(x)            elif self.layers[i].out_channels == N_DISC_CHANNELS * 2:x = self.batch2(x)x = self.LeakyReLU(x)else:x = self.sigmoid(x)return x

The Generator. The generator is essentially the reverse of the discriminator. It takes a vector of random values as noise and uses transposed convolutional layers to scale up the noise into an image. The more layers I have the larger the end image.

发电机。 生成器本质上是鉴别器的反向。 它采用随机值向量作为噪声,并使用转置的卷积层将噪声放大为图像。 层越多,最终图像就越大。

class Generator(nn.Module):def __init__(self, ngpu, n_layers):super(Generator, self).__init__()self.ngpu = ngpuself.n_layers = n_layers# makes the desired number of transposed convo layersself.layers = nn.ModuleList([nn.ConvTranspose2d(GEN_INPUT_SIZE, N_GEN_CHANNELS * 2, 4, 1, 0, bias=False)])self.layers.extend([nn.ConvTranspose2d(N_GEN_CHANNELS * 2, N_GEN_CHANNELS * 2, 4, 2, 1, bias=False) for i in range(self.n_layers - 3)])self.layers.extend([nn.ConvTranspose2d(N_GEN_CHANNELS * 2, N_GEN_CHANNELS, 4, 2, 1, bias=False),nn.ConvTranspose2d(N_GEN_CHANNELS, N_CHANNELS, 4, 2, 1, bias=False)])                   # other transformationsself.batch1 = nn.BatchNorm2d(N_GEN_CHANNELS)self.batch2 = nn.BatchNorm2d(N_GEN_CHANNELS * 2)self.relu = nn.ReLU(True)self.tanh = nn.Tanh()def forward(self, x):for i, name in enumerate(self.layers):x = self.layers[i](x)if self.layers[i].out_channels == N_GEN_CHANNELS * 2:x = self.batch2(x)x = self.relu(x)elif self.layers[i].out_channels == N_GEN_CHANNELS:x = self.batch1(x)x = self.relu(x)else:x = self.tanh(x)return x

测试网络 (Testing the Network)

Before I dive into trying to generate abstract art, I first want to test my network to make sure things are set up correctly. To do this I’m going to run the network on a dataset of images from another GANs project and then see if I get similar results. The animeGAN project is a good fit for this use-case. For their project they used 143,000 images of anime characters’ faces to create a generator that makes new characters. After downloading their dataset, I ran my model for 100 epochs with a target image size of 32 pixels, and voila!

在尝试生成抽象艺术之前,我首先要测试我的网络以确保正确设置。 为此,我将在另一个GANs项目的图像数据集上运行网络,然后查看是否获得相似的结果。 animeGAN项目非常适合此用例。 在他们的项目中,他们使用了143,000张动漫人物面Kong图像来创建生成新角色的生成器。 下载他们的数据集后,我将模型运行了100个时间段,目标图像尺寸为32像素,瞧!

Image for post
Results from my GAN model
我的GAN模型的结果

The results are actually better than I expected. With these results, I’m confident that my network is set up correctly and I can move to my dataset.

结果实际上比我预期的要好。 有了这些结果,我相信我的网络设置正确并且可以移动到数据集。

训练 (Training)

Now it’s time to finally train the model on the art data. My initial image size is going to be a meager 32 pixels. I’ll train at this size for a while after which I’ll add an additional layer to the generator and discriminator to double the image size to 64. It’s just rinse and repeat until I get to a satisfactory image resolution. But how do I know when to progress on to the next size? There’s a lot of work that’s been done around this question; I’m going to take the simple approach of training until I get a GPU usage limit from Google and then I will manually check the results. If they look like they need more time, I’ll wait a day (so the usage limit is lifted) and train another round.

现在是时候对该艺术数据进行模型训练了。 我的初始图像大小将只有32个像素。 我将以这种尺寸训练一会儿,然后在生成器和鉴别器上添加一个额外的层,以将图像尺寸增加一倍,达到64。只是冲洗并重复直到获得令人满意的图像分辨率。 但是我怎么知道什么时候继续前进到下一个尺寸呢? 关于这个问题已经做了很多工作。 在我从Google获得GPU使用限制之前,我将采用简单的培训方法,然后我将手动检查结果。 如果他们看起来需要更多时间,我将等待一天(因此取消了使用限制)并进行另一轮训练。

Image for post
Hello darkness my old friend
你好,黑暗,我的老朋友

32 Pixel Results. My first set of results look great. Not only is there no sign of mode-collapse, the generator even replicated that some images include a frame.

32像素结果。 我的第一组结果看起来很棒。 不仅没有模式崩溃的迹象,生成器甚至复制了一些图像包含帧的信息。

Image for post
Generated images at size 32
生成的图像大小为32

64 and 128 Pixel Results. The 64 pixel results also turned out pretty well; however, by the time I increased the size to 128 pixels I was starting to see signs of mode collapse in the generator results.

64和128像素结果。 64像素的结果也很好。 但是,当我将大小增加到128像素时,我开始看到生成器结果中出现模式崩溃的迹象。

Image for post
Starting to see identical output
开始看到相同的输出

256 Pixel Results. By the time I got to this image size, mode-collapse had reduced the results to only about 3 or 4 types of images. I suspect this may have to do with my limited dataset. By the time I got to this resolution I only had about 1000 images, and it’s possible that the generator is just mimicking a few of the images in that collection.

256像素结果。 到我达到此图像大小时,模式崩溃将结果减少到仅约3或4种类型的图像。 我怀疑这可能与我有限的数据集有关。 到达到此分辨率时,我只有大约1000张图像,并且生成器可能只是在模仿该集合中的一些图像。

Image for post
Mode collapse
模式崩溃

结论 (Conclusion)

In the end my progressive GANs model didn’t progress very far. However, I am still amazed with what a fairly simple network was able to create. It was shocking when it generated anime faces or when it placed some of its generated paintings in frames. I understand why people consider GANs one of the greatest machine learning breakthroughs in recent years. For now this was just my hello world introduction to GANs, but I’ll probably be coming back.

在结束我的进步甘斯模型没有很远的进展 。 但是,我仍然对一个相当简单的网络能够创建的内容感到惊讶。 当它生成动漫面Kong或将其生成的某些绘画放置在框架中时,令人震惊。 我理解为什么人们将GAN视为近年来最大的机器学习突破之一。 目前,这只是我对GAN的介绍,但我可能会回来。

翻译自: https://towardsdatascience.com/i-tried-and-failed-to-use-gans-to-create-art-but-it-was-still-worth-it-c392bcd29f39

gan训练失败

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/390887.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

19.7 主动模式和被动模式 19.8 添加监控主机 19.9 添加自定义模板 19.10 处理图形中的乱码 19.11 自动发现...

2019独角兽企业重金招聘Python工程师标准>>> 19.7 主动模式和被动模式 • 主动或者被动是相对客户端来讲的 • 被动模式,服务端会主动连接客户端获取监控项目数据,客户端被动地接受连接,并把监控信息传递给服务端 服务端请求以后&…

华盛顿特区与其他地区的差别_使用华盛顿特区地铁数据确定可获利的广告位置...

华盛顿特区与其他地区的差别深度分析 (In-Depth Analysis) Living in Washington DC for the past 1 year, I have come to realize how WMATA metro is the lifeline of this vibrant city. The metro network is enormous and well-connected throughout the DMV area. When …

Windows平台下kafka环境的搭建

近期在搞kafka,在Windows环境搭建的过程中遇到一些问题,把具体的流程几下来防止后面忘了。 准备工作: 1.安装jdk环境 http://www.oracle.com/technetwork/java/javase/downloads/index.html 2.下载kafka的程序安装包: http://kafk…

铺装s路画法_数据管道的铺装之路

铺装s路画法Data is a key bet for Intuit as we invest heavily in new customer experiences: a platform to connect experts anywhere in the world with customers and small business owners, a platform that connects to thousands of institutions and aggregates fin…

IBM推全球首个5纳米芯片:计划2020年量产

IBM日前宣布,该公司已取得技术突破,利用5纳米技术制造出密度更大的芯片。这种芯片可以将300亿个5纳米开关电路集成在指甲盖大小的芯片上。 IBM推全球首个5纳米芯片 IBM表示,此次使用了一种新型晶体管,即堆叠硅纳米板,将…

async 和 await的前世今生 (转载)

async 和 await 出现在C# 5.0之后,给并行编程带来了不少的方便,特别是当在MVC中的Action也变成async之后,有点开始什么都是async的味道了。但是这也给我们编程埋下了一些隐患,有时候可能会产生一些我们自己都不知道怎么产生的Bug&…

项目案例:qq数据库管理_2小时元项目:项目管理您的数据科学学习

项目案例:qq数据库管理Many of us are struggling to prioritize our learning as a working professional or aspiring data scientist. We’re told that we need to learn so many things that at times it can be overwhelming. Recently, I’ve felt like there could be …

react 示例_2020年的React Cheatsheet(+真实示例)

react 示例Ive put together for you an entire visual cheatsheet of all of the concepts and skills you need to master React in 2020.我为您汇总了2020年掌握React所需的所有概念和技能的完整视觉摘要。 But dont let the label cheatsheet fool you. This is more than…

查询数据库中有多少个数据表_您的数据中有多少汁?

查询数据库中有多少个数据表97%. That’s the percentage of data that sits unused by organizations according to Gartner, making up so-called “dark data”.97 %。 根据Gartner的说法,这就是组织未使用的数据百分比,即所谓的“ 暗数据…

数据科学与大数据技术的案例_作为数据科学家解决问题的案例研究

数据科学与大数据技术的案例There are two myths about how data scientists solve problems: one is that the problem naturally exists, hence the challenge for a data scientist is to use an algorithm and put it into production. Another myth considers data scient…

Spring-Boot + AOP实现多数据源动态切换

2019独角兽企业重金招聘Python工程师标准>>> 最近在做保证金余额查询优化,在项目启动时候需要把余额全量加载到本地缓存,因为需要全量查询所有骑手的保证金余额,为了不影响主数据库的性能,考虑把这个查询走从库。所以涉…

leetcode 1738. 找出第 K 大的异或坐标值

本文正在参加「Java主题月 - Java 刷题打卡」&#xff0c;详情查看 活动链接 题目 给你一个二维矩阵 matrix 和一个整数 k &#xff0c;矩阵大小为 m x n 由非负整数组成。 矩阵中坐标 (a, b) 的 值 可由对所有满足 0 < i < a < m 且 0 < j < b < n 的元素…

商业数据科学

数据科学 &#xff0c; 意见 (Data Science, Opinion) “There is a saying, ‘A jack of all trades and a master of none.’ When it comes to being a data scientist you need to be a bit like this, but perhaps a better saying would be, ‘A jack of all trades and …

leetcode 692. 前K个高频单词

题目 给一非空的单词列表&#xff0c;返回前 k 个出现次数最多的单词。 返回的答案应该按单词出现频率由高到低排序。如果不同的单词有相同出现频率&#xff0c;按字母顺序排序。 示例 1&#xff1a; 输入: ["i", "love", "leetcode", "…

数据显示,中国近一半的独角兽企业由“BATJ”四巨头投资

中国的互联网行业越来越有被巨头垄断的趋势。百度、阿里巴巴、腾讯、京东&#xff0c;这四大巨头支撑起了中国近一半的独角兽企业。CB Insights日前发表了题为“Nearly Half Of China’s Unicorns Backed By Baidu, Alibaba, Tencent, Or JD.com”的数据分析文章&#xff0c;列…

Java的Servlet、Filter、Interceptor、Listener

写在前面&#xff1a; 使用Spring-Boot时&#xff0c;嵌入式Servlet容器可以通过扫描注解&#xff08;ServletComponentScan&#xff09;的方式注册Servlet、Filter和Servlet规范的所有监听器&#xff08;如HttpSessionListener监听器&#xff09;。 Spring boot 的主 Servlet…

leetcode 1035. 不相交的线(dp)

在两条独立的水平线上按给定的顺序写下 nums1 和 nums2 中的整数。 现在&#xff0c;可以绘制一些连接两个数字 nums1[i] 和 nums2[j] 的直线&#xff0c;这些直线需要同时满足满足&#xff1a; nums1[i] nums2[j] 且绘制的直线不与任何其他连线&#xff08;非水平线&#x…

SPI和RAM IP核

学习目的&#xff1a; &#xff08;1&#xff09; 熟悉SPI接口和它的读写时序&#xff1b; &#xff08;2&#xff09; 复习Verilog仿真语句中的$readmemb命令和$display命令&#xff1b; &#xff08;3&#xff09; 掌握SPI接口写时序操作的硬件语言描述流程&#xff08;本例仅…

个人技术博客Alpha----Android Studio UI学习

项目联系 这次的项目我在前端组&#xff0c;负责UI&#xff0c;下面简略讲下学到的内容和使用AS过程中遇到的一些问题及其解决方法。 常见UI控件的使用 1.TextView 在TextView中&#xff0c;首先用android:id给当前控件定义一个唯一标识符。在活动中通过这个标识符对控件进行事…

数据科学家数据分析师_站出来! 分析人员,数据科学家和其他所有人的领导和沟通技巧...

数据科学家数据分析师这一切如何发生&#xff1f; (How did this All Happen?) As I reflect on my life over the past few years, even though I worked my butt off to get into Data Science as a Product Analyst, I sometimes still find myself begging the question, …