针对深度学习的GPU芯片选择

转自:http://timdettmers.com/2014/08/14/which-gpu-for-deep-learning/

It is again and again amazing to see how much speedup you get when you use GPUs for deep learning: Compared to CPUs 10x speedups are typical, but on larger problems one can achieve 20x speedups. With GPUs you can try out new ideas, algorithms and experiments much faster than usual and get almost immediate feedback as to what works and what does not. If you do not have a GPU and you are serious about deep learning you should definitely get one. But which one should you get? In this blog post I will guide you through the choices to the GPU which is best for you.

Having a fast GPU is a very important aspect when one begins to learn deep learning as this rapid gain in practical experience is key to building the expertise with which you will be able to apply deep learning to new problems. Without this rapid feedback it just takes too much time to learn from one’s mistakes and it can be discouraging and frustrating to go on with deep learning.

With GPUs I quickly learned how to apply deep learning on a range of Kaggle competitions and I managed to earn second place in the Partly Sunny with a Chance of Hashtags Kaggle competition, where it was the task to predict weather ratings for a given tweet. In the competition I used a rather large two layered deep neural network with rectified linear units and dropout for regularization and this deep net fitted barely into my 6GB GPU memory. More details on my approach can be found here.

Should I get multiple GPUs?

Excited by what deep learning can do with GPUs I plunged myself into multi-GPU territory by assembling a small GPU cluster with InfiniBand 40Gbit/s interconnect. I was thrilled to see if even better results can be obtained with multiple GPUs.

I quickly found that it is not only very difficult to parallelize neural networks on multiple GPU efficiently, but also that the speedup was only mediocre for dense neural networks. Small neural networks could be parallelized rather efficiently using data parallelism, but larger neural networks like I used in the Partly Sunny with a Chance of Hashtags Kaggle competition received almost no speedup.

GPU picSetup in my main computer: You can see three GXT Titan and an InfiniBand card. Is this a good setup for doing deep learning?

However, using model parallelism, I was able to train neural networks that were much larger and that had almost 3 billion connections. But to leverage these connections one needs just much larger data sets which are uncommon outside of large companies – I found some for these large models when I trained a language model on the entire Wikipedia corpus, but that’s about it.

On the other hand, one advantage of multiple GPUs is that you can run multiple algorithms or experiments separately on each GPU. You gain no speedups, but you get more information of your performance by using different algorithms or parameters at once. This is highly useful if your main goal is to gain deep learning experience as quickly as possible and also it is very useful for researchers, who want try multiple version of a new algorithm at the same time.

If you use deep learning only occasionally, or you use rather small data sets (smaller than say 10-15GB) and foremost dense neural networks, then multiple GPUs are probably not for you. However, when you use convolutional neural networks a lot, then multiple GPUs might still make sense.

Alex Krizhevsky released his new updated code which can run convolutional neural networks on up to four GPUs. Convolutional neural networks – unlike dense neural networks – can be run very efficiently on multiple GPUs because their use of weight sharing makes data parallelism very efficient. On top of that, Alex Krizhevsky’s implementation utilizes model parallelism for the densely connected final layers of the network which gives additional gains in speed.

However, if you want to program similar networks yourself, be aware that to program efficient multiple GPU networks is a difficult undertaking which will consume much more time than programming a simple network on one GPU.

So overall, one can say that one GPU should be sufficient for almost any task and that additional GPUs convey only benefits under very specific circumstances (many for very, very large data set).

So what kind of GPU should I get? NVIDIA or AMD?

NVIDIA’s standard libraries made it very easy to establish the first deep learning libraries in CUDA, while there were no such powerful standard libraries for AMD’s OpenCL. Right now, there are just no good deep learning libraries for AMD cards – so NVIDIA it is. Even if some OpenCL libraries would be available in the future I would stick with NVIDIA: The thing is that the GPU computing or GPGPU community is very large for CUDA and rather small for OpenCL. Thus in the CUDA community good open source solutions and solid advice for your programming is readily available.

Required memory size for simple neural networks

People often ask me if the GPUs with the largest memory are best for them, as this would enable them to run the largest neural networks. I thought like this when I bought my GPU, a GTX Titan with 6GB memory. And I also thought that this was a good choice when my neural network in the Partly Sunny with a Chance of Hashtags Kaggle competition barely fitted in my GPU memory. But later I found out that my neural network implementation was very memory inefficient and much less memory would have been sufficient – so sometimes the underlying code is limiting rather than the GPU. Although standard deep learning libraries are efficient, these libraries are often optimized for speed rather than for memory efficiency.

There is an easy formula for calculating the memory requirements of a simple neural network. This formula is the memory requirement in GB for standard implementation of a simple neural network, with dropout and momentum/Nesterov/AdaGrad/RMSProp:

{\mbox{Memory in GB } = 12\times 1024^{-3}\left(\left(\sum\limits_{i=0}^{\mbox{weights}} \mbox{rows}_i\times \mbox{columns}_i \right) + \mbox{batchsize}\sum\limits_{i=0}^{\mbox{layers}} \mbox{units}_i \right)}
Memory formula: The units for the first layer is the dimensionality of the input. In words this formula means: Sum up the weight sizes and input sizes each; multiply the input sizes by the batch size; multiply everything by 4 for bytes and by another 3 for the momentum and gradient matrix for the first term, and the dropout and error matrix for the second term; divide to get gigabytes.

For the Kaggle competition I used a 9000x4000x4000x32 network with batchsize 128 which uses up:
{12\times 1024^{-3} ((9000\times 4000 + 4000\times 4000 + 4000 \times 32) + 128(9000+4000+4000+32)) \approx 0.62\mbox{GB}}
So this fits even into a small GPU with 1.5GBs memory. However, the world looks quite different when you use convolutional neural networks.

Understanding the memory requirements of convolutional nets

In the last update I made some corrections for the memory requirement of convolutional neural networks, but the topic was still unclear and lacked definite advice on the situation.

I sat down and tried to build a similar formula as for simple neural networks, but the many parameters that vary from one net to another, and the implementations that vary from one net to another made such a formula too impractical – it would just contain too many variables and would be too large to give a quick overview of what memory size is needed. Instead I want to give a practical rule of thumb. But first let us dive into the question why convolutional nets require so much memory.

There are generally two types of convolutional implementations. One uses Fourier transforms the other direct computation on image patches.

{\mbox{featuremap}({\bf x}, {\bf x_0}) = \int\limits_{-\infty}^\infty \mbox{input}({\bf x }- {\bf x_0})\mbox{kernel}({\bf x_0}), d{\bf x} = \sqrt{2\pi}\times \mbox{input}^\star\times\mbox{kernel}^\star}
Continuous convolution theorem with abuse in notation: The input represents an image or feature map and the subtraction of the argument can be thought of creating image patches with width{x_0}with respect to some x, which is then multiplied by the kernel. The integration turns into a multiplication in the Fourier domain; here{f^\star(x)}denotes a Fourier transformed function. For discrete “dimensions”(x) we have a sum instead of an integral – but the idea is the same.

The mathematical operation of convolution can be described by a simple matrix multiplication in the Fourier frequency domain. So one can perform a fast Fourier transform on the inputs and on each kernel and multiply them to obtain feature maps – the outputs of a convolutional layer. During the backward pass we do an inverse fast Fourier transform to receive gradients in the standard domain so that we can update the weights. Ideally, we store all these Fourier transforms in memory to save the time of allocating the memory during each pass. This can amount to a lot of extra memory and this is the chunk of memory that is added for the Fourier method for convolutional nets – holding all this memory is just required to make everything run smoothly.

The method that operates directly on image patches realigns memory for overlapping patches to allow contiguous memory access. This means that all memory addresses lie next to each other – there is no “skipping” of indices – and this allows much faster memory reads. One can imagine this operation as unrolling a square (cubic) kernel in the convolutional net into a single line, a single vector. Slow memory access is probably the thing that hurts an algorithms performance the most and this prefetching and aligning of memory makes the convolution run much faster. There is much more going on in the CUDA code for this approach of calculating convolutions, but prefetching of inputs or pixels seems to be the main reason of increased memory usage.

Another memory related issue which is general for any convolutional net implementation is that convolutional nets have small weight matrices due to weight sharing, but much more units then densely connected, simple neural networks. The result is that the second term in the equation for simple neural networks above is much larger for convolutional neural networks, while the first term is smaller. So this gives another intuition where the memory requirements come from.

I hope this gives an idea about what is going on with memory in convolutional neural nets. Now we have a look at what practical advice might look like.

Required memory size for convolutional nets

Now that we understand that implementations and architectures vary wildly for convolutional nets we know that it might be best for look for other ways to gauge memory requirements.
In general, if you have enough memory, convolutional neural nets can be made nearly arbitrarily large for a given problem (dozen of convolutional layers). However, you will soon run into problems with overfitting, so that larger nets will not be any better than smaller ones. Therefore data set size and the label class size might serve well as a gauge of how big you can make your net and in turn how large your GPU memory needs to be.

One example is here the Kaggle plankton detection competition. At first I thought about entering the competition as I might have a huge advantage through my 4 GPU system. I reasoned I might be able to train a very large convolutional net in a very short time – one thing that others cannot do because they lack the hardware. However, due to the small data set (about 50×50 pixels, 2 color channels, 400k training images; about 100 classes) I quickly found that overfitting was an issue even for small nets that neatly fit into one GPU and which are fast to train. So there was hardly a speed advantage of multiple GPUs and not any advantage at all of having a large GPU memory.

If you look at the ImageNet competition, you have 1000 classes, over a million 250×250 images with three color channels – that’s more than 250GB of data. Here Alex Krizhevsky’s original convolutional net did not fit into a single 3GB memory (but it did not use much more than 3GB). For ImageNet 6GB will usually be sufficient for a competitive net. You can receive slightly better results if you throw dozens of high memory GPUs at the problem, but the improvement is only marginal compared to the resources that you need for that.

I think it is very likely, that the next breakthrough in deep learning will be done with a single GPU by some researchers that try new recipes, rather than by researchers that use GPU clusters and try variations of the same recipe. So if you are a researcher you should not fear a small memory size – a faster GPU, or multiple smaller GPUs will often give you a better experience than a single large GPU.

Right now, there are not many relevant data sets that are larger than ImageNet and thus for most people a 6GB GPU memory should be plenty; if you want to invest into a smaller GPU and are unsure if 3GB or 4GB is okay, a good rule might to look at the size of your problems when compared to ImageNet and judge from that.

Overall, I think memory size is overrated. You can nicely gain some speedups if you have very large memory, but these speedups are rather small. I would say that GPU clusters are nice to have, but that they cause more overhead than the accelerate progress; a single 12GB GPU will last you for 3-6 years; a 6GB GPU is good for now; a 4GB GPU is good but might be limiting on some problems; and a 3GB GPU will be fine for most research that tests new architectures and algorithms on small data sets.

Fastest GPU for a given budget

Processing performance is most often measured in floating-point operations per second (FLOPS). This measure is often advertised in GPU computing and it is also the measure which determines which supercomputer enters the TOP500 list of the fastest supercomputers. However, this measure is misleading, as it measures processing power on problems that do not occur in practice.
It turns out that the most important practical measure for GPU performance is bandwidth in GB/s, which measures how much memory can be read and written per second. This is because almost all mathematical operations, such as dot product, sum, addition etcetera, are bandwidth bound, i.e. limited by the GB/s of the card rather than its FLOPS.

memory-bandwidth Comparison of bandwidth for CPUs and GPUs over time:Bandwidth is one of the main reasons why GPUs are faster for computing than CPUs are.

To determine the fastest GPU for a given budget one can use this Wikipedia page and look at Bandwidth in GB/s; the listed prices are quite accurate for newer cards (700 and 900 series), but older cards are significantly cheaper than the listed prices – especially if you buy those cards via eBay. Sometimes cryptocurreny mining benchmarks also are a reliable gauge of performance (you could gauge the performance of the GTX980 nicely before there were any deep learning benchmarks). But beware, some cryptocurreny mining benchmarks are compute bound and thus are uninformative for deep learning performance.

Another important factor to consider however, is that the Maxwell and Fermi architecture (Maxwell 900 series; Fermi 400 and 500 series) are quite a bit faster than the Kepler architecture (600 and 700 series); so that for example the GTX 580 is faster than any GTX 600 series GPU. The new Maxwell GPUs are significantly faster than most Kepler GPUs, and you should prefer Maxwell if you have the money. If you cannot afford a GTX Titan X or a GTX 980, a 4GB GTX 960 or a GTX 680 from eBay will be untroubled cheap choices. If you run primarily dense and recurrent neural networks a GTX 970 will be a very solid option, but if you use convolutional networks heavily the 3.5GB memory and its weird architecture (see below) will cause you a lot of trouble; a GTX 970 is not recommended in that case. Previously, I recommended a GTX 580 as a cheap solution, but I no longer favor this GPU due to the updated cuDNN library that features fast implementation of convolution. In its new update, cuDNN is significantly faster and it is to be expected that more and more libraries will integrate with cuDNN. The bad thing about the GTX 580 is that the card is too dated to be compatible with cuDNN, so I no longer recommend the GTX 580.

To give a rough estimate of how the cards perform with respect to each other on deep learning tasks:

GTX Titan X = GTX 980 Ti = 0.66 GTX 980 = 0.6 GTX 970 = 0.5 GTX Titan

GTX Titan X = 0.35 GTX 680 = 0.35 AWS GPU instance (g2.2 and g2.8) = 0.33 GTX 960

The 700 series is outdated, but the GTX Titan is still interesting as a cost effective 6GB option. The GTX 970 might also be an option, but along with the GTX 580 there are some thing you will need to consider.

Cheap but troubled

The GTX 970 is a special case you need to be aware of. The GTX 970 has a weird architecture which may cripple performance if more than 3.5GB is used and so it might be troublesome to use if you train large convolutional nets. The problem had been dramatized quite a bit in the news, but it turned out that the whole problem was not as dramatic for deep learning as the original benchmarks showed: If you do not go above 3.75GB, it will still be faster than a GTX 960.

To manage the memory problem, it would still be best, if you would learn to extend libraries with your own software routines which will alert you when you hit the performance decreasing memory zone above 3.5GB – if you do this, then the GTX 970 is an excellent, very cost efficient card that is just limited to 3.5GB. You might get the GTX 970 even cheaper on eBay, where disappointed gamers – who do not have control over the memory routines – may sell it cheaply. Other people made it clear to me that they think that a recommendation for the GTX 970 is crazy, but if you are short on money this is just better than no GPU, and other cheap GPUs like the GTX 580 have troubles too.

The GTX 580 is now a bit dated and the important cuDNN library does not support it, but you can still use libraries like cuda-convnet 1 and deepnet for convolution, which are both excellent libraries, but you should be aware that you cannot use (or not fully) libraries like torch7. If you do not use convolutional nets at all, the GTX 580 might still be a good choice.

If you do not like all these troubles, then go with a 4GB GTX 960 or a GTX 680 form eBay for a cheap solution with no troubles.

Amazon Web Services (AWS) GPU instances

Amazon web services instances can be a good option if you lack the money for a dedicated machine. It is also a very good option if you need to run multiple small experiments. Be aware however, that it is hardly possible to use multiple GPUs on a single deep learning architecture: The problem with AWS is that it uses specialized GPUs which support virtualization (the real machine has 8 GPUs in one computer; the virtualized machine only 1-4), and it is this virtualization which cripples the bandwidth between the GPUs. There are some patches which you can apply on the server to improve this but this will not completely remove the problem either. In the end, most algorithms will run more slowly on multiple GPUs on a AWS instance compared to a single GPU. But in the end this is only a minor issue, and I raise this point only so that you do not waste your time on parallelisation on AWS.

The true beauty of AWS are spot instances: Spot instances are very cheap virtual computers which you usually rent for a couple of hours to run a algorithm and after you have completed your algorithm you shut them down again. For about $1.5 you can rent a AWS GPU spot instance for two hours with which you can run easily 4 experiments on MNIST concurrently and do a total of 160 experiments in that two hours. You can use the same time to run full 8-12 CIFAR-10 or CIFAR-100 experiments.

It will generally be difficult to run ImageNet on AWS GPU instances because they offer only 4GB of memory. With 4GB that you will be able to run some models, but you will need more RAM to run newer, and more successful architectures. Another difficulty is the dataset size of ImageNet for which you need to rent extra space for your instance.

The usage of an AWS instances is quite simple, but it may take a couple of hours until you get used to the process of setting up an instance, logging into the instance and using the instance, but after that it will become easy to fire up some GPU instances to do some (small) deep learning work. Installing all necessary programs on a AWS instance can be a pain and this is why amazon created a feature where you can launch operating systems with pre-installed software (rather than a naked OSes). These pre-installed packages are called amazon machine images (AMIs) and you can use any public AMI which is available in your selected region (you can always change the region at the top right of your AWS console). There are many different AMIs for deep learning, where all deep learning software is already pre-installed (try google or the option introduced below) — so a GPU instance with all deep learning software is essentially only two steps away!

Moneywise, AWS GPU instances will be quite cheap not only for short experiments but also in the long run. The main disadvantages compared to a dedicated system is that AWS instances much slower and will not allow you to work with large data sets easily.

Another problem can be delay between the server and your computer which can make working with an AWS instance a real pain (especially in the Asian region, so I have heard); another problem is that you only have a console to work with. However, all this hassle can be reduced somewhat with IPython and iTorch notebooks from which you can execute theano and torch and code in your browser as an interactive session which you can save and load. A manual of how to get the IPython and iTorch notebooks working for deep learning on a AWS instance can be found here (the only problem is that you need to download cuDNN yourself).

In the developing world, even a cheap deep learning system can create big holes in one’s pocket and this is the point where AWS GPU instance can help you out — if you have very little money AWS will just be the best choice for you.

Another use-case is to use AWS GPU instances to run multiple experiments. While this use-case it not so typical for everyday deep learning work, it is a blessing if you really want to learn how to train deep learning architectures. If you grab CIFAR-10 or CIFAR-100 and run four different convolutional nets on a large GPU instance you will very quickly get the hang of training convolutional nets successfully.

Conclusion

With all this information in this article you should be able to reason which GPU to choose by balancing the required memory size, bandwidth in GB/s for speed and the price of the GPU, and this reasoning will be solid for many years to come. But right now my recommendation is to get a GTX Titan X or GTX980 if you have the money, a GTX 960 or GTX 680 from eBay for cheap solution, and if you are fine with their problems a 3GB GTX 580 or a GTX 970 from eBay might be suitable. If you need cheap memory for large convolutional nets a 6GB GTX Titan from eBay will be good (Titan Black if you have the money). If you have very little money then AWS GPU spot instances will be the best choice.

TL;DR advice

Best GPU overall: GTX Titan X
Cost efficient but expensive: GTX Titan X, GTX 980, GTX 980 Ti
Cost efficient but troubled: GTX 580 3GB (lacks software support) or GTX 970 (has memory problem)
Cheapest card with no troubles: GTX 960 4GB or GTX 680
I work with data sets > 250GB: GTX Titan, GTX 980 Ti or GTX Titan X
I have little money: GTX 680 3GB eBay
I have almost no money: AWS GPU spot instance
I do Kaggle: GTX 980 or GTX 960 4GB
I am a researcher: 1-4x GTX Titan X
I want to build a GPU cluster: This is really complicated, you can get some ideas here
I started deep learning and I am serious about it: Start with one GTX 680, GTX 980, or GTX 970 and buy more of those as you feel the need for them; save money for Pascal GPUs in 2016 Q2/Q3 (they will be much faster than current GPUs)

Update 2014-09-28: Added emphasis for memory requirement of CNNs
Update 2015-02-23: Updated GPU recommendations and memory calculations
Update 2015-03-16: Updated GPU recommendations: GTX 970 and GTX 580
Update 2015-04-22: GTX 580 no longer recommended; added performance relationships between cards
Update 2015-08-20: Added section for AWS GPU instances; added GTX 980 Ti to the comparison relation

Acknowledgements

I want to thank Mat Kelcey for helping me to debug and test custom code for the GTX 970; I want to thank Sander Dieleman for making me aware of the shortcomings of my GPU memory advice for convolutional nets; I want to thank Hannes Bretschneider for pointing out software dependency problems for the GTX 580; and I want to thank Oliver Griesel for pointer out notebook solutions for AWS instances.

[Image source: http://docs.nvidia.com/cuda/cuda-c-programming-guide/#axzz3AI18t18Z]


本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/313375.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

C# 8 - Range 和 Index(范围和索引)

C# 7 的 Span C# 7 里面出现了Span这个数据类型,它可以表示另一个数据结构里连续相邻的一串数据,并且它是内存安全的。 例子: 这个图的输出是3,4,5,6。 C# 8 的Range类型 而C# 8里面我们可以从一个序列里面…

DCT变换学习

http://blog.csdn.net/timebomb/article/details/5960624 timebomb的博客 DCT变换的基本思路是将图像分解为88的子块或1616的子块,并对每一个子块进行单独的DCT变换,然后对变换结果进行量化、编码。随着子块尺寸的增加,算法的复杂度急剧上升…

敏捷回顾会议的套路与实践分享

01—关于敏捷回顾会议实践过敏捷的人都知道,在敏捷中会有很多的会议要开,比如计划会议(Planning)、站立会议(Daily Scrum)、评审会议(Review)以及回顾会议(Retrospective…

.Net Core AA.FrameWork应用框架介绍

开发多年,一直在从社区获取开源的便利,也深感社区力量的重要性,今天开源一个应用基础框架AA.FrameWork,也算是回馈社区,做出一点点贡献,希望能够帮助类似当年入行的我。AA.FrameWork 是基于.NET core流行的开源类库创建…

RBM/DBN训练中的explaining away概念

可以参照 Stanford大神DaphneKoller的概率图模型,里面贝叶斯网络一节讲到了explaining away。我看过之后试着谈谈自己的理解。 explainingaway指的是这样一种情况:对于一个多因一果的问题,假设各种“因”之间都是相互独立的,如果…

.NET Core使用gRPC打造服务间通信基础设施

一、什么是RPCrpc(远程过程调用)是一个古老而新颖的名词,他几乎与http协议同时或更早诞生,也是互联网数据传输过程中非常重要的传输机制。利用这种传输机制,不同进程(或服务)间像调用本地进程中…

DBN训练学习-A fast Learning algorithm for deep belief nets

转载自:http://blog.sciencenet.cn/blog-110554-889016.html DBN的学习一般都是从Hinton的论文A Fast Learning Algorithm for Deep Belief Nets开始。如果没有相关的背景知识,不知道问题的来由,这篇论文读起来是相当费劲的。 学习过程中主…

程序员家的大闸蟹:青壳、白底、肉多、爆黄,现在是吃大闸蟹的最佳时期

其实,我跟大家一样,也是dotNET跨平台和张队长的忠实粉丝,也是一名程序员。上上周,我在dotNET跨平台的优选商城买了队长推荐人生果,也是第一次吃这个人生果,味道鲜甜、汁水也特别多,但由于快递的…

环形链表II

1、题目描述 给定一个链表,返回链表开始入环的第一个节点。 如果链表无环,则返回 null。 为了表示给定链表中的环,我们使用整数 pos 来表示链表尾连接到链表中的位置(索引从 0 开始)。 如果 pos 是 -1,则…

.NET Core Love gRPC

这篇内容主要来自Microsoft .NET团队程序经理Sourabh Shirhatti的博客文章:https://grpc.io/blog/grpc-on-dotnetcore/, .NET Core 3.0现已提供grpc的.NET 托管实现 grpc-dotnet, gRpc 取代WCF成为 .NET的一等公民。自2018年11月以来&#xf…

Magicodes.IE已支持通过模板导出票据

本次更新如下:【重构】重构HTML、PDF导出等逻辑,并修改IExporterByTemplate为:Task ExportListByTemplate(IList dataItems, string htmlTemplate null) where T : class;Task ExportByTemplate(T data, string htmlTemplate null) where T…

complementary prior

Complementary Prior 解决了多层网络中层间参数W无法计算的问题。 多层有向网络如下图,如果计算 W,我们需要知道第一个隐层的后验概率,那么会遇到几个问题:多层sigmoid网络1)后验概率几乎不可计算,因为所谓…

Memcached

本文来自 58沈剑:https://mp.weixin.qq.com/s/zh9fq_e2BgdIeR8RKtY6Sg memcache是互联网分层架构中,使用最多的的KV缓存。面试的过程中,memcache相关的问题几乎是必问的,关于memcache的面试提问,你能回答到哪一个层次…

使用FastReport报表工具生成报表PDF文档

在我们开发某个系统的时候,客户总会提出一些特定的报表需求,固定的报表格式符合他们的业务处理需要,也贴合他们的工作场景,因此我们尽可能做出符合他们实际需要的报表,这样我们的系统会得到更好的认同感。本篇随笔介绍…

DXSDK_June10安装错误

今天安装DXSDK_Jun10时(下载地址:http://download.microsoft.com/download/A/E/7/AE743F1F-632B-4809-87A9-AA1BB3458E31/DXSDK_Jun10.exe),出现错误Error Code:s1023 错误原因: 计算机上有安装过更新版的Microsoft Vi…

相交链表

1、题目描述 编写一个程序,找到两个单链表相交的起始节点。 如下面的两个链表: 在节点 c1 开始相交。 示例 1: 输入:intersectVal 8, listA [4,1,8,4,5], listB [5,0,1,8,4,5], skipA 2, skipB 3 输出:Refe…

Kullback-Leibler Divergence

本文转自:http://www.cnblogs.com/ywl925/p/3554502.html KL距离,是Kullback-Leibler差异(Kullback-Leibler Divergence)的简称,也叫做相对熵(Relative Entropy)。它衡量的是相同事件空间里的两…

Shader 坐标转换

转自:http://www.ownself.org/blog/2010/kong-jian-zuo-biao-zhuan-huan.html 这个比较基础了,不过基础最重要,往往应该理解透彻,并且反复复习。 我们知道在3D画面渲染过程中对于模型的计算的一部分被称为Transforming and Ligh…

致所有.Net者和有梦想的朋友们 - 共勉

这篇文章很早就想写的了,主要是人到了一定的年纪,就想唠叨一些看法,认不认可不重要,重要的是生活给予你的酸甜苦辣,你都想一吐为快。 这里主要基于多年来自己的一个行业感受和以及生活感想,唠叨一下工作以及…

ReLU的起源

论文参考:Deep Sparse Rectifier Neural Networks 网页参考:http://www.mamicode.com/info-detail-873243.html 起源:传统激活函数、脑神经元激活频率研究、稀疏激活性 传统Sigmoid系激活函数 传统神经网络中最常用的两个激活函数&…