Deep Learning运行所需的硬件配置(转)

A Full Hardware Guide to Deep Learning

转自:http://timdettmers.com/2015/03/09/deep-learning-hardware-guide/

Deep Learning is very computationally intensive, so you will need a fast CPU with many cores, right? Or is it maybe wasteful to buy a fast CPU? One of the worst things you can do when building a deep learning system is to waste money on hardware that is unnecessary. Here I will guide you step by step through the hardware you will need for a cheap high performance system.

In my work on parallelizing deep learning I built a GPU cluster for which I needed to make careful hardware selections. Despite careful research and reasoning I made my fair share of mistakes when I selected the hardware parts which often became clear to me when I used the cluster in practice. Here I want to share what I have learned so you will not step into the same traps as I did.

GPU

This blog post assumes that you will use a GPU for deep learning. If you are building or upgrading your system for deep learning, it is not sensible to leave out the GPU. The GPU is just the heart of deep learning applications – the improvement in processing speed is just too huge too ignore.

I talked at length about GPU choice in my previous blog post, and the choice of your GPU is probably the most critical choice for your deep learning system. Generally, I recommend a GTX 680 from eBay if you lack money, a GTX Titan X (if you have the money; for convolution) or GTX 980 (very cost effective; a bit limited for very large convolutional nets) for the best current GPUs, a GTX Titan from eBay if you need cheap memory. I supported the GTX 580 before, but due to new updates to the cuDNN library which increase the speed of convolution dramatically, all GPUs that do not support cuDNN have become obsolete — the GTX 580 is such a GPU. If you do not use convolutional nets at all however, the GTX 580 is still a solid choice.

Suspect line-up
Can you identify the hardware part which is at fault for bad performance? One of these GPUs? Or maybe it is the fault of the CPU after all?

CPU

To be able to make a wise choice for the CPU we first need to understand the CPU and how it relates to deep learning. What does the CPU do for deep learning? The CPU does little computation when you run your deep nets on a GPU, but your CPU does still work on these things:

  • Writing and reading variables in your code
  • Executing instructions such as function calls
  • Initiating function calls on your GPU
  • Creating mini-batches from data
  • Initiating transfers to the GPU

Needed number of CPU cores

When I train deep neural nets with three different libraries I always see that one CPU thread is at 100% (and sometimes another thread will fluctuate between 0 and 100% for some time). And this immediately tells you that most deep learning libraries – and in fact most software applications in general – just use a single thread. This means that multi-core CPUs are rather useless. If you run multiple GPUs however and use parallelization frameworks like MPI, then you will run multiple programs at once and you will need multiple threads also. You should be fine with one thread per GPU, but two threads per GPU will result in better performance for most deep learning libraries; these libraries run on one core, but sometimes call functions asynchronously for which a second CPU thread will be utilized. Remember that many CPUs can run multiple threads per core (that is true especially for Intel CPUs), so that one core per GPU will often suffice.

CPU and PCI-Express

It’s a trap! Some new Haswell CPUs do not support the full 40 PCIe lanes that older CPUs support – avoid these CPUs if you want to build a system with multiple GPUs. Also make sure that your processor actually supports PCIe 3.0 if you have a motherboard with PCIe 3.0.

CPU cache size

As we shall see later, CPU cache size is rather irrelevant further along the CPU-GPU-pipeline, but I included a short analysis section anyway so that we make sure that every possible bottleneck is considered along this pipeline and so that we can get a thorough understanding of the overall process.

CPU cache is often ignored when people buy a CPU, but generally it is a very important piece in the overall performance puzzle. The CPU cache is very small amount of on chip memory, very close to the CPU, which can be used for high speed calculations and operations. A CPU often has a hierarchy of caches, which stack from small, fast caches (L1, L2), to slow, large caches (L3, L4). As a programmer, you can think of it as a hash table, where every entry is a key-value-pair, and where you can do very fast lookups on a specific key: If the key is found, one can perform fast read and write operations on the value in the cache; if the key is not found (this is called a cache miss), the CPU will need to wait for the RAM to catch up and will then read the value from there – a very slow process. Repeated cache misses result in significant decreases in performance. Efficient CPU caching procedures and architectures are often very critical to CPU performance.

How the CPU determines its caching procedure is a very complex topic, but generally one can assume that variables, instructions, and RAM addresses that are used repeatedly will stay in the cache, while less frequent items do not.

In deep learning, the same memory is read repeatedly for every mini-batch before it is sent to the GPU (the memory is just overwritten), but it depends on the mini-batch size if its memory can be stored in the cache. For a mini-batch size of 128, we have 0.4MB and 1.5 MB for MNIST and CIFAR, respectively, which will fit into most CPU caches; for ImageNet, we have more than 85 MB ({4\times 128\times 244^2\times 3\times 1024^{-2}}) for a mini-batch, which is much too large even for the largest cache (L3 caches are limited to a few MB).

Because data sets in general are too large to fit into the cache, new data need to be read from the RAM for each new mini-batch – so there will be a constant need to access the RAM either way.

RAM memory addresses stay in the cache (the CPU can perform fast lookups in the cache which point to the exact location of the data in RAM), but this is only true if your whole data set fits into your RAM, otherwise the memory addresses will change and there will be no speed up from caching (one might be able to prevent that when one uses pinned memory, but as you shall see later, it does not matter anyway).

Other pieces of deep learning code – like variables and function calls – will benefit from the cache, but these are generally few in number and fit easily into the small and fast L1 cache of almost any CPU.

From this reasoning it is sensible to conclude, that CPU cache size should not really matter, and further analysis in the next sections is coherent with this conclusion.

Needed CPU clock rate (frequency)

When people think about fast CPUs they usually first think about the clock rate.  4GHz is better than 3.5GHz, or is it? This is generally true for comparing processors with the same architecture, e.g. “Ivy Bridge”, but it does not compare well between processors. Also it is not always the best measure of performance.

In the case of deep learning there is very little computation to be done by the CPU: Increase a few variables here, evaluate some Boolean expression there, make some function calls on the GPU or within the program – all these depend on the CPU core clock rate.

While this reasoning seems sensible, there is the fact that the CPU has 100% usage when I run deep learning programs, so what is the issue here? I did some CPU core rate underclocking experiments to find out.

CPU underclocking on MNIST and ImageNet: Performance is measured as time taken on 100 epochs MNIST or half an epoch on ImageNet with different CPU core clock rates, where the maximum clock rate is taken as a base line for each CPU. For comparison: Upgrading from a GTX 580 to a GTX Titan is about +20% performance; from GTX Titan to GTX 980 another +30% performance; GPU overclocking yields about +5% performance for any GPUCPU underclocking on MNIST and ImageNet: Performance is measured as time taken on 200 epochs MNIST or a quarter epoch on ImageNet with different CPU core clock rates, where the maximum clock rate is taken as a base line for each CPU. For comparison: Upgrading from a GTX 680 to a GTX Titan is about +15% performance; from GTX Titan to GTX 980 another +20% performance; GPU overclocking yields about +5% performance for any GPU

So why is the CPU usage at 100% when the CPU core clock rate is rather irrelevant? The answer might be CPU cache misses: The CPU is constantly busy with accessing the RAM, but at same time the CPU has to wait for the RAM to catch up with its slower clock rate and this might result in a paradoxically busy-with-waiting state. If this is true, then underclocking the CPU core would not result in dramatic decreases in performance – just like the results you see above.

The CPU also performs other operations, like copying data into mini-batches, and preparing data to be copied to the GPU, but these operations depend on the memory clock rate and not the CPU core clock rate. So now we look at the memory.

Needed RAM clock rate

CPU-RAM and other interactions with the RAM are quite complicated. I will here show a simplified version of the process. Lets dive in and dissect this process from CPU RAM to GPU RAM for a more thorough understanding.

The CPU memory clock and RAM are intertwined. The memory clock of your CPU determines the maximum clock rate of your RAM and both pieces are the overall memory bandwidth of your CPU, but usually the RAM itself determines the overall available bandwidth because it can be slower than the CPU memory rate. You can determine the bandwidth like this:

{\mbox{bandwidth in GB/s} =\mbox{RAM clock in GHz}\times \mbox{memory channels of CPU}\times 64\times 8^{-1}}

Where the 64, is for a 64-bit CPU architecture. For my processors and RAM modules the bandwidth is 51.2GB/s.

However, the bandwidth is only relevant if you copy large amounts of data. Usually the timings – for example 8-8-8 – on your RAM are more relevant for small pieces of data and determine how long your CPU has to wait for your RAM to catch up. But as I outlined above, almost all data from your deep learning program will either easily fit into the CPU cache, or will be much too large to benefit from caching. This implies that timings will be rather unimportant and that bandwidth might be important.

So how does this relate to deep learning programs? I just said that bandwidth might be important, but this is not so when we look at the next step in the process. The memory bandwidth of your RAM determines how fast a mini-batch can be overwritten and allocated for initiating a GPU transfer, but the next step, CPU-RAM-to-GPU-RAM is the true bottleneck – this step makes use of direct memory access (DMA). As quoted above, the memory bandwidth for my RAM modules are 51.2GB/s, but the DMA bandwidth is only 12GB/s!

The DMA bandwidth relates to the regular bandwidth, but the details are unnecessary and I will just refer you to this Wikipedia entry, in which you can look up the DMA bandwidth for RAM modules (peak transfer limit). But lets have a look at how DMA works.

Direct memory access (DMA)

The CPU with its RAM can only communicate with a GPU through DMA. In the first step, a specific DMA transfer buffer is reserved in both CPU RAM and GPU RAM; in the second step the CPU writes the requested data into the CPU-side DMA buffer; in the third step the reserved buffer is transferred to your GPU RAM without any help of the CPU. Your PCIe bandwidth is 8GB/s (PCIe 2.0) or 15.75GB/s (PCIe 3.0), so you should get a RAM with a good peak transfer limit as determined from above, right?

Not necessarily. Software plays a big role here. If you do some transfers in a clever way, you will get away with cheaper slower memory. Here is how.

Asynchronous mini-batch allocation

Once your GPU finished computation on the current mini-batch, it wants to immediately work on the next mini-batch. You can now of course, initiate a DMA transfer and then wait for the transfer to complete so that your GPU can continue to crunch numbers. But there is a much more efficient way: Prepare the next mini-batch in advance so that your GPU does not have to wait at all. This can be done easily and asynchronously with no degradation in GPU performance.

CUDA Code for asynchronous mini-batch allocation: The first calls are made when the GPU starts with the current batch. The last calls are made when the GPU finished with the current batch. The transfer of the data will be completed long before the stream is synchronized in the second step, so there will be no delay for the GPU to begin with the next batch.CUDA Code for asynchronous mini-batch allocation: The first two calls are made when the GPU starts with the current batch; the last two calls are made when the GPU finished with the current batch. The transfer of the data will be completed long before the stream is synchronized in the second step, so there will be no delay for the GPU to begin with the next batch.

An ImageNet 2012 mini-batch of size 128 for Alex Krishevsky’s convolutional net takes 0.35 seconds for a full backprop pass. Can we allocate the next batch in this time?

If we take the batch size to be 128 and the dimensions of the data 244x244x3 that is a total of roughly 0.085 GB ({4\times 128\times 244^2\times 3\times 1024^{-3}}). With an ultra-slow memory we have 6.4 GB/s, or in other terms 75 mini-batches per second! So with asynchronous mini-batch allocation even the slowest RAM will be more than sufficient for deep learning. There is no advantage in buying faster RAM modules if you use asynchronous mini-batch allocation.

This procedure also implies indirectly that the CPU cache is irrelevant. It does not really matter how fast your CPU can overwrite (in the fast cache) and prepare (write the cache to RAM) a mini-batch for a DMA transfer, because the whole transfer will be long completed before your GPU requests the next mini-batch – so a large cache really does not matter much.

So the bottom line is really that the RAM clock rate is irrelevant. Buy what is cheap – end of story.

But how much should you buy?

RAM size

You should have at least the same RAM size as your GPU has. You could work with less RAM, but you might need to transfer data step by step. From my experience however, it is much more comfortable to work with more RAM.

Psychology tells us that concentration is a resource that is depleted over time. RAM is one of the few hardware pieces that allows you to conserve your concentration resource for more difficult programming problems. Rather than spending lots of time on circumnavigating RAM bottlenecks, you can invest your concentration on more pressing matters if you have more RAM.  With a lot of RAM you can avoid those bottlenecks, save time and increase productivity on more pressing problems. Especially in Kaggle competitions I found additional RAM very useful for feature engineering. So if you have the money and do a lot of pre-processing then additional RAM might be a good choice.

Hard drive/SSD

A hard drive can be a significant bottleneck in some cases for deep learning. If your data set is large you will typically have some of it on your SSD/hard drive, some of it in your RAM, and two mini-batches in your GPU RAM. To feed the GPU constantly, we need to provide new mini-batches with the same rate as the GPU can go through each of them.

For this to be true we need to use the same idea as asynchronous mini-batch allocation. We need to read files with multiple mini-batches asynchronously – this is really important! If we do not do this asynchronously you will cripple your performance by quite a bit (5-10%) and render your carefully crafted advantages in hardware useless  – good deep learning software will run faster on a GTX 680, than bad deep learning software on a GTX 980.

With this in mind, we have in the case of the Alex’s ImageNet convolutional net 0.085GB ({4\times 128\times 244^2\times 3\times 1024^{-3}}) every 0.3 seconds, or 290MB/s if we save the data as 32 bit floating data. If we however save it as jpeg data, we can compress it 5-15 fold bringing down the required read bandwidth to about 30MB/s. If we look at hard drive speeds we typically see speeds of 100-150MB/s, so this will be sufficient for data compressed as jpeg. Similarly, one is able to use mp3 or other compression techniques for sound files, but for other data sets that deal with raw 32 bit floating point data it is not possible to compress data so well: We can compress 32 bit floating point data by only 10-15%.  So if you have large 32 bit data sets, then you will definitely need a SSD, as hard drives with a speed of 100-150 MB/s will be too slow to keep up with your GPU – so if you work with such data get a SSD, otherwise a hard drive will be fine.

Many people buy a SSD for comfort: Programs start and respond more quickly, and pre-processing with large files is quite a bit faster, but for deep learning it is only required if your input dimensions are high and you cannot compress your data sufficiently.

If you buy a SSD you should get one which is able to hold data sets of sizes you typically work with, with an additional few tens of GBs extra space. It is also a good idea to also get a hard drive to store your unused data sets on.

Power supply unit (PSU)

Generally, you want a PSU that is sufficient to accommodate all your future GPUs. GPUs typically get more energy efficient over time; so while other components will need to be replaced, a PSU should last a long while so a good PSU is a good investment.

You can calculate the required watts by adding up the watt of your CPU and GPUs with an additional 100-300 watts for other components and as a buffer for power spikes.

One important part to be aware of is if the PCIe connectors of your PSU are able to support a 8pin+6pin connector with one cable. I bought one PSU which had 6x PCIe ports, but which was only able to power either a 8pin or 6pin connector, so I could not run 4 GPUs with that PSU.

Another important thing is to buy a PSU with high power efficiency rating – especially if you run many GPUs and will run them for a longer time.

Running a 4 GPU system on full power (1000-1500 watts) to train a convolutional net for two weeks will amount to 300-500 kWh, which in Germany – with rather high power costs of 20 cents per kWh – will amount to 60-100€ ($66-111). If this price is for a hundred per-cent efficiency, then training such a net with a 80% power supply would increase the costs by an additional 18-26€ – ouch! This is much less for a single GPU, but the point still holds – spending a bit more money on an efficient power supply makes good sense.

Cooling

Cooling is important and it can be a significant bottleneck which reduces performance more than poor hardware choices do. You should be fine with a standard heat sink for your CPU, but what for your GPU you will need to make special considerations.

Modern GPUs will increase their speed – and thus power consumption – up to their maximum when they run an algorithm, but as soon as the GPU hits a temperature barrier – often 80 °C – the GPU will decrease the speed so that the temperature threshold is not breached. This enables best performance while keeping your GPU safe from overheating.

However, typical pre-programmed schedules for fan speeds are badly designed for deep learning programs, so that this temperature threshold is reached within seconds after starting a deep learning program. The result is a decreased performance (a few per-cents) which can be significant for multiple GPUs (10-25%) where each GPU heats up the GPUs next to itself.

Since NVIDIA GPUs are first and foremost gaming GPUs, they are optimized for Windows. You can change the fan schedule with a few clicks in Windows, but not so in Linux, and as most deep learning libraries are written for Linux this is a problem.

The easiest and most cost efficient work-around is to flash your GPU with a new BIOS which includes a new, more reasonable fan schedule which keeps your GPU cool and the noise levels at an acceptable threshold (if you use a server, you could crank the fan speed to maximum speed which is otherwise not really bearable on a noise level). You can also overclock your GPU memory with a few MHz (30-50) and this is very safe to do. The software for flashing BIO is a program designed for Windows, but you can use wine to call that program from your Linux/Unix OS.

The other option is to use to set a configuration for your Xorg server (Ubuntu) where you set the option “coolbits”. This works very well for a single GPU, but if you have multiple GPUs where some of them are headless, i.e. they have no monitor attached to them, you have to emulate a monitor which is hard and hacky. I tried it for a long time and had frustrating hours with a live boot CD to recover my graphics settings – I could never get it running properly on headless GPUs.

Another, more costly, and craftier option is to use water cooling. For a single GPU, water cooling will nearly halve your temperatures even under maximum load, so that the temperature threshold is never reached. Even multiple GPUs stay cool which is rather impossible when you cool with air. Another advantage of water cooling is that it operates much more silently, which is a big plus if you run multiple GPUs in an area where other people work. Water cooling will cost you about $100 for each GPU and some additional upfront costs (something like $50). Water cooling will also require some additional effort to assemble your computer, but there are many detailed guides on that and it should only require a few more hours of time in total. Maintenance should not be that complicated or effortful.

From my experience these are the most relevant points. I bought large towers for my deep learning cluster, because they have additional fans for the GPU area, but I found this to be largely irrelevant: About 2-5 °C decrease, not worth the investment and the bulkiness of the cases. The most important part is really the cooling solution directly on your GPU – flash your BIOS, use water cooling, or live with a decrease in performance – these are all reasonable choices in certain situations. Just think about what do you want in your situation and you will be fine.

Motherboard and computer case

Your motherboard should have enough PCIe ports to support the number of GPUs you want to run (usually limited to four GPUs, even if you have more PCIe slots); remember that most GPUs have a width of two PCIe slots, so you will need 7 slots to run 4 GPUs for example. PCIe 2.0 is okay for a single GPU, but PCIe 3.0 is quite cost efficient with respect to cost-performance even for a single GPU; for multiple GPUs always buy PCIe 3.0 boards which will be a boon when you do multi-GPU computing as the PCIe connection will be the bottleneck here.

The motherboard choice is straightforward: Just pick a motherboard that supports the hardware components that you want.

When you select a case, you should make sure that it supports full length GPUs that sit on top of your motherboard. Most cases support full length GPUs, but you should be suspicious if you buy a small case. Check its dimensions and specifications; you can also try a google image search of that model and see if you find pictures with GPUs in them.

Monitors

I first thought it would be silly to write about monitors also, but they make such a huge difference and are so important that I just have to write about them.

The money I spent on my 3 27 inch monitors is probably the best money I have ever spent. Productivity goes up by a lot when using multiple monitors. I feel desperately crippled if I have to work with a single monitor.  Do not short-change yourself on this matter. What good is a fast deep learning system if you are not able to operate it in an efficient manner?

2015-03-04 13.58.10Typical monitor layout when I do deep learning: Left: Papers, Google searches, gmail, stackoverflow; middle: Code; right: Output windows, R, folders, systems monitors, GPU monitors, to-do list, and other small applications.

Some words on building a PC

Many people are scared to build computers. The hardware components are expensive and you do not want to do something wrong. But it is really simple as components that do not belong together do not fit together. The motherboard manual is often very specific how to assemble everything and there are tons of guides and step by step videos which guide you through the process if you have no experience.

The great thing about building a computer is, that you know everything that there is to know about building a computer when you did it once, because all computer are built in the very same way – so building a computer will become a life skill that you will be able to apply again and again. So no reason to hold back!

Conclusion / TL;DR

GPU: GTX 680 or GTX 960 (no money); GTX 980 (best performance); GTX Titan (if you need memory); GTX 970 (no convolutional nets)

CPU: Two threads per GPU; full 40 PCIe lanes and correct PCIe spec (same as your motherboard); > 2GHz; cache does not matter;

RAM: Use asynchronous mini-batch allocation; clock rate and timings do not matter; buy at least as much CPU RAM as you have GPU RAM;

Hard drive/SSD: Use asynchronous batch-file reads and compress your data if you have image or sound data; a hard drive will be fine unless you work with 32 bit floating point data sets with large input dimensions

PSU: Add up watts of GPUs + CPU + (100-300) for required power; get high efficiency rating if you use large conv nets; make sure it has enough PCIe connectors (6+8pins) and watts for your (future) GPUs

Cooling: Set coolbits flag in your config if you run a single GPU; otherwise flashing BIOS for increased fan speeds is easiest and cheapest; use water cooling for multiple GPUs and/or when you need to keep down the noise (you work with other people in the same room)

Motherboard: Get PCIe 3.0 and as many slots as you need for your (future) GPUs (one GPU takes two slots; max 4 GPUs per system)

Monitors: If you want to upgrade your system to be more productive, it might make more sense to buy an additional monitor rather than upgrading your GPU

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/246935.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

DBUtils

概述 DBUtils是Java编程中的数据库操作实用工具,小巧简单实用。 DBUtils封装了对JDBC的操作,简化了JDBC操作,可以少写代码。 DBUtils三个核心功能介绍 QueryRunner中提供对sql语句操作的APIResultSetHandler接口,用于定义select操…

windows7 下vmware workstation 12安装Ubuntu16.04虚拟机及安装和共享文件夹

关于安装虚拟机,具体细节可参考下文(已测试可执行) vmware workstation 12 安装 ubuntu kylin 16.04虚拟机 下面就给出在虚拟机Ubuntu16.04下安装VMware Tools,是为了实现目标主机和虚拟机之间的通讯。 1.打开虚拟机之后&#…

前端行性能优化

PS:结合了精英的思想和自己的一些小小的总结~ 影响用户访问的最大部分是前端的页面。网站的划分一般为二:前端和后台。我们可以理解成后台是用来实现网站的功能的,比如:实现用户注册,用户能够为文章发表评论等等。而前…

Python环境搭建及第三方库安装和卸载

因预处理医学图像数据需要用到以下的Python库 SimpleITK;Anaconda;PIL (Python Imaging Library),故重新安装Python和第三方库。 一、安装Python 去官网:https://www.python.org/downloads/ 下载相应版本。 双击下载的安装包&…

VueX的store的简单使用心结

vuex的特点: 多组件共享状态: 多个组件使用同一个数据 任何一个组件发生改变, 其他组件也要跟着发生相应的变化 安装vuex npm install vuex: 创建实例: import Vuex from vuex import Vue from vue Vue.use(Vuex)const state {name : 张…

快速学习EndNote X7

EndNote X7 破解版本下载: http://bt.neu6.edu.cn/thread-1554395-1-1.html 链接: https://pan.baidu.com/s/1kVwetNL 密码: vh2b http://bbs.sciencenet.cn/thread-1192379-1-1.html 或者 http://www.las.ac.cn/endnote/endnote.jsp 一、EndNote主要功能 1.…

关于slot、slot-scope的指令的一些操作记录

关于slot、slot-scope的指令的一些操作记录: 从vue2.6.0开始,slot、slot-scope已经被废弃,推荐使用v-slot; slot-scope是作用域插槽, 父组件中不能直接用子组件中定义的data数据。 而slot-scope的出现就是解决了这样的问题 子组…

五种方法提高你的智力

五种方法提高你的智力 智力是天生的,一成不变的吗?教你五种方法提高智力:1体验新鲜事物 2挑战自己3有创造力的思考4生活中,多玩“hard”模式5多与他人进行交流 弯兔123 2011-04-04 14:53经验说:智力是天生的&#xff0…

matlab中 text 使用画图示例

% Plot the image of the Sensitivity and False Positive per image clear; close all; clc; I1 [0.5 75]; I2 [1 54.5];msize 19; %设定字体的大小hold on plot(I1(1),I1(2),ko-,MarkerFaceColor,r) % 红色实心圆点text(I1(1)0.1,I1(2),Wei 2002) hold on plot(I2(1),I2(2)…

计算机视觉界CV牛人牛事

CV人物1:Jianbo Shi史建波毕业于UC Berkeley,导师是Jitendra Malik。其最有影响力的研究成果:图像分割。其于2000年在PAMI上多人合作发表”Noramlized cuts and image segmentation”。这是图像分割领域内最经典的算法。主页:www.…

js消息任务队列

JS单线程、异步、同步概念 多次出现“事件循环”这个名词,简单说明了事件循环的步骤,以便理解nextTick的运行时机,这篇文章将更为详细的分析下事件循环。在此之前需要了解JS单线程,及由此产生的同步执行环境和异步执行环境。 众所…

Ubuntu下安装cmake,配置ITK 和 SimpleITK, VTK(已测试可执行)

curses库 在安装cmake之前应该先安装一下curses库。如果系统中有curses库的话,cmake将生成一个可执行文件ccmake,它是一个基于文本程序的终端,有点类似windows GUI。 sudo apt-get install libncurses5-dev 备注:若无curses库…

世界坐标与图像坐标

1. 右手坐标系 2. 左手坐标系 伸出左手,让拇指和食指成“L”形,大拇指向右,食指向上。其余的手指指向前方。这样就建立了一个左手坐标系。拇指、食指和其余手指分别代表x,y,z轴的正方向。判断方法:在空间直…

Ubuntu下安装 imagej 和 Fiji

安装ImageJ 方法一 sudo apt-get install imagej whereis imagej imagej -version 方法二 下载 Linux版本的 imagej 安装 Instructions cd /home mkdir imagej cp ij150-linux64-java8.zip /home/imagej unzip ij150-linux64-java8.zip 建立软连接,可以在终端…

Ubuntu下安装Pycharm及相关设置

下载 Pycharm 社区版本 http://www.jetbrains.com/pycharm/download/#sectionlinux 安装指导 Copy the pycharm-community-2016.2.3.tar.gz to the desired installation location (make sure you have rw permissions for that directory) cp /home/bids/Downloads/pycha…

朱松纯:初探计算机视觉三个源头兼谈人工智能

朱松纯 加州大学洛杉矶分校UCLA统计学和计算机科学教授(Song-Chun Zhu;www.stat.ucla.edu/~sczhu) 杨: 朱教授,你在计算机视觉领域耕耘20余年,获得很多奖项,是很资深的研究人员。近年来你又涉足认知科学、…

视觉研究的前世今生(上)王天珍(武汉理工大学)

视觉是人类最重要的知觉,没有视觉人类很难定位,识别物体,了解坏境,得以生存发展。20世纪两次世界大战,使得西方各国,不论是为了飞机安全着陆,还是导弹精确制导,都对视觉研究有了非常…

看了数百个PPT封面,我只想告诉你这两个套路!

做PPT离不开的一个词就是封面,封面即门面,很大程度上决定了你的作品给人的第一印象。 我们经常能够在网上看到一些大神做的封面,比如阿文、珞珈,非常酷炫。 这类封面也许不需要太多技巧,但一定是花了很多心思并且需要…