深度学习机器选择

Description

Quantity

Unit Price

Amount

Intel® Core™ i7-6700K Processor (Skyshake, 8M Cache, Socket-LGA1151, 14nm, Overlock unlock, 4Core 8Threads, TDP 95W, Gen 9LP, up to 4.20 GHz)

1

 

 

ANTEC TPC750 TruePower Classic 750W 80Plus Gold 火牛

1

 

 

ASUS Z170-PRO GAMING Z170, DDR4, LGA1151, Intel GbE Lan, USB 3.1, SATA3 6Gb/s, ATX M/B

1

 

 

Kingston HyperX Fury HX424C15FB2K2/16 DDR4 2400MHz, 16GB Kit (2x8GB)

2

 

 

Toshiba DT01ACA300 3TB SATA3 6Gb/s /64M HDD

2

 

 

<SSD>Samsung 850 EV0 Series MZ-75E1T0BW 1TB 2.5” SATA3 6GB/s (SSD) 固态硬碟 7mm

1

 

 

GeForce GTX Titan X Pascal

1

 

 

Cooler Master Hyper 212X CPU Fan

1

 

 

NZXT Tempest 410 Elite T410E-001 (黑色) ATX Tower Case

1

 

 

https://www.microway.com/knowledge-center-articles/comparison-of-nvidia-geforce-gpus-and-nvidia-tesla-gpus/
This resource was prepared by Microway from data provided by NVIDIA.

All NVIDIA GPUs support general-purpose computation (GPGPU), but not all GPUs offer the same performance or support the same features. The consumer line of GeForce GPUs (GTX Titan X, in particular) may be attractive to those running GPU-accelerated applications. However, it’s wise to keep in mind the differences between the products. There are many features only available on the professional line of Tesla GPUs.

64-bit (Double Precision) Floating Point Calculations

Many applications require higher-accuracy mathematical calculations. In these applications, data is represented by values that are twice as large (using 64 binary bits instead of 32 bits). These larger values are called double-precision (64-bit). Less accurate values are called single-precision (32-bit).

Although almost all NVIDIA GPU products support both single- and double-precision calculations, the performance for double-precision values is significantly lower on the consumer-level GeForce GPUs. Here is a comparison of the double-precision floating-point calculation performance between a GeForce and a Tesla:

NVIDIA GPU Model Double-precision (64-bit) Floating Point Performance
GeForce GTX 1080 up to 0.277 TFLOPS
GeForce GTX Titan X Maxwell up to 0.206 TFLOPS
GeForce GTX Titan X Pascal up to 0.343 TFLOPS
Tesla K80 1.87+ TFLOPS
Tesla P100 4.7+ TFLOPS

Some earlier versions of the GeForce GTX Titan products did support up to 1.3 TFLOPS double-precision calculations. However, those models were discontinued in 2014.

Error Detection and Correction

On a GPU running a computer game, one memory error typically causes no issues (e.g., one pixel color might be incorrect for one frame). The user is very unlikely to even be aware of the issue.

However, technical computing applications rely on the accuracy of the data returned by the GPU. For some applications, a single error can cause the simulation to be grossly and obviously incorrect. For others, a single-bit error may not be so easy to detect (returning incorrect results which appear reasonable).

Titan X GPUs do not include error correction or error detection capabilities. Neither the GPU nor the system can alert the user to errors should they occur. It is up to the user to detect errors (whether they cause application crashes, obviously incorrect data, or subtly incorrect data). Such issues are not uncommon – our technicians regularly encounter memory errors on consumer gaming GPUs.

NVIDIA Tesla GPUs are able to correct single-bit errors and detect & alert on double-bit errors. On the latest Tesla P100 GPUs, ECC support is included in the main HBM2 memory, as well as in register files, shared memories, L1 cache and L2 cache.

Warranty

NVIDIA’s warranty on GeForce GPU products explicitly states that the GeForce products are not designed for installation in servers. Running GeForce GPUs in a server system will void the warranty. From NVIDIA’s manufacturer warranty website:

Warranted Product is intended for consumer end user purposes only, and is not intended for datacenter use and/or GPU cluster commercial deployments (“Enterprise Use”). Any use of Warranted Product for Enterprise Use shall void this warranty.

GPU Memory Performance

Computationally-intensive applications require high-performance compute units, but fast access to data is also critical. For many HPC applications, an increase in compute performance does not help unless memory performance is also improved. For this reason, the Tesla GPUs provide better real-world performance than the GeForce GPUs:

NVIDIA GPU Model GPU Memory Bandwidth
GeForce GTX 1080 320 GB/s
GeForce GTX Titan X Maxwell 336 GB/s
GeForce GTX Titan X Pascal 480 GB/s
Tesla K80 480 GB/s
Tesla P40 346 GB/s
Tesla P100 12GB 549 GB/s
Tesla P100 16GB 732 GB/s

The primary reason for this performance disparity is that GeForce GPUs use GDDR5 memory, while the latest Tesla GPUs use on-die HBM2 memory.

GPU Memory Quantity

In general, the more memory a system has the faster it will run. For some HPC applications, it’s not even possible to perform a single run unless there is sufficient memory. For others, the quality and fidelity of the results will be degraded unless sufficient memory is available. Tesla GPUs offer as much as twice the memory of GeForce GPUs:

NVIDIA GPU Model GPU Memory Quantity
GeForce GTX 1080 8GB
GeForce GTX Titan X 12GB
Tesla K80 24GB
Tesla P40 24GB
Tesla P100 12GB or 16GB*

* note that Tesla Pascal Unified Memory allows GPUs to share each other’s memory to load even larger datasets

PCI-E vs NVLink – Device-to-Host and Device-to-Device Throughput

One of the largest potential bottlenecks is in waiting for data to be transferred to the GPU. Additional bottlenecks are present when multiple GPUs operate in parallel. Faster data transfers directly result in faster application performance.

The GeForce GPUs connect via PCI-Express, which has a theoretical peak throughput of 16GB/s. NVIDIA Tesla GPUs with NVLink are able to leverage much faster connectivity. NVLink allows each GPU to communicate at up to 80GB/s (160GB/s bidirectional). NVLink connections are supported between GPUs, and also between the CPUs and the GPUs on supported OpenPOWER platforms. Only the Tesla line of GPUs supports NVLink.

Application Software Support

While some software programs are able to operate on any GPU which supports CUDA, others are designed and optimized for the professional GPU series. Most professional software packages only officially support the NVIDIA Tesla and Quadro GPUs. Using a GeForce GPU may be possible, but will not be supported by the software vendor. In other cases, the applications will not function at all when launched on a GeForce GPU (for example, the software products from Schrödinger, LLC).

Operating System Support

Although NVIDIA’s GPU drivers are quite flexible, there are no GeForce drivers available for Windows Server operating systems. GeForce GPUs are only supported on Windows 7, Windows 8, and Windows 10. Groups that use Windows Server should look to NVIDIA’s professional Tesla and Quadro GPU products. The Linux drivers, on the other hand, support all NVIDIA GPUs.

Product Life Cycle

Due to the nature of the consumer GPU market, GeForce products have a relatively short lifecycle (commonly no more than a year between product release and end of production). Projects which require a longer product lifetime (such as those which might require replacement parts 3+ years after purchase) should use a professional GPU.

NVIDIA’s professional Tesla and Quadro GPU products have an extended lifecycle and long-term support from the manufacturer (including notices of product End of Life and opportunities for last buys before production is halted). Furthermore, the professional GPUs undergo a more thorough testing and validation process during production.

Power Efficiency

GeForce GPUs are intended for consumer gaming usage, and are not usually designed for power efficiency. In contrast, the Tesla GPUs are designed for large-scale deployment where power efficiency is important. This makes the Tesla GPUs a better choice for larger installations.

For example, the GeForce GTX Titan X is popular for desktop deep learning workloads. In server deployments, the Tesla P40 GPU provides matching performance and double the memory capacity. However, when put side-by-side the Tesla consumes less power and generates less heat.

DMA Engines

The Direct Memory Access (DMA) Engine of a GPU allows for speedy data transfers between the system memory and the GPU memory. Because such transfers are part of any real-world application, the performance is vital to GPU-acceleration. Slow transfers cause the GPU cores to sit idle until the data arrives in GPU memory. Likewise, slow returns cause the CPU to wait until the GPU has finished returning results.

GeForce products feature a single DMA Engine* which is able to transfer data in one direction at a time. If data is being uploaded to the GPU, any results computed by the GPU cannot be returned until the upload is complete. Likewise, results being returned from the GPU will block any new data which needs to be uploaded to the GPU.
* one GeForce GPU model, the GeForce GTX Titan X, features dual DMA engines

The Tesla GPU products feature dual DMA Engines to alleviate this bottleneck. Data may be transferred into the GPU and out of the GPU simultaneously.

GPU Direct RDMA

NVIDIA’s GPU-Direct technology allows for greatly improved data transfer speeds between GPUs. Various capabilities fall under the GPU-Direct umbrella, but the RDMA capability promises the largest performance gain.

Traditionally, sending data between the GPUs of a cluster required 3 memory copies (once to the GPU’s system memory, once to the CPU’s system memory and once to the InfiniBand driver’s memory). GPU Direct RDMA removes the system memory copies, allowing the GPU to send data directly through InfiniBand to a remote system. In practice, this has resulted in up to 67% reductions in latency and 430% increases in bandwidth for small MPI message sizes [1].

In CUDA version 8.0, NVIDIA has introduced GPU Direct RDMA ASYNC, which allows the GPU to initiate RDMA transfers without any interaction with the CPU.

GeForce GPUs do not support GPU-Direct RDMA. Although the MPI calls will still return successfully, the transfers will be performed through the standard memory-copy paths. The only form of GPU-Direct which is supported on the GeForce cards is GPU Direct Peer-to-Peer (P2P). This allows for fast transfers within a single computer, but does nothing for applications which run across multiple servers/compute nodes.

Tesla GPUs have full support for GPU Direct RDMA and the various other GPU Direct capabilities. They are the primary target for these capabilities and thus have the most testing and use in the field.

Hyper-Q

Hyper-Q Proxy for MPI and CUDA Streams allows multiple CPU threads or processes to launch work on a single GPU. This is particularly important for existing parallel applications written with MPI, as these codes have been designed to take advantage of multiple CPU cores. Allowing the GPU to accept work from each of the MPI threads running on a system can offer a potentially significant performance boost. It can also reduce the amount of source code re-architecting required to add GPU acceleration to an existing application.

However, the only form of Hyper-Q which is supported on the GeForce GPUs is Hyper-Q for CUDA Streams. This allows the GeForce to efficiently accept and run parallel calculations from separate CPU cores, but applications running across multiple computers will be unable to efficiently launch work on the GPU.

GPU Health Monitoring and Management Capabilities

Many health monitoring and GPU management capabilities (which are vital for maintaining multiple GPU systems) are only supported on the professional Tesla GPUs. Health features which are not supported on the GeForce GPUs include:

  • NVML/nvidia-smi for monitoring and managing the state and capabilities of each GPU. This enables GPU support from a number of 3rd party applications and tools such as Ganglia. Perl and Python bindings are also available.
  • OOB (out of band monitoring via IPMI) allows the system to monitor GPU health, adjust fan speeds to appropriately cool the devices and send alerts when an issue is seen
  • InfoROM (persistent configuration and state data) provides the system with additional data about each GPU
  • NVHealthmon utility provides cluster administrators with a ready-to-use GPU health status tool
  • TCC allows GPUs to be specifically set to display-only or compute-only modes
  • ECC (memory error detection & correction)

Cluster tools rely upon the capabilities provided by NVIDIA NVML. Roughly 60% of the capabilities are not available on GeForce – this table offers a more detailed comparison of the NVML features supported in Tesla and GeForce GPUs:

Feature Tesla Geforce
Product Name yes yes
Show GPU Count yes yes
PCI-Express Generation (e.g., 2.0 vs 3.0) yes
PCI-Express Link Width (e.g., x4, x8, x16) yes
Current Fan Speed yes yes
Current Temperature yes yes*
Current Performance State yes
Clock Throttle Status yes
Current GPU Usage (percentage) yes
Current Memory Usage (percentage) yes yes
GPU Boost Capability yes yes^
ECC Error Detection/Correction Support yes
List Retired Pages yes
Current Power Draw yes
Set Power Draw Limit yes
Current GPU Clock Speed yes
Current Memory Clock Speed yes
Show Available Clock Speeds yes
Show Available Memory Speeds yes
Set GPU Boost Speed (core clock and memory clock) yes
Show Current Compute Processes yes
Card Serial Number yes
InfoROM image and objects yes
Accounting Capability (resource usage per process) yes
PCI-Express IDs yes yes
NVIDIA Driver Version yes yes
NVIDIA VBIOS Version yes yes

* Temperature reading is not available to the system platform, which means fan speeds cannot be adjusted.
^ GPU Boost is disabled during double precision calculations. Additionally, GeForce clock speeds will be automatically reduced in certain scenarios.

GPU Boost

All of the latest NVIDIA GPU products support GPU Boost, but their implementations vary depending upon the intended usage scenario. GeForce cards are built for interactive desktop usage and gaming. Tesla GPUs are built for intensive, constant number crunching with stability and reliability placed at a premium. Given the differences between these two use cases, GPU Boost functions differently on Tesla than on GeForce.

In Geforce’s case, the graphics card automatically determines clock speed and voltage based on the temperature of the GPU. Temperature is the appropriate independent variable as heat generation affects fan speed. For less graphically-intense games or for general desktop usage, the end user can enjoy a quieter computing experience. When playing games that require serious GPU compute, however, GPU Boost automatically cranks up the voltage and clock speeds (in addition to generating more noise).

Tesla’s GPU boost level, on the other hand, may be specified by the system administrator or computational user – the desired clock speed is set to a specific frequency. Rather than floating the clock speed at various levels, the desired clock speed may be statically maintained unless the power consumption threshold (TDP) is reached. This is an important consideration because accelerators in an HPC environment often need to be in sync with one other. The deterministic aspect of Tesla’s GPU boost allows system administrators to determine optimal clock speeds and lock them in across all GPUs.

For applications that require additional performance, the most recent Tesla GPUs include Auto Boost within synchronous boost groups. With Auto Boost enabled, each group of GPUs will increase clock speeds when headroom allows. The group will keep clocks in sync with each other to ensure matching performance across the group.

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/313224.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

.Net Core 3.0 IdentityServer4 快速入门02

.Net Core 3.0 IdentityServer4 快速入门—— resource owner password credentials&#xff08;密码模式&#xff09;一、前言OAuth2.0默认有四种授权模式&#xff08;GrantType&#xff09;&#xff1a;1&#xff09;授权码模式2&#xff09;简化模式3&#xff09;密码模式&a…

LMDB使用说明

http://rayz0620.github.io/2015/05/25/lmdb_in_caffe/ 官方的extract_feature.bin很好用&#xff0c;但是输出的特征是放在LMDB里的。以前嫌LMDB麻烦&#xff0c;一直都图方便直接用ImageDataLayer来读原始图像。这次绕不过去了&#xff0c;就顺便研究了一下Caffe对LMDB的使用…

.Net Core3.0 日志 logging

多年的经验&#xff0c;日志记录是软件开发的重要组成部分。没有日志记录机制的系统不是完善的系统。在开发阶段可以通过debug附件进程进行交互调试&#xff0c;可以检测到一些问题&#xff0c;但是在上线之后&#xff0c;日志的记录起到至关重要的作用。它可使我们在系统出现问…

Linux下的磁盘空间管理

df -h /文件夹名 du -sh /文件夹名 后者统计的是我们能点开到的文件占用的空间&#xff0c;前者是总空间减去剩余空间。 可能出现的情况是二者不统一&#xff0c;原因见下&#xff1a; http://wushank.blog.51cto.com/3489095/1533409 一、df和du的统计机制&#xff1a; …

在微软工作一年,我学会了什么

大家好&#xff0c;我是运营小马。正如我们所知道的那样&#xff0c;10.23日 &#xff0c;崔庆才因为写文写得很痛苦&#xff0c;将公众号转给我运营。10.24 我兴致勃勃又小心翼翼的宣布了我要运营10.25 崔庆才回来了&#xff0c;他说他有喷薄而出抑制不住的写作欲望&#xff0…

Eclipse调试方法

http://blog.jobbole.com/93421/ 一、Eclipse调试介绍 二、Eclipse中和Debug相关的视图 2.1 Debug View2.2 Variables View2.3 Breakpoints View2.4 Expressions View2.5 Display View 三、Debug 3.1 设置断点 3.2 调试程序 3.2.1 调试本地 Java 语言程序 3.3.2 远程调试 一、…

聊聊 Docker Swarm 部署 gRPC 服务的坑

gRPC 是一个高性能、开源和通用的 RPC 框架&#xff0c;面向移动和 HTTP/2 设计&#xff0c;也是目前流行的微服务架构中比较突出的跨语言 RPC 框架。一直以来&#xff0c;我们的微服务都是基于 gRPC 来开发&#xff0c;使用的语言有 .NET、JAVA、Node.js&#xff0c;整体还比较…

动手造轮子:实现一个简单的依赖注入(零)

动手造轮子&#xff1a;实现一个简单的依赖注入(零)Intro依赖注入为我们写程序带来了诸多好处&#xff0c;在微软的 .net core 出来的同时也发布了微软开发的依赖注入框架 Microsoft.Extensions.DependencyInjection&#xff0c;大改传统 asp.net 的开发模式&#xff0c;asp.ne…

Caffe Blob Dtype理解

http://blog.luoyetx.com/2015/10/reading-caffe-2/ 关于Blob: Blob 在 Caffe 中扮演了重要的角色&#xff0c;用于存储数据和网络参数&#xff0c;同时也在 CPU 和 GPU 之间做了数据同步。Blob 原本在 Caffe 中被表示为一个 4 维数组 (num x channel x height x width)&#…

【WPF on .NET Core 3.0】 Stylet演示项目 - 简易图书管理系统(2)

上一章《回忆一下我们的登录逻辑,主要有以下4点:当"用户名"或"密码"为空时, 是不允许登录的("登录"按钮处于禁用状态).用户名或密码不正确时, 显示"用户名或密码不正确"的消息框.用户名输入"waku", 并且密码输入"123&q…

MATLAB读取文件夹及其所有子文件夹内的图像

1。 指定路径下 单个文件夹data中所有图像 file_path .\data\;% 图像文件夹路径img_path_list dir(strcat(file_path,*.jpg));%获取该文件夹中所有jpg格式的图像img_num length(img_path_list);%获取图像总数量if img_num > 0 %有满足条件的图像for j 1:img_num %逐一读…

gRPC 流式调用

gRPC 使用 Protocol buffers 作为接口定义语言&#xff08;IDL&#xff09;来描述服务接口和输入输出消息的结构&#xff0c;目前支持 4 种定义服务方法类型&#xff1a;类型说明简单 RPC客户端传入一个请求对象&#xff0c;服务端返回一个结果对象客户端流式 RPC客户端传入多个…

模型压缩案例-SSDYou only look once

http://write.blog.csdn.NET/postedit 在上一篇文章中&#xff0c;介绍了以regionproposal来检测的框架&#xff0c;这一系列速度和精度不断提高&#xff0c;但是还是无法达到实时。存在的主要问题为&#xff1a;速度不够快&#xff0c;主要原因是proposal比较多&#xff0c;特…

.NET如何将字符串分隔为字符

前言如果这是一道面试题&#xff0c;答案也许非常简单&#xff1a;.ToCharArray()&#xff0c;这基本正确……我们以“AB吉??????”作为输入参数&#xff0c;首先如果按照“正常”处理的思路&#xff0c;用 .ToCharArray()&#xff0c;然后转换为 JSON&#xff08;以便方…

Rebuttal

http://blog.csdn.net/lqhbupt/article/details/25207463 1. Rebuttal是给编辑看的 2. 每个审稿人给出一个分数&#xff0c;加得总分 3. 定位。一般而言&#xff0c;对于area chair&#xff0c;那个给分比较低的会自然吸引他的眼球&#xff0c;相对占得的权重也就大&#xf…

Orleans 知多少 | 3. Hello Orleans

1. 引言是的&#xff0c;Orleans v3.0.0 已经发布了&#xff0c;并已经完全支持 .NET Core 3.0。所以&#xff0c;Orleans 系列是时候继续了&#xff0c;抱歉&#xff0c;让大家久等了。万丈高楼平地起&#xff0c;这一节我们就先来了解下Orleans的基本使用。2. 模板项目讲解在…

.NET Core 3.0之深入源码理解ObjectPool(二)

写在前面前文主要介绍了ObjectPool的一些理论基础&#xff0c;本文主要从源码角度理解Microsoft.Extensions.ObjectPool是如何实现的。下图为其三大核心组件图&#xff1a;核心组件ObjectPoolObjectPool是一个泛型抽象类&#xff0c;里面只有两个抽象方法&#xff0c;Get和Retu…

VC维学习

http://www.flickering.cn/machine_learning/2015/04/vc维的来龙去脉/ 说说历史Hoeffding不等式Connection to Learning学习可行的两个核心条件Effective Number of HypothesesGrowth FunctionBreak Point与ShatterVC BoundVC dimension深度学习与VC维小结参考文献 VC维在机器学…

.NET Core 3.0 一个 jwt 的轻量角色/用户、单个API控制的授权认证库

作者&#xff1a;痴者工良&#xff08;朋友合作原创&#xff09;来源&#xff1a;https://www.cnblogs.com/whuanle/p/11743406.html目录说明一、定义角色、API、用户二、添加自定义事件三、注入授权服务和中间件三、如何设置API的授权四、添加登录颁发 Token五、部分说明六、验…