1. 安装GPU版本的PyTorch
登录PyTorch官网https://pytorch.org/,下载对应CUDA版本的PyTorch【不能直接pip install,否则安装上的是CPU版本的】
2. 查看GPU信息
(1)重要信息
!nvidia-smi
我的GPU版本很垃圾,本blog仅为阐述使用方法
(2)详细信息
!nvidia-smi -i 0 -q
==============NVSMI LOG==============Timestamp : Sun Jul 30 19:59:38 2023
Driver Version : 527.37
CUDA Version : 12.0Attached GPUs : 1
GPU 00000000:01:00.0Product Name : NVIDIA GeForce GTX 1650Product Brand : GeForceProduct Architecture : TuringDisplay Mode : DisabledDisplay Active : DisabledPersistence Mode : N/AMIG ModeCurrent : N/APending : N/AAccounting Mode : DisabledAccounting Mode Buffer Size : 4000Driver ModelCurrent : WDDMPending : WDDMSerial Number : N/AGPU UUID : GPU-fc8d7cba-de6b-c3b4-f29a-1554c1aa0ba0Minor Number : N/AVBIOS Version : 90.17.46.00.abMultiGPU Board : NoBoard ID : 0x100Board Part Number : N/AGPU Part Number : 1F99-753-A1Module ID : 1Inforom VersionImage Version : G001.0000.02.04OEM Object : 1.1ECC Object : N/APower Management Object : N/AGPU Operation ModeCurrent : N/APending : N/AGSP Firmware Version : N/AGPU Virtualization ModeVirtualization Mode : NoneHost VGPU Mode : N/AIBMNPURelaxed Ordering Mode : N/APCIBus : 0x01Device : 0x00Domain : 0x0000Device Id : 0x1F9910DEBus Id : 00000000:01:00.0Sub System Id : 0x09EF1028GPU Link InfoPCIe GenerationMax : 3Current : 3Device Current : 3Device Max : 3Host Max : 3Link WidthMax : 16xCurrent : 8xBridge ChipType : N/AFirmware : N/AReplays Since Reset : 0Replay Number Rollovers : 0Tx Throughput : 0 KB/sRx Throughput : 0 KB/sAtomic Caps Inbound : N/AAtomic Caps Outbound : N/AFan Speed : N/APerformance State : P8Clocks Throttle ReasonsIdle : ActiveApplications Clocks Setting : Not ActiveSW Power Cap : Not ActiveHW Slowdown : Not ActiveHW Thermal Slowdown : Not ActiveHW Power Brake Slowdown : Not ActiveSync Boost : Not ActiveSW Thermal Slowdown : Not ActiveDisplay Clock Setting : Not ActiveFB Memory UsageTotal : 4096 MiBReserved : 146 MiBUsed : 710 MiBFree : 3239 MiBBAR1 Memory UsageTotal : 256 MiBUsed : 2 MiBFree : 254 MiBCompute Mode : DefaultUtilizationGpu : 0 %Memory : 0 %Encoder : 0 %Decoder : 0 %Encoder StatsActive Sessions : 0Average FPS : 0Average Latency : 0FBC StatsActive Sessions : 0Average FPS : 0Average Latency : 0Ecc ModeCurrent : N/APending : N/AECC ErrorsVolatileSRAM Correctable : N/ASRAM Uncorrectable : N/ADRAM Correctable : N/ADRAM Uncorrectable : N/AAggregateSRAM Correctable : N/ASRAM Uncorrectable : N/ADRAM Correctable : N/ADRAM Uncorrectable : N/ARetired PagesSingle Bit ECC : N/ADouble Bit ECC : N/APending Page Blacklist : N/ARemapped Rows : N/ATemperatureGPU Current Temp : 56 CGPU Shutdown Temp : 99 CGPU Slowdown Temp : 94 CGPU Max Operating Temp : 75 CGPU Target Temperature : N/AMemory Current Temp : N/AMemory Max Operating Temp : N/APower ReadingsPower Management : N/APower Draw : 3.65 WPower Limit : N/ADefault Power Limit : N/AEnforced Power Limit : N/AMin Power Limit : N/AMax Power Limit : N/AClocksGraphics : 300 MHzSM : 300 MHzMemory : 405 MHzVideo : 540 MHzApplications ClocksGraphics : N/AMemory : N/ADefault Applications ClocksGraphics : N/AMemory : N/ADeferred ClocksMemory : N/AMax ClocksGraphics : 1785 MHzSM : 1785 MHzMemory : 6001 MHzVideo : 1650 MHzMax Customer Boost ClocksGraphics : 1785 MHzClock PolicyAuto Boost : N/AAuto Boost Default : N/AVoltageGraphics : N/AFabricState : N/AStatus : N/AProcessesGPU instance ID : N/ACompute instance ID : N/AProcess ID : 8260Type : CName : D:\PYTHON\Anaconda\envs\basic_torch\python.exeUsed GPU Memory : Not available in WDDM driver modelGPU instance ID : N/ACompute instance ID : N/AProcess ID : 14084Type : C+GName : Used GPU Memory : Not available in WDDM driver model
3. 查看可用GPU数量
torch.cuda.device_count()
4. 这两个函数允许我们在请求的GPU不存在的情况下运行代码
def try_gpu(i=0): """如果存在,则返回gpu(i),否则返回cpu()。"""if torch.cuda.device_count() >= i + 1:return torch.device(f'cuda:{i}')return torch.device('cpu')def try_all_gpus(): """返回所有可用的GPU,如果没有GPU,则返回[cpu(),]。"""devices = [torch.device(f'cuda:{i}') for i in range(torch.cuda.device_count())]return devices if devices else [torch.device('cpu')]try_gpu(), try_gpu(10), try_all_gpus()
5. 在GPU上定义tensor
6. 在GPU上定义网络
7. 再次查看GPU信息
!nvidia-smi
如果发现仅仅定义了几个很小的tensor,GPU显存就占用了好几百兆,这是正常现象,GPU初始化需要占用的显存,根据测试,不同GPU初始化需要的显存大小不同,1060 Ti需要583M左右,而服务器上的V100需要1449M左右,这部分无法优化。初始化显存的意思是,即使只是执行a = torch.randn((1, 1)).to(‘cuda’)命令,显存的占用可能达到几百M,这其中只有极少是张量a占用的,绝大部分都是GPU初始化的占用。不必担心~
为了验证上面的说法,可以定义XX = torch.ones(2000, 3000, device=try_gpu()) ,然后发现,显存占用从710M增加到734M,600W数据量大小的tensor只占用了很少的显存。
8. 重启显存归0(CPU运行内存和GPU显存本质都是RAM,断电即无)
!nvidia-smi