安装Rocky linux教程
https://developer.aliyun.com/article/1074889
注意事项
-
Tesla P100服务器,按Delete进入bios,设置Daul模式,第一选项选UEFI hard disk(用驱动盘选这个),usb的就选UEFI usb
-
安装rocky linux时,这两项默认,轻易不要改,否则可能无法安装下一步
生产环境安装(安装Nvidia驱动及cuda、cudnn等)
Nvidia教程:https://blog.csdn.net/dendi_hust/article/details/111177699
cuda及nvidia对应关系:http://www.8fe.com/jiaocheng/2376.html
驱动下载:https://www.nvidia.cn/download/driverResults.aspx/207499/cn/
显卡及驱动信息查看:https://blog.csdn.net/m0_67403073/article/details/126749126
cuda及cudnn安装:https://blog.csdn.net/bluewind_1988/article/details/105244396
cuda及cudnn的关系:https://www.jianshu.com/p/622f47f94784
网卡配置参数含义:https://blog.csdn.net/z1014347942/article/details/78069966
静态IP设置:https://jingyan.baidu.com/article/9989c746d2161af649ecfe44.html
注意事项
- 出现GPU驱动无法安装的时候,比如提示“Unable to find the kernel source tree for the currently running kernel.”系统没有kernel,有可能是系统没找到kernel,这时可以去/usr/src/kernels文件里看是否有内核文件,若有则执行安装NVIDIA时加上
–kernel-source-path /usr/src/kernels/自己的内核就可以了,比如:
bash ./NVIDIA-Linux-x86_64-440.64.00.run --kernel-source-path /usr/src/kernels/自己的内核
不能解决看:https://blog.csdn.net/chris_pei/article/details/79203033
https://www.cnblogs.com/liuke-note/p/13712202.html?ivk_sa=1024320u
-
搜索自己的显卡版本时,需要注意有的驱动系列是在一起的,比如Tesla系列是和Data Center在一起
-
设置静态ip时,如果重启服务提示:Unit network.service not found
更换重启命令:‘systemctl restart NetworkManager’ -
安装nvidia出现如下错误“equires nvidia-kmod =”看网页:
https://thelinuxcluster.com/tag/nvidia/ -
安装cuda出现如下错误“Install of driver component failed”:看:
https://blog.csdn.net/bluewind_1988/article/details/105244396 -
安装好cuda,不提示nvcc可用,看:
https://blog.csdn.net/weixin_44750512/article/details/123156020
安装conda
教程:https://blog.csdn.net/qq_44173974/article/details/125336916