0.提前准备
0.1.更新yum源(以阿里为例)
0.1.1 备份当前的yum源
mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup
0.1.2 下载新的CentOS-Base.repo 到/etc/yum.repos.d/
CentOS 5
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-5.repo
或者
curl -o /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-5.repo
CentOS 6
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-6.repo
或者
curl -o /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-6.repo
CentOS 7
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
或者
curl -o /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
0.1.3 清空并生成缓存
yum clean all
yum makecache
备注:
yum 会把下载的软件包和header存储在cache中(默认路径/var/cache/yum/),而不自动删除。如果觉得占用磁盘空间,可以使用yum clean指令进行清除,更精确 的用法是yum clean headers清除header,yum clean packages清除下载的rpm包,yum clean all全部清除。
———————————————————————————————————————————
1.安装依赖项
yum -y install gcc pciutils
yum -y install gcc
yum -y install gcc-c++
yum -y install make
2.查看内核版本
uname -a
3.查看显卡版本
lspci | grep -i nvidia
4.屏蔽系统自带的nouveau
(1)查看nouveau
lsmod | grep nouveau
(2)打开blacklist.nf文件,按i或insert键进入修改模式
vi /lib/modprobe.d/dist-blacklist.conf
(3)将nvidiafb注释掉:
#blacklist nvidiafb
(4)添加以下两句在blacklist.conf 文件中, :回到文件最底部,:w保存文件,:q退出文件
blacklist nouveau
options nouveau modeset=0
5.重建initramfs image步骤
(1)删除
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
(2)重建
dracut /boot/initramfs-$(uname -r).img $(uname -r)
6.修改运行级别为文本模式
systemctl set-default multi-user.target
7.重新启动
reboot
8.下载显卡驱动和cuda
打开链接:https://www.nvidia.cn/drivers/lookup/
下载完后,进入下载后的文件夹,在终端输入安装指令:
根据自己的显卡型号下载对应驱动,然后安装:
sudo sh NVIDIA-Linux-x86_64-550.120.run
cuda的下载,可以用wget,也可以window端下载后传入服务器:
wget https://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.run
sudo sh cuda_10.2.89_440.33.01_linux.run
输入accept后:
CUDA Installer
-[X] Driver
[X] 450.51.06
+[X] CUDA Toolkit 11.0
[X] CUDA Samples 11.0
[X] CUDA Demo Suite 11.0
[X] CUDA Documentation 11.0
Options
Install
直接选择install
报错:CentOS安装nvidia-container-toolkit报错:没有可用软件包
1、设置 docker-ce 存储库:
sudo yum-config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo
2、安装 containerd.io 包:
sudo yum install -y https://download.docker.com/linux/centos/7/x86_64/stable/Packages/containerd.io-1.4.3-3.1.el7.x86_64.rpm
3、安装 docker-ce 软件包:
sudo yum install docker-ce -y
使用以下命令确保 Docker 服务正在运行:
sudo systemctl --now enable docker
4、设置 nvidia-container-toolkit 存储库和 GPG 密钥:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
将experimental分支添加到存储库列表中:
yum-config-manager --enable libnvidia-container-experimental
5、更新包列表后安装nvidia-container-toolkit包:
sudo yum clean expire-cache
sudo yum install -y nvidia-container-toolkit配置 Docker 守护进程以识别 NVIDIA 容器运行时:
sudo nvidia-ctk runtime configure --runtime=docker
设置默认运行时后重启Docker守护进程完成安装:
sudo systemctl restart docker
6、验证
docker run -it -d -v /home/data/jt/:/data/jt -v /etc/localtime:/etc/localtime:ro --restart always --net host --name EventDetectorV3 --gpus all jt20240711_gongsi
docker ps
docker exec -it EventDetectorV3 bash进入容器后:
nvidia-smi
备份指令记录(供参考):
488 mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup
490 cd yum.repos.d/
493 wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
495 yum clean all
496 yum makecache
497 yum -y install gcc pciutils
498 yum -y install gcc
499 yum -y install gcc-c++
500 yum -y install make
501 uname -a
502 lsmod | grep nouveau
503 vi /lib/modprobe.d/dist-blacklist.conf
504 mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
505 dracut /boot/initramfs-$(uname -r).img $(uname -r)
506 systemctl set-default multi-user.target
507 reboot
508 set +o history;
509 sh NVIDIA-Linux-x86_64-550.120.run
510 nvidia-smi
511 docker
512 docker ps
513 ll
514 cd jt20240731/
515 ll
516 cat x* >test.tar
517 docker load -i test.tar
518 docker images
519 docker ps
534 wget https://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.run
535 sh cuda_10.2.89_440.33.01_linux.run
537 nvcc --version
538 vi /etc/profile
539 source /etc/profile
540 nvcc --version
541 yum install nvidia-container-runtime
542 yum update
543 yum install nvidia-container-runtime
544 sudo yum-config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo
545 sudo yum install -y https://download.docker.com/linux/centos/7/x86_64/stable/Packages/containerd.io-1.4.3-3.1.el7.x86_64.rpm
546 sudo yum install docker-ce -y
547 sudo systemctl --now enable docker
548 sudo docker run --rm hello-world
549 distribution=$(. /etc/os-release;echo $ID$VERSION_ID) && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
550 yum-config-manager --enable libnvidia-container-experimental
551 sudo yum clean expire-cache
552 sudo yum install -y nvidia-container-toolkit
553 sudo nvidia-ctk runtime configure --runtime=docker
554 sudo systemctl restart docker
555 docker ps
556 docker ps -a
557 docker rm EventDetectorV3
558 ll
559 cd jt20240731/
560 ll
561 docker images
562 history
563 docker run -it -d -v /home/data/jt/:/data/jt -v /etc/localtime:/etc/localtime:ro --restart always --net host --name EventDetectorV3 --gpus all jt20240711_gongsi
564 docker ps
565 docker exec -it EventDetectorV3 bash