要在 CentOS 上正确安装和配置 NVIDIA Container Toolkit,您可以按照以下步骤进行操作,如果1和2都已经完成,可以直接进行第3步NVIDIA Container Toolkit安装配置。
1. 安装 NVIDIA GPU 驱动程序:
您可以从 NVIDIA 官方网站下载适用于您的 GPU 型号和 CentOS 版本的驱动程序,并按照安装指南进行安装。确保您的系统已正确安装并配置了 NVIDIA GPU 驱动程序。
2. 安装 Docker CE:
2.1 删除旧版本的 Docker(如果存在):
sudo yum remove -y docker docker-common docker-selinux docker-engine
2.2 安装必要的软件包:
sudo yum install -y yum-utils device-mapper-persistent-data lvm2
2.3 添加 Docker CE 存储库:
sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
2.4 安装 Docker CE:
sudo yum install -y docker-ce
2.5 启动 Docker 服务:
sudo systemctl start docker
2.6 设置 Docker 开机自启:
sudo systemctl enable docker
3. 安装 NVIDIA Container Toolkit:
3.1 添加 NVIDIA Container Toolkit 存储库密钥:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo
安装过程:
[xxx]# distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
[xxx]# curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo
[libnvidia-container]
name=libnvidia-container
baseurl=https://nvidia.github.io/libnvidia-container/stable/centos7/$basearch
repo_gpgcheck=1
gpgcheck=0
enabled=1
gpgkey=https://nvidia.github.io/libnvidia-container/gpgkey
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt[libnvidia-container-experimental]
name=libnvidia-container-experimental
baseurl=https://nvidia.github.io/libnvidia-container/experimental/centos7/$basearch
repo_gpgcheck=1
gpgcheck=0
enabled=0
gpgkey=https://nvidia.github.io/libnvidia-container/gpgkey
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt[nvidia-container-runtime]
name=nvidia-container-runtime
baseurl=https://nvidia.github.io/nvidia-container-runtime/stable/centos7/$basearch
repo_gpgcheck=1
gpgcheck=0
enabled=1
gpgkey=https://nvidia.github.io/nvidia-container-runtime/gpgkey
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt[nvidia-container-runtime-experimental]
name=nvidia-container-runtime-experimental
baseurl=https://nvidia.github.io/nvidia-container-runtime/experimental/centos7/$basearch
repo_gpgcheck=1
gpgcheck=0
enabled=0
gpgkey=https://nvidia.github.io/nvidia-container-runtime/gpgkey
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt[nvidia-docker]
name=nvidia-docker
baseurl=https://nvidia.github.io/nvidia-docker/centos7/$basearch
repo_gpgcheck=1
gpgcheck=0
enabled=1
gpgkey=https://nvidia.github.io/nvidia-docker/gpgkey
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
3.2 安装 NVIDIA Container Toolkit:
sudo yum install -y nvidia-docker2
安装过程
[ xxx ]# yum install -y nvidia-docker2
Loaded plugins: fastestmirror, langpacks, nvidia
Loading mirror speeds from cached hostfile
epel/x86_64/metalink | 14 kB 00:00:00base | 3.6 kB 00:00:00
centos-sclo-rh | 3.0 kB 00:00:00
centos-sclo-sclo | 3.0 kB 00:00:00
cuda-rhel7-x86_64 | 3.0 kB 00:00:00
docker-ce-stable | 3.5 kB 00:00:00
epel | 4.7 kB 00:00:00
extras | 2.9 kB 00:00:00
libnvidia-container/x86_64/signature | 833 B 00:00:00
Retrieving key from https://nvidia.github.io/libnvidia-container/gpgkey
Importing GPG key 0xF796ECB0:Userid : "NVIDIA CORPORATION (Open Source Projects) <cudatools@nvidia.com>"Fingerprint: c95b 321b 61e8 8c18 09c4 f759 ddca e044 f796 ecb0From : https://nvidia.github.io/libnvidia-container/gpgkey
libnvidia-container/x86_64/signature | 2.1 kB 00:00:00 !!!
nvidia-container-runtime/x86_64/signature | 833 B 00:00:00
Retrieving key from https://nvidia.github.io/nvidia-container-runtime/gpgkey
Importing GPG key 0xF796ECB0:Userid : "NVIDIA CORPORATION (Open Source Projects) <cudatools@nvidia.com>"Fingerprint: c95b 321b 61e8 8c18 09c4 f759 ddca e044 f796 ecb0From : https://nvidia.github.io/nvidia-container-runtime/gpgkey
nvidia-container-runtime/x86_64/signature | 2.1 kB 00:00:00 !!!
nvidia-docker/x86_64/signature | 833 B 00:00:00
Retrieving key from https://nvidia.github.io/nvidia-docker/gpgkey
Importing GPG key 0xF796ECB0:Userid : "NVIDIA CORPORATION (Open Source Projects) <cudatools@nvidia.com>"Fingerprint: c95b 321b 61e8 8c18 09c4 f759 ddca e044 f796 ecb0From : https://nvidia.github.io/nvidia-docker/gpgkey
nvidia-docker/x86_64/signature | 2.1 kB 00:00:00 !!!
updates | 2.9 kB 00:00:00
(1/6): nvidia-docker/x86_64/primary | 8.0 kB 00:00:01
(2/6): epel/x86_64/updateinfo | 1.0 MB 00:00:01
(3/6): nvidia-container-runtime/x86_64/primary | 11 kB 00:00:01
(4/6): libnvidia-container/x86_64/primary | 35 kB 00:00:01
(5/6): epel/x86_64/primary_db | 7.0 MB 00:00:04
(6/6): updates/7/x86_64/primary_db | 22 MB 00:00:10
libnvidia-container 231/231
nvidia-container-runtime 71/71
nvidia-docker 54/54
Resolving Dependencies
--> Running transaction check
---> Package nvidia-docker2.noarch 0:2.13.0-1 will be installed
--> Processing Dependency: nvidia-container-toolkit >= 1.13.0-1 for package: nvidia-docker2-2.13.0-1.noarch
--> Running transaction check
---> Package nvidia-container-toolkit.x86_64 0:1.13.5-1 will be installed
--> Processing Dependency: nvidia-container-toolkit-base = 1.13.5-1 for package: nvidia-container-toolkit-1.13.5-1.x86_64
--> Processing Dependency: libnvidia-container-tools < 2.0.0 for package: nvidia-container-toolkit-1.13.5-1.x86_64
--> Processing Dependency: libnvidia-container-tools >= 1.13.5-1 for package: nvidia-container-toolkit-1.13.5-1.x86_64
--> Running transaction check
---> Package libnvidia-container-tools.x86_64 0:1.13.5-1 will be installed
--> Processing Dependency: libnvidia-container1(x86-64) >= 1.13.5-1 for package: libnvidia-container-tools-1.13.5-1.x86_64
--> Processing Dependency: libnvidia-container.so.1(NVC_1.0)(64bit) for package: libnvidia-container-tools-1.13.5-1.x86_64
--> Processing Dependency: libnvidia-container.so.1()(64bit) for package: libnvidia-container-tools-1.13.5-1.x86_64
---> Package nvidia-container-toolkit-base.x86_64 0:1.13.5-1 will be installed
--> Running transaction check
---> Package libnvidia-container1.x86_64 0:1.13.5-1 will be installed
--> Finished Dependency ResolutionDependencies Resolved====================================================================================================================================================================Package Arch Version Repository Size
====================================================================================================================================================================
Installing:nvidia-docker2 noarch 2.13.0-1 libnvidia-container 8.7 k
Installing for dependencies:libnvidia-container-tools x86_64 1.13.5-1 libnvidia-container 52 klibnvidia-container1 x86_64 1.13.5-1 libnvidia-container 1.0 Mnvidia-container-toolkit x86_64 1.13.5-1 libnvidia-container 909 knvidia-container-toolkit-base x86_64 1.13.5-1 libnvidia-container 3.1 MTransaction Summary
====================================================================================================================================================================
Install 1 Package (+4 Dependent packages)Total download size: 5.1 M
Installed size: 15 M
Downloading packages:
(1/5): libnvidia-container-tools-1.13.5-1.x86_64.rpm | 52 kB 00:00:01
(2/5): libnvidia-container1-1.13.5-1.x86_64.rpm | 1.0 MB 00:00:01
(3/5): nvidia-container-toolkit-1.13.5-1.x86_64.rpm | 909 kB 00:00:01
(4/5): nvidia-docker2-2.13.0-1.noarch.rpm | 8.7 kB 00:00:00
(5/5): nvidia-container-toolkit-base-1.13.5-1.x86_64.rpm | 3.1 MB 00:00:02
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total 1.1 MB/s | 5.1 MB 00:00:04
Running transaction check
Running transaction test
Transaction test succeeded
Running transactionInstalling : libnvidia-container1-1.13.5-1.x86_64 1/5Installing : libnvidia-container-tools-1.13.5-1.x86_64 2/5Installing : nvidia-container-toolkit-base-1.13.5-1.x86_64 3/5Installing : nvidia-container-toolkit-1.13.5-1.x86_64 4/5Installing : nvidia-docker2-2.13.0-1.noarch 5/5
warning: /etc/docker/daemon.json saved as /etc/docker/daemon.json.rpmorigVerifying : nvidia-container-toolkit-base-1.13.5-1.x86_64 1/5Verifying : libnvidia-container-tools-1.13.5-1.x86_64 2/5Verifying : nvidia-docker2-2.13.0-1.noarch 3/5Verifying : libnvidia-container1-1.13.5-1.x86_64 4/5Verifying : nvidia-container-toolkit-1.13.5-1.x86_64 5/5Installed:nvidia-docker2.noarch 0:2.13.0-1Dependency Installed:libnvidia-container-tools.x86_64 0:1.13.5-1 libnvidia-container1.x86_64 0:1.13.5-1 nvidia-container-toolkit.x86_64 0:1.13.5-1nvidia-container-toolkit-base.x86_64 0:1.13.5-1Complete!
4. 配置 Docker:
4.1 创建或编辑 Docker 配置文件 /etc/docker/daemon.json
sudo nano /etc/docker/daemon.json
4.2 添加以下内容到文件中:
{"default-runtime": "nvidia","runtimes": {"nvidia": {"path": "nvidia-container-runtime","runtimeArgs": []}}
}
4.3 保存并关闭文件。
5. 重启 Docker 服务:
sudo systemctl restart docker
完成上述步骤后,您的 CentOS 系统将具备 NVIDIA Container Toolkit 的安装和配置。您可以使用带有 GPU 功能的 Docker 容器,并确保容器正确地使用 GPU 资源。
请注意,上述步骤适用于 CentOS 7 及更高版本。如果您使用的是其他版本的 CentOS,请参考 NVIDIA Container Toolkit 官方文档中针对您的 CentOS 版本的安装和配置指南。
6. NVIDIA Container Toolkit 的官方文档链接:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/index.html