linux nvidia 361.run,Nvidia-Docker

本文探讨了容器技术在结合NVIDIA GPU的应用中的日益重要性,特别是nvidia-docker如何简化GPU应用程序的部署、隔离和协作。通过案例研究,展示了如何利用nvidia-docker创建可重复的构建,轻松部署GPU资源,并在不同驱动和工具包环境中运行,只需安装NVIDIA驱动即可。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

简述

由于容器技术的优势,其应用越发广泛,并且传统虚拟化技术正逐步向容器进行适配,比如将SR-IOV(Single-Root Input/Output

Virtualization)应用于容器,Intel的实验[1]表明网络和存储的性能几乎能接近物理设备。同时近些年GPU (Graphics Processing

Unit)在高性能计算,云桌面等领域不断革新。GPU密集型的应用程序开发、调试和使用,环境比较多样且版本依赖程度高。而借助容器技术在CI/CD方面的优势,容器化的GPU应用程序将带来以下好处,NVIDIA Docker简化了这些繁锁的工作,本文将初步认识和简单实践nvidia-docker[2]。

Benefits of GPU

containerization:

Reproducible

builds

Ease of

deployment

Isolation of

individual devices

Run across

heterogeneous driver/toolkit environments

Requires only the

NVIDIA driver to be installed

Enables "fire and

forget" GPU applications

Facilitate

collaboration

Example of how CUDA

integrates with Docker.

a4c26d1e5885305701be709a3d33442f.png

实验环境

系统配置

操作系统为CentOS

[root@localhost ~]#

cat /etc/redhat-release

CentOS Linux

release 7.2.1511 (Core)

[root@localhost ~]#

uname -a

Linux

localhost.localdomain 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31

16:04:38 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

[root@localhost ~]#

yum update -y

[root@localhost ~]#

yum install -y  wget tmux vim git pciutils

kernel-devel kernel-headers gcc make epel-release

GPU详情

[root@localhost ~]#

lspci | grep VGA

03:00.0 VGA

compatible controller: NVIDIA Corporation GK104GL [Quadro K4200]

(rev a1)

04:00.0 VGA

compatible controller: NVIDIA Corporation GF110GL [Tesla C2050 /

C2075] (rev a1)

[root@localhost ~]#

lspci -v -s 03:00.0

03:00.0 VGA

compatible controller: NVIDIA Corporation GK104GL [Quadro K4200]

(rev a1) (prog-if 00 [VGA controller])

Subsystem: NVIDIA Corporation Device 1096

Physical Slot: 2

Flags: bus master, fast devsel, latency 0, IRQ 11

Memory at fa000000 (32-bit, non-prefetchable) [size=16M]

Memory at d0000000 (64-bit, prefetchable) [size=256M]

Memory at e0000000 (64-bit, prefetchable) [size=32M]

I/O ports at d000 [size=128]

Expansion ROM at fb000000 [disabled] [size=512K]

Capabilities: [60] Power Management version 3

Capabilities: [68] MSI: Enable- Count=1/1 Maskable-

64bit+

Capabilities: [78] Express Endpoint, MSI 00

Capabilities: [b4] Vendor Specific Information: Len=14

Capabilities: [100] Virtual Channel

Capabilities: [128] Power Budgeting

Capabilities: [420] Advanced Error Reporting

Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1

Len=024

a4c26d1e5885305701be709a3d33442f.png

[root@localhost ~]#

lspci -v -s 04:00.0

04:00.0 VGA

compatible controller: NVIDIA Corporation GF110GL [Tesla C2050 /

C2075] (rev a1) (prog-if 00 [VGA controller])

Subsystem: NVIDIA Corporation Tesla C2075

Physical Slot: 4

Flags: fast devsel, IRQ 11

Memory at f8000000 (32-bit, non-prefetchable) [disabled]

[size=16M]

Memory at e8000000 (64-bit, prefetchable) [disabled]

[size=128M]

Memory at f0000000 (64-bit, prefetchable) [disabled]

[size=32M]

I/O ports at c000 [disabled] [size=128]

Expansion ROM at f9000000 [disabled] [size=512K]

Capabilities: [60] Power Management version 3

Capabilities: [68] MSI: Enable- Count=1/1 Maskable-

64bit+

Capabilities: [78] Express Endpoint, MSI 00

Capabilities: [b4] Vendor Specific Information: Len=14

Capabilities: [100] Virtual Channel

Capabilities: [128] Power Budgeting

Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1

Len=024

a4c26d1e5885305701be709a3d33442f.png

安装docker

[root@localhost ~]#

sudo tee /etc/yum.repos.d/docker.repo <

>

[dockerrepo]

> name=Docker

Repository

>

baseurl=https://yum.dockerproject.org/repo/main/centos/$releasever/

>

enabled=1

>

gpgcheck=1

>

gpgkey=https://yum.dockerproject.org/gpg

> EOF

[root@localhost ~]#

yum install docker-engine

[root@localhost ~]#

systemctl restart docker

[root@localhost ~]#

systemctl enable docker

安装NVIDIA驱动

[root@localhost ~]#

uname -r

3.10.0-327.13.1.el7.x86_64

[root@localhost ~]#

ll /usr/src/kernels/3.10.0-327.13.1.el7.x86_64/

版本要一致,否则检查修改grup2并重启。

出于兼容性的考虑,选择较低版本的驱动进行安装

[root@localhost ~]#

sh ./NVIDIA-Linux-x86_64-352.79_Tesla_C2050.run

[root@localhost ~]#

ll /dev/nvidia*

crw-rw-rw-. 1 root

root 195, 0

4月6

18:19 /dev/nvidia0

crw-rw-rw-. 1 root

root 195, 1

4月6

18:19 /dev/nvidia1

crw-rw-rw-. 1 root

root 195, 255 4月6 18:19

/dev/nvidiactl

crw-rw-rw-. 1 root

root 246, 0

4月6

18:19 nvidia-uvm

如果没有nvidia-uvm,则手动modprobe

[root@localhost ~]#

sudo modprobe nvidia_uvm

安装和配置CUDA环境

安装

[root@localhost

~]# rpm -ivh

cuda-repo-rhel7-7.5-18.x86_64.rpm

[root@localhost ~]#

yum clean expire-cache

[root@localhost ~]#

yum install cuda -y

出现DKMS dependency问题时,检查是否执行了yum install -y epel-release

配置环境变量

[root@localhost ~]#

find / -name nvcc

/usr/local/cuda-7.5/bin/nvcc

可知cuda版本是7.5,位于/usr/local/cuda-7.5/目录下。

[root@localhost ~]#

vim /etc/profile

…….

……..

export

PATH=/usr/local/cuda-7.5/bin:$PATH

export

LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64:$LD_LIBRARY_PATH

安装nvidia-docker

# Install

nvidia-docker and nvidia-docker-plugin

[root@localhost ~]#

sudo tar --strip-components=1 -C /usr/bin -xvf

/tmp/nvidia-docker_1.0.0.beta.3_amd64.tar.xz && rm

/tmp/nvidia-docker*.tar.xz

# Run

nvidia-docker-plugin

[root@localhost ~]#

sudo -b nohup nvidia-docker-plugin >

/tmp/nvidia-docker.log

docker

images

REPOSITORY TAG IMAGE

ID CREATED SIZE

nvidia/cuda latest 22bde803e760 2 weeks

ago 1.226 GB

错误:

[root@localhost ~]#

nvidia-docker run --rm nvidia/cuda nvidia-smi

docker: Error

response from daemon: create nvidia_driver_352.79: create

nvidia_driver_352.79: Error looking up volume plugin nvidia-docker:

plugin not found.

See 'docker run

--help'.

解决办法:

nvidia-docker

volume setup

docker volume

ls

DRIVER VOLUME NAME

local nvidia_driver_352.79

a4c26d1e5885305701be709a3d33442f.png

测试

启动多个容器,并确认每个container中都有nvidia设备

mkdir -p

~/docker/digits

nvidia-docker run

-it -p 8080:8080 -v ~/docker/digits:/digits nvidia/cuda

nvidia-docker run

-it -p 8081:8080 -v ~/docker/digits:/digits nvidia/cuda

nvidia-docker run

-it -p 8082:8080 -v ~/docker/digits:/digits nvidia/cuda

nvidia-docker run

-it -p 8083:8080 -v ~/docker/digits:/digits nvidia/cuda

a4c26d1e5885305701be709a3d33442f.png

docker ps

-a

CONTAINER

ID IMAGE COMMAND CREATED STATUS PORTS NAMES

5d86dbc4047b nvidia/cuda "/bin/bash" About a minute ago Up About a

minute 0.0.0.0:8083->8080/tcp

romantic_williams

0c8d3300140b nvidia/cuda "/bin/bash" About a minute ago Up About a

minute 0.0.0.0:8082->8080/tcp tiny_shaw

5c927720fa16 nvidia/cuda "/bin/bash" 2 minutes

ago Up About a minute 0.0.0.0:8081->8080/tcp drunk_brattain

5ab94e3a21d2 nvidia/cuda "/bin/bash" 2 minutes

ago Up 2

minutes 0.0.0.0:8080->8080/tcp evil_pare

在容器中编译并运行cuda程序

cuda源文件

root@5ab94e3a21d2:~# cd

/digits/

root@5ab94e3a21d2:/digits# ls

hellocuda.cu

在容器中使用nvcc编译

root@5ab94e3a21d2:/digits# nvcc hellocuda.cu -o

hellocuda

在容器中运行程序

root@5ab94e3a21d2:/digits# ./hellocuda

16 18 20 22 24 26

28 30 32 34 36 38 40 42 44 46

也可在容器中使用其它测试NVIDIA/DIGITS[4][5]

[1] Single-Root

Input/Output Virtualization (SR-IOV) with Linux*

Containers

### 安装 NVIDIA Docker on Ubuntu 22.04 #### 配置环境准备 为了在Ubuntu 22.04上安装NVIDIA Docker,需先确认已启用Universe仓库[^2]。这一步骤对于获取必要的依赖项至关重要。 #### 更新软件包索引并安装Docker Engine 确保系统处于最新状态非常重要。通过运行`apt-get update && apt-get upgrade`命令更新现有软件包列表。之后可以按照官方指南中的说明来设置Docker的APT源,并完成Docker引擎版本v20.10.20的部署[^1]。 ```bash sudo apt-get update && sudo apt-get upgrade -y ``` #### 添加NVIDIA APT Repository 接下来要添加NVIDIA容器工具套件的APT存储库到系统的资源列表中去。此操作允许访问最新的驱动程序以及支持GPU加速的应用镜像。 ```bash distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list ``` #### 安装NVIDIA Container Toolkit 一旦上述准备工作就绪,则可以通过执行如下指令来进行实际的NVIDIA容器工具箱安装: ```bash sudo apt-get update sudo apt-get install -y nvidia-container-toolkit ``` #### 启动与验证服务 重启docker守护进程使更改生效;随后可通过简单的测试案例检验是否成功启用了GPU功能。 ```bash sudo systemctl restart docker # 测试CUDA可见性 docker run --rm --gpus all nvidia/cuda:11.8.0-base-nns3 nc -zv localhost 22 ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值