ubuntu16.04 安装Nvidia驱动和Nvidia-Docker过程

1 安装Nvidia驱动

    查看GPU型号:

lspci | grep -i nvidia

02:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 12GB] (rev a1)
03:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 12GB] (rev a1)
83:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 12GB] (rev a1)
84:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 12GB] (rev a1)

    下载驱动:https://www.geforce.cn/drivers/beta-legacy,根据自己的GPU型号去下载相应的驱动

1.1 卸载原有驱动

    卸载

sudo apt-get remove --purge nvidia*

1.2 禁用nouveau驱动

    在/etc/modprobe.d/blacklist.conf/文件末尾添加如下内容

blacklist nouveau
options nouveau modeset=0

    更新

sudo update-initramfs -u

    检查禁用,如无任何输出则成功

lsmod | grep nouveau

    如果还有输出,可以尝试重启一下

1.3 关闭图形显示管理工具

   关闭lightdm

sudo service lightdm stop

1.4 给脚本赋权

    赋权

sudo chmod  a+x NVIDIA-Linux-x86_64-440.59.run

1.5 安装驱动

    执行安装脚本

sudo ./NVIDIA-Linux-x86_64-440.59.run

    按照提示选择安装即可,按照我给的步骤做基本没啥问题

1.6 验证驱动是否安装成功

   执行nvidia-smi

tilyp@tilyp:~/$ nvidia-smi

Thu Mar  5 14:39:58 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59       Driver Version: 440.59       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:02:00.0 Off |                  N/A |
| 17%   37C    P0    63W / 250W |      0MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  Off  | 00000000:82:00.0 Off |                  N/A |
| 11%   38C    P0    29W / 250W |      0MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

    这里可以看到GPU的信息,驱动安装成功

2 安装CUDA(nvidia-docker不需要)

2.1 下载cuda安装脚本

    下载地址:https://developer.nvidia.com/cuda-downloads,根据驱动对应的版本下载相应的CUDA版本

2.2 脚本赋权

    赋权

sudo chmod  a+x cuda_10.2.89_440.33.01_linux.run 

2.3 安装

    执行脚本

sudo ./cuda_10.2.89_440.33.01_linux.run

   这里要注意,不能勾选重新安装Nvidia-driver

2.4 配置环境变量

   在~/.bashrc文件末尾添加如下内容

export PATH=/usr/local/cuda-10.2/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64:$LD_LIBRARY_PATH

  加载配置

source ~/.bashrc

2.5  查看CUDA版本

   看到如下信息证明安装成功

tilyp@tilyp:~/$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

3  安装cudnn(nvidia-docker不需要)

3.1 下载cudnn

    下载地址:https://developer.nvidia.com/rdp/cudnn-archive,同样需要找对应的版本下载

     解压安装包

tar -zxvf cudnn-10.1-linux-x64-v7.6.4.38.tgz

     复制到cuda目录下

sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

    查看cudnn版本

tilyp@tilyp:~/$ cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 4
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#include "driver_types.h"

4 安装docker

     更新软件

sudo apt-get update

    安装https依赖

sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common

    添加gpg KEY

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

    验证指纹

sudo apt-key fingerprint 0EBFCD88

    添加docker源到系统中

sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"

    更新源索引

sudo apt-get update

    安装docker

sudo apt-get install docker-ce docker-ce-cli containerd.io

     验证安装是否成功

tilyp@tilyp:~/$ sudo docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
1b930d010525: Pull complete 
Digest: sha256:fc6a51919cfeb2e6763f62b6d9e8815acbf7cd2e476ea353743570610737b752
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

     至此,docker安装成功

5 安装nvidia-docker

     添加源并更新

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update

    安装nvidia-docker

sudo apt-get install nvidia-docker2

   重启docker

sudo pkill -SIGHUP dockerd

   验证nvidia-docker

tilyp@tilyp:~/$ sudo docker run --runtime=nvidia --rm nvidia/cuda:10.2-base-ubuntu16.04 nvidia-smi
Thu Mar  5 06:11:30 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59       Driver Version: 440.59       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:02:00.0 Off |                  N/A |
| 16%   36C    P0    63W / 250W |      0MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  Off  | 00000000:82:00.0 Off |                  N/A |
| 14%   38C    P0    30W / 250W |      0MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

    看到在运行nvidia-docker中执行nvidia-smi和在服务器上直接执行nvidia-smi的结果是一样的,所以我们安装到此结束,需要说明的一点是--runtime=nvidia这个参数,这个是在安装nvidia-docker的过程中,它会在/etc/docker/daemon.json做好了配置,我们只需要通过指定就可以调用nvidia-docker了。

/etc/docker/daemon.json配置如下:

{
    "insecure-registries": ["192.168.0.198:5000"],
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

 

完毕,,,

有问题加QQ群:  526855734

### 回答1: 1. 首先,需要安装NVIDIA驱动程序。可以通过以下命令安装: sudo add-apt-repository ppa:graphics-drivers/ppa sudo apt-get update sudo apt-get install nvidia-384 2. 安装Docker。可以通过以下命令安装: sudo apt-get install docker.io 3. 安装nvidia-docker。可以通过以下命令安装: curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update sudo apt-get install nvidia-docker2 sudo pkill -SIGHUP dockerd 4. 验证nvidia-docker是否安装成功。可以通过以下命令验证: sudo docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi 如果nvidia-smi显示了GPU信息,则说明nvidia-docker已经成功安装。 ### 回答2: 安装nvidia-docker是为了在Ubuntu 16.04上使用NVIDIA GPU进行深度学习机器学习等任务。以下是在Ubuntu 16.04安装nvidia-docker的步骤: 1. 在终端中,使用以下命令添加NVIDIA Docker软件包的存储库: ``` $ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - $ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list $ sudo apt-get update ``` 2. 安装nvidia-docker软件包: ``` $ sudo apt-get install -y nvidia-docker2 $ sudo pkill -SIGHUP dockerd ``` 3. 使用以下命令验证安装是否成功: ``` $ docker run --gpus all nvidia/cuda:11.0-base nvidia-smi ``` 如果成功安装并配置正确,将显示GPU的信息。 通过以上步骤,你可以在Ubuntu 16.04上成功安装nvidia-docker,并使用NVIDIA GPU进行深度学习机器学习等任务。请注意,确保你的系统安装了合适的NVIDIA GPU驱动程序,并且能够与nvidia-docker兼容。
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值