linux验证cuda安装成功_linux # centos # 安装cuda

本文详细介绍了如何在实体机的CentOS 7系统上安装和验证CUDA 9.1的步骤,包括确认CUDA toolkit和NVIDIA驱动版本、安装显卡驱动、设置环境变量以及通过deviceQuery验证CUDA安装。特别强调了在虚拟机中安装可能失败,并提供了解决常见问题的方案。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

注意: 千万不要在虚拟机机中操作,不会成功的。因为目前不支持。

要想成功,需要在实体机中操作。

准备

确认版本

主要确认CUDA toolkit和nvidia的驱动版本。

经过实践之后,发现最靠谱的确定思路是:

首先根据本机的显卡版本,确定nvidia显卡的驱动版本,然后根据驱动版本确定CUDA toolkit的版本。

查看显卡的类型

可以看到显卡的类型为GeForce GTX 1060 3G

CUDA的core个数为: 1152个

确定显卡的驱动版本

https://www.geforce.com/drivers

然后可以查询到所有支持该显卡的驱动版本,最上边的为最新版本(除了beta版本)。

可看到当前nvidia显卡最新的驱动版本为: 390.87

确定CUDA toolkit的版本

CUDA toolkit对nvidia的版本有要求, 可参见https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html中的CUDA Driver部分的说明:

image.png

linux平台下,由于nvidia driver的最新版本为390.87,所以无法选择CUDA 9.2, 因为它对driver的要求是>=396.26, 所以选择CUDA 9.1,它的要求是>=390.46, 满足要求。

查看系统和内核的要求

参见https://docs.nvidia.com/cuda/archive/9.1/cuda-installation-guide-linux/index.html中System Requirements部分的说明:

可见CUDA 9.1对各系统的要求。

比如CentOS 7.x,要求内核3.10, gcc版本4.8.5, GLIBC版本2.17等。

必要的查询

可参考https://docs.nvidia.com/cuda/archive/9.1/cuda-installation-guide-linux/index.html中的第2章。

(1) 查看是否存在支持CUDA的GPU

lspci | grep -i nvidia

可以在https://developer.nvidia.com/cuda-gpus查询本机的显卡是否支持CUDA。

(2) 查看当前linux版本是否支持

The CUDA Development Tools are only supported on some specific distributions of Linux.

$ uname -m && cat /etc/*release

You should see output similar to the following, modified for your particular system:

x86_64

Red Hat Enterprise Linux Workstation release 6.0 (Santiago)

The x86_64 line indicates you are running on a 64-bit system.

The remainder gives information about your distribution.

(3) 查看gcc的版本:

$ gcc --version

(4) 查看glibc版本

ll /lib64/libc.so.*

(5) 安装当前内核需要的kernel headers

这个步骤很重要。

sudo yum install "kernel-devel-uname-r == $(uname -r)"

安装显卡驱动和CUDA toolkit

Handle Conflicting Installation Methods中提到:

可见,同版本的显卡驱动和CUDA toolkit,如果再次安装时,需要卸载旧的版本。

如果CUDA toolkit已安装,可用如下途径卸载:

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.1/bin

To uninstall the NVIDIA Driver, run nvidia-uninstall

安装显卡driver

yum安装

大部分 Linux 发行版都使用开源的显卡驱动 nouveau,对于 nvidia 显卡来说,还是闭源的官方驱动的效果更好。

安装官方显卡驱动,可参考这个网址:https://blog.youkuaiyun.com/u013378306/article/details/69229919

里边介绍了一种简单的用yum安装nvidia显卡驱动的方法。

操作之前需要屏蔽默认带有的nouveau。

lsmod | grep nouveau

如果以上语句没有输出,则表示屏蔽默认带有的nouveau

成功。

这种方式,最后一步:

yum -y install kmod-nvidia

有时可能不成功,不过不妨碍使用

nvidia-detect -v

返回的结果去查找对应的驱动版本,进行安装。

源码安装

查找驱动的靠谱地址: https://www.geforce.com/drivers

安装过程可参考: https://blog.youkuaiyun.com/itaacy/article/details/72628792?utm_source=itdadao&utm_medium=referral

显卡安装成功后,可用如下命令查看显卡信息:

nvidia-smi

出现以上信息,说明显卡驱动安装成功。

卸载显卡驱动,可用如下指令:

nvidia-uninstall

安装 CUDA toolkit

注: 安装前应该关闭gnome。

获取CUDA toolkit下载地址:

CUDA toolkit 下载地址: https://developer.nvidia.com/cuda-toolkit-archive

下载CUDA 9.1。

安装CUDA:

sh cuda_9.1.85_387.26_linux.run

安装过程(以下是某次安装9.2版本的日志,仅参考):

Do you accept the previously read EULA?

accept/decline/quit: accept

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 396.37?

(y)es/(n)o/(q)uit: yes

Do you want to install the OpenGL libraries?

(y)es/(n)o/(q)uit [ default is yes ]: yes

Do you want to run nvidia-xconfig?

This will update the system X configuration file so that the NVIDIA X driver

is used. The pre-existing X configuration file will be backed up.

This option should not be used on systems that require a custom

X configuration, such as systems with multiple GPU vendors.

(y)es/(n)o/(q)uit [ default is no ]: y

Install the CUDA 9.2 Toolkit?

(y)es/(n)o/(q)uit: y

Enter Toolkit Location

[ default is /usr/local/cuda-9.2 ]: y

Toolkit location must be an absolute path.

Enter Toolkit Location

[ default is /usr/local/cuda-9.2 ]: /usr/local/cuda-9.2

Do you want to install a symbolic link at /usr/local/cuda?

(y)es/(n)o/(q)uit: y

Install the CUDA 9.2 Samples?

(y)es/(n)o/(q)uit: y

Enter CUDA Samples Location

[ default is /root ]: y

Samples location must be an absolute path

Enter CUDA Samples Location

[ default is /root ]: y

Samples location must be an absolute path

Enter CUDA Samples Location

[ default is /root ]: /root

Installing the NVIDIA display driver...

安装成功的日志:

Installing the NVIDIA display driver...

Installing the CUDA Toolkit in /usr/local/cuda-9.2 ...

Installing the CUDA Samples in /root ...

Copying samples to /root/NVIDIA_CUDA-9.2_Samples now...

Finished copying samples.

===========

= Summary =

===========

Driver: Installed

Toolkit: Installed in /usr/local/cuda-9.2

Samples: Installed in /root

Please make sure that

- PATH includes /usr/local/cuda-9.2/bin

- LD_LIBRARY_PATH includes /usr/local/cuda-9.2/lib64, or, add /usr/local/cuda-9.2/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.2/bin

To uninstall the NVIDIA Driver, run nvidia-uninstall

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.2/doc/pdf for detailed information on setting up CUDA.

Logfile is /tmp/cuda_install_3101.log

配置环境变量

https://www.jianshu.com/p/73399a4c9114 参考这个设置环境变量。

验证cuda是否安装成功

cd /root/NVIDIA_CUDA-9.2_Samples/1_Utilities/deviceQuery

make

./deviceQuery

如果成功,会显示PASS。

./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1060 3GB"

CUDA Driver Version / Runtime Version 8.0 / 8.0

CUDA Capability Major/Minor version number: 6.1

Total amount of global memory: 3013 MBytes (3159293952 bytes)

( 9) Multiprocessors, (128) CUDA Cores/MP: 1152 CUDA Cores

GPU Max Clock rate: 1747 MHz (1.75 GHz)

Memory Clock rate: 4004 Mhz

Memory Bus Width: 192-bit

L2 Cache Size: 1572864 bytes

Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)

Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers

Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 49152 bytes

Total number of registers available per block: 65536

Warp size: 32

Maximum number of threads per multiprocessor: 2048

Maximum number of threads per block: 1024

Max dimension size of a thread block (x,y,z): (1024, 1024, 64)

Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)

Maximum memory pitch: 2147483647 bytes

Texture alignment: 512 bytes

Concurrent copy and kernel execution: Yes with 2 copy engine(s)

Run time limit on kernels: No

Integrated GPU sharing Host Memory: No

Support host page-locked memory mapping: Yes

Alignment requirement for Surfaces: Yes

Device has ECC support: Disabled

Device supports Unified Addressing (UVA): Yes

Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0

Compute Mode:

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 1060 3GB

Result = PASS

可以看到CUDA Driver Version / Runtime Version 8.0 / 8.0

( 9) Multiprocessors, (128) CUDA Cores/MP: 1152 CUDA Cores

等参数。

如何查看cuda的版本

nvcc --version

遇到问题及解决:

The driver installation is unable to locate the kernel source.

The driver installation is unable to locate the kernel source. Please make sure that the kernel source packages are installed and set up correctly.

If you know that the kernel source packages are installed and set up correctly, you may pass the location of the kernel source with the '--kernel-source-path' flag.

解决方法:

sudo yum install epel-release

yum install --enablerepo=epel dkms

Missing recommended library

Installing the NVIDIA display driver...

Installing the CUDA Toolkit in /usr/local/cuda-9.2 ...

Missing recommended library: libGLU.so

Missing recommended library: libX11.so

Missing recommended library: libXi.so

Missing recommended library: libXmu.so

Installing the CUDA Samples in /root ...

Copying samples to /root/NVIDIA_CUDA-9.2_Samples now...

Finished copying samples.

===========

= Summary =

===========

Driver: Installed

Toolkit: Installed in /usr/local/cuda-9.2

Samples: Installed in /root, but missing recommended libraries

Please make sure that

- PATH includes /usr/local/cuda-9.2/bin

- LD_LIBRARY_PATH includes /usr/local/cuda-9.2/lib64, or, add /usr/local/cuda-9.2/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.2/bin

To uninstall the NVIDIA Driver, run nvidia-uninstall

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.2/doc/pdf for detailed information on setting up CUDA.

Logfile is /tmp/cuda_install_7498.log

解决方法:

yum install mesa-libGLES.x86_64 mesa-libGL-devel.x86_64

mesa-libGLU-devel.x86_64 mesa-libGLw.x86_64

mesa-libGLw-devel.x86_64 libXi-devel.x86_64

freeglut-devel.x86_64 freeglut.x86_64

cudaGetDeviceCount returned 30

验证cuda安装是否成功时,出现如下提示:

[root@localhost deviceQuery]# ./deviceQuery

./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 30

-> unknown error

Result = FAIL

[root@localhost deviceQuery]# pwd

/root/NVIDIA_CUDA-9.2_Samples/1_Utilities/deviceQuery

这种一般是nvidia显卡驱动的问题,需要安装最新的nvidia的驱动。

http://elrepo.org/tiki/tiki-index.php

rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org

rpm -Uvh https://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm

然后按照https://blog.youkuaiyun.com/u013378306/article/details/69229919中用yum方式安装nvidia的驱动。

cudaGetDeviceCount returned 35

图片发自简书App

这种一般是cuda版本的问题。确定正确的版本,安装即可。

CUDA driver version is insufficient for CUDA runtime version就是说cuda runtime库的版本比driver的版本高了,要么装更高版本的驱动,要么就用低一点版本的cuda runtime库,所有的库都可以在这里面找到http://developer.download.nvidia.com/compute/cuda/repos/

Your kernel headers for kernel xxx cannot be found

图片发自简书App

The solution is likely to be found at this question the short version being, run

sudo yum install "kernel-devel-uname-r == $(uname -r)"

That will install the kernel headers for the version of the kernel you are currently running.

References:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

https://baiweiblog.wordpress.com/2017/07/21/cuda-8-0%E5%9C%A8linux%E4%B8%8A%E7%9A%84%E5%AE%89%E8%A3%85%E6%B5%81%E7%A8%8B/

https://stackoverflow.com/questions/38016466/installing-cuda-7-5-on-centos-7-unable-to-locate-the-kernel-source

https://bitsanddragons.wordpress.com/2016/10/07/cuda-on-centos-7/

https://devtalk.nvidia.com/default/topic/1027413/cuda-setup-and-installation/linux-installation-error-cudagetdevicecount-returned-30-gt-unknown-error/

https://developer.download.nvidia.com/compute/cuda/9.2/Prod2/docs/sidebar/CUDA_Installation_Guide_Linux.pdf

https://blog.youkuaiyun.com/10km/article/details/61665578

https://medium.com/@changrongko/nv-how-to-check-cuda-and-cudnn-version-e05aa21daf6c

https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

https://www.cnblogs.com/wolflzc/p/9117291.html

http://detail.zol.com.cn/picture_index_1760/index17594460.shtml

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值