已解决 failed call to cuInit: CUDA_ERROR_NO_DEVICE

最新推荐文章于 2022-07-03 23:33:43 发布

原创最新推荐文章于 2022-07-03 23:33:43 发布 · 1.9w 阅读

23 ·

CC 4.0 BY-SA版权

文章标签：

#failed call to culnit:CUDA_ERROR_NO #NVIDIA-SMI has failed

Tensorflow 同时被 2 个专栏收录

10 篇文章

订阅专栏

linux

5 篇文章

订阅专栏

在重启服务器后遇到NVIDIA驱动问题，导致无法使用GPU。虽然tensorflow仍能运行在CPU上，但GPU版本安装后无法使用。解决方案是通过更新驱动和执行特定步骤，无需重启即可恢复正常。具体操作包括查看NVIDIA版本和检查TensorFlow是否为GPU版本。

重启服务器之后就出现连接不上NVIDIA驱动的情况。这个时候tensorflow还是可以运行的，但只是在用cpu跑。安装gpu版的TensorFlow时，也显示已安装。

首先在终端输入nvidia-smi

出现NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
1 在终端输入 nvcc -V 驱动也在

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

解决办法只需要两步，不用重启

step1：sudo apt-get install dkms

step2: sudo dkms install -m nvidia -v 410.73

再次输入nvidia-smi时，回归正常。
在这里插入图片描述
其中step2 中的410.73是NVIDIA的版本号，当不知道版本号时，进入/usr/src目录中，可以看到里面有nvidia文件夹，后缀就是其版本号

cd /usr/src

另：怎么查看TensorFlow是gpu版本还是cpu版本

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

参考 https://blog.youkuaiyun.com/hangzuxi8764/article/details/86572093