解决Could not load dynamic library 'libcudart.so.10.0'的问题

问题表现与分析

在安装了CUDA和CUDNN还有Tensorflow最新的2.0正式版本后,我在使用Pycharm写TF代码并运行时,遇到这样的问题

主要表现就是提示找不到动态库文件,扫了一眼文件名,都是CUDA的库文件,那怎么会说找不到

2019-10-15 19:19:41.440285: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-10-15 19:19:41.465433: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-15 19:19:41.465758: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.665
pciBusID: 0000:01:00.0
2019-10-15 19:19:41.465809: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2019-10-15 19:19:41.465841: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2019-10-15 19:19:41.465870: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2019-10-15 19:19:41.465900: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2019-10-15 19:19:41.465930: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2019-10-15 19:19:41.465959: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2019-10-15 19:19:41.468179: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-10-15 19:19:41.468189: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2019-10-15 19:19:41.468361: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-15 19:19:41.490938: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3696000000 Hz
2019-10-15 19:19:41.492057: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x520eba0 executing computations on platform Host. Devices:
2019-10-15 19:19:41.492085: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
2019-10-15 19:19:41.559665: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-15 19:19:41.560029: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5241a20 executing computations on platform CUDA. Devices:
2019-10-15 19:19:41.560040: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
2019-10-15 19:19:41.560084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-15 19:19:41.560088: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      
2019-10-15 19:19:41.562457: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-15 19:19:41.562855: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.665
pciBusID: 0000:01:00.0
2019-10-15 19:19:41.562913: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2019-10-15 19:19:41.562945: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2019-10-15 19:19:41.562975: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2019-10-15 19:19:41.563004: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2019-10-15 19:19:41.563032: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2019-10-15 19:19:41.563062: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2019-10-15 19:19:41.563069: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-10-15 19:19:41.563073: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2019-10-15 19:19:41.563080: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-15 19:19:41.563083: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2019-10-15 19:19:41.563086: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2019-10-15 19:19:41.563504: I tensorflow/core/common_runtime/direct_session.cc:359] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device

本来以为是因为我的环境变量没有配置清楚,但是试着测试了一下CUDA和CUDNN都在该在的位置,在终端上也能正常的运行

 

后面发现这个问题的出现是由于目前版本的Tensorflow还只能支持CUDA10.0,而英伟达的CUDA则是更新到了10.1,要解决这个问题,其实可以通过两个版本切换的方式来达到,要用哪个切换哪个

 

解决流程

 

下载CUDA10.0,我电脑上面已经配置了10.1版本了

wget https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux

添加可执行权限

sudo chmod +x cuda_10.0.130_410.48_linux

以sudo执行这个安装程序,记得这样选,只安装CUDA10.0但是不要安装驱动程序,选项可以参考我这边的截图

然后就会开始安装了,安装完成之后,sudo vim ~/.bashrc 修改环境变量,修改前如下

修改后,把这部分我们先前配置的删除,替换成如下,保存文件

export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:"$LD_LIBRARY_PATH:/usr/loacl/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
export PATH=/usr/local/cuda/bin:$PATH

使用 source ~/.bashrc 来应用环境配置,如果一切就绪,应该使用nvcc --version显示的就是CUDA 10.0版本了

arenascat@TensorSystem:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

如果Pycharm那边再运行python还是没成功,就先把cuda目录文件删除,这个文件其实是一个软链接,之后,再重建一个链接就好了

sudo rm -rf /usr/local/cuda

sudo ln -s /usr/local/cuda-10.0/ /usr/local/cuda

再次运行调用了tensorflow-gpu的代码,可以看到已经成功的读取了库

如果要设置为CUDA 10.1的话,把上面的10.0改一改就行了

 

如果这个问题不能解决的话,再试试多设置一下环境变量,新建一个文件

sudo vi /etc/profile.d/cuda.sh

把以下内容复制进去并保存

export PATH=$PATH:/usr/local/cuda/bin
export CUDADIR=/usr/local/cuda

 这一个位置也要新建一个文件,如果你装了10.1可能里面会已经有一个cuda10.1.conf不用管它

sudo vi /etc/ld.so.conf.d/cuda.conf

文件里面有这一句

/usr/local/cuda/lib64

 应用设置

sudo ldconfig

 

<think>好的,我现在需要解决用户在TensorFlow中遇到的CUDA动态链接库加载失败的问题,具体是cudart64_101.dll和nvcuda.dll缺失的错误。首先,我得回忆一下类似的问题处理经验,并结合用户提供的引用信息来整理解决方案。 用户提供的引用中有几个关键点。引用[1]提到cudart64_101.dll找不到,引用[3]和[4]则涉及cudart64_110.dll的问题,而解决方案包括将dll文件放到System32目录。但用户的错误是cudart64_101.dll和nvcuda.dll,可能涉及不同版本的CUDA。我需要确认不同TensorFlow版本对应的CUDA版本,因为不同版本的TensorFlow需要特定的CUDA和cuDNN支持。例如,TensorFlow 2.3可能要求CUDA 10.1,而较新版本可能需要CUDA 11.0。用户的问题中同时出现了101(10.1)和110(11.0)的dll,可能存在版本不匹配的情况。 首先,用户应该检查自己安装的TensorFlow版本,并对照官方文档确认所需的CUDA版本。例如,如果用户使用的是TensorFlow 2.3,那么需要CUDA 10.1,而如果是TensorFlow 2.6或更高,可能需要CUDA 11.2。如果版本不匹配,用户需要更新或降级CUDA工具包。 其次,关于nvcuda.dll的缺失,这通常与NVIDIA显卡驱动有关。nvcuda.dll是NVIDIA驱动程序的一部分,如果驱动未正确安装或过旧,可能导致该文件缺失。因此,用户应更新显卡驱动到最新版本,确保支持当前CUDA版本。 另外,引用[4]中提到将dll文件放到System32目录下,这可能是一个临时的解决办法,但更推荐正确安装CUDA工具包,因为手动放置dll可能导致其他依赖问题。不过,在某些情况下,用户可能从其他来源获取正确的dll文件,但需注意安全性。 环境变量也是一个需要检查的部分。安装CUDA后,必须确保CUDA的路径(如C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin)被添加到系统的PATH变量中,这样系统才能找到这些dll文件。 此外,用户可能没有安装cuDNN,或者cuDNN的版本与CUDA不匹配。cuDNN是NVIDIA的深度神经网络库,TensorFlow依赖它来加速计算。用户需要从NVIDIA官网下载对应版本的cuDNN,并将其文件复制到CUDA的安装目录中。 还有一个可能性是用户在没有NVIDIA GPU的机器上运行TensorFlow的GPU版本,这时会出现找不到这些dll的错误。此时,用户应安装TensorFlow的CPU版本,或者确保机器确实有兼容的GPU。 最后,测试安装是否成功,可以通过运行简单的TensorFlow代码来检查是否有GPU被识别,例如使用tf.config.list_physical_devices('GPU')。 总结步骤应该是:检查TensorFlow版本与CUDA的兼容性,安装正确的CUDA和cuDNN,更新显卡驱动,配置环境变量,以及验证安装。同时,用户需要注意32位和64位系统的差异,确保安装的库与系统架构匹配。 现在需要将这些步骤条理清晰地呈现给用户,确保他们能够逐步操作解决问题,并引用相关的参考资料来支持建议的解决方案。</think>### TensorFlow CUDA动态链接库加载失败解决方案 #### 1. 检查TensorFlowCUDA版本对应关系 首先确认安装的TensorFlow版本与CUDA工具包版本严格匹配。例如: - TensorFlow 2.4+需要CUDA 11.0 - TensorFlow 2.3需要CUDA 10.1 - TensorFlow 1.15需要CUDA 10.0 参考[TensorFlow官方文档](https://www.tensorflow.org/install/source)核对版本对应表[^1][^3]。 #### 2. 安装正确的CUDA工具包 - 访问[NVIDIA CUDA下载页面](https://developer.nvidia.com/cuda-toolkit-archive) - 根据TensorFlow版本选择对应CUDA版本(例如cudart64_101.dll对应CUDA 10.1) - 安装时勾选"Visual Studio Integration"(若已安装VS) 验证安装:检查`C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vX.Y\bin`目录是否存在目标dll文件[^4]。 #### 3. 安装匹配的cuDNN库 - 从[NVIDIA cuDNN页面](https://developer.nvidia.com/cudnn)下载对应版本 - 将cuDNN的`bin`、`include`、`lib`目录文件复制到CUDA安装目录 - 示例路径:`C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1` #### 4. 更新NVIDIA显卡驱动 - 通过GeForce Experience或[NVIDIA驱动下载页](https://www.nvidia.com/Download/index.aspx)安装最新驱动 - 确保驱动支持当前CUDA版本(查看驱动版本与CUDA兼容性表) #### 5. 配置系统环境变量 添加以下路径到系统PATH变量: ``` C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vX.Y\bin C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vX.Y\libnvvp ``` #### 6. 手动补充dll文件(可选) 若确认版本正确仍报错: - 从正常系统或可信来源获取`cudart64_101.dll`和`nvcuda.dll` - 复制到以下两个位置: - `C:\Windows\System32` - CUDA的bin目录(如`C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin`) > 注意:此方法仅作为临时解决方案,推荐重新安装官方组件[^4]。 #### 7. 验证GPU可用性 在Python中执行以下测试代码: ```python import tensorflow as tf print(tf.config.list_physical_devices('GPU')) print(tf.sysconfig.get_build_info()) ``` 正常输出应显示GPU设备信息和匹配的CUDA版本[^2]。 #### 8. 特殊问题处理 - **nvcuda.dll缺失**:通常由显卡驱动未正确安装引起,建议使用DDU工具彻底卸载驱动后重装 - **TensorFlow.contrib报错**:该模块在TF 2.x已弃用,需要修改代码适配新API
评论 29
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值