Could not load dynamic library ‘libcupti.so.10.0‘; dlerror: libcupti.so.10.0...

在Ubuntu 16.04上使用CUDA 10.0和CUDNN 7.6.5时遇到错误“Could not load dynamic library ‘libcupti.so.10.0'”。通过检查LD_LIBRARY_PATH,发现缺少libcupti.so.10.0。在/usr/local/cuda-10.0/extras/CUPTI/lib64/找到所需文件,由于无root权限,为当前用户添加环境变量,并验证环境变量设置成功。每次新建服务器会话时,需运行特定命令激活环境变量以正常进行深度学习模型训练。

环境

  • Ubuntu: 16.04
  • CUDA: 10.0
  • CUDNN: 7.6.5
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 5
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#include "driver_types.h"

执行情况:

resnet50 (Model)             (None, 2048)              23587712  
_________________________________________________________________
dense (Dense)                (None, 1024)              2098176   
_________________________________________________________________
dropout (Dropout)            (None, 1024)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 2)                 2050      
=================================================================
Total params: 25,687,938
Trainable params: 25,634,818
Non-trainable params: 53,120
_________________________________________________________________
Train for 2 steps, validate for 1 steps
Epoch 1/50
2020-10-12 16:46:46.980388: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-10-12 16:46:47.348629: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-10-12 16:46:48.700629: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-10-12 16:46:48.720395: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-10-12 16:46:48.720496: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[{
  
  {node model/resnet50/conv1_conv/Conv2D}}]]
2020-10-12 16:46:48.733878: I tensorflow/core/profiler/lib/profiler_session.cc:184] Profiler session started.
2020-10-12 16:46:48.734055: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcupti.so.10.0'; dlerror: libcupti.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/ubuntu/downloads/libpng/lib::/usr/local/cuda/lib64:/usr/local/cuda/lib64
2020-10-12 16:46:48.734080: W tensorflow/core/profiler/lib/profiler_session.cc:192] Encountered error while starting profiler: Unavailable: CUPTI error: CUPTI could not be loaded or symbol could not be found.
1/2 [==============>...............] - ETA: 9s2020-10-12 16:46:48.735185: I tensorflow/core/platform/default/device_tracer.cc:588] Collecting 0 kernel records, 0 memcpy records.
2020-10-12 16:46:48.735327: E tensorflow/core/platform/default/device_tracer.cc:70] CUPTI error: CUPTI could not be loaded or symbol could not be found.
Traceback (most recent call last):
  File "train.py", line 168, in <module>
    trainer.train()
  File "train.py", line 131, in train
    callbacks=callbacks)
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 728, in fit
    use_multiprocessing=use_multiprocessing)
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 324, in fit
    total_epochs=epochs)
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 123, in run_one_epoch
    batch_outs = execution_function(iterator)
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 86, in execution_function
    distributed_function(input_fn))
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 457, in __call__
    result = self._call(*args, **kwds)
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 520, in _call
    return self._stateless_fn(*args, **kwds)
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1823, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1141, in _filtered_call
    self.captured_inputs)
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1224, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager)
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 511, i
<think>嗯,用户遇到了TensorFlow无法加载cudart64_110.dll的问题,我需要帮他们解决。首先,这个问题通常是因为CUDA版本不匹配,或者没有正确安装CUDA工具包。根据提供的引用信息,比如引用[1]和[4],用户可能安装了TensorFlow的GPU版本,但缺少对应的CUDA组件。 我应该先确认CUDA版本是否与TensorFlow版本兼容。比如,TensorFlow 2.x可能需要特定的CUDA版本。用户提到的cudart64_110.dll对应CUDA 11.0,所以需要检查他们安装的TensorFlow版本是否需要CUDA 11.0。如果是,用户需要下载并安装CUDA Toolkit 11.0。另外,还要确保环境变量正确配置,让系统能找到CUDA的路径。 另外,引用[3]提到即使缺少这个dll,TensorFlow还是可以运行,只是用不了GPU。所以用户可能需要考虑是否真的需要GPU加速。如果他们不需要,可以安装CPU版本的TensorFlow,或者忽略这个警告。 还有可能用户安装了错误的cuDNN版本,或者没有安装cuDNN。这时候需要检查cuDNN的版本是否与CUDA匹配,并正确放置到CUDA的目录中。 另外,用户可能没有安装NVIDIA的显卡驱动,或者驱动版本太旧,不支持当前的CUDA版本。更新显卡驱动也是一个解决步骤。 需要分步骤指导用户检查TensorFlow版本、CUDA安装、环境变量、cuDNN安装以及显卡驱动。可能还需要提供下载链接和安装后的验证方法,比如运行nvidia-smi或者简单的TensorFlow GPU检测代码。 最后,提醒用户如果所有步骤都正确还是有问题,可能需要重新安装或检查系统路径是否正确。另外,如果用户不需要GPU,可以忽略警告或者安装CPU版本。</think>### TensorFlow无法加载cudart64_110.dll的解决方案 #### 1. 检查TensorFlowCUDA版本兼容性 问题通常由**TensorFlow版本与CUDA工具包版本不匹配**导致。例如: - TensorFlow 2.4+需要CUDA 11.0[^1][^4] - TensorFlow 2.1~2.3需要CUDA 10.1[^3] - TensorFlow 1.x需要CUDA 10.0[^2] 请通过以下命令确认TensorFlow版本: ```python import tensorflow as tf print(tf.__version__) ``` #### 2. 安装匹配的CUDA工具包 访问[NVIDIA CUDA下载页面](https://developer.nvidia.com/cuda-toolkit-archive): - 若使用TensorFlow 2.4+,下载**CUDA 11.0** - 安装时勾选`Visual Studio Integration`(需提前安装VS2019+) #### 3. 配置系统环境变量 安装完成后添加路径到系统变量: ```bash C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\libnvvp ``` #### 4. 安装对应cuDNN版本 从[NVIDIA cuDNN页面](https://developer.nvidia.com/cudnn)下载: - CUDA 11.0对应cuDNN 8.0.4 - 将解压后的`bin`、`include`、`lib`文件夹复制到CUDA安装目录 #### 5. 验证GPU支持 运行以下代码检测GPU是否可用: ```python import tensorflow as tf print(tf.config.list_physical_devices('GPU')) ``` #### 6. 备用解决方案 若无需GPU加速: - 安装CPU版TensorFlow:`pip install tensorflow-cpu` - 或忽略警告(不影响基础功能使用)[^3] ---
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值