问题描述
拉取Python3版本的tensorflow-gpu==1.13.1的镜像
docker pull tensorflow/tensorflow:1.13.1-gpu-py3
指定容器
docker run -it -v /xxx/xxx/xxx:/xxx/xxx --name tf-113 xxxxxxxxxxxx /bin/bash
进入镜像后使用pip指令查看发现已经安装成功tensorlfow-gpu=1.13.1版本
pip list
但是在引用tensorflow的时候报错
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
问题分析
libcuda报错的意思是该镜像没有相应的显卡驱动,一开始以为是要重新安装对应版本的cudatoolkit和cudnn,后来意识到是运行镜像的时候没有指定gpu
解决方法
更换指令,添加“--gous all”,即可
docker run -it --gpus all -v /xxx/xxx/xxx:/xxx --name tf-113 xxxxxxxxxxxx /bin/bash
如图所示:
root@b8d6213d8bf7:/# python
Python 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.__version__
'1.13.1'
>>> tf.test.is_gpu_available()
2022-10-26 03:10:24.960562: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2022-10-26 03:10:25.268171: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55769f0 executing computations on platform CUDA. Devices:
2022-10-26 03:10:25.268237: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): NVIDIA RTX XXXXX, Compute Capability 8.6
2022-10-26 03:10:25.268255: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): NVIDIA RTX XXXXX, Compute Capability 8.6
2022-10-26 03:10:25.291696: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3400000000 Hz
2022-10-26 03:10:25.298380: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x58085e0 executing computations on platform Host. Devices:
2022-10-26 03:10:25.298429: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined>
2022-10-26 03:10:25.298664: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: NVIDIA RTX XXXXX major: 8 minor: 6 memoryClockRate(GHz): 1.695
pciBusID: 0000:73:00.0
totalMemory: 23.68GiB freeMemory: 23.06GiB
2022-10-26 03:10:25.298733: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 1 with properties:
name: NVIDIA RTX XXXXX major: 8 minor: 6 memoryClockRate(GHz): 1.695
pciBusID: 0000:a6:00.0
totalMemory: 23.69GiB freeMemory: 771.88MiB
2022-10-26 03:10:25.299380: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0, 1
2022-10-26 03:10:25.301950: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-10-26 03:10:25.301983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 1
2022-10-26 03:10:25.301998: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N Y
2022-10-26 03:10:25.302009: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: Y N
2022-10-26 03:10:25.302189: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 22432 MB memory) -> physical GPU (device: 0, name: NVIDIA RTX XXXXX, pci bus id: 0000:73:00.0, compute capability: 8.6)
2022-10-26 03:10:25.302951: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:1 with 546 MB memory) -> physical GPU (device: 1, name: NVIDIA RTX XXXXX, pci bus id: 0000:a6:00.0, compute capability: 8.6)
True

在尝试运行Python3版本的tensorflow-gpu=1.13.1镜像时,遇到'ImportError: libcuda.so.1: cannot open shared object file: No such file or directory'错误。该问题源于镜像缺少相应的GPU驱动。解决方案是在启动容器时添加'--gpus all'选项,确保容器能够访问GPU资源。
1058

被折叠的 条评论
为什么被折叠?



