解决Docker镜像报错“ImportError: libcuda.so.1: cannot open shared object file: No such file or directory”问题

在尝试运行Python3版本的tensorflow-gpu=1.13.1镜像时,遇到'ImportError: libcuda.so.1: cannot open shared object file: No such file or directory'错误。该问题源于镜像缺少相应的GPU驱动。解决方案是在启动容器时添加'--gpus all'选项,确保容器能够访问GPU资源。
部署运行你感兴趣的模型镜像

问题描述

拉取Python3版本的tensorflow-gpu==1.13.1的镜像

docker pull tensorflow/tensorflow:1.13.1-gpu-py3

指定容器

docker run -it -v /xxx/xxx/xxx:/xxx/xxx --name tf-113 xxxxxxxxxxxx /bin/bash

进入镜像后使用pip指令查看发现已经安装成功tensorlfow-gpu=1.13.1版本

pip list

但是在引用tensorflow的时候报错

ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

问题分析

libcuda报错的意思是该镜像没有相应的显卡驱动,一开始以为是要重新安装对应版本的cudatoolkit和cudnn,后来意识到是运行镜像的时候没有指定gpu

解决方法

更换指令,添加“--gous all”,即可

docker run -it --gpus all -v /xxx/xxx/xxx:/xxx --name tf-113 xxxxxxxxxxxx /bin/bash

如图所示:

root@b8d6213d8bf7:/# python
Python 3.5.2 (default, Nov 12 2018, 13:43:14) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.__version__
'1.13.1'
>>> tf.test.is_gpu_available()
2022-10-26 03:10:24.960562: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2022-10-26 03:10:25.268171: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55769f0 executing computations on platform CUDA. Devices:
2022-10-26 03:10:25.268237: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): NVIDIA RTX XXXXX, Compute Capability 8.6
2022-10-26 03:10:25.268255: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): NVIDIA RTX XXXXX, Compute Capability 8.6
2022-10-26 03:10:25.291696: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3400000000 Hz
2022-10-26 03:10:25.298380: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x58085e0 executing computations on platform Host. Devices:
2022-10-26 03:10:25.298429: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2022-10-26 03:10:25.298664: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: NVIDIA RTX XXXXX major: 8 minor: 6 memoryClockRate(GHz): 1.695
pciBusID: 0000:73:00.0
totalMemory: 23.68GiB freeMemory: 23.06GiB
2022-10-26 03:10:25.298733: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 1 with properties: 
name: NVIDIA RTX XXXXX major: 8 minor: 6 memoryClockRate(GHz): 1.695
pciBusID: 0000:a6:00.0
totalMemory: 23.69GiB freeMemory: 771.88MiB
2022-10-26 03:10:25.299380: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0, 1
2022-10-26 03:10:25.301950: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-10-26 03:10:25.301983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 1 
2022-10-26 03:10:25.301998: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N Y 
2022-10-26 03:10:25.302009: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   Y N 
2022-10-26 03:10:25.302189: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 22432 MB memory) -> physical GPU (device: 0, name: NVIDIA RTX XXXXX, pci bus id: 0000:73:00.0, compute capability: 8.6)
2022-10-26 03:10:25.302951: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:1 with 546 MB memory) -> physical GPU (device: 1, name: NVIDIA RTX XXXXX, pci bus id: 0000:a6:00.0, compute capability: 8.6)
True

您可能感兴趣的与本文相关的镜像

Python3.9

Python3.9

Conda
Python

Python 是一种高级、解释型、通用的编程语言,以其简洁易读的语法而闻名,适用于广泛的应用,包括Web开发、数据分析、人工智能和自动化脚本

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值