invalid argument "type=bind,source=/tmp/tfserving/serving/tensorflow_serving

本文解决了一个常见的TensorFlow Serving Docker部署问题,即找不到libcuda.so库导致的错误。通过调整Docker运行指令,移除斜杠并正确指定挂载路径,成功启动了TensorFlow Serving GPU容器。
部署运行你感兴趣的模型镜像

今天在测试Tensorflow Serving Docker 的代码,前面一直是报错,说找不到libcuda.so

 libcuda reported version is: Not found: was unable to find libcuda.so
  DSO loaded into this program

查询了很多解决方案,发现需要dockerfile 立执行 参考链接

run rm /usr/local/cuda/lib64/stubs/libcuda.so.1 fixed my problem

但是这个涉及到了DockerFile,然后我想看看能不能从根本解决这个问题,我们原来使用 TF1.9 gpu,现在使用latest的docker,然后用tensorflow serving Docker的官方代码测试Tensorflow

执行代码

docker run --runtime=nvidia -p 8501:8501 \
  --mount type=bind,\ source=/tmp/tfserving/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_gpu,\
  target=/models/half_plus_two \
  -e MODEL_NAME=half_plus_two -t tensorflow/serving:latest-gpu &

但是还是会报错

invalid argument "type=bind,source=/tmp/tfserving/serving/tensorflow_serving/
servables/tensorflow/testdata/saved_model_half_plus_two_cpu," 
for "--mount" flag: invalid field '' must be a key=value pair

参考这位同学的答案,原来是指令会报错的原因是里面有空格,最后我把斜杠全部去掉就可以了

docker run --runtime=nvidia -p 8501:8501 --mount type=bind,source=/tmp/tfserving/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_gpu,target=/models/half_plus_two -e MODEL_NAME=half_plus_two -t tensorflow/serving:latest-gpu

这样就正常启动了

您可能感兴趣的与本文相关的镜像

PyTorch 2.5

PyTorch 2.5

PyTorch
Cuda

PyTorch 是一个开源的 Python 机器学习库,基于 Torch 库,底层由 C++ 实现,应用于人工智能领域,如计算机视觉和自然语言处理

autodl和vscode跑开源项目 +-----------------------------------------------AutoDL-----------------------------------------------------+ 目录说明: ╔═════════════════╦════════╦════╦═════════════════════════════════════════════════════════════════════════╗ ║目录 ║名称 ║速度║说明 ║ ╠═════════════════╬════════╬════╬═════════════════════════════════════════════════════════════════════════╣ ║/ ║系 统 盘║一般║实例关机数据不会丢失,可存放代码等。会随保存镜像一起保存。 ║ ║/root/autodl-tmp ║数 据 盘║ 快 ║实例关机数据不会丢失,可存放读写IO要求高的数据。但不会随保存镜像一起保存 ║ ╚═════════════════╩════════╩════╩═════════════════════════════════════════════════════════════════════════╝ CPU :16 核心 内存:120 GB GPU :NVIDIA GeForce RTX 4090, 1 存储: 系 统 盘/ :90% 27G/30G 数 据 盘/root/autodl-tmp:65% 33G/50G +----------------------------------------------------------------------------------------------------------+ *注意: 1.系统盘较小请将大的数据存放于数据盘或文件存储中,重置系统时数据盘和文件存储中的数据不受影响 2.清理系统盘请参考:https://www.autodl.com/docs/qa1/ 3.终端中长期执行命令请使用screen等工具开后台运行,确保程序不受SSH连接中断影响:https://www.autodl.com/docs/daemon/ (base) root@autodl-container-5a0b42b505-fc8eec58:~# conda activate chazhen (chazhen) root@autodl-container-5a0b42b505-fc8eec58:~# /root/miniconda3/envs/chazhen/bin/python /root/w.py --- 使用 Python 版本: 3.10.6 --- --- 1. 核心组件检查 (PyTorch & GPU) --- ✅ PyTorch 导入: [成功] 版本: 1.13.1+cu117 ✅ CUDA 可用性: [成功] - PyTorch 编译所用 CUDA 版本: 11.7 - 检测到的 GPU: NVIDIA GeForce RTX 4090 ✅ cuDNN 可用性: [成功] - cuDNN 版本: 8906 --- 2. 其他依赖库检查 --- ✅ causal-conv1d: [成功] 版本: 1.0.0 ✅ mamba-ssm: [成功] 版本: 1.0.1 ✅ numpy: [成功] 版本: 1.26.4 ✅ scikit-image: [成功] 版本: 0.19.2 ✅ opencv-python: [成功] 版本: 4.12.0 ✅ timm: [成功] 版本: 1.0.17 ✅ tqdm: [成功] 版本: 4.67.1 ✅ tensorboard: [成功] 版本: 2.20.0 --- 检查完毕 --- (chazhen) root@autodl-container-5a0b42b505-fc8eec58:~# cd VFIMamba普通训练 (chazhen) root@autodl-container-5a0b42b505-fc8eec58:~/VFIMamba普通训练# tensorboard --logdir=./log TensorFlow installation not found - running with reduced feature set. NOTE: Using experimental fast data loading logic. To disable, pass "--load_fast=false" and report issues on GitHub. More details: https://github.com/tensorflow/tensorboard/issues/4784 Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all TensorBoard 2.20.0 at http://localhost:6008/ (Press CTRL+C to quit) ^C(chazhen) root@autodl-container-5a0b42b505-fc8eec58:~/VFIMamba普通训练# /root/miniconda3/envs/chazhen/bin/python /root/VFIMamba普通训练/train.py /root/miniconda3/envs/chazhen/lib/python3.10/site-packages/timm/models/layers/__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning) --- 脚本开始运行 --- Traceback (most recent call last): File "/root/VFIMamba普通训练/train.py", line 204, in <module> local_rank = int(os.environ["LOCAL_RANK"]) File "/root/miniconda3/envs/chazhen/lib/python3.10/os.py", line 679, in __getitem__ raise KeyError(key) from None KeyError: 'LOCAL_RANK' (chazhen) root@autodl-container-5a0b42b505-fc8eec58:~/VFIMamba普通训练# pip install six Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Requirement already satisfied: six in /root/miniconda3/envs/chazhen/lib/python3.10/site-packages (1.17.0) WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. (chazhen) root@autodl-container-5a0b42b505-fc8eec58:~/VFIMamba普通训练# cd VFIMamba普通训练 bash: cd: VFIMamba普通训练: No such file or directory (chazhen) root@autodl-container-5a0b42b505-fc8eec58:~/VFIMamba普通训练# torchrun --nproc_per_node=1 train.py --data_path ../autodl-tmp/vimeo_triplet --batch_size 32world size /root/miniconda3/envs/chazhen/lib/python3.10/site-packages/timm/models/layers/__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning) --- 脚本开始运行 --- usage: train.py [-h] [--batch_size BATCH_SIZE] [--data_path DATA_PATH] train.py: error: argument --batch_size: invalid int value: '32world' ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 2594) of binary: /root/miniconda3/envs/chazhen/bin/python3.1 Traceback (most recent call last): File "/root/miniconda3/envs/chazhen/bin/torchrun", line 8, in <module> sys.exit(main()) File "/root/miniconda3/envs/chazhen/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper return f(*args, **kwargs) File "/root/miniconda3/envs/chazhen/lib/python3.10/site-packages/torch/distributed/run.py", line 762, in main run(args) File "/root/miniconda3/envs/chazhen/lib/python3.10/site-packages/torch/distributed/run.py", line 753, in run elastic_launch( File "/root/miniconda3/envs/chazhen/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/root/miniconda3/envs/chazhen/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ train.py FAILED ------------------------------------------------------------ Failures: <NO_OTHER_FAILURES> ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2025-07-29_09:24:53 host : autodl-container-5a0b42b505-fc8eec58 rank : 0 (local_rank: 0) exitcode : 2 (pid: 2594) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================ (chazhen) root@autodl-container-5a0b42b505-fc8eec58:~/VFIMamba普通训练# /root/miniconda3/envs/chazhen/bin/python /root/VFIMamba普通训练/train.py /root/miniconda3/envs/chazhen/lib/python3.10/site-packages/timm/models/layers/__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning) --- 脚本开始运行 --- Traceback (most recent call last): File "/root/VFIMamba普通训练/train.py", line 204, in <module> local_rank = int(os.environ["LOCAL_RANK"]) File "/root/miniconda3/envs/chazhen/lib/python3.10/os.py", line 679, in __getitem__ raise KeyError(key) from None KeyError: 'LOCAL_RANK' (chazhen) root@autodl-container-5a0b42b505-fc8eec58:~/VFIMamba普通训练# /root/miniconda3/envs/chazhen/bin/python /root/w.py --- 使用 Python 版本: 3.10.6 --- --- 1. 核心组件检查 (PyTorch & GPU) --- ✅ PyTorch 导入: [成功] 版本: 1.13.1+cu117 ✅ CUDA 可用性: [成功] - PyTorch 编译所用 CUDA 版本: 11.7 - 检测到的 GPU: NVIDIA GeForce RTX 4090 ✅ cuDNN 可用性: [成功] - cuDNN 版本: 8906 --- 2. 其他依赖库检查 --- ✅ causal-conv1d: [成功] 版本: 1.0.0 ✅ mamba-ssm: [成功] 版本: 1.0.1 ✅ numpy: [成功] 版本: 1.26.4 ✅ scikit-image: [成功] 版本: 0.19.2 ✅ opencv-python: [成功] 版本: 4.12.0 ✅ timm: [成功] 版本: 1.0.17 ✅ tqdm: [成功] 版本: 4.67.1 ✅ tensorboard: [成功] 版本: 2.20.0 --- 检查完毕 --- (chazhen) root@autodl-container-5a0b42b505-fc8eec58:~/VFIMamba普通训练# 然后呢
07-30
typedef enum _HD_RESULT { HD_OK = 0, ///< general error HD_ERR_NOT_SUPPORT = -1, ///< not support ///< execute error HD_ERR_NG = -10, ///< general failure HD_ERR_SYS = -11, ///< operating system call failure HD_ERR_DRV = -12, ///< driver call failure HD_ERR_NOT_ALLOW = -13, ///< not allow in this case HD_ERR_ABORT = -14, ///< ignored or skipped HD_ERR_TIMEDOUT = -15, ///< timeout occured HD_ERR_FAIL = -19, ///< already executed and failed ///< module state error HD_ERR_INIT = -20, ///< module is already initialised HD_ERR_UNINIT = -21, ///< module is not initialised yet HD_ERR_STATE = -22, ///< invalid state to invoke function HD_ERR_TERM = -23, ///< module has terminated HD_ERR_NOT_OPEN = -24, ///< path is not open yet HD_ERR_ALREADY_OPEN = -25, ///< path is already open HD_ERR_NOT_START = -26, ///< path is not start yet HD_ERR_ALREADY_START = -27, ///< path is already start HD_ERR_NOT_BIND = -28, ///< io is not bind yet HD_ERR_ALREADY_BIND = -29, ///< io is already bind ///< parameter error HD_ERR_UNIQUE = -30, ///< unique id needed HD_ERR_DEV = -31, ///< invalid device id HD_ERR_IO = -32, ///< invalid io id HD_ERR_PATH = -33, ///< invalid path id HD_ERR_PARAM = -34, ///< invalid param id HD_ERR_INV = -35, ///< invalid argument passed HD_ERR_LIMIT = -36, ///< parameter value limit was reached HD_ERR_NULL_PTR = -37, ///< null pointer HD_ERR_NO_CONFIG = -38, ///< module device config or path config is not set HD_ERR_NO_DRV_CONFIG = -39, ///< module driver config is not set ///< object or resource error HD_ERR_RESOURCE = -40, ///< insufficient resources for request HD_ERR_NOT_AVAIL = -41, ///< resources not available HD_ERR_NOT_FOUND = -42, ///< not found HD_ERR_EOL = -43, ///< a list traversal has ended HD_ERR_IN_USE = -44, ///< object is in use. HD_ERR_NOT_USE = -45, ///< object is not in use. HD_ERR_LOCKED = -46, ///< object locked HD_ERR_DELETED = -47, ///< object has been deleted ///< data or buffer error HD_ERR_HEAP = -50, ///< heap full (application) HD_ERR_NOMEM = -51, ///< no memory (not enough system mem for create pool) HD_ERR_NOBUF = -52, ///< no buffer (not enough pool size for new blk) HD_ERR_SIGN = -53, ///< invalid sign in data HD_ERR_BAD_DATA = -54, ///< bad content in data HD_ERR_OVERRUN = -55, ///< buffer overrun occured, or data queue full HD_ERR_UNDERRUN = -56, ///< buffer underrun occured, or data queue empty ///< other error HD_ERR_INLINE_SUCCESS = -90, HD_ERR_INTERMEDIATE = -91, ///< intermediate return - successful but not complete HD_ERR_DEFERRED_SUCCESS = -92, HD_ERR_DEFERRED = -93, ///< action has been deferred HD_ERR_INV_PTR = -94, ///< invalid pointer HD_ERR_USER = -99, ///< start of subsystem specific errors } HD_RESULT; ///< HDAL error code
最新发布
09-29
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值