Ubuntu22.04 下使用 Conda 安装 tensorflow-gpu 版本

环境

同上篇。
操作系统:Ubuntu 22.04
显卡:A100

Conda 安装 tensorflow-gpu

按照 tensorflow 官网说法,需要用户源码级编译 tensorflow-gpu,这个太累了,还是偷懒使用 conda 包管理安装。

创建虚拟环境

conda create -n tensorflow python=3.8

激活虚拟环境

conda activate tensorflow

使用 conda 安装 tensorflow-gpu

不要使用 pip 安装

特别注意,不要使用 pip install --upgrade tensorflow-gpu==2.4。
至少我这里安装的版本是错误的,显示没有 libcudart.so.11.0

>>> import tensorflow as tf
2023-05-22 21:52:56.234235: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-05-22 21:52:56.234290: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

使用 conda 安装

conda install tensorflow-gpu

安装完成,验证,有一个错误,如下。

python
Python 3.8.16 (default, Mar  2 2023, 03:21:46)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2023-05-22 22:04:51.350939: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
/home/yzhou/base/anaconda3/envs/tensorflow/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:513: FutureWarning: In the future `np.object` will be defined as the corresponding NumPy scalar.
  np.object,
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/yzhou/base/anaconda3/envs/tensorflow/lib/python3.8/site-packages/tensorflow/__init__.py", line 41, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "/home/yzhou/base/anaconda3/envs/tensorflow/lib/python3.8/site-packages/tensorflow/python/__init__.py", line 46, in <module>
    from tensorflow.python import data
  File "/home/yzhou/base/anaconda3/envs/tensorflow/lib/python3.8/site-packages/tensorflow/python/data/__init__.py", line 25, in <module>
    from tensorflow.python.data import experimental
  File "/home/yzhou/base/anaconda3/envs/tensorflow/lib/python3.8/site-packages/tensorflow/python/data/experimental/__init__.py", line 96, in <module>
    from tensorflow.python.data.experimental import service
  File "/home/yzhou/base/anaconda3/envs/tensorflow/lib/python3.8/site-packages/tensorflow/python/data/experimental/service/__init__.py", line 140, in <module>
    from tensorflow.python.data.experimental.ops.data_service_ops import distribute
  File "/home/yzhou/base/anaconda3/envs/tensorflow/lib/python3.8/site-packages/tensorflow/python/data/experimental/ops/data_service_ops.py", line 25, in <module>
    from tensorflow.python.data.experimental.ops import compression_ops
  File "/home/yzhou/base/anaconda3/envs/tensorflow/lib/python3.8/site-packages/tensorflow/python/data/experimental/ops/compression_ops.py", line 20, in <module>
    from tensorflow.python.data.util import structure
  File "/home/yzhou/base/anaconda3/envs/tensorflow/lib/python3.8/site-packages/tensorflow/python/data/util/structure.py", line 26, in <module>
    from tensorflow.python.data.util import nest
  File "/home/yzhou/base/anaconda3/envs/tensorflow/lib/python3.8/site-packages/tensorflow/python/data/util/nest.py", line 41, in <module>
    from tensorflow.python.framework import sparse_tensor as _sparse_tensor
  File "/home/yzhou/base/anaconda3/envs/tensorflow/lib/python3.8/site-packages/tensorflow/python/framework/sparse_tensor.py", line 29, in <module>
    from tensorflow.python.framework import constant_op
  File "/home/yzhou/base/anaconda3/envs/tensorflow/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 29, in <module>
    from tensorflow.python.eager import execute
  File "/home/yzhou/base/anaconda3/envs/tensorflow/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 27, in <module>
    from tensorflow.python.framework import dtypes
  File "/home/yzhou/base/anaconda3/envs/tensorflow/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py", line 513, in <module>
    np.object,
  File "/home/yzhou/base/anaconda3/envs/tensorflow/lib/python3.8/site-packages/numpy/__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'object'.
`np.object` was a deprecated alias for the builtin `object`. To avoid this error in existing code, use `object` by itself. Doing this will not modify any behavior and is safe.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

意思是 numpy 的版本太高了。查询安装记录,可知道 numpy 版本为 1.24.3。

  numpy              pkgs/main/linux-64::numpy-1.24.3-py38hf6e8229_1
  numpy-base         pkgs/main/linux-64::numpy-base-1.24.3-py38h060ed82_1

需要将 numpy 降级为 1.23.4 即可。
特别注意,不要使用 pip 来降级,需要使用 conda 完成。
如果使用 pip,将出现如下错误。

$ pip install --upgrade numpy==1.23.4
Collecting numpy==1.23.4
  Downloading numpy-1.23.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.1/17.1 MB 3.1 MB/s eta 0:00:00
Installing collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 1.24.3
    Uninstalling numpy-1.24.3:
      Successfully uninstalled numpy-1.24.3
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.4.1 requires typing-extensions~=3.7.4, which is not installed.
tensorflow 2.4.1 requires absl-py~=0.10, but you have absl-py 1.3.0 which is incompatible.
tensorflow 2.4.1 requires flatbuffers~=1.12.0, but you have flatbuffers 2.0 which is incompatible.
tensorflow 2.4.1 requires gast==0.3.3, but you have gast 0.4.0 which is incompatible.
tensorflow 2.4.1 requires grpcio~=1.32.0, but you have grpcio 1.48.2 which is incompatible.
tensorflow 2.4.1 requires numpy~=1.19.2, but you have numpy 1.23.4 which is incompatible.
tensorflow 2.4.1 requires six~=1.15.0, but you have six 1.16.0 which is incompatible.
tensorflow 2.4.1 requires tensorflow-estimator<2.5.0,>=2.4.0, but you have tensorflow-estimator 2.6.0 which is incompatible.
tensorflow 2.4.1 requires termcolor~=1.1.0, but you have termcolor 2.1.0 which is incompatible.
tensorflow 2.4.1 requires wrapt~=1.12.1, but you have wrapt 1.14.1 which is incompatible.
Successfully installed numpy-1.23.4

再次强调,使用 conda 包管理完成。

$ conda install numpy==1.23.4

验证安装

$ python
Python 3.8.16 (default, Mar  2 2023, 03:21:46)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.config.list_physical_devices('GPU')
2023-05-22 22:24:42.577927: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2023-05-22 22:24:42.579631: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:21:00.0 name: NVIDIA A100 80GB PCIe computeCapability: 8.0
coreClock: 1.41GHz coreCount: 108 deviceMemorySize: 79.19GiB deviceMemoryBandwidth: 1.76TiB/s
2023-05-22 22:24:42.579693: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2023-05-22 22:24:42.579757: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2023-05-22 22:24:42.579772: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2023-05-22 22:24:42.579784: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2023-05-22 22:24:42.579797: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2023-05-22 22:24:42.579810: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2023-05-22 22:24:42.579824: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2023-05-22 22:24:42.579845: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7
2023-05-22 22:24:42.581861: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

这样,搬砖工具全部就位。

<think>好的,我现在需要帮助用户在Ubuntu 22.04系统上部署Qwen2.5-VL-7B-Instruct-GGUF模型。首先,我得回忆一下用户提供的引用内容。引用[2]提到用户之前使用过联想Y7000配置的机器,安装了4060显卡、16G内存和Ubuntu 22.04,微调过Qwen2.5-VL-2B模型。现在用户想部署7B的GGUF版本,可能需要类似的环境配置。 首先,安装GGUF模型通常需要llama.cpp或其衍生工具,因为GGUF是GGML的后续格式,专为CPU推理优化,但也支持GPU加速。用户有4060显卡,应该可以利用CUDA加速。因此,我需要确保系统中安装了NVIDIA驱动、CUDA工具包以及相关的依赖库。 接下来,用户可能需要下载Qwen2.5-VL-7B-Instruct的GGUF模型文件。根据经验,模型可能在Hugging Face或ModelScope上提供。需要指导用户如何正确下载模型文件,并验证文件的完整性,比如通过哈希校验。 然后,编译llama.cpp或兼容的推理引擎。这可能涉及从源码克隆仓库,安装必要的构建工具如CMake、g++,以及启用CUDA支持。在编译过程中,需要确保CUDA架构参数正确,比如针对RTX 4060的Compute Capability是8.9,但可能llama.cpp的最新版本已经支持,或者需要手动指定。 安装完成后,用户需要配置模型路径和参数,编写一个简单的Python脚本或使用提供的示例代码来加载模型并进行推理。可能还需要安装Python绑定,如llama-cpp-python库,并确保其与系统已安装的CUDA版本兼容。 另外,用户可能关心内存和显存的需求。7B的模型在GGUF格式下,量化等级不同会影响资源使用。例如,Q4_K_M的量化可能需要约5-6GB内存,而用户有16G内存和8GB显存的4060,应该足够运行,但需要根据量化等级调整参数。 最后,测试模型是否正常运行,处理可能的错误,比如缺少动态库、权限问题,或者模型加载失败。建议用户查看日志,安装必要的依赖,如OpenBLAS或其他数学库,以优化性能。 可能还需要注意Python环境的管理,使用虚拟环境如venv或conda来隔离依赖,避免与系统包冲突。同时,引用[1]中提到Qwen2.5-VL-72B的表现优异,但用户使用的是7B版本,可能需要调整预期性能,尤其是在处理视觉任务时的输入预处理步骤,不过GGUF模型可能仅支持文本,需确认模型是否是多模态的。如果Qwen2.5-VL-7B是视觉语言模型,可能需要额外的步骤处理图像输入,但GGUF格式可能仅支持部分功能,需查阅文档确认。 总结步骤大致为:安装依赖→下载模型→编译推理引擎→配置环境→运行测试。过程中需详细说明每个步骤的命令和可能遇到的问题,确保用户能逐步跟随操作。</think>### 在Ubuntu 22.04上部署Qwen2.5-VL-7B-Instruct-GGUF模型的步骤 #### 一、环境准备 1. **安装NVIDIA驱动和CUDA工具包** ```bash sudo apt update sudo apt install nvidia-driver-535 cuda-toolkit-12-2 ``` 安装完成后验证: ```bash nvidia-smi # 查看GPU状态 nvcc --version # 查看CUDA版本 ``` 2. **安装基础依赖** ```bash sudo apt install build-essential cmake python3-pip python3-venv ``` #### 二、获取模型文件 1. **下载GGUF模型文件** 从Hugging Face或ModelScope获取模型文件(例如`qwen2.5-vl-7b-instruct.Q4_K_M.gguf`): ```bash wget https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct-GGUF/resolve/main/qwen2.5-vl-7b-instruct.Q4_K_M.gguf ``` #### 三、编译llama.cpp 1. **克隆仓库并编译** ```bash git clone https://github.com/ggerganov/llama.cpp cd llama.cpp make clean && LLAMA_CUDA=1 make -j8 ``` 编译参数说明: - `LLAMA_CUDA=1` 启用CUDA加速 - 若遇到架构错误,可通过`CMAKE_CUDA_ARCHITECTURES=89`指定RTX 4060的算力 #### 四、运行推理 1. **使用命令行测试** ```bash ./main -m /path/to/qwen2.5-vl-7b-instruct.Q4_K_M.gguf -n 512 --color -p "描述这张图片的内容:" ``` 参数说明: - `-n` 控制生成文本长度 - `--temp` 调整生成随机性 2. **Python API调用** 安装`llama-cpp-python`: ```bash pip install llama-cpp-python[cuBLAS] --force-reinstall --upgrade ``` 编写测试脚本: ```python from llama_cpp import Llama llm = Llama( model_path="qwen2.5-vl-7b-instruct.Q4_K_M.gguf", n_gpu_layers=35, # 根据显存调整层数 n_ctx=2048 ) response = llm.create_chat_completion( messages=[{"role": "user", "content": "描述这张图片的内容:"}] ) print(response["choices"][0]["message"]["content"]) ``` #### 五、性能优化建议 1. **量化等级选择** | 量化类型 | 文件大小 | 显存占用 | 质量保留 | |---------|----------|----------|----------| | Q2_K | ~3GB | 4GB | 低 | | Q4_K_M | ~5GB | 6GB | 中 | | Q6_K | ~7GB | 8GB | 高 | 2. **Batch参数调整** ```python llm = Llama( ... n_batch=512, # 增大batch提升吞吐量 n_threads=8 # 匹配CPU核心数 ) ``` #### 六、常见问题解决 1. **CUDA内存不足** ```bash export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128 ``` 2. **缺少cuBLAS库** ```bash sudo apt install libopenblas-dev ```
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

努力的老周

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值