【大模型·八】VLLM 适配 RTX5090显卡-2025.05.14~0613

原创已于 2025-10-31 09:23:28 修改 · 2.5k 阅读

13 ·

CC 4.0 BY-SA版权

文章标签：

#人工智能 #自然语言处理 #机器学习

于 2025-05-09 14:53:04 首次发布

机器学习-工具专栏收录该内容

3 篇文章

订阅专栏

`方案一 ==> 已验证`

已验证：https://zhuanlan.zhihu.com/p/1902008703462406116 ，系统是：22.04成功
未验证：https://blog.youkuaiyun.com/liutian_ye/article/details/147069770

测试平台：AutoDL，Ubuntu-22.04
准备环境

# 查看系统版本: “22.04.1 LTS”，cuda driver 570.124.04
>> cat /etc/os-release 

cuda-12.8
nvcc --version #12.8

g++ --version # 13.3  # 我的使用系统默认，版本：11.3.0
gcc --version # 13.3  # 我的使用系统默认，版本：11.3.0；“conda list |grep gcc” 显示 libgcc-ng=11.2.0
cmake --version # 3.28  # 我的使用系统默认，版本：4.0.2；“conda list |grep cmake” 显示 4.0.2
ninja --version # 1.11  # 单独安装：sudo apt-get install -y ninja-builde-11.1；“conda list |grep ninja ” 显示1.11.1.4

安装过程

# 第一步：未勾选“nvidia-fs”安装
>> ./cuda_12.8.0_570.86.10_linux.run --override

# 第二步
>> ./Miniconda3-py311_25.3.1-1-Linux-x86_64.sh -b -p ~/miniconda3  # 指定conda安装目录
>> vim ~/.bashrc  # 追加 export PATH=/root/autodl-tmp/miniconda3/bin:$PATH
>> source ~/.bashrc

# 第三步
>> conda create --name vllm python=3.11 -y
>> conda activate vllm
>> pip install torch==2.7.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

# 第四步
>> git clone https://github.com/vllm-project/vllm.git  # vllm项目不能与自己的运行脚本在同一路劲，否则报错（类似“装成功但报错”）
>> cd vllm
>> git fetch --all
>> git checkout v0.8.5

# 第五步
>> python use_existing_torch.py  # 使用已安装的torch，即不重新安装torch
>> pip install -r requirements/build.txt
>> pip install -r requirements/common.txt
>> pip install --upgrade pip setuptools setuptools-scm  # 旧版本的 setuptools 可能导致 editable 模式兼容性问题

# 默认MAX_JOBS=128，自己系统没那么多CPU也会安装不成功
# 装成功后，vllm 项目不能删除，否则报“ModuleNotFoundError: No module named 'vllm'”
>> MAX_JOBS=8 SETUPTOOLS_SCM_PRETEND_VERSION=0.8.5 pip install -e . --no-build-isolation -v  # 加 “-v” 查看更多日志，“-VV(两个v)”更多

# 第六步：升级nccl，解决多卡无法并行的问题
# 参考(整个安装流程按该链接步骤也能成功)：https://blog.youkuaiyun.com/qq_63455100/article/details/148613949?sharetype=blogdetail&sharerId=148613949&sharerefer=PC&sharesource=qq_63455100&spm=1011.2480.3001.8118
>> pip install -U nvidia-nccl-cu12 # 先编译vllm后再升级nccl，升级完nccl不需要重新编译

# 第七步：前五步能装上，但是报“ImportError: cannot import name 'PoolingParams' from 'vllm' (unknown location)”。遂执行 “安装lsmod、dkms”，再重复“第五步”

装成功但报错：上述第五步后应该就成功了，可能我忽略了以下提示，导致又执行了第六步

/root/autodl-tmp/miniconda3/envs/vllm/lib/python3.11/site-packages/setuptools/command/editable_wheel.py:351: InformationOnly: Editable installation.
  !!

          ********************************************************************************
          Please be careful with folders in your working directory with the same
          name as your package as they may take precedence during imports.
          ********************************************************************************

  !!
    with strategy, WheelFile(wheel_path, "w") as wheel_obj:
  Building editable for vllm (pyproject.toml) ... done
  Created wheel for vllm: filename=vllm-0.8.5+cu128-0.editable-cp311-cp311-linux_x86_64.whl size=12865 sha256=2028bb7922040ce6d9b4028d586236391302223b9fb27cd701326b646d775ec2
  Stored in directory: /tmp/pip-ephem-wheel-cache-bp4c02jv/wheels/55/19/4d/0ba1030ee37cc8a147ca29552d435863f639435c056b9aa440
Successfully built vllm
Installing collected packages: vllm
  changing mode of /root/autodl-tmp/miniconda3/envs/vllm/bin/vllm to 755
Successfully installed vllm-0.8.5+cu128

解释：

含义：这是 setuptools 在 editable 模式（开发模式）安装时的常规提醒。editable 安装允许直接从源码目录导入包（无需重新安装），但如果工作目录中存在与包同名的文件夹，可能会导致导入冲突（例如项目根目录和包名均为 vllm）。
处理建议：
如果你只是正常使用 vllm（而非开发修改其源码），这个提示可以忽略。如果后续开发中遇到导入问题，检查项目结构即可。
示例：执行“git clone vllm”后，vllm项目路劲为“/root/vllm”，然后我的qwen3启动脚本路劲也在"/root"下，导致运行时提示ImportError: cannot import name 'PoolingParams' from 'vllm' (unknown location)，解决方式：将启动程序放在和vllm项目不一样的路劲

踩坑记录

日志提示1：

/root/autodl-tmp/vllm/csrc/moe/marlin_moe_wna16/kernel.h(42): warning #20281-D: in whole program compilation mode ("-rdc=false"), a __global__ function template instantiation or specialization ("marlin_moe_wna16::Marlin<    ::__nv_bfloat16, (long)1125899906843648l, (int)128, (int)4, (int)8, (int)4, (bool)0, (int)4, (bool)0, (bool)1, (int)8, (bool)0> ") will be required to have a definition in the current translation unit, when "-static-global-template-stub" will be set to "true" by default in the future. To resolve this issue, either use "-rdc=true", or explicitly set "-static-global-template-stub=false" (but see nvcc documentation about downsides of turning it off)

处理方式：忽略不管。若执行 “export CUDAFLAGS=“-rdc=true -lcudadevrt” & pip install ” 操作则报错

Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - broken
CMake Error at /root/autodl-tmp/miniconda3/envs/vllm/lib/python3.11/site-packages/cmake/data/share/cmake-4.0/Modules/CMakeTestCUDACompiler.cmake:59 (message):
The CUDA compiler

  "/usr/local/cuda/bin/nvcc"

is not able to compile a simple test program.

It fails with the following output:

  Change Dir: '/tmp/tmp2v82_f1s.build-temp/CMakeFiles/CMakeScratch/TryCompile-rE8LAN'

  Run Build Command(s): /root/autodl-tmp/miniconda3/envs/vllm/bin/ninja -v cmTC_835b4
  [1/2] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler   -rdc=true -lcudadevrt  "--generate-code=arch=compute_52,code=[compute_52,sm_52]" -MD -MT CMakeFiles/cmTC_835b4.dir/main.cu.o -MF CMakeFiles/cmTC_835b4.dir/main.cu.o.d -x cu -c /tmp/tmp2v82_f1s.build-temp/CMakeFiles/CMakeScratch/TryCompile-rE8LAN/main.cu -o CMakeFiles/cmTC_835b4.dir/main.cu.o
  nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
  [2/2] : && /usr/bin/g++  CMakeFiles/cmTC_835b4.dir/main.cu.o -o cmTC_835b4  -lcudadevrt  -lcudart_static  -lrt  -lpthread  -ldl -L"/usr/local/cuda/targets/x86_64-linux/lib/stubs" -L"/usr/local/cuda/targets/x86_64-linux/lib" && :
  FAILED: cmTC_835b4
  : && /usr/bin/g++  CMakeFiles/cmTC_835b4.dir/main.cu.o -o cmTC_835b4  -lcudadevrt  -lcudart_static  -lrt  -lpthread  -ldl -L"/usr/local/cuda/targets/x86_64-linux/lib/stubs" -L"/usr/local/cuda/targets/x86_64-linux/lib" && :
  /usr/bin/ld: CMakeFiles/cmTC_835b4.dir/main.cu.o: in function `__sti____cudaRegisterAll()':
  tmpxft_00007a42_00000000-6_main.cudafe1.cpp:(.text+0x127): undefined reference to `__cudaRegisterLinkedBinary_e73a7380_7_main_cu_main'
  collect2: error: ld returned 1 exit status
  ninja: build stopped: subcommand failed.

坑2：由于ImportError: cannot import name 'PoolingParams' from 'vllm' (unknown location)提示，我还执行过以下操作，但是环境都存在以下包，所以下方操作没卵用

apt-get update && apt-get install -y build-essential python3-dev
# 安装 Python 依赖
pip install setuptools wheel

我的conda环境

# vllm.yml，是使用“conda activate vllm & conda env create -f vllm.yml”生成； “conda pack -n vllm(conda环境名) -o env.tar.gz” 可打包整个环境
name: vllm
channels:
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - _openmp_mutex=5.1=1_gnu
  - bzip2=1.0.8=h5eee18b_6
  - ca-certificates=2025.2.25=h06a4308_0
  - ld_impl_linux-64=2.40=h12ee557_0
  - libffi=3.4.4=h6a678d5_1
  - libgcc-ng=11.2.0=h1234567_1
  - libgomp=11.2.0=h1234567_1
  - libstdcxx-ng=11.2.0=h1234567_1
  - libuuid=1.41.5=h5eee18b_0
  - ncurses=6.4=h6a678d5_0
  - openssl=3.0.16=h5eee18b_0
  - python=3.11.11=he870216_0
  - readline=8.2=h5eee18b_0
  - sqlite=3.45.3=h5eee18b_0
  - tk=8.6.14=h39e8969_0
  - tzdata=2025b=h04d1e81_0
  - wheel=0.45.1=py311h06a4308_0
  - xz=5.6.4=h5eee18b_1
  - zlib=1.2.13=h5eee18b_1
  - pip:
      - aiohappyeyeballs==2.6.1
      - aiohttp==3.11.18
      - aiosignal==1.3.2
      - airportsdata==20250224
      - annotated-types==0.7.0
      - anyio==4.9.0
      - astor==0.8.1
      - attrs==25.3.0
      - blake3==1.0.4
      - cachetools==5.5.2
      - certifi==2025.4.26
      - charset-normalizer==3.4.2
      - click==8.2.0
      - cloudpickle==3.1.1
      - cmake==4.0.2
      - compressed-tensors==0.9.3
      - cupy-cuda12x==13.4.1
      - deprecated==1.2.18
      - depyf==0.18.0
      - dill==0.4.0
      - diskcache==5.6.3
      - distro==1.9.0
      - dnspython==2.7.0
      - einops==0.8.1
      - email-validator==2.2.0
      - fastapi==0.115.12
      - fastapi-cli==0.0.7
      - fastrlock==0.8.3
      - filelock==3.18.0
      - frozenlist==1.6.0
      - fsspec==2024.6.1
      - gguf==0.16.3
      - googleapis-common-protos==1.70.0
      - grpcio==1.72.0
      - h11==0.16.0
      - hf-xet==1.1.1
      - httpcore==1.0.9
      - httptools==0.6.4
      - httpx==0.28.1
      - huggingface-hub==0.31.2
      - idna==3.10
      - importlib-metadata==8.0.0
      - interegular==0.3.3
      - jinja2==3.1.6
      - jiter==0.9.0
      - jsonschema==4.23.0
      - jsonschema-specifications==2025.4.1
      - lark==1.2.2
      - llguidance==0.7.19
      - llvmlite==0.44.0
      - lm-format-enforcer==0.10.11
      - markdown-it-py==3.0.0
      - markupsafe==2.1.5
      - mdurl==0.1.2
      - mistral-common==1.5.4
      - mpmath==1.3.0
      - msgpack==1.1.0
      - msgspec==0.19.0
      - multidict==6.4.3
      - nest-asyncio==1.6.0
      - networkx==3.3
      - ninja==1.11.1.4
      - numba==0.61.2
      - numpy==2.1.2
      - nvidia-cublas-cu12==12.8.3.14
      - nvidia-cuda-cupti-cu12==12.8.57
      - nvidia-cuda-nvrtc-cu12==12.8.61
      - nvidia-cuda-runtime-cu12==12.8.57
      - nvidia-cudnn-cu12==9.7.1.26
      - nvidia-cufft-cu12==11.3.3.41
      - nvidia-cufile-cu12==1.13.0.11
      - nvidia-curand-cu12==10.3.9.55
      - nvidia-cusolver-cu12==11.7.2.55
      - nvidia-cusparse-cu12==12.5.7.53
      - nvidia-cusparselt-cu12==0.6.3
      - nvidia-nccl-cu12==2.26.2
      - nvidia-nvjitlink-cu12==12.8.61
      - nvidia-nvtx-cu12==12.8.55
      - openai==1.78.1
      - opencv-python-headless==4.11.0.86
      - opentelemetry-api==1.26.0
      - opentelemetry-exporter-otlp==1.26.0
      - opentelemetry-exporter-otlp-proto-common==1.26.0
      - opentelemetry-exporter-otlp-proto-grpc==1.26.0
      - opentelemetry-exporter-otlp-proto-http==1.26.0
      - opentelemetry-proto==1.26.0
      - opentelemetry-sdk==1.26.0
      - opentelemetry-semantic-conventions==0.47b0
      - opentelemetry-semantic-conventions-ai==0.4.8
      - outlines==0.1.11
      - outlines-core==0.1.26
      - packaging==25.0
      - partial-json-parser==0.2.1.1.post5
      - pillow==11.0.0
      - pip==25.1.1
      - prometheus-client==0.21.1
      - prometheus-fastapi-instrumentator==7.1.0
      - propcache==0.3.1
      - protobuf==4.25.7
      - psutil==7.0.0
      - py-cpuinfo==9.0.0
      - pycountry==24.6.1
      - pydantic==2.11.4
      - pydantic-core==2.33.2
      - pygments==2.19.1
      - python-dotenv==1.1.0
      - python-json-logger==3.3.0
      - python-multipart==0.0.20
      - pyyaml==6.0.2
      - pyzmq==26.4.0
      - ray==2.46.0
      - referencing==0.36.2
      - regex==2024.11.6
      - requests==2.32.3
      - rich==14.0.0
      - rich-toolkit==0.14.6
      - rpds-py==0.24.0
      - safetensors==0.5.3
      - scipy==1.15.3
      - sentencepiece==0.2.0
      - setuptools==80.4.0
      - setuptools-scm==8.3.1
      - shellingham==1.5.4
      - sniffio==1.3.1
      - starlette==0.46.2
      - sympy==1.13.3
      - tiktoken==0.9.0
      - tokenizers==0.21.1
      - torch==2.7.0+cu128
      - torchaudio==2.7.0+cu128
      - torchvision==0.22.0+cu128
      - tqdm==4.67.1
      - transformers==4.51.3
      - triton==3.3.0
      - typer==0.15.3
      - typing-extensions==4.12.2
      - typing-inspection==0.4.0
      - urllib3==2.4.0
      - uvicorn==0.34.2
      - uvloop==0.21.0
      - vllm==0.8.5+cu128
      - watchfiles==1.0.5
      - websockets==15.0.1
      - wrapt==1.17.2
      - xgrammar==0.1.18
      - yarl==1.20.0
      - zipp==3.21.0
prefix: /root/autodl-tmp/miniconda3/envs/vllm

安装指定版本 gcc、cmake等

安装 GCC/G++ 13.3

# 安装 software-properties-common 包
sudo apt update
sudo apt install software-properties-common

# 添加PPA仓库
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt update

# 安装GCC 13
sudo apt install gcc-13 g++-13

# 设置默认版本
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-13 130 --slave /usr/bin/g++ g++ /usr/bin/g++-13 --slave /usr/bin/gcov gcov /usr/bin/gcov-13

# 验证版本
gcc --version
g++ --version

安装 CMake 3.28

# 安装依赖
sudo apt-get install -y wget libssl-dev

# 下载并编译CMake
wget https://github.com/Kitware/CMake/releases/download/v3.28.0/cmake-3.28.0.tar.gz
tar -zxvf cmake-3.28.0.tar.gz
cd cmake-3.28.0
./bootstrap --prefix=/usr/local
make -j$(nproc)
sudo make install

# 验证版本
cmake --version

安装 Ninja 1.11

# 安装依赖
sudo apt-get install -y wget

# 下载并编译Ninja
wget https://github.com/ninja-build/ninja/archive/v1.11.1.tar.gz
tar -zxvf v1.11.1.tar.gz
cd ninja-1.11.1
./configure.py --bootstrap

# 安装到系统路径
sudo cp ninja /usr/local/bin/
sudo chmod +x /usr/local/bin/ninja

# 验证版本
ninja --version

其他技巧

# 卸载所有包
>> pip freeze > req.txt
>> pip uninstall -r req.txt -y  
>> sudo apt-get install -y build-essential g++-13.3 gcc-13.3 cmake-3.28 ninja-builde

# 临时增加pip源
>> pip install -i https://mirrors.huaweicloud.com/repository/pypi/simple/
>> pip install  -i https://pypi.tuna.tsinghua.edu.cn/simple

# 国内git
https://gitee.com/

# miniconda 下载
https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/?C=M&O=D