【大模型·八】VLLM 适配 RTX5090显卡-2025.05.14~0613

方案一 ==> 已验证

  • 已验证:https://zhuanlan.zhihu.com/p/1902008703462406116 ,系统是:22.04成功
  • 未验证:https://blog.youkuaiyun.com/liutian_ye/article/details/147069770
  1. 测试平台:AutoDL,Ubuntu-22.04
  2. 准备环境
# 查看系统版本: “22.04.1 LTS”,cuda driver 570.124.04
>> cat /etc/os-release 

cuda-12.8
nvcc --version #12.8

g++ --version # 13.3  # 我的使用系统默认,版本:11.3.0
gcc --version # 13.3  # 我的使用系统默认,版本:11.3.0;“conda list |grep gcc” 显示 libgcc-ng=11.2.0
cmake --version # 3.28  # 我的使用系统默认,版本:4.0.2;“conda list |grep cmake” 显示 4.0.2
ninja --version # 1.11  # 单独安装:sudo apt-get install -y ninja-builde-11.1;“conda list |grep ninja ” 显示1.11.1.4
  1. 安装过程
# 第一步:未勾选“nvidia-fs”安装
>> ./cuda_12.8.0_570.86.10_linux.run --override

# 第二步
>> ./Miniconda3-py311_25.3.1-1-Linux-x86_64.sh -b -p ~/miniconda3  # 指定conda安装目录
>> vim ~/.bashrc  # 追加 export PATH=/root/autodl-tmp/miniconda3/bin:$PATH
>> source ~/.bashrc

# 第三步
>> conda create --name vllm python=3.11 -y
>> conda activate vllm
>> pip install torch==2.7.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

# 第四步
>> git clone https://github.com/vllm-project/vllm.git  # vllm项目不能与自己的运行脚本在同一路劲,否则报错(类似“装成功但报错”)
>> cd vllm
>> git fetch --all
>> git checkout v0.8.5

# 第五步
>> python use_existing_torch.py  # 使用已安装的torch,即不重新安装torch
>> pip install -r requirements/build.txt
>> pip install -r requirements/common.txt
>> pip install --upgrade pip setuptools setuptools-scm  # 旧版本的 setuptools 可能导致 editable 模式兼容性问题

# 默认MAX_JOBS=128,自己系统没那么多CPU也会安装不成功
# 装成功后,vllm 项目不能删除,否则报“ModuleNotFoundError: No module named 'vllm'”
>> MAX_JOBS=8 SETUPTOOLS_SCM_PRETEND_VERSION=0.8.5 pip install -e . --no-build-isolation -v  # 加 “-v” 查看更多日志,“-VV(两个v)”更多

# 第六步:升级nccl,解决多卡无法并行的问题
# 参考(整个安装流程按该链接步骤也能成功):https://blog.youkuaiyun.com/qq_63455100/article/details/148613949?sharetype=blogdetail&sharerId=148613949&sharerefer=PC&sharesource=qq_63455100&spm=1011.2480.3001.8118
>> pip install -U nvidia-nccl-cu12 # 先编译vllm后再升级nccl,升级完nccl不需要重新编译

# 第七步:前五步能装上,但是报“ImportError: cannot import name 'PoolingParams' from 'vllm' (unknown location)”。遂执行 “安装lsmod、dkms”,再重复“第五步”
  1. 装成功但报错:上述第五步后应该就成功了,可能我忽略了以下提示,导致又执行了第六步
/root/autodl-tmp/miniconda3/envs/vllm/lib/python3.11/site-packages/setuptools/command/editable_wheel.py:351: InformationOnly: Editable installation.
  !!

          ********************************************************************************
          Please be careful with folders in your working directory with the same
          name as your package as they may take precedence during imports.
          ********************************************************************************

  !!
    with strategy, WheelFile(wheel_path, "w") as wheel_obj:
  Building editable for vllm (pyproject.toml) ... done
  Created wheel for vllm: filename=vllm-0.8.5+cu128-0.editable-cp311-cp311-linux_x86_64.whl size=12865 sha256=2028bb7922040ce6d9b4028d586236391302223b9fb27cd701326b646d775ec2
  Stored in directory: /tmp/pip-ephem-wheel-cache-bp4c02jv/wheels/55/19/4d/0ba1030ee37cc8a147ca29552d435863f639435c056b9aa440
Successfully built vllm
Installing collected packages: vllm
  changing mode of /root/autodl-tmp/miniconda3/envs/vllm/bin/vllm to 755
Successfully installed vllm-0.8.5+cu128

解释

  • 含义:这是 setuptools 在 editable 模式(开发模式)安装时的常规提醒。editable 安装允许直接从源码目录导入包(无需重新安装),但如果工作目录中存在与包同名的文件夹,可能会导致导入冲突(例如项目根目录和包名均为 vllm)。
  • 处理建议:
    如果你只是正常使用 vllm(而非开发修改其源码),这个提示可以忽略。如果后续开发中遇到导入问题,检查项目结构即可。
  • 示例:执行“git clone vllm”后,vllm项目路劲为“/root/vllm”,然后我的qwen3启动脚本路劲也在"/root"下,导致运行时提示ImportError: cannot import name 'PoolingParams' from 'vllm' (unknown location)解决方式:将启动程序放在和vllm项目不一样的路劲
  1. 踩坑记录
  • 日志提示1:
/root/autodl-tmp/vllm/csrc/moe/marlin_moe_wna16/kernel.h(42): warning #20281-D: in whole program compilation mode ("-rdc=false"), a __global__ function template instantiation or specialization ("marlin_moe_wna16::Marlin<    ::__nv_bfloat16, (long)1125899906843648l, (int)128, (int)4, (int)8, (int)4, (bool)0, (int)4, (bool)0, (bool)1, (int)8, (bool)0> ") will be required to have a definition in the current translation unit, when "-static-global-template-stub" will be set to "true" by default in the future. To resolve this issue, either use "-rdc=true", or explicitly set "-static-global-template-stub=false" (but see nvcc documentation about downsides of turning it off)
  • 处理方式:忽略不管。若执行 “export CUDAFLAGS=“-rdc=true -lcudadevrt” & pip install ” 操作则报错
Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - broken
CMake Error at /root/autodl-tmp/miniconda3/envs/vllm/lib/python3.11/site-packages/cmake/data/share/cmake-4.0/Modules/CMakeTestCUDACompiler.cmake:59 (message):
The CUDA compiler

  "/usr/local/cuda/bin/nvcc"

is not able to compile a simple test program.

It fails with the following output:

  Change Dir: '/tmp/tmp2v82_f1s.build-temp/CMakeFiles/CMakeScratch/TryCompile-rE8LAN'

  Run Build Command(s): /root/autodl-tmp/miniconda3/envs/vllm/bin/ninja -v cmTC_835b4
  [1/2] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler   -rdc=true -lcudadevrt  "--generate-code=arch=compute_52,code=[compute_52,sm_52]" -MD -MT CMakeFiles/cmTC_835b4.dir/main.cu.o -MF CMakeFiles/cmTC_835b4.dir/main.cu.o.d -x cu -c /tmp/tmp2v82_f1s.build-temp/CMakeFiles/CMakeScratch/TryCompile-rE8LAN/main.cu -o CMakeFiles/cmTC_835b4.dir/main.cu.o
  nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
  [2/2] : && /usr/bin/g++  CMakeFiles/cmTC_835b4.dir/main.cu.o -o cmTC_835b4  -lcudadevrt  -lcudart_static  -lrt  -lpthread  -ldl -L"/usr/local/cuda/targets/x86_64-linux/lib/stubs" -L"/usr/local/cuda/targets/x86_64-linux/lib" && :
  FAILED: cmTC_835b4
  : && /usr/bin/g++  CMakeFiles/cmTC_835b4.dir/main.cu.o -o cmTC_835b4  -lcudadevrt  -lcudart_static  -lrt  -lpthread  -ldl -L"/usr/local/cuda/targets/x86_64-linux/lib/stubs" -L"/usr/local/cuda/targets/x86_64-linux/lib" && :
  /usr/bin/ld: CMakeFiles/cmTC_835b4.dir/main.cu.o: in function `__sti____cudaRegisterAll()':
  tmpxft_00007a42_00000000-6_main.cudafe1.cpp:(.text+0x127): undefined reference to `__cudaRegisterLinkedBinary_e73a7380_7_main_cu_main'
  collect2: error: ld returned 1 exit status
  ninja: build stopped: subcommand failed.
  • 坑2:由于ImportError: cannot import name 'PoolingParams' from 'vllm' (unknown location)提示,我还执行过以下操作,但是环境都存在以下包,所以下方操作没卵用
apt-get update && apt-get install -y build-essential python3-dev
# 安装 Python 依赖
pip install setuptools wheel
  1. 我的conda环境
# vllm.yml,是使用“conda activate vllm & conda env create -f vllm.yml”生成; “conda pack -n vllm(conda环境名) -o env.tar.gz” 可打包整个环境
name: vllm
channels:
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - _openmp_mutex=5.1=1_gnu
  - bzip2=1.0.8=h5eee18b_6
  - ca-certificates=2025.2.25=h06a4308_0
  - ld_impl_linux-64=2.40=h12ee557_0
  - libffi=3.4.4=h6a678d5_1
  - libgcc-ng=11.2.0=h1234567_1
  - libgomp=11.2.0=h1234567_1
  - libstdcxx-ng=11.2.0=h1234567_1
  - libuuid=1.41.5=h5eee18b_0
  - ncurses=6.4=h6a678d5_0
  - openssl=3.0.16=h5eee18b_0
  - python=3.11.11=he870216_0
  - readline=8.2=h5eee18b_0
  - sqlite=3.45.3=h5eee18b_0
  - tk=8.6.14=h39e8969_0
  - tzdata=2025b=h04d1e81_0
  - wheel=0.45.1=py311h06a4308_0
  - xz=5.6.4=h5eee18b_1
  - zlib=1.2.13=h5eee18b_1
  - pip:
      - aiohappyeyeballs==2.6.1
      - aiohttp==3.11.18
      - aiosignal==1.3.2
      - airportsdata==20250224
      - annotated-types==0.7.0
      - anyio==4.9.0
      - astor==0.8.1
      - attrs==25.3.0
      - blake3==1.0.4
      - cachetools==5.5.2
      - certifi==2025.4.26
      - charset-normalizer==3.4.2
      - click==8.2.0
      - cloudpickle==3.1.1
      - cmake==4.0.2
      - compressed-tensors==0.9.3
      - cupy-cuda12x==13.4.1
      - deprecated==1.2.18
      - depyf==0.18.0
      - dill==0.4.0
      - diskcache==5.6.3
      - distro==1.9.0
      - dnspython==2.7.0
      - einops==0.8.1
      - email-validator==2.2.0
      - fastapi==0.115.12
      - fastapi-cli==0.0.7
      - fastrlock==0.8.3
      - filelock==3.18.0
      - frozenlist==1.6.0
      - fsspec==2024.6.1
      - gguf==0.16.3
      - googleapis-common-protos==1.70.0
      - grpcio==1.72.0
      - h11==0.16.0
      - hf-xet==1.1.1
      - httpcore==1.0.9
      - httptools==0.6.4
      - httpx==0.28.1
      - huggingface-hub==0.31.2
      - idna==3.10
      - importlib-metadata==8.0.0
      - interegular==0.3.3
      - jinja2==3.1.6
      - jiter==0.9.0
      - jsonschema==4.23.0
      - jsonschema-specifications==2025.4.1
      - lark==1.2.2
      - llguidance==0.7.19
      - llvmlite==0.44.0
      - lm-format-enforcer==0.10.11
      - markdown-it-py==3.0.0
      - markupsafe==2.1.5
      - mdurl==0.1.2
      - mistral-common==1.5.4
      - mpmath==1.3.0
      - msgpack==1.1.0
      - msgspec==0.19.0
      - multidict==6.4.3
      - nest-asyncio==1.6.0
      - networkx==3.3
      - ninja==1.11.1.4
      - numba==0.61.2
      - numpy==2.1.2
      - nvidia-cublas-cu12==12.8.3.14
      - nvidia-cuda-cupti-cu12==12.8.57
      - nvidia-cuda-nvrtc-cu12==12.8.61
      - nvidia-cuda-runtime-cu12==12.8.57
      - nvidia-cudnn-cu12==9.7.1.26
      - nvidia-cufft-cu12==11.3.3.41
      - nvidia-cufile-cu12==1.13.0.11
      - nvidia-curand-cu12==10.3.9.55
      - nvidia-cusolver-cu12==11.7.2.55
      - nvidia-cusparse-cu12==12.5.7.53
      - nvidia-cusparselt-cu12==0.6.3
      - nvidia-nccl-cu12==2.26.2
      - nvidia-nvjitlink-cu12==12.8.61
      - nvidia-nvtx-cu12==12.8.55
      - openai==1.78.1
      - opencv-python-headless==4.11.0.86
      - opentelemetry-api==1.26.0
      - opentelemetry-exporter-otlp==1.26.0
      - opentelemetry-exporter-otlp-proto-common==1.26.0
      - opentelemetry-exporter-otlp-proto-grpc==1.26.0
      - opentelemetry-exporter-otlp-proto-http==1.26.0
      - opentelemetry-proto==1.26.0
      - opentelemetry-sdk==1.26.0
      - opentelemetry-semantic-conventions==0.47b0
      - opentelemetry-semantic-conventions-ai==0.4.8
      - outlines==0.1.11
      - outlines-core==0.1.26
      - packaging==25.0
      - partial-json-parser==0.2.1.1.post5
      - pillow==11.0.0
      - pip==25.1.1
      - prometheus-client==0.21.1
      - prometheus-fastapi-instrumentator==7.1.0
      - propcache==0.3.1
      - protobuf==4.25.7
      - psutil==7.0.0
      - py-cpuinfo==9.0.0
      - pycountry==24.6.1
      - pydantic==2.11.4
      - pydantic-core==2.33.2
      - pygments==2.19.1
      - python-dotenv==1.1.0
      - python-json-logger==3.3.0
      - python-multipart==0.0.20
      - pyyaml==6.0.2
      - pyzmq==26.4.0
      - ray==2.46.0
      - referencing==0.36.2
      - regex==2024.11.6
      - requests==2.32.3
      - rich==14.0.0
      - rich-toolkit==0.14.6
      - rpds-py==0.24.0
      - safetensors==0.5.3
      - scipy==1.15.3
      - sentencepiece==0.2.0
      - setuptools==80.4.0
      - setuptools-scm==8.3.1
      - shellingham==1.5.4
      - sniffio==1.3.1
      - starlette==0.46.2
      - sympy==1.13.3
      - tiktoken==0.9.0
      - tokenizers==0.21.1
      - torch==2.7.0+cu128
      - torchaudio==2.7.0+cu128
      - torchvision==0.22.0+cu128
      - tqdm==4.67.1
      - transformers==4.51.3
      - triton==3.3.0
      - typer==0.15.3
      - typing-extensions==4.12.2
      - typing-inspection==0.4.0
      - urllib3==2.4.0
      - uvicorn==0.34.2
      - uvloop==0.21.0
      - vllm==0.8.5+cu128
      - watchfiles==1.0.5
      - websockets==15.0.1
      - wrapt==1.17.2
      - xgrammar==0.1.18
      - yarl==1.20.0
      - zipp==3.21.0
prefix: /root/autodl-tmp/miniconda3/envs/vllm

  1. 安装指定版本 gcc、cmake等
  • 安装 GCC/G++ 13.3
# 安装 software-properties-common 包
sudo apt update
sudo apt install software-properties-common

# 添加PPA仓库
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt update

# 安装GCC 13
sudo apt install gcc-13 g++-13

# 设置默认版本
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-13 130 --slave /usr/bin/g++ g++ /usr/bin/g++-13 --slave /usr/bin/gcov gcov /usr/bin/gcov-13

# 验证版本
gcc --version
g++ --version
  • 安装 CMake 3.28
# 安装依赖
sudo apt-get install -y wget libssl-dev

# 下载并编译CMake
wget https://github.com/Kitware/CMake/releases/download/v3.28.0/cmake-3.28.0.tar.gz
tar -zxvf cmake-3.28.0.tar.gz
cd cmake-3.28.0
./bootstrap --prefix=/usr/local
make -j$(nproc)
sudo make install

# 验证版本
cmake --version
  • 安装 Ninja 1.11
# 安装依赖
sudo apt-get install -y wget

# 下载并编译Ninja
wget https://github.com/ninja-build/ninja/archive/v1.11.1.tar.gz
tar -zxvf v1.11.1.tar.gz
cd ninja-1.11.1
./configure.py --bootstrap

# 安装到系统路径
sudo cp ninja /usr/local/bin/
sudo chmod +x /usr/local/bin/ninja

# 验证版本
ninja --version
  1. 其他技巧
# 卸载所有包
>> pip freeze > req.txt
>> pip uninstall -r req.txt -y  
>> sudo apt-get install -y build-essential g++-13.3 gcc-13.3 cmake-3.28 ninja-builde

# 临时增加pip源
>> pip install -i https://mirrors.huaweicloud.com/repository/pypi/simple/
>> pip install  -i https://pypi.tuna.tsinghua.edu.cn/simple

# 国内git
https://gitee.com/

# miniconda 下载
https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/?C=M&O=D
### DeepSeek-R1-70B 模型本地部署硬件要求 对于DeepSeek-R1-70B模型的本地部署,硬件配置的选择至关重要。考虑到该模型庞大的参数量以及复杂的运算需求,合理的硬件规划能够显著提升部署的成功率与运行效率。 #### GPU 配置建议 针对不同场景下的应用需求,GPU 的选择有所不同: - **企业级文档分析或多轮对话系统**:推荐使用单张 RTX 4090 (24GB 显存),适用于中小规模的数据处理任务;当显存量不足时可考虑升级至具备35GB以上显存的产品[^4]。 - **科研计算或金融建模**:更高级别的性能支持则需采用双卡方案——两片RTX 5090通过NVLink技术连接起来提供总计64GB显存空间,并配备至少32核心CPU处理器来加速整体流程。 - **国家级AI研究或通用人工智能探索**:面对最严苛的任务挑战,则应构建由块NVIDIA H100 Tensor Core GPUs组成的集群环境,借助Infiniband高速网络互连协议确保节点间通信无阻塞的同时获得惊人的并行计算效能,总显存容量可达640GB级别。 #### CPU 及其他组件考量 除了强大的图形处理单元外,中央处理器(CPU)同样不可忽视。根据具体应用场景的不同,可以选择AMD EPYC系列中的高性能型号作为主机心脏部位的动力源。例如,在上述提到的最后一类极端条件下,就需要搭载拥有64个物理线程的强大服务器级芯片组以支撑起整个系统的稳定运转。 此外,充足的内存(RAM)也是保障大型语言模型顺利加载的关键因素之一。通常情况下,随着所选用GPU数量增加而相应扩大RAM规格会是比较稳妥的做法。至于存储介质方面,快速读写的SSD固态硬盘自然是首选项,特别是那些带有PCIe接口的企业级产品更能满足频繁IO操作的需求。 ```bash # 示例命令用于启动vllm服务端口监听 vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B \ --tensor-parallel-size 2 \ --max-model-len 32768 \ --enforce-eager ```
评论 6
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值