环境搭建全过程

原创已于 2025-09-01 14:08:03 修改 · 797 阅读

15 ·

CC 4.0 BY-SA版权

文章标签：

#docker

于 2025-07-11 10:20:15 首次发布

linux 专栏收录该内容

5 篇文章

订阅专栏

部署运行你感兴趣的模型镜像

环境搭建全流程

通过mobaxterm终端工具访问服务器

服务器登陆
ssh sp.hengyue.yuan@console.uat.enflame.cc -p 31081

scp <Linux用户名>@<服务器IP>：<文件路径> <Windows保存路径（用/）>

1、下载模型和安装包到host

模型路径：
sudo wget -nH --no-parent --recursive http://10.12.110.200:8080/sse_ard_data/inference/pretrained_models/DeepSeek-R1-Distill-Qwen-14B/

安装包：
sudo wget http://nfs01.enflame.cn/qodftp_builds/tops_daily_packages/master/test_package/x86_64/gcc7_with_paddle/x86_64-linux-rel_3.4.20250620.tar.gz

2、将压缩包解压解包

解压+解包
tar -zxvf /home/cssg/yhy/x86_64-linux-rel_3.4.20250620.tar.gz -C /home/cssg/yhy/ttB

3、x86目录下执行该命令（针对某些文件缺失）

./make_runnable no_fail

4、host主机内，lib目录下执行以下命令

bash TopsPlatform_1.5.0.5-9de43a_deb_amd64.run -y

5、创建容器并实现host目录向容器的映射

sudo docker run -it --name ubuntu0703 -v /home:/home -v /sys/:/sys/ --network host --privileged --ipc=host enflame:ubuntu_base

6、去创建的容器内再次执行以下命令

bash TopsPlatform_1.5.0.5-9de43a_deb_amd64.run -y

7、安装依赖

//（1）torch版本更换
pip uninstall torch
pip uninstall torchaudio
pip uninstall torchvision

pip install torch==2.5.1 -i https://mirrors.cloud.tencent.com/pypi/simple
pip install torchaudio -i https://mirrors.cloud.tencent.com/pypi/simple
pip install torchvision -i https://mirrors.cloud.tencent.com/pypi/simple

//（2）lib
dpkg -i eccl_3.4.20250605-1_amd64.deb eccl-tests_3.4.20250605-1_amd64.deb
topsaten_3.5.20250619-1_amd64.deb
topsfactor_3.4.20250620-1_amd64.deb
topsgraph_3.5.0-1_amd64.deb
tops-sdk_3.4.20250620-1_amd64_internal.deb

//（3）python_package
注意python、torch版本
pip install flash_attn-2.6.3+torch.2.5.1.gcu.3.4.20250616-cp310-cp310-linux_x86_64.whl
tops_extension-3.2.20250604+torch.2.5.1-cp310-cp310-linux_x86_64.whl
topsgraph-3.5.0-cp310-cp310-linux_x86_64.whl
torch_gcu-2.5.1+3.4.0.9-cp310-cp310-linux_x86_64.whl
vllm_gcu-0.7.2+3.4.20250321-cp39-abi3-linux_x86_64.whl
xformers-0.0.28.post3+torch.2.5.1.gcu.3.2.20250427-cp310-cp310-linux_x86_64.whl

1、vllm工具测试模型性能
vllm serve /home/cssg/yhy/attModel/sse_ard_data/inference/pretrained_models/DeepSeek-R1-Distill-Qwen-7B/ --max-model-len 65536 --block-size=64 --disable-log-requests
nohup vllm serve /home/cssg/yhy/attModel/sse_ard_data/inference/pretrained_models/DeepSeek-R1-Distill-Qwen-14B/ --max-model-len 32768 --block-size=64 --disable-log-requests &> server.log &
tail -f server.log

2、vllm工具测试模型性能
set -x
python3 -m vllm_utils.benchmark_serving
–backend vllm
–dataset-name random
–model /home/cssg/yhy/attModel/sse_ard_data/inference/pretrained_models/DeepSeek-R1-Distill-Qwen-14B/
–num-prompts 1
–random-input-len 1024
–random-output-len 1024
–trust-remote-code
–ignore_eos
–strict-in-out-len
–keep-special-tokens

python3 -m vllm_utils.benchmark_serving
–backend vllm
–dataset-name random
–model /home/cssg/yhy/attModel/sse_ard_data/inference/pretrained_models/DeepSeek-R1-Distill-Qwen-14B/
–num-prompts 16
–random-input-len 1024
–random-output-len 1024
–trust-remote-code
–ignore_eos
–strict-in-out-len
–keep-special-tokens

python3.10 -m vllm_utils.benchmark_test
–perf --model /home/cssg/yhy/attModel/sse_ard_data/inference/pretrained_models/DeepSeek-R1-Distill-Qwen-14B/
–tensor-parallel-size 1
–max-model-len=65536
–input-len=1024
–output-len=1024
–dtype=bfloat16
–device gcu
–num-prompts 1
–block-size=64
–gpu-memory-utilization 0.9
–trust-remote-code

3、topsprof工具使用
topsprof
–enable-activities operator
–export-visual-profiler ./DeepSeek-R1-Distill-Qwen-14B_in1024_out1024_bs1
–print-app-log
–force-overwrite
–export-rawdata ./DeepSeek-R1-Distill-Qwen-14B_in1024_out1024_bs1/DeepSeek-R1-Distill-Qwen-14B_in1024_out1024_bs1.rawdata
–trace all
–topstx-domain-include topsrt python3.10 -m vllm_utils.benchmark_test
–perf
–model /home/cssg/yhy/attModel/sse_ard_data/inference/pretrained_models/DeepSeek-R1-Distill-Qwen-14B/
–tensor-parallel-size 1
–max-model-len=32768
–input-len=1024
–output-len=1024
–dtype=bfloat16
–device gcu
–num-prompts 1
–block-size=64
–gpu-memory-utilization 0.945
–trust-remote-code

软件版本查看
import vllm
print(vllm.version)

您可能感兴趣的与本文相关的镜像

Vllm-v0.11.0

Vllm

vLLM是伯克利大学LMSYS组织开源的大语言模型高速推理框架，旨在极大地提升实时场景下的语言模型服务的吞吐与内存使用效率。vLLM是一个快速且易于使用的库，用于 LLM 推理和服务，可以和HuggingFace 无缝集成。vLLM利用了全新的注意力算法「PagedAttention」，有效地管理注意力键和值