环境搭建全流程
通过mobaxterm终端工具访问服务器
服务器登陆
ssh sp.hengyue.yuan@console.uat.enflame.cc -p 31081
scp <Linux用户名>@<服务器IP>:<文件路径> <Windows保存路径(用/)>
1、下载模型和安装包到host
模型路径:
sudo wget -nH --no-parent --recursive http://10.12.110.200:8080/sse_ard_data/inference/pretrained_models/DeepSeek-R1-Distill-Qwen-14B/
安装包:
sudo wget http://nfs01.enflame.cn/qodftp_builds/tops_daily_packages/master/test_package/x86_64/gcc7_with_paddle/x86_64-linux-rel_3.4.20250620.tar.gz
2、将压缩包解压解包
解压+解包
tar -zxvf /home/cssg/yhy/x86_64-linux-rel_3.4.20250620.tar.gz -C /home/cssg/yhy/ttB
3、x86目录下执行该命令(针对某些文件缺失)
./make_runnable no_fail
4、host主机内,lib目录下执行以下命令
bash TopsPlatform_1.5.0.5-9de43a_deb_amd64.run -y
5、创建容器并实现host目录向容器的映射
sudo docker run -it --name ubuntu0703 -v /home:/home -v /sys/:/sys/ --network host --privileged --ipc=host enflame:ubuntu_base
6、去创建的容器内再次执行以下命令
bash TopsPlatform_1.5.0.5-9de43a_deb_amd64.run -y
7、安装依赖
//(1)torch版本更换
pip uninstall torch
pip uninstall torchaudio
pip uninstall torchvision
pip install torch==2.5.1 -i https://mirrors.cloud.tencent.com/pypi/simple
pip install torchaudio -i https://mirrors.cloud.tencent.com/pypi/simple
pip install torchvision -i https://mirrors.cloud.tencent.com/pypi/simple
//(2)lib
dpkg -i eccl_3.4.20250605-1_amd64.deb eccl-tests_3.4.20250605-1_amd64.deb
topsaten_3.5.20250619-1_amd64.deb
topsfactor_3.4.20250620-1_amd64.deb
topsgraph_3.5.0-1_amd64.deb
tops-sdk_3.4.20250620-1_amd64_internal.deb
//(3)python_package
注意python、torch版本
pip install flash_attn-2.6.3+torch.2.5.1.gcu.3.4.20250616-cp310-cp310-linux_x86_64.whl
tops_extension-3.2.20250604+torch.2.5.1-cp310-cp310-linux_x86_64.whl
topsgraph-3.5.0-cp310-cp310-linux_x86_64.whl
torch_gcu-2.5.1+3.4.0.9-cp310-cp310-linux_x86_64.whl
vllm_gcu-0.7.2+3.4.20250321-cp39-abi3-linux_x86_64.whl
xformers-0.0.28.post3+torch.2.5.1.gcu.3.2.20250427-cp310-cp310-linux_x86_64.whl
1、vllm工具测试模型性能
vllm serve /home/cssg/yhy/attModel/sse_ard_data/inference/pretrained_models/DeepSeek-R1-Distill-Qwen-7B/ --max-model-len 65536 --block-size=64 --disable-log-requests
nohup vllm serve /home/cssg/yhy/attModel/sse_ard_data/inference/pretrained_models/DeepSeek-R1-Distill-Qwen-14B/ --max-model-len 32768 --block-size=64 --disable-log-requests &> server.log &
tail -f server.log
2、vllm工具测试模型性能
set -x
python3 -m vllm_utils.benchmark_serving
–backend vllm
–dataset-name random
–model /home/cssg/yhy/attModel/sse_ard_data/inference/pretrained_models/DeepSeek-R1-Distill-Qwen-14B/
–num-prompts 1
–random-input-len 1024
–random-output-len 1024
–trust-remote-code
–ignore_eos
–strict-in-out-len
–keep-special-tokens
python3 -m vllm_utils.benchmark_serving
–backend vllm
–dataset-name random
–model /home/cssg/yhy/attModel/sse_ard_data/inference/pretrained_models/DeepSeek-R1-Distill-Qwen-14B/
–num-prompts 16
–random-input-len 1024
–random-output-len 1024
–trust-remote-code
–ignore_eos
–strict-in-out-len
–keep-special-tokens
python3.10 -m vllm_utils.benchmark_test
–perf --model /home/cssg/yhy/attModel/sse_ard_data/inference/pretrained_models/DeepSeek-R1-Distill-Qwen-14B/
–tensor-parallel-size 1
–max-model-len=65536
–input-len=1024
–output-len=1024
–dtype=bfloat16
–device gcu
–num-prompts 1
–block-size=64
–gpu-memory-utilization 0.9
–trust-remote-code
3、topsprof工具使用
topsprof
–enable-activities operator
–export-visual-profiler ./DeepSeek-R1-Distill-Qwen-14B_in1024_out1024_bs1
–print-app-log
–force-overwrite
–export-rawdata ./DeepSeek-R1-Distill-Qwen-14B_in1024_out1024_bs1/DeepSeek-R1-Distill-Qwen-14B_in1024_out1024_bs1.rawdata
–trace all
–topstx-domain-include topsrt python3.10 -m vllm_utils.benchmark_test
–perf
–model /home/cssg/yhy/attModel/sse_ard_data/inference/pretrained_models/DeepSeek-R1-Distill-Qwen-14B/
–tensor-parallel-size 1
–max-model-len=32768
–input-len=1024
–output-len=1024
–dtype=bfloat16
–device gcu
–num-prompts 1
–block-size=64
–gpu-memory-utilization 0.945
–trust-remote-code