昇腾平台LLM pytorch推理环境搭建

有来有去9527

已于 2024-08-21 14:38:49 修改

阅读量4.5k

点赞数 16

分类专栏： llm Ascend训推文章标签： pytorch 人工智能 python

于 2023-11-23 18:06:01 首次发布

本文链接：https://blog.youkuaiyun.com/bmfire/article/details/134583456

版权

llm 同时被 2 个专栏收录

9 篇文章

订阅专栏

Ascend训推

4 篇文章

订阅专栏

本文介绍如何在昇腾平台上搭建针对LLM模型的PyTorch推理环境，包括基础环境配置、PyTorch及依赖安装步骤，并提供验证方法及示例代码。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

【昇腾】LLM pytorch推理环境搭建

1.基础环境准备

资源下载地址：https://www.hiascend.com/developer/download
需要下载：
驱动
固件
cann
pytorch
pytorch_npu

python和pytorch的对应关系

PyTorch版本	Python版本
PyTorch1.11.0	Python3.7.x(>=3.7.5), Python3.8.x, Python3.9.x, Python3.10.x
PyTorch2.0.1	Python3.8.x, Python3.9.x, Python3.10.x
PyTorch2.1.0	Python3.8.x, Python3.9.x, Python3.10.x

检查NPU是否正常在位可执行lspci | grep d802命令，如果服务器上有 N路NPU，回显N行含“d802”字段，则表示NPU正常在位。

1.1 驱动安装

sh Ascend-hdk-910b-npu-driver_23.0.rc3_linux-aarch64.run --full --install-for-all
sh Ascend-hdk-910b-npu-firmware_6.4.0.4.220.run

通过执行npu-smi info命令查看

1.2 cann安装

sh Ascend-cann-nnrt_7.0.RC1_linux-aarch64.run

安装成功确认
cat /usr/local/Ascend/ascend-toolkit/latest/aarch64-linux/ascend-toolkit_install.info

2. 安装pytorch

2.1 安装pytorch

aarch64:

pip install torch==2.1.0

2.2 安装依赖

pip3 install pyyaml
pip3 install setuptools

2.3 安装pytorch-npu

pip3 install torch-npu==2.1.0rc1

3.验证

3.1 设置环境变量

# Default path, change it if needed.
source /usr/local/Ascend/ascend-toolkit/set_env.sh

查看状态

python3 -c "import torch;import torch_npu;print(torch_npu.npu.is_available())"
#如果为false，则需要进行迁移

#自动映射cuda API到npu的代码
from torch_npu.contrib import transfer_to_npu
print(torch_npu.npu.is_available())
print(torch_npu.cuda.is_available())

#True
#True

3.2 代码验证

import torch
import torch_npu

x = torch.randn(1, 3, 224, 224).npu()

print(x.shape)
print(type(x))

4 llm示例

此处以chatglm3-6b为示例


import time

import torch
import torch_npu
from torch_npu import transfer_to_npu

from transformers import AutoTokenizer, AutoModel


model_dir = 'path/to/chatglm3-6b'

tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
t0 = time.time()
model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).half().npu()
model = model.eval()
print("model load consume:", time.time() - t0)
#38s

response, history = model.chat(tokenizer, "你好", history=[])
print(response)

response, history = model.chat(tokenizer, "晚上睡不着应该怎么办", history=history)
print(response)

常见问题

2.3中pip直接安装失败
下载源码进行安装
git clone https://gitee.com/ascend/pytorch.git -b v2.1.0-5.0.rc3 --depth 1
“ImportError: libhccl.so: cannot open shared object file: No such file or directory”
设置环境变量，见3.1
“npu error，erro code is 507008” fail to obtain the soc version
查看当前用户是否有权限使用驱动
当前用户下执行npu-smi info是否可以查看到gpu信息，如果不能，则表示驱动安装错误，需要在安装命令加上 --install-for-all
npu()-cuda()迁移问题
https://support.huaweicloud.com/bestpractice-modelarts/modelarts_10_2503.html

cann和pytorch的对应关系

CANN版本	支持的PyTorch版本	支持的Adapter版本	Github分支	AscendHub镜像版本/名称(链接)
CANN 7.0.RC1	2.1.0	2.1.0.rc1	v2.1.0-5.0.rc3	-
	2.0.1	2.0.1	v2.0.1-5.0.rc3	-
	1.11.0	1.11.0.post4	v1.11.0-5.0.rc3	-
CANN 6.3.RC3.1	1.11.0	1.11.0.post3	v1.11.0-5.0.rc2.2	-
CANN 6.3.RC3	1.11.0	1.11.0.post2	v1.11.0-5.0.rc2.1	-
CANN 6.3.RC2	2.0.1	2.0.1.rc1	v2.0.1-5.0.rc2	-
	1.11.0	1.11.0.post1	v1.11.0-5.0.rc2	23.0.RC1-1.11.0
	1.8.1	1.8.1.post2	v1.8.1-5.0.rc2	23.0.RC1-1.8.1
CANN 6.3.RC1	1.11.0	1.11.0	v1.11.0-5.0.rc1	-
CANN 6.3.RC1	1.8.1	1.8.1.post1	v1.8.1-5.0.rc1	-
CANN 6.0.1	1.5.0	1.5.0.post8	v1.5.0-3.0.0	22.0.0
	1.8.1	1.8.1	v1.8.1-3.0.0	22.0.0-1.8.1
	1.11.0	1.11.0.rc2（beta)	v1.11.0-3.0.0	-
CANN 6.0.RC1	1.5.0	1.5.0.post7	v1.5.0-3.0.rc3	22.0.RC3
	1.8.1	1.8.1.rc3	v1.8.1-3.0.rc3	22.0.RC3-1.8.1
	1.11.0	1.11.0.rc1（beta)	v1.11.0-3.0.rc3	-
CANN 5.1.RC2	1.5.0	1.5.0.post6	v1.5.0-3.0.rc2	22.0.RC2
CANN 5.1.RC2	1.8.1	1.8.1.rc2	v1.8.1-3.0.rc2	22.0.RC2-1.8.1
CANN 5.1.RC1	1.5.0	1.5.0.post5	v1.5.0-3.0.rc1	22.0.RC1
CANN 5.1.RC1	1.8.1	1.8.1.rc1	v1.8.1-3.0.rc1	-
CANN 5.0.4	1.5.0	1.5.0.post4	2.0.4.tr5	21.0.4
CANN 5.0.3	1.8.1	1.5.0.post3	2.0.3.tr5	21.0.3
CANN 5.0.2	1.5.0	1.5.0.post2	2.0.2.tr5	21.0.2