导言
不是技术贴,算是个科普贴
最近很多人咨询,GPU的资源越来越难以获取,有没有昇腾的资源可以使用
下面展示了一条相对简单能运行模型的资源路径
拉起一个模型最快的方式是使用已经集成好的三方库,本文使用了昇腾原生支持的Huggingface-Transformers套件
昇腾原生适配的三方库在社区文档 中提供了清单。若有更多三方库适配需求,可在社区论坛 中提出,欢迎反馈
当下应该是OpenAI-o1比较火,那就用这些资源拉起一个o1数学模型试试推理效果吧
资源获取
所需资源 | 提供方 | 获取途径 |
---|---|---|
Device: NPU | 启智社区 | 积分制社区,注册既有50积分,我们本文使用的机器每小时按4积分计算,所以足够了。点击启智社区 官网使用 |
Requirements: torch=2.1 torch_npu=2.1 | 昇腾社区 | torch_npu在社区文档 中获取 |
三方库: Transformers=4.43.2 | HuggingFace社区 | 昇腾原生适配的三方库在社区文档 中提供了清单。若有更多三方库适配需求,可在社区论坛 中提出,欢迎反馈 |
模型 | 天工 HF社区上传 | 最近一周o1模型开源集中爆发,天工发布了Skywork-o1-Open-Llama-3.1-8B 可直接在HF社区下载 |
下载工具 | HF-Mirror | 国内在HF下载受阻,参照HF-Mirror设置 环境变量即可加速下载 |
快速实践
1. 创建一个NPU环境
启智社区的使用请参考社区文档。
启智社区 --> 个人中心 --> 云脑任务 --> 新建云脑任务 --> 调试任务 --> 昇腾NPU
资源规格选择NPU: 1*Ascend-D910(显存: 32GB), CPU: 20, 内存: 60GB
镜像选择torch-npu-cann8-debug
2. 依赖安装
镜像中提供的torch是2.1.0版本,匹配的Transformers是4.41.2。
但我们需要运行的Skywork-o1-Open-Llama-3.1-8B入参校验需要的Transformers版本至少要求4.43.2以上,所以需要更新一下。
pip install transformers==4.43.2
为加快加快速度,需设置环境变量
export HF_ENDPOINT=https://hf-mirror.com
3. 启动脚本
导入依赖
import torch
import torch_npu
from transformers import AutoModelForCausalLM, AutoTokenizer
设定问答模板
system_prompt: 你是Skywork-o1,Skywork AI开发的思维模型,擅长通过深入思考解决涉及数学、编码和逻辑推理的复杂问题。当面对用户的请求时,你首先要进行漫长而深入的思考过程,以探索问题的可能解决方案。完成你的想法后,你在回复中提供对解决方案过程的详细说明。
problem: Jane有12个苹果。她把4个苹果给了她的朋友Mark,然后又买了1个苹果,最后把所有的苹果平均分给了她自己和2个兄弟姐妹。请问最后每人得到多少个苹果?
system_prompt = """You are Skywork-o1, a thinking model developed by Skywork AI, specializing in solving complex problems involving mathematics, coding, and logical reasoning through deep thought. When faced with a user's request, you first engage in a lengthy and in-depth thinking process to explore possible solutions to the problem. After completing your thoughts, you then provide a detailed explanation of the solution process in your response."""
# An Example Case
problem = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
user_message = problem
conversation = [
{
"role": "system",
"content": system_prompt
},
{
"role": "user",
"content": user_message
}
]
指定模型为"Skywork/Skywork-o1-Open-Llama-3.1-8B"
model_name = "Skywork-o1-Open-Llama3.1-8B"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
input_ids = tokenizer.apply_chat_template(
conversation,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt").to(model.device)
完成推理
generation = model.generate(
input_ids=input_ids,
max_new_tokens=2048,
do_sample=False,
pad_token_id=128009,
temperature=0)
completion = tokenizer.decode(
generation[0][len(input_ids[0]):],
skip_special_tokens=True,
clean_up_tokenization_spaces=True)
print(completion)
4. 输出流
Loading checkpoint shards: 0%|
Loading checkpoint shards: 25%|████████████████████████████████████████▌
Loading checkpoint shards: 50%|█████████████████████████████████████████████████████████████████████████████████
Loading checkpoint shards: 75%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.01it/s]
/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:567: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
warnings.warn(
/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:572: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
warnings.warn(
The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
[W NeKernelNpu.cpp:28] Warning: The oprator of ne is executed, Currently High Accuracy but Low Performance OP with 64-bit has been used, Please Do Some Cast at Python Functions with 32-bit for Better Performance! (function operator())
[W AddKernelNpu.cpp:82] Warning: The oprator of add is executed, Currently High Accuracy but Low Performance OP with 64-bit has been used, Please Do Some Cast at Python Functions with 32-bit for Better Performance! (function operator())
5. 推理结果
可以看到完整的思维链过程,按照题干分三步完成了推理。
To solve the problem, let's break it down into a series of logical steps:
1. **Initial Number of Apples**: Jane starts with 12 apples.
2. **Apples Given Away**: Jane gives 4 apples to her friend Mark. So, the number of apples she has now is:
\[
12 - 4 = 8
\]
3. **Apples Bought**: Jane then buys 1 more apple. So, the number of apples she has now is:
\[
8 + 1 = 9
\]
4. **Apples Split Equally**: Jane splits all her apples equally among herself and her 2 siblings. This means the apples are divided among 3 people. So, the number of apples each person gets is:
\[
\frac{9}{3} = 3
\]
Therefore, each person gets \(\boxed{3}\) apples.
除去注册网站账号和编译时间,大概十分钟就能完成这份快速上手实践。感兴趣可以参照体验下。