使用昇腾原生支持的三方库，全部利用外部公开资源，快速体验最近很火的o1推理

本文链接：https://blog.youkuaiyun.com/qq_37662375/article/details/144167800

导言

不是技术贴，算是个科普贴
最近很多人咨询，GPU的资源越来越难以获取，有没有昇腾的资源可以使用
下面展示了一条相对简单能运行模型的资源路径

拉起一个模型最快的方式是使用已经集成好的三方库，本文使用了昇腾原生支持的Huggingface-Transformers套件

昇腾原生适配的三方库在社区文档中提供了清单。若有更多三方库适配需求，可在社区论坛中提出，欢迎反馈

当下应该是OpenAI-o1比较火，那就用这些资源拉起一个o1数学模型试试推理效果吧

资源获取

所需资源	提供方	获取途径
Device: NPU	启智社区	积分制社区，注册既有50积分，我们本文使用的机器每小时按4积分计算，所以足够了。点击启智社区官网使用
Requirements: torch=2.1 torch_npu=2.1	昇腾社区	torch_npu在社区文档中获取
三方库: Transformers=4.43.2	HuggingFace社区	昇腾原生适配的三方库在社区文档中提供了清单。若有更多三方库适配需求，可在社区论坛中提出，欢迎反馈
模型	天工 HF社区上传	最近一周o1模型开源集中爆发，天工发布了Skywork-o1-Open-Llama-3.1-8B 可直接在HF社区下载
下载工具	HF-Mirror	国内在HF下载受阻，参照HF-Mirror设置环境变量即可加速下载

快速实践

1. 创建一个NPU环境

启智社区的使用请参考社区文档。

启智社区 --> 个人中心 --> 云脑任务 --> 新建云脑任务 --> 调试任务 --> 昇腾NPU

资源规格选择NPU: 1*Ascend-D910(显存: 32GB), CPU: 20, 内存: 60GB

镜像选择torch-npu-cann8-debug

2. 依赖安装

镜像中提供的torch是2.1.0版本，匹配的Transformers是4.41.2。

但我们需要运行的Skywork-o1-Open-Llama-3.1-8B入参校验需要的Transformers版本至少要求4.43.2以上，所以需要更新一下。

pip install transformers==4.43.2

为加快加快速度，需设置环境变量

export HF_ENDPOINT=https://hf-mirror.com

3. 启动脚本

导入依赖

import torch
import torch_npu
from transformers import AutoModelForCausalLM, AutoTokenizer

设定问答模板

system_prompt: 你是Skywork-o1，Skywork AI开发的思维模型，擅长通过深入思考解决涉及数学、编码和逻辑推理的复杂问题。当面对用户的请求时，你首先要进行漫长而深入的思考过程，以探索问题的可能解决方案。完成你的想法后，你在回复中提供对解决方案过程的详细说明。

problem: Jane有12个苹果。她把4个苹果给了她的朋友Mark，然后又买了1个苹果，最后把所有的苹果平均分给了她自己和2个兄弟姐妹。请问最后每人得到多少个苹果？

system_prompt = """You are Skywork-o1, a thinking model developed by Skywork AI, specializing in solving complex problems involving mathematics, coding, and logical reasoning through deep thought. When faced with a user's request, you first engage in a lengthy and in-depth thinking process to explore possible solutions to the problem. After completing your thoughts, you then provide a detailed explanation of the solution process in your response."""

# An Example Case
problem = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"

user_message = problem

conversation = [
    {
        "role": "system",
        "content": system_prompt
    },
    {
        "role": "user", 
        "content": user_message
    }
]

指定模型为"Skywork/Skywork-o1-Open-Llama-3.1-8B"

model_name = "Skywork-o1-Open-Llama3.1-8B"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained(model_name)

input_ids = tokenizer.apply_chat_template(
    conversation, 
    tokenize=True, 
    add_generation_prompt=True,
    return_tensors="pt").to(model.device)

完成推理

generation = model.generate(
    input_ids=input_ids,
    max_new_tokens=2048,
    do_sample=False,
    pad_token_id=128009,
    temperature=0)

completion = tokenizer.decode(
    generation[0][len(input_ids[0]):], 
    skip_special_tokens=True, 
    clean_up_tokenization_spaces=True)

print(completion)

4. 输出流

Loading checkpoint shards:   0%|                                                                                                                            
Loading checkpoint shards:  25%|████████████████████████████████████████▌                                                                                   
Loading checkpoint shards:  50%|█████████████████████████████████████████████████████████████████████████████████                                           
Loading checkpoint shards:  75%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌  
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00,  1.01it/s]
/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:567: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
  warnings.warn(
/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:572: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
  warnings.warn(
The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
[W NeKernelNpu.cpp:28] Warning: The oprator of ne is executed, Currently High Accuracy but Low Performance OP with 64-bit has been used, Please Do Some Cast at Python Functions with 32-bit for Better Performance! (function operator())
[W AddKernelNpu.cpp:82] Warning: The oprator of add is executed, Currently High Accuracy but Low Performance OP with 64-bit has been used, Please Do Some Cast at Python Functions with 32-bit for Better Performance! (function operator())

5. 推理结果

可以看到完整的思维链过程，按照题干分三步完成了推理。

To solve the problem, let's break it down into a series of logical steps:
1. **Initial Number of Apples**: Jane starts with 12 apples.
2. **Apples Given Away**: Jane gives 4 apples to her friend Mark. So, the number of apples she has now is:
   \[
   12 - 4 = 8
   \]
3. **Apples Bought**: Jane then buys 1 more apple. So, the number of apples she has now is:
   \[
   8 + 1 = 9
   \]
4. **Apples Split Equally**: Jane splits all her apples equally among herself and her 2 siblings. This means the apples are divided among 3 people. So, the number of apples each person gets is:
   \[
   \frac{9}{3} = 3
   \]

Therefore, each person gets \(\boxed{3}\) apples.