阿里云发布全新数学推理模型Qwen2.5-Math-PRM,7B 版本超越 GPT-4o

今天,阿里云通义团队正式发布了全新的数学推理过程奖励模型 Qwen2.5-Math-PRM。该模型提供了72B 和7B 两种尺寸,性能表现均显著优于同类的开源过程奖励模型,尤其是在识别推理错误方面表现突出。

Qwen2.5-Math-PRM 的7B 版本令人惊讶地超越了业界广受欢迎的 GPT-4o,这一成就标志着阿里云在推理模型的研发上迈出了重要的一步。为了全面评估模型在数学推理中的表现,通义团队还开源了首个步骤级的评估标准 ——ProcessBench。这个评估标准涵盖了3400个数学问题测试案例,其中还包括国际奥林匹克数学竞赛的难度题目,每个案例均由人类专家标注了详细的推理过程,确保评估的科学性和全面性。

在这里插入图片描述
通过对 Qwen2.5-Math-PRM 在 ProcessBench 上的表现评估,研究团队发现,不论是72B 还是7B 尺寸的模型,均表现出色。特别是7B 版本,不仅超越了同尺寸的开源模型,甚至在某些方面还超过了闭源的 GPT-4o-0806。这证明了过程奖励模型(PRM)在提高推理可靠性方面的巨大潜力,并为未来推理过程监督技术的发展提供了新的思路。

在这里插入图片描述
阿里云通义团队的这项创新性工作,不仅推动了人工智能推理技术的进步,也为行业内其他开发者提供了宝贵的参考。通过开源的方式,通义团队希望能够与更多研究者共享经验,推动整个行业的技术进步。

快速上手

这是必须的,因为 transformers4.37.0 起就集成了 Qwen2.5 代码。

import torch
from transformers import AutoModel, AutoTokenizer
import torch.nn.functional as F


def make_step_rewards(logits, token_masks):
    probabilities = F.softmax(logits, dim=-1)
    probabilities = probabilities * token_masks.unsqueeze(-1) # bs, seq_len, num_labels
    
    all_scores_res = []
    for i in range(probabilities.size(0)):
        sample = probabilities[i] # seq_len, num_labels
        positive_probs = sample[sample != 0].view(-1, 2)[:, 1] # valid_tokens, num_labels
        non_zero_elements_list = positive_probs.cpu().tolist()
        all_scores_res.append(non_zero_elements_list)
    return all_scores_res


model_name = "Qwen/Qwen2.5-Math-PRM-7B"
device = "auto"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(
    model_name, 
    device_map=device, 
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
).eval()


data = {
    "system": "Please reason step by step, and put your final answer within \boxed{}.",
    "query": "Sue lives in a fun neighborhood.  One weekend, the neighbors decided to play a prank on Sue.  On Friday morning, the neighbors placed 18 pink plastic flamingos out on Sue's front yard.  On Saturday morning, the neighbors took back one third of the flamingos, painted them white, and put these newly painted white flamingos back out on Sue's front yard.  Then, on Sunday morning, they added another 18 pink plastic flamingos to the collection. At noon on Sunday, how many more pink plastic flamingos were out than white plastic flamingos?",
    "response": [
      "To find out how many more pink plastic flamingos were out than white plastic flamingos at noon on Sunday, we can break down the problem into steps. First, on Friday, the neighbors start with 18 pink plastic flamingos.",
      "On Saturday, they take back one third of the flamingos. Since there were 18 flamingos, (1/3 \times 18 = 6) flamingos are taken back. So, they have (18 - 6 = 12) flamingos left in their possession. Then, they paint these 6 flamingos white and put them back out on Sue's front yard. Now, Sue has the original 12 pink flamingos plus the 6 new white ones. Thus, by the end of Saturday, Sue has (12 + 6 = 18) pink flamingos and 6 white flamingos.",
      "On Sunday, the neighbors add another 18 pink plastic flamingos to Sue's front yard. By the end of Sunday morning, Sue has (18 + 18 = 36) pink flamingos and still 6 white flamingos.",
      "To find the difference, subtract the number of white flamingos from the number of pink flamingos: (36 - 6 = 30). Therefore, at noon on Sunday, there were 30 more pink plastic flamingos out than white plastic flamingos. The answer is (\boxed{30})."
    ]
}

messages = [
    {"role": "system", "content": data['system']},
    {"role": "user", "content": data['query']},
    {"role": "assistant", "content": "<extra_0>".join(data['response']) + "<extra_0>"},
]
conversation_str = tokenizer.apply_chat_template(
    messages, 
    tokenize=False, 
    add_generation_prompt=False
)

input_ids = tokenizer.encode(
    conversation_str, 
    return_tensors="pt", 
).to(model.device)

outputs = model(input_ids=input_ids)

step_sep_id = tokenizer.encode("<extra_0>")[0]
token_masks = (input_ids == step_sep_id)
step_reward = make_step_rewards(outputs[0], token_masks)
print(step_reward)  # [[1.0, 0.1904296875, 0.9765625, 1.0]]

### 部署 Qwen2.5-VL-7B 模型的方法 为了成功部署 Qwen2.5-VL-7B 模型,可以按照以下方式操作: #### 下载模型文件 首先需要下载 Qwen2.5-VL-7B 的权重文件到本地目录。可以通过命令行工具完成此操作,具体命令如下所示[^1]: ```bash modelscope download --model Qwen/Qwen2.5-VL-7B-Instruct --local_dir ./Qwen2___5-VL-7B-Instruct ``` 该命令会将指定的模型及其依赖项保存至 `./Qwen2___5-VL-7B-Instruct` 文件夹下。 #### 使用 vLLM 加速推理 对于实际应用中的高效推理需求,建议采用 **vLLM** 工具来优化性能。vLLM 是一种专门针对大语言模型设计的高性能推理框架,其核心优势在于减少延迟并提升吞吐量[^2]。 以下是基于 vLLM 部署 Qwen2.5-VL-7B 模型的主要流程概述: 1. 安装必要的环境和库支持; 2. 将已下载的模型加载入内存; 3. 调整超参数以适应目标硬件配置; 4. 启动服务端口供客户端调用接口访问。 通过上述步骤,我们可以实现对 Qwen2.5-VL-7B 模型的有效部署,并利用 vLLM 提升整体运行效率。 更多细节可参考官方文档或相关资源链接获取最新指导说明。 ```python from vllm import LLM, SamplingParams # 初始化模型实例 llm = LLM("./Qwen2___5-VL-7B-Instruct") # 设置采样参数 sampling_params = SamplingParams(temperature=0.8, top_p=0.9) # 执行预测任务 outputs = llm.generate(["你好"], sampling_params=sampling_params) for output in outputs: print(output.outputs[0].text) ``` 以上代码片段展示了如何借助 vLLM 来初始化模型以及执行简单的文本生成任务。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值