Thinkless 项目最佳实践教程

符卿玺

于 2025-05-27 09:01:11 发布

阅读量351

点赞数 5

本文链接：https://blog.youkuaiyun.com/gitblog_00568/article/details/148244934

版权

Thinkless 项目最佳实践教程

Thinkless [Preprint 2025] Thinkless: LLMs Learn When to Think 项目地址: https://gitcode.com/gh_mirrors/th/Thinkless

1. 项目介绍

Thinkless 是一个开源项目，旨在通过强化学习使大型语言模型（LLM）能够自适应地在简短回答和详细推理之间进行选择。该项目由 National University of Singapore 的 xML Lab 提出，核心是一个名为 Decoupled Group Relative Policy Optimization（DeGRPO）的算法。通过训练，Thinkless 可以在多个基准测试中显著提高推理语言模型的计算效率。

2. 项目快速启动

在开始之前，请确保您的系统中已安装了以下依赖：

Python 3.10
PyTorch
lm_eval
Ray

以下是快速启动项目的步骤：

# 创建并激活虚拟环境
conda create -n thinkless python==3.10
conda activate thinkless

# 克隆项目仓库
git clone https://github.com/VainF/Thinkless.git
cd Thinkless

# 安装依赖
pip install torch==2.4.0 lm_eval==0.4.8 ray==2.45.0
pip install -e ./verl
pip install -e .

# 加载模型和分词器
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Vinnnf/Thinkless-1.5B-RL-DeepScaleR"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)

# 使用模型生成回答
instruction = "Please reason step by step, and put your final answer within \\boxed{}."
prompt = "The arithmetic mean of 7, 2, $x$ and 10 is 9. What is the value of $x$?"
messages = [{"role": "user", "content": f"{instruction}\n{prompt}"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(**model_inputs, max_new_tokens=16384, do_sample=True, temperature=0.6, top_p=0.95)
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(text + response)