快手AI团队开源 KAT (Kwaipilot-AutoThink) 思考模型

吴脑的键客

于 2025-08-06 12:09:36 发布

阅读量570

点赞数 15

CC 4.0 BY-SA版权

分类专栏：人工智能文章标签：人工智能开源语言模型

本文链接：https://blog.youkuaiyun.com/weixin_41446370/article/details/149965147

人工智能专栏收录该内容

644 篇文章

订阅专栏

新闻

Kwaipilot-AutoThink 在专为防止数据泄露设计的挑战性基准测试 LiveCodeBench Pro 中位列所有开源模型榜首，甚至超越了 Seed 和 o3-mini 等强大的专有系统。

简介

KAT（快想-自动思考） 是一个开源大语言模型，通过学会判断何时生成显式推理链与何时直接回答，有效缓解了过度思考问题。

在这里插入图片描述

其发展遵循简洁的两阶段训练流程：

Stage	Core Idea	Key Techniques	Outcome
1. Pre-training	Inject knowledge while separating “reasoning” from “direct answering”.	Dual-regime data • Think-off queries labeled via a custom tagging system. • Think-on queries generated by a multi-agent solver. Knowledge Distillation + Multi-Token Prediction for fine-grained utility.	Base model attains strong factual and reasoning skills without full-scale pre-training costs.
2. Post-training	Make reasoning optional and efficient.	Cold-start AutoThink — majority vote sets the initial thinking mode. Step-SRPO — intermediate supervision rewards correct mode selection and answer accuracy under that mode.	Model triggers CoT only when beneficial, reducing token use and speeding inference.

在这里插入图片描述

数据格式

KAT以结构化模板生成响应，使推理路径清晰且机器可解析。
支持两种模式：

在这里插入图片描述

特殊标记

标记	描述
`<judge>`	分析输入以判断是否需要显式推理过程。
`<think_on>` / `<think_off>`	表示推理功能是否启用（“on”）或跳过（“off”）。
`<think>`	当选择`think_on`时，标记思维链片段的开始。
`<answer>`	标记最终面向用户答案的开始。

🔧 Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "Kwaipilot/KAT-V1-40B"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=65536,
    temperature=0.6,
    top_p=0.95,
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 
content = tokenizer.decode(output_ids, skip_special_tokens=True).strip("\n")
print("prompt:\n", prompt)
print("content:\n", content)
"""
prompt:
Give me a short introduction to large language model.
content:
<judge>
The user's request is to provide a concise factual introduction to large language models, which involves retrieving and summarizing basic information. This task is straightforward as it only requires recalling and presenting well-known details without deeper analysis. No complex reasoning is needed here—just a simple explanation will suffice.
</judge>

<think_off>
<answer>
A **Large Language Model (LLM)** is an advanced AI system trained on vast amounts of text data to understand, generate, and process human-like language. Here’s a concise introduction:

### Key Points:
1. **Training**: Trained on diverse text sources (books, websites, etc.) using deep learning.
2. **Capabilities**: 
   - Answer questions, generate text, summarize content, translate languages.
   - Understand context, sentiment, and nuances in language.
3. **Architecture**: Often based on **transformer models** (e.g., BERT, GPT, LLaMA).
4. **Scale**: Billions of parameters, requiring massive computational resources.
5. **Applications**: Chatbots, content creation, coding assistance, research, and more.

### Examples:
- **OpenAI’s GPT-4**: Powers ChatGPT.
- **Google’s Gemini**: Used in Bard.
- **Meta’s LLaMA**: Open-source alternative.

### Challenges:
- **Bias**: Can reflect biases in training data.
- **Accuracy**: May hallucinate "facts" not grounded in reality.
- **Ethics**: Raises concerns about misinformation and job displacement.

LLMs represent a leap forward in natural language processing, enabling machines to interact with humans in increasingly sophisticated ways. 🌐🤖
</answer>
"""