DeepSeek-R1 论文解析
1. 论文基本信息
标题:DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
作者:DeepSeek-AI团队(联系邮箱:research@deepseek.com)
发表时间与出处:2024年,AIME 2024(人工智能与数学教育国际会议)
关键词:
- Reinforcement Learning (强化学习)
- Reasoning Models (推理模型)
- Chain-of-Thought (思维链)
- Model Distillation (模型蒸馏)
- Mixed Precision Training (混合精度训练)
- Self-Evolution (自我进化)
- Cold Start (冷启动)
- GRPO Algorithm (分组相对策略优化)
- SWE-Bench (软件工程基准测试)
注:论文聚焦强化学习在LLM推理能