DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via RL

The authors present their first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero shows impressive reasoning capabilities trained through reinforcement learning. However, it faces challenges like poor readability and language mixing. To improve performance, DeepSeek-R1 incorporates multi-stage training and cold-start data. It achieves results comparable to OpenAI-o1-1217 on reasoning tasks. The models, including six dense versions, are open-sourced for the research community. The models are based on Qwen and Llama and aim to provide powerful reasoning capabilities while addressing usability issues. (Note: The content lists a large number of authors, demonstrating a collaborative effort.)

https://arxiv.org/abs/2501.12948

To top