Papers
arxiv:2501.12948

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Published on Jan 22
ยท Submitted by AK on Jan 23
#1 Paper of the day
ยท deepseek-ai DeepSeek
Authors:
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

DeepSeek-R1-Zero and DeepSeek-R1 utilize reinforcement learning and multi-stage training to enhance reasoning capabilities, with DeepSeek-R1 achieving performance comparable to OpenAI-o1-1217.

AI-generated summary

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.

Community

Paper submitter

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

A written and video review - https://aipapersacademy.com/deepseek-r1/

Here is the Ajith's AI Pulse article on this paper : https://ajithp.com/2025/01/26/deepseek-r1-ai-reasoning/

Bookmark of GRPO Equation Latex Code and Term Explanation

image.png

Equation: http://www.deepnlp.org/equation/group-relative-policy-optimization-grpo
Equation Search Engine (http://www.deepnlp.org/search/equation) and Paper related AI Agents List (http://www.deepnlp.org/store/ai-agent)

Sign up or log in to comment

Models citing this paper 274

Browse 274 models citing this paper

Datasets citing this paper 9

Browse 9 datasets citing this paper

Spaces citing this paper 2,381

Collections including this paper 112