Home

TECHNICAL PAPERS

DeepSeek: Improving Language Model Reasoning Capabilities Using Pure Reinforcement Learning

January 27th, 2025 - By: Technical Paper Link

A new technical paper titled “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning” was published by DeepSeek.

Abstract:
“We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.”

Find the technical paper here. January 2025.

Guo, Daya, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu et al. “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.” arXiv preprint arXiv:2501.12948 (2025).

Knowledge Centers
Entities, people and technologies explored

Startup Funding: Q1 2025

AI chips and data center communications see big funding; 75 startups raise $2 billion.

by Jesse Allen

Advanced Packaging Fundamentals for Semiconductor Engineers

New SE eBook examines the next phase of semiconductor design, testing, and manufacturing.

by Bryon Moyer

Chip Industry Week in Review

AI export rule to be scrapped; SEMI, EU request; Cadence, Nvidia supercomputer; AI co-processor; Imagination's new GPU; semi sales up; imec, TNO photonics lab; NSF key to national security; flexible packaging control system; SiConic test engineering; USB 4 support; SiC JFETS; magnetic behavior in hematite.

by The SE Staff

Chip Industry Week in Review

EDA export controls; Synopsys-Ansys divest requirements; SIA Factbook; McKinsey effects of tariffs; ASE's fan-out bridge; earnings; TSMC's design center; China's legacy chips play; AMD's optical acquisition.

by The SE Staff

What Exactly Are Chiplets And Heterogeneous Integration?

New technologies drive new terminology, but the early days for those new approaches can be very confusing.

by Bryon Moyer

Big Changes Ahead For Interposers And Substrates

New materials and processes will help with power distribution and thermal dissipation in advanced packages.

by Gregory Haley

RISC-V’s Increasing Influence

Does the world need another CPU architecture when that no longer reflects the typical workload? Perhaps not, but it may need a bridge to get to where it needs to be.

by Brian Bailey

Chip Industry Week in Review

IC, AI global ranking; China's fully automated IC design system; Micron goes bigger; PCIe 7.0 spec; TSMC-Tokyo joint lab; panel-level packaging win; first neuromorphic compute system; GAA forksheets; AMD's new GPUs.

by The SE Staff

DeepSeek: Improving Language Model Reasoning Capabilities Using Pure Reinforcement Learning

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Chip Industry Week in Review

What Exactly Are Chiplets And Heterogeneous Integration?

Big Changes Ahead For Interposers And Substrates

RISC-V’s Increasing Influence

Chip Industry Week in Review

Sponsors

Recent Comments

About

Navigation

Connect With Us

DeepSeek: Improving Language Model Reasoning Capabilities Using Pure Reinforcement Learning

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Chip Industry Week in Review

What Exactly Are Chiplets And Heterogeneous Integration?

Big Changes Ahead For Interposers And Substrates

RISC-V’s Increasing Influence

Chip Industry Week in Review

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored