DeepSeek R1 (Jan '25)

Unknown Size

By DeepSeek • Released 2025-01-20

Capability Radar

Avg Score

44

Across all benchmarks

Participated

13

Benchmarks

Benchmark Performance

Benchmark	Category	Score
MATH-500	Reasoning	96.6
MMLU-Pro	Knowledge	84.4
GPQA Diamond	Knowledge	70.8
AIME 2025	Reasoning	68
LiveCodeBench	Coding	61.7
LCR	Long-Context Reasoning	52.3
IFBench	Agent	39
SciCode	Reasoning Knowledge	35.7
Artificial Analysis Intelligence Index	Knowledge	18.8
Artificial Analysis Coding Index	Coding	15.9
𝜏²-Bench Telecom	Reasoning Knowledge	11.4
HLE	Knowledge Multi-Modal	9.3
Terminal-Bench Hard	Agent Coding	6.1