DeepSeek

DeepSeek R1 (Jan '25)

Unknown Size

By DeepSeek • Released 2025-01-20

Capability Radar

Avg Score
44

Across all benchmarks

Participated
13
Benchmarks

Benchmark Performance

Benchmark Category Score
MATH-500
Reasoning
96.6
MMLU-Pro
Knowledge
84.4
GPQA Diamond
Knowledge
70.8
AIME 2025
Reasoning
68
LiveCodeBench
Coding
61.7
LCR
Long-Context Reasoning
52.3
IFBench
Agent
39
SciCode
Reasoning Knowledge
35.7
Artificial Analysis Intelligence Index
Knowledge
18.8
Artificial Analysis Coding Index
Coding
15.9
𝜏²-Bench Telecom
Reasoning Knowledge
11.4
HLE
Knowledge Multi-Modal
9.3
Terminal-Bench Hard
Agent Coding
6.1