DeepSeek

DeepSeek V3.1 (Reasoning)

Unknown Size

By DeepSeek • Released 2025-08-21

Capability Radar

Avg Score
50

Across all benchmarks

Participated
12
Benchmarks

Benchmark Performance

Benchmark Category Score
AIME 2025
Reasoning
89.7
MMLU-Pro
Knowledge
85.1
LiveCodeBench
Coding
78.4
GPQA Diamond
Knowledge
77.9
LCR
Long-Context Reasoning
53.3
IFBench
Agent
41.5
SciCode
Reasoning Knowledge
39.1
𝜏²-Bench Telecom
Reasoning Knowledge
37.4
Artificial Analysis Coding Index
Coding
29.7
Artificial Analysis Intelligence Index
Knowledge
27.6
Terminal-Bench Hard
Agent Coding
25
HLE
Knowledge Multi-Modal
13