DeepSeek

DeepSeek V3.1 Terminus (Reasoning)

Unknown Size

By DeepSeek • Released 2025-09-22

Capability Radar

Avg Score
54

Across all benchmarks

Participated
12
Benchmarks

Benchmark Performance

Benchmark Category Score
AIME 2025
Reasoning
89.7
MMLU-Pro
Knowledge
85.1
LiveCodeBench
Coding
79.8
GPQA Diamond
Knowledge
79.2
LCR
Long-Context Reasoning
65
IFBench
Agent
57
SciCode
Reasoning Knowledge
40.6
𝜏²-Bench Telecom
Reasoning Knowledge
37.1
Artificial Analysis Intelligence Index
Knowledge
33.8
Artificial Analysis Coding Index
Coding
33.7
Terminal-Bench Hard
Agent Coding
30.3
HLE
Knowledge Multi-Modal
15.2