DeepSeek V3.1 Terminus (Reasoning)

Unknown Size

By DeepSeek • Released 2025-09-22

Capability Radar

Avg Score

54

Across all benchmarks

Participated

12

Benchmarks

Benchmark Performance

Benchmark	Category	Score
AIME 2025	Reasoning	89.7
MMLU-Pro	Knowledge	85.1
LiveCodeBench	Coding	79.8
GPQA Diamond	Knowledge	79.2
LCR	Long-Context Reasoning	65
IFBench	Agent	57
SciCode	Reasoning Knowledge	40.6
𝜏²-Bench Telecom	Reasoning Knowledge	37.1
Artificial Analysis Intelligence Index	Knowledge	33.8
Artificial Analysis Coding Index	Coding	33.7
Terminal-Bench Hard	Agent Coding	30.3
HLE	Knowledge Multi-Modal	15.2