DeepSeek V3.1 Terminus (Non-reasoning)

Unknown Size

By DeepSeek • Released 2025-09-22

Capability Radar

Avg Score

43

Across all benchmarks

Participated

12

Benchmarks

Benchmark Performance

Benchmark	Category	Score
MMLU-Pro	Knowledge	83.6
GPQA Diamond	Knowledge	75.1
AIME 2025	Reasoning	53.7
LiveCodeBench	Coding	52.9
LCR	Long-Context Reasoning	43.3
IFBench	Agent	41.2
𝜏²-Bench Telecom	Reasoning Knowledge	37.1
SciCode	Reasoning Knowledge	32.1
Artificial Analysis Coding Index	Coding	31.9
Terminal-Bench Hard	Agent Coding	31.8
Artificial Analysis Intelligence Index	Knowledge	28.4
HLE	Knowledge Multi-Modal	8.4