DeepSeek V3.1 (Reasoning)

Unknown Size

By DeepSeek • Released 2025-08-21

Capability Radar

Avg Score

50

Across all benchmarks

Participated

12

Benchmarks

Benchmark Performance

Benchmark	Category	Score
AIME 2025	Reasoning	89.7
MMLU-Pro	Knowledge	85.1
LiveCodeBench	Coding	78.4
GPQA Diamond	Knowledge	77.9
LCR	Long-Context Reasoning	53.3
IFBench	Agent	41.5
SciCode	Reasoning Knowledge	39.1
𝜏²-Bench Telecom	Reasoning Knowledge	37.4
Artificial Analysis Coding Index	Coding	29.7
Artificial Analysis Intelligence Index	Knowledge	27.6
Terminal-Bench Hard	Agent Coding	25
HLE	Knowledge Multi-Modal	13