DeepSeek

DeepSeek V3.1 (Non-reasoning)

Unknown Size

By DeepSeek • Released 2025-08-21

Capability Radar

Avg Score
42

Across all benchmarks

Participated
12
Benchmarks

Benchmark Performance

Benchmark Category Score
MMLU-Pro
Knowledge
83.3
GPQA Diamond
Knowledge
73.5
LiveCodeBench
Coding
57.7
AIME 2025
Reasoning
49.7
LCR
Long-Context Reasoning
45
IFBench
Agent
37.8
SciCode
Reasoning Knowledge
36.7
𝜏²-Bench Telecom
Reasoning Knowledge
34.8
Artificial Analysis Coding Index
Coding
28.4
Artificial Analysis Intelligence Index
Knowledge
28
Terminal-Bench Hard
Agent Coding
24.2
HLE
Knowledge Multi-Modal
6.3