Mistral

Devstral Small (Jul '25)

Unknown Size

By Mistral • Released 2025-07-10

Capability Radar

Avg Score
30

Across all benchmarks

Participated
14
Benchmarks

Benchmark Performance

Benchmark Category Score
MATH-500
Reasoning
63.5
MMLU-Pro
Knowledge
62.2
SWE-bench (Bash Only)
Coding Agent
56.4
GPQA Diamond
Knowledge
41.4
IFBench
Agent
34.6
AIME 2025
Reasoning
29.3
𝜏²-Bench Telecom
Reasoning Knowledge
28.4
LiveCodeBench
Coding
25.4
SciCode
Reasoning Knowledge
24.3
LCR
Long-Context Reasoning
17
Artificial Analysis Intelligence Index
Knowledge
15.2
Artificial Analysis Coding Index
Coding
12.1
Terminal-Bench Hard
Agent Coding
6.1
HLE
Knowledge Multi-Modal
3.7