Anthropic

Claude 4 Sonnet (Reasoning)

Unknown Size

By Anthropic • Released 2025-05-22

Capability Radar

Avg Score
57

Across all benchmarks

Participated
13
Benchmarks

Benchmark Performance

Benchmark Category Score
MATH-500
Reasoning
99.1
MMLU-Pro
Knowledge
84.2
GPQA Diamond
Knowledge
77.7
AIME 2025
Reasoning
74.3
LiveCodeBench
Coding
65.5
LCR
Long-Context Reasoning
64.7
𝜏²-Bench Telecom
Reasoning Knowledge
64.6
IFBench
Agent
54.7
SciCode
Reasoning Knowledge
40
Artificial Analysis Intelligence Index
Knowledge
38.6
Artificial Analysis Coding Index
Coding
34.1
Terminal-Bench Hard
Agent Coding
31.1
HLE
Knowledge Multi-Modal
9.6