Anthropic

Claude 4.5 Sonnet (Reasoning)

Unknown Size

By Anthropic • Released 2025-09-29

Capability Radar

Avg Score
59

Across all benchmarks

Participated
12
Benchmarks

Benchmark Performance

Benchmark Category Score
AIME 2025
Reasoning
88
MMLU-Pro
Knowledge
87.5
GPQA Diamond
Knowledge
83.4
𝜏²-Bench Telecom
Reasoning Knowledge
78.1
LiveCodeBench
Coding
71.4
LCR
Long-Context Reasoning
65.7
IFBench
Agent
57.3
SciCode
Reasoning Knowledge
44.7
Artificial Analysis Intelligence Index
Knowledge
42.9
Artificial Analysis Coding Index
Coding
38.6
Terminal-Bench Hard
Agent Coding
35.6
HLE
Knowledge Multi-Modal
17.3