Anthropic

Claude 4.5 Sonnet (Non-reasoning)

Unknown Size

By Anthropic • Released 2025-09-29

Capability Radar

Avg Score
52

Across all benchmarks

Participated
14
Benchmarks

Benchmark Performance

Benchmark Category Score
MMLU-Pro
Knowledge
86
τ-bench
Agent Knowledge
84.7
GPQA Diamond
Knowledge
72.7
SWE-bench (Bash Only)
Coding Agent
70.6
𝜏²-Bench Telecom
Reasoning Knowledge
70.5
LiveCodeBench
Coding
59
LCR
Long-Context Reasoning
51.3
SciCode
Reasoning Knowledge
42.8
IFBench
Agent
42.7
Artificial Analysis Intelligence Index
Knowledge
37.1
AIME 2025
Reasoning
37
Artificial Analysis Coding Index
Coding
33.5
Terminal-Bench Hard
Agent Coding
28.8
HLE
Knowledge Multi-Modal
7.1