Anthropic

Claude 4 Sonnet (Non-reasoning)

Unknown Size

By Anthropic • Released 2025-05-22

Capability Radar

Avg Score
48

Across all benchmarks

Participated
14
Benchmarks

Benchmark Performance

Benchmark Category Score
MATH-500
Reasoning
93.4
MMLU-Pro
Knowledge
83.7
GPQA Diamond
Knowledge
68.3
SWE-bench (Bash Only)
Coding Agent
64.93
𝜏²-Bench Telecom
Reasoning Knowledge
52.3
IFBench
Agent
45.4
LiveCodeBench
Coding
44.9
LCR
Long-Context Reasoning
44.3
AIME 2025
Reasoning
38
SciCode
Reasoning Knowledge
37.3
Artificial Analysis Intelligence Index
Knowledge
33
Artificial Analysis Coding Index
Coding
30.6
Terminal-Bench Hard
Agent Coding
27.3
HLE
Knowledge Multi-Modal
4