Anthropic

Claude 4 Opus (Reasoning)

Unknown Size

By Anthropic • Released 2025-05-22

Capability Radar

Avg Score
54

Across all benchmarks

Participated
13
Benchmarks

Benchmark Performance

Benchmark Category Score
MATH-500
Reasoning
98.2
MMLU-Pro
Knowledge
87.3
GPQA Diamond
Knowledge
79.6
AIME 2025
Reasoning
73.3
𝜏²-Bench Telecom
Reasoning Knowledge
70.5
LiveCodeBench
Coding
63.6
IFBench
Agent
53.7
SciCode
Reasoning Knowledge
39.8
Artificial Analysis Coding Index
Coding
34
LCR
Long-Context Reasoning
33.7
Terminal-Bench Hard
Agent Coding
31.1
Artificial Analysis Intelligence Index
Knowledge
27.4
HLE
Knowledge Multi-Modal
11.7