Anthropic

Claude 4 Opus (Non-reasoning)

Unknown Size

By Anthropic • Released 2025-05-22

Capability Radar

Avg Score
51

Across all benchmarks

Participated
11
Benchmarks

Benchmark Performance

Benchmark Category Score
MATH-500
Reasoning
94.1
MMLU-Pro
Knowledge
86
GPQA Diamond
Knowledge
70.1
SWE-bench (Bash Only)
Coding Agent
67.6
LiveCodeBench
Coding
54.2
IFBench
Agent
43.3
SciCode
Reasoning Knowledge
40.9
AIME 2025
Reasoning
36.3
LCR
Long-Context Reasoning
36
Artificial Analysis Intelligence Index
Knowledge
22.2
HLE
Knowledge Multi-Modal
5.9