Anthropic

Claude 4.1 Opus (Reasoning)

Unknown Size

By Anthropic • Released 2025-08-05

Capability Radar

Avg Score
55

Across all benchmarks

Participated
12
Benchmarks

Benchmark Performance

Benchmark Category Score
MMLU-Pro
Knowledge
88
GPQA Diamond
Knowledge
80.9
AIME 2025
Reasoning
80.3
𝜏²-Bench Telecom
Reasoning Knowledge
71.4
LCR
Long-Context Reasoning
66.3
LiveCodeBench
Coding
65.4
IFBench
Agent
55.4
SciCode
Reasoning Knowledge
40.9
Artificial Analysis Coding Index
Coding
36.5
Terminal-Bench Hard
Agent Coding
34.3
Artificial Analysis Intelligence Index
Knowledge
31.9
HLE
Knowledge Multi-Modal
11.9