Anthropic

Claude 3.7 Sonnet (Reasoning)

Unknown Size

By Anthropic • Released 2025-02-24

Capability Radar

Avg Score
51

Across all benchmarks

Participated
13
Benchmarks

Benchmark Performance

Benchmark Category Score
MATH-500
Reasoning
94.7
MMLU-Pro
Knowledge
83.7
GPQA Diamond
Knowledge
77.2
LCR
Long-Context Reasoning
60.7
AIME 2025
Reasoning
56.3
𝜏²-Bench Telecom
Reasoning Knowledge
54.7
IFBench
Agent
48.3
LiveCodeBench
Coding
47.3
SciCode
Reasoning Knowledge
40.3
Artificial Analysis Intelligence Index
Knowledge
34.6
Artificial Analysis Coding Index
Coding
27.6
Terminal-Bench Hard
Agent Coding
21.2
HLE
Knowledge Multi-Modal
10.3