Anthropic

Claude 3.7 Sonnet (Non-reasoning)

Unknown Size

By Anthropic • Released 2025-02-24

Capability Radar

Avg Score
45

Across all benchmarks

Participated
15
Benchmarks

Benchmark Performance

Benchmark Category Score
MATH-500
Reasoning
85
MMLU-Pro
Knowledge
80.3
GPQA Diamond
Knowledge
65.6
τ-bench
Agent Knowledge
61.8
SWE-bench (Bash Only)
Coding Agent
52.8
𝜏²-Bench Telecom
Reasoning Knowledge
50
LCR
Long-Context Reasoning
48.3
IFBench
Agent
44
LiveCodeBench
Coding
39.4
SciCode
Reasoning Knowledge
37.6
Artificial Analysis Intelligence Index
Knowledge
30.8
Artificial Analysis Coding Index
Coding
26.7
Terminal-Bench Hard
Agent Coding
21.2
AIME 2025
Reasoning
21
HLE
Knowledge Multi-Modal
4.8