Anthropic

Claude 3.5 Sonnet (Oct '24)

Unknown Size

By Anthropic • Released 2024-10-22

Capability Radar

Avg Score
42

Across all benchmarks

Participated
8
Benchmarks

Benchmark Performance

Benchmark Category Score
MMLU-Pro
Knowledge
77.2
MATH-500
Reasoning
77.1
GPQA Diamond
Knowledge
59.9
LiveCodeBench
Coding
38.1
SciCode
Reasoning Knowledge
36.6
Artificial Analysis Coding Index
Coding
30.2
Artificial Analysis Intelligence Index
Knowledge
15.9
HLE
Knowledge Multi-Modal
3.9