OpenAI

o3

Unknown Size

By OpenAI • Released 2025-04-16

Capability Radar

Avg Score
64

Across all benchmarks

Participated
14
Benchmarks

Benchmark Performance

Benchmark Category Score
MATH-500
Reasoning
99.2
AIME 2025
Reasoning
88.3
MMLU-Pro
Knowledge
85.3
GPQA Diamond
Knowledge
82.7
LiveCodeBench
Coding
80.8
𝜏²-Bench Telecom
Reasoning Knowledge
80.7
IFBench
Agent
71.4
LCR
Long-Context Reasoning
69.3
SWE-bench (Bash Only)
Coding Agent
58.4
SciCode
Reasoning Knowledge
41
Artificial Analysis Coding Index
Coding
38.4
Artificial Analysis Intelligence Index
Knowledge
38.3
Terminal-Bench Hard
Agent Coding
37.1
HLE
Knowledge Multi-Modal
20