OpenAI

GPT-4.1

Unknown Size

By OpenAI • Released 2025-04-14

Capability Radar

Avg Score
45

Across all benchmarks

Participated
15
Benchmarks

Benchmark Performance

Benchmark Category Score
MATH-500
Reasoning
91.3
MMLU-Pro
Knowledge
80.6
GPQA Diamond
Knowledge
66.6
LCR
Long-Context Reasoning
61
τ-bench
Agent Knowledge
54.7
𝜏²-Bench Telecom
Reasoning Knowledge
47.1
LiveCodeBench
Coding
45.7
IFBench
Agent
43
SWE-bench (Bash Only)
Coding Agent
39.58
SciCode
Reasoning Knowledge
38.1
AIME 2025
Reasoning
34.7
Artificial Analysis Intelligence Index
Knowledge
25.6
Artificial Analysis Coding Index
Coding
21.8
Terminal-Bench Hard
Agent Coding
13.6
HLE
Knowledge Multi-Modal
4.6