OpenAI

GPT-5.1 Codex (high)

Unknown Size

By OpenAI • Released 2025-11-13

Capability Radar

Avg Score
63

Across all benchmarks

Participated
13
Benchmarks

Benchmark Performance

Benchmark Category Score
AIME 2025
Reasoning
95.7
GPQA Diamond
Knowledge
86
MMLU-Pro
Knowledge
86
LiveCodeBench
Coding
84.9
𝜏²-Bench Telecom
Reasoning Knowledge
83
IFBench
Agent
70
LCR
Long-Context Reasoning
67.3
SWE-bench (Bash Only)
Coding Agent
66
Artificial Analysis Intelligence Index
Knowledge
42.2
SciCode
Reasoning Knowledge
40.2
Artificial Analysis Coding Index
Coding
36.6
Terminal-Bench Hard
Agent Coding
34.8
HLE
Knowledge Multi-Modal
23.4