xAI

Grok 3

Unknown Size

By xAI • Released 2025-02-19

Capability Radar

Avg Score
45

Across all benchmarks

Participated
13
Benchmarks

Benchmark Performance

Benchmark Category Score
MATH-500
Reasoning
87
MMLU-Pro
Knowledge
79.9
GPQA Diamond
Knowledge
69.3
AIME 2025
Reasoning
58
LCR
Long-Context Reasoning
54.7
𝜏²-Bench Telecom
Reasoning Knowledge
48.8
IFBench
Agent
46.9
LiveCodeBench
Coding
42.5
SciCode
Reasoning Knowledge
36.8
Artificial Analysis Intelligence Index
Knowledge
25
Artificial Analysis Coding Index
Coding
19.8
Terminal-Bench Hard
Agent Coding
11.4
HLE
Knowledge Multi-Modal
5.1