Kimi

Kimi K2 Thinking

Unknown Size

By Kimi • Released 2025-11-06

Capability Radar

Avg Score
62

Across all benchmarks

Participated
13
Benchmarks

Benchmark Performance

Benchmark Category Score
AIME 2025
Reasoning
94.7
𝜏²-Bench Telecom
Reasoning Knowledge
93
LiveCodeBench
Coding
85.3
MMLU-Pro
Knowledge
84.8
GPQA Diamond
Knowledge
83.8
IFBench
Agent
68.1
LCR
Long-Context Reasoning
66.3
SWE-bench (Bash Only)
Coding Agent
63.4
SciCode
Reasoning Knowledge
42.4
Artificial Analysis Intelligence Index
Knowledge
40.7
Artificial Analysis Coding Index
Coding
34.8
Terminal-Bench Hard
Agent Coding
31.1
HLE
Knowledge Multi-Modal
22.3