Alibaba

Qwen3 Max Thinking

Unknown Size

By Alibaba • Released 2026-01-26

Capability Radar

Avg Score
55

Across all benchmarks

Participated
10
Benchmarks

Benchmark Performance

Benchmark Category Score
GPQA Diamond
Knowledge
86.1
𝜏²-Bench Telecom
Reasoning Knowledge
83.6
τ-bench
Agent Knowledge
82.2
IFBench
Agent
70.7
LCR
Long-Context Reasoning
66
SciCode
Reasoning Knowledge
43.1
Artificial Analysis Intelligence Index
Knowledge
39.7
Artificial Analysis Coding Index
Coding
30.5
HLE
Knowledge Multi-Modal
26.2
Terminal-Bench Hard
Agent Coding
24.2