Alibaba

Qwen3 Max Thinking (Preview)

Unknown Size

By Alibaba • Released 2025-11-03

Capability Radar

Avg Score
51

Across all benchmarks

Participated
12
Benchmarks

Benchmark Performance

Benchmark Category Score
𝜏²-Bench Telecom
Reasoning Knowledge
83.6
MMLU-Pro
Knowledge
82.4
AIME 2025
Reasoning
82.3
GPQA Diamond
Knowledge
77.6
LCR
Long-Context Reasoning
57.7
IFBench
Agent
53.8
LiveCodeBench
Coding
53.5
SciCode
Reasoning Knowledge
38.7
Artificial Analysis Intelligence Index
Knowledge
32.5
Artificial Analysis Coding Index
Coding
24.5
Terminal-Bench Hard
Agent Coding
17.4
HLE
Knowledge Multi-Modal
12