Alibaba

Qwen3 Max (Preview)

Unknown Size

By Alibaba • Released 2025-09-05

Capability Radar

Avg Score
45

Across all benchmarks

Participated
12
Benchmarks

Benchmark Performance

Benchmark Category Score
MMLU-Pro
Knowledge
83.8
GPQA Diamond
Knowledge
76.4
AIME 2025
Reasoning
75
LiveCodeBench
Coding
65.1
IFBench
Agent
48
LCR
Long-Context Reasoning
39.7
SciCode
Reasoning Knowledge
37
𝜏²-Bench Telecom
Reasoning Knowledge
32.7
Artificial Analysis Intelligence Index
Knowledge
25.9
Artificial Analysis Coding Index
Coding
25.5
Terminal-Bench Hard
Agent Coding
19.7
HLE
Knowledge Multi-Modal
9.3