Alibaba

Qwen3 Max

Unknown Size

By Alibaba • Released 2025-09-23

Capability Radar

Avg Score
53

Across all benchmarks

Participated
13
Benchmarks

Benchmark Performance

Benchmark Category Score
MMLU-Pro
Knowledge
84.1
AIME 2025
Reasoning
80.7
LiveCodeBench
Coding
76.7
GPQA Diamond
Knowledge
76.4
𝜏²-Bench Telecom
Reasoning Knowledge
74.3
τ-bench
Agent Knowledge
72
LCR
Long-Context Reasoning
46.7
IFBench
Agent
44.1
SciCode
Reasoning Knowledge
38.3
Artificial Analysis Intelligence Index
Knowledge
31.3
Artificial Analysis Coding Index
Coding
26.4
Terminal-Bench Hard
Agent Coding
20.5
HLE
Knowledge Multi-Modal
11.1