Alibaba

Qwen3 4B 2507 (Reasoning)

Unknown Size

By Alibaba • Released 2025-08-06

Capability Radar

Avg Score
38

Across all benchmarks

Participated
12
Benchmarks

Benchmark Performance

Benchmark Category Score
AIME 2025
Reasoning
82.7
MMLU-Pro
Knowledge
74.3
GPQA Diamond
Knowledge
66.7
LiveCodeBench
Coding
64.1
IFBench
Agent
49.8
LCR
Long-Context Reasoning
37.7
SciCode
Reasoning Knowledge
25.6
𝜏²-Bench Telecom
Reasoning Knowledge
25.4
Artificial Analysis Intelligence Index
Knowledge
18.6
Artificial Analysis Coding Index
Coding
9.5
HLE
Knowledge Multi-Modal
5.9
Terminal-Bench Hard
Agent Coding
1.5