Alibaba

Qwen3 4B 2507 Instruct

Unknown Size

By Alibaba • Released 2025-08-06

Capability Radar

Avg Score
27

Across all benchmarks

Participated
12
Benchmarks

Benchmark Performance

Benchmark Category Score
MMLU-Pro
Knowledge
67.2
AIME 2025
Reasoning
52.3
GPQA Diamond
Knowledge
51.7
LiveCodeBench
Coding
37.7
IFBench
Agent
33.5
𝜏²-Bench Telecom
Reasoning Knowledge
26.6
SciCode
Reasoning Knowledge
18.1
Artificial Analysis Intelligence Index
Knowledge
13.2
Artificial Analysis Coding Index
Coding
9.1
LCR
Long-Context Reasoning
7.3
HLE
Knowledge Multi-Modal
4.7
Terminal-Bench Hard
Agent Coding
4.5