Benchmark Hub
Explore Different Kins of Benchmarks
Found 15 Results
AIME 2025
1,000
by Artificial Analysis
Reasoning
Sync: 2026-01-28
Artificial Analysis Coding Index
1,000
by Artificial Analysis
Coding
Sync: 2026-02-12
Artificial Analysis Intelligence Index
1,000
by Artificial Analysis
Knowledge
Sync: 2026-02-12
GPQA Diamond
1,000
by idavidrein
Knowledge
Sync: 2026-02-12
HLE
1,000
by Center for AI Safety & Scale AI
KnowledgeMulti-Modal
Sync: 2026-02-12
IFBench
1,000
by Allen Institute for AI (Ai2)
Agent
Sync: 2026-02-12
LCR
1,000
by Artificial Analysis
Long-ContextReasoning
Sync: 2026-02-12
LiveCodeBench
1,000
by UC Berkeley & MIT & Cornell University
Coding
Sync: 2026-01-28
MATH-500
1,000
by OpenAI
Reasoning
Sync: 2026-01-25
MMLU-Pro
1,000
by TIGER AI Lab
Knowledge
Sync: 2026-01-28
SciCode
1,000
by SciCode
ReasoningKnowledge
Sync: 2026-02-12
SWE-bench (Bash Only)
1,000
by Princeton University & University of Chicago
CodingAgent
Sync: 2026-01-12
𝜏²-Bench Telecom
1,000
by Sierra AI
ReasoningKnowledge
Sync: 2026-02-12
Terminal-Bench Hard
1,000
by Stanford & Laude
AgentCoding
Sync: 2026-02-12
τ-bench
1,000
by Sierra
AgentKnowledge
Sync: 2026-01-31