Overview
Humanity’s Last Exam represents one of the most ambitious academic benchmarking efforts ever undertaken. Created by the Center for AI Safety and Scale AI, this benchmark aims to push AI models to their absolute limits across virtually every academic domain.
Key Statistics
| Metric | Value |
|---|---|
| Total Questions | 2,500 |
| Subject Areas | Dozens of subjects |
| Question Types | Multiple-choice, short-answer |
| Grading | Automated grading compatible |
Subject Coverage
HLE spans an extraordinarily broad range of disciplines:
- Mathematics: From advanced calculus to abstract algebra
- Humanities: Literature, philosophy, history, political science
- Natural Sciences: Physics, chemistry, biology, astronomy
- Social Sciences: Economics, psychology, sociology
- Professional Fields: Law, medicine, engineering
Development Process
Questions in HLE were developed globally by subject-matter experts, ensuring:
- High quality and accuracy
- True representation of expert-level challenges
- Resistance to memorization-based approaches
- Coverage of both common and obscure topics
Purpose
HLE is designed to be the final closed-ended academic benchmark of its kind - a comprehensive test that can definitively measure whether AI has achieved expert-level performance across human knowledge domains.
Source: Last Exam