Humanity's Last Exam

Benchmark Tags:

Knowledge Multi-Modal

Publisher:

Center for AI Safety & Scale AI

Last Sync:

2026-02-12

Official Site:

Link

Overview

Humanity’s Last Exam represents one of the most ambitious academic benchmarking efforts ever undertaken. Created by the Center for AI Safety and Scale AI, this benchmark aims to push AI models to their absolute limits across virtually every academic domain.

Key Statistics

Metric	Value
Total Questions	2,500
Subject Areas	Dozens of subjects
Question Types	Multiple-choice, short-answer
Grading	Automated grading compatible

Subject Coverage

HLE spans an extraordinarily broad range of disciplines:

Mathematics: From advanced calculus to abstract algebra
Humanities: Literature, philosophy, history, political science
Natural Sciences: Physics, chemistry, biology, astronomy
Social Sciences: Economics, psychology, sociology
Professional Fields: Law, medicine, engineering

Development Process

Questions in HLE were developed globally by subject-matter experts, ensuring:

High quality and accuracy
True representation of expert-level challenges
Resistance to memorization-based approaches
Coverage of both common and obscure topics

Purpose

HLE is designed to be the final closed-ended academic benchmark of its kind - a comprehensive test that can definitively measure whether AI has achieved expert-level performance across human knowledge domains.

Source: Last Exam