Artificial Analysis Intelligence Index

Benchmark Tags:

Knowledge

Publisher:

Artificial Analysis

Last Sync:

2026-02-12

Official Site:

Link

Overview

The Artificial Analysis Intelligence Index v4.0 is a comprehensive benchmark that evaluates AI models across 10 diverse and challenging evaluations. It represents a holistic assessment of artificial intelligence capabilities, covering reasoning, knowledge, coding, instruction following, and scientific understanding.

Component Evaluations

This intelligence index aggregates performance from the following benchmarks:

Category	Benchmark	Description
Reasoning	𝜏²-Bench Telecom	Dual-control conversational AI benchmark for technical support scenarios
Agent	Terminal-Bench Hard	AI capabilities in terminal environments
Coding	SciCode	Scientist-curated coding problems from laboratory settings
Reasoning	AA-LCR	Long Context Reasoning benchmark
Knowledge	AA-Omniscience	General knowledge assessment
Instruction-Following	IFBench	Precise instruction-following generalization
Academic	Humanity’s Last Exam	Multi-modal benchmark at the frontier of human knowledge
Scientific	GPQA Diamond	Graduate-level scientific Q&A
Agent	CritPt	Critical thinking and problem-solving
General	GDPval-AA	General domain performance validation

Purpose

This composite index provides a comprehensive view of an AI model’s intelligence by evaluating performance across multiple dimensions. It helps identify well-rounded models that excel in diverse challenges rather than specializing in narrow domains.

Source: Artificial Analysis