Artificial Analysis Intelligence Index

Benchmark Tags:
Publisher:
Artificial Analysis
Last Sync:
2026-02-12
Official Site:
Link

Overview

The Artificial Analysis Intelligence Index v4.0 is a comprehensive benchmark that evaluates AI models across 10 diverse and challenging evaluations. It represents a holistic assessment of artificial intelligence capabilities, covering reasoning, knowledge, coding, instruction following, and scientific understanding.

Component Evaluations

This intelligence index aggregates performance from the following benchmarks:

CategoryBenchmarkDescription
Reasoning𝜏²-Bench TelecomDual-control conversational AI benchmark for technical support scenarios
AgentTerminal-Bench HardAI capabilities in terminal environments
CodingSciCodeScientist-curated coding problems from laboratory settings
ReasoningAA-LCRLong Context Reasoning benchmark
KnowledgeAA-OmniscienceGeneral knowledge assessment
Instruction-FollowingIFBenchPrecise instruction-following generalization
AcademicHumanity’s Last ExamMulti-modal benchmark at the frontier of human knowledge
ScientificGPQA DiamondGraduate-level scientific Q&A
AgentCritPtCritical thinking and problem-solving
GeneralGDPval-AAGeneral domain performance validation

Purpose

This composite index provides a comprehensive view of an AI model’s intelligence by evaluating performance across multiple dimensions. It helps identify well-rounded models that excel in diverse challenges rather than specializing in narrow domains.


Source: Artificial Analysis

Benchmark Snapshot