Nous Research

Hermes 4 - Llama-3.1 405B (Reasoning)

Unknown Size

By Nous Research • Released 2025-08-27

Capability Radar

Avg Score
38

Across all benchmarks

Participated
12
Benchmarks

Benchmark Performance

Benchmark Category Score
MMLU-Pro
Knowledge
82.9
GPQA Diamond
Knowledge
72.7
AIME 2025
Reasoning
69.7
LiveCodeBench
Coding
68.6
IFBench
Agent
32.7
SciCode
Reasoning Knowledge
25.2
𝜏²-Bench Telecom
Reasoning Knowledge
22.2
LCR
Long-Context Reasoning
20.7
Artificial Analysis Intelligence Index
Knowledge
18.6
Artificial Analysis Coding Index
Coding
16
Terminal-Bench Hard
Agent Coding
11.4
HLE
Knowledge Multi-Modal
10.3