Nous Research

Hermes 4 - Llama-3.1 405B (Non-reasoning)

Unknown Size

By Nous Research • Released 2025-08-27

Capability Radar

Avg Score
30

Across all benchmarks

Participated
12
Benchmarks

Benchmark Performance

Benchmark Category Score
MMLU-Pro
Knowledge
72.9
LiveCodeBench
Coding
54.6
GPQA Diamond
Knowledge
53.6
IFBench
Agent
34.8
SciCode
Reasoning Knowledge
34.6
𝜏²-Bench Telecom
Reasoning Knowledge
26.6
LCR
Long-Context Reasoning
20
Artificial Analysis Coding Index
Coding
18.1
Artificial Analysis Intelligence Index
Knowledge
17.1
AIME 2025
Reasoning
15.3
Terminal-Bench Hard
Agent Coding
9.8
HLE
Knowledge Multi-Modal
4.2