Nous Research

Hermes 4 - Llama-3.1 70B (Reasoning)

Unknown Size

By Nous Research • Released 2025-08-27

Capability Radar

Avg Score
35

Across all benchmarks

Participated
12
Benchmarks

Benchmark Performance

Benchmark Category Score
MMLU-Pro
Knowledge
81.1
GPQA Diamond
Knowledge
69.9
AIME 2025
Reasoning
68.7
LiveCodeBench
Coding
65.3
SciCode
Reasoning Knowledge
34.1
IFBench
Agent
31.3
𝜏²-Bench Telecom
Reasoning Knowledge
22.5
Artificial Analysis Intelligence Index
Knowledge
16
Artificial Analysis Coding Index
Coding
14.4
HLE
Knowledge Multi-Modal
7.9
LCR
Long-Context Reasoning
6.7
Terminal-Bench Hard
Agent Coding
4.5