Nous Research

Hermes 4 - Llama-3.1 70B (Non-reasoning)

Unknown Size

By Nous Research • Released 2025-08-27

Capability Radar

Avg Score
22

Across all benchmarks

Participated
12
Benchmarks

Benchmark Performance

Benchmark Category Score
MMLU-Pro
Knowledge
66.4
GPQA Diamond
Knowledge
49.1
IFBench
Agent
29
SciCode
Reasoning Knowledge
27.7
LiveCodeBench
Coding
26.9
𝜏²-Bench Telecom
Reasoning Knowledge
21.6
Artificial Analysis Intelligence Index
Knowledge
13.6
AIME 2025
Reasoning
11.3
Artificial Analysis Coding Index
Coding
9.2
HLE
Knowledge Multi-Modal
3.6
LCR
Long-Context Reasoning
2
Terminal-Bench Hard
Agent Coding
0