Hermes 3 8B (llama.cpp, Q6_K_L)

Hermes 3 8B ist ein lokales GGUF-Finetuning von Nous Research auf Basis von Llama 3.1 8B. Die Variante ist auf Instruction-Following, Tool Use und kreative oder ambivalente Anfragen ausgelegt und läuft hier als Q6_K_L-GGUF über llama.cpp. Die bewusst reduzierte Ablehnungsrate ist ein zentrales Merkmal der Distribution.

NousResearch Version Q6_K_L (GGUF) Kommerzielle Nutzung erlaubt Dense 8 B 128 K Context 09/2024 $0 / $0 per 1M

Restricted Weights
Edge
LCL
Instruct
Uncensored-Finetuned
Real-Time

Sovereign Risk: MEDIUM NousResearch ist ein US-amerikanisches Unternehmen; CLOUD Act ist nur bei API-Nutzung relevant, nicht bei lokaler Ausführung der Open-Weights-Variante.

Schlüsselmetriken

Score · Latenz · Kosten · Qualität

Total Score Bronze: 59.21

Routine: 37.75
Reasoning: 21.46

Rank #85

LLM Judge Avg: 2.78; 100% Coverage

Avg Task Duration: 12.93; Real-Time

Token Rate: 47.99; Output Rate

P95 Latency: 28.73; Top 5 %

Total Tokens: 38.1K; Output Volume

Cost per 1K: $0; USD / 1K Requests

Benchmark Cost: $0; Total · 38.1K tok

Benchmark-Module

7 Module · gewichtet · vs. Modellmedian & Spitzenreiter

Hermes 3 8B (llama.cpp, Q6_K_L) Bestes Modell Ø Alle Modelle

Token-Effizienz & Latenz

Verbrauch pro Modul vs. Modellmedian

Hermes 3 8B (llama.cpp, Q6_K_L)

Schlüsselmetriken

Benchmark-Module

Token-Effizienz & Latenz

Token-Verbrauch pro Modul

Performance-Profil