# Linux / WSL2
-weight: 500;">curl -fsSL https://ollama.com/-weight: 500;">install.sh | sh # macOS (Homebrew)
-weight: 500;">brew -weight: 500;">install ollama
# Linux / WSL2
-weight: 500;">curl -fsSL https://ollama.com/-weight: 500;">install.sh | sh # macOS (Homebrew)
-weight: 500;">brew -weight: 500;">install ollama
# Linux / WSL2
-weight: 500;">curl -fsSL https://ollama.com/-weight: 500;">install.sh | sh # macOS (Homebrew)
-weight: 500;">brew -weight: 500;">install ollama
# Best overall local model
ollama pull llama4-maverick # Best for budget VPS (8 GB RAM)
ollama pull qwen3:8b # Best lightweight option
ollama pull mistral-small
# Best overall local model
ollama pull llama4-maverick # Best for budget VPS (8 GB RAM)
ollama pull qwen3:8b # Best lightweight option
ollama pull mistral-small
# Best overall local model
ollama pull llama4-maverick # Best for budget VPS (8 GB RAM)
ollama pull qwen3:8b # Best lightweight option
ollama pull mistral-small
# Run the model selector
hermes model
# Select "ollama" as provider, then choose your downloaded model
# Run the model selector
hermes model
# Select "ollama" as provider, then choose your downloaded model
# Run the model selector
hermes model
# Select "ollama" as provider, then choose your downloaded model
provider: ollama
model: qwen3:8b
provider: ollama
model: qwen3:8b
provider: ollama
model: qwen3:8b
provider: ollama
model: llama4-maverick
provider: ollama
model: llama4-maverick
provider: ollama
model: llama4-maverick
provider: ollama
model: qwen3:8b
provider: ollama
model: qwen3:8b
provider: ollama
model: qwen3:8b
provider: ollama
model: mistral-small
provider: ollama
model: mistral-small
provider: ollama
model: mistral-small
provider: ollama
model: deepseek-r1:14b
provider: ollama
model: deepseek-r1:14b
provider: ollama
model: deepseek-r1:14b - Hermes Agent auto-detects Ollama models and ships per-model tool call parsers for reliable local function calling.
- Llama 4 Maverick (1M context, strong tool calling) is the best local model but needs 16+ GB RAM.
- Qwen 3 8B runs on a VPS with 8 GB RAM and handles straightforward agent tasks at zero API cost.
- Mistral Small fits in 8 GB RAM with 128K context and solid function calling — the best lightweight option.
- Hardware requirements: 8 GB RAM minimum for 7-8B models, 16 GB for 14B models, 48+ GB for 70B models. - Open-Source Model Rankings for Hermes Agent
- Hardware Requirements
- Ollama Setup for Hermes Agent
- Model-by-Model Configuration
- How Hermes Agent Handles Local Tool Calling
- Limitations and Tradeoffs - Response speed is slower. Local models on CPU-only hardware generate 2–10 tokens per second for 7-8B models. Cloud APIs return 50–100+ tokens per second. Interactive agent sessions feel noticeably slower without a GPU.
- Tool calling quality is lower. Even the best open-source models generate malformed tool calls more often than Claude Sonnet 4.6 or GPT-4.1. Retries consume compute time on local hardware, adding latency.
- Context windows are smaller. Qwen 3 8B (32K) and Mistral Small (128K) have smaller context windows than cloud models (1M). Hermes Agent loads tool definitions, memory, and history into every request — smaller windows mean earlier context gets truncated in long sessions.
- You manage the infrastructure. Updates, monitoring, disk space, and model downloads are your responsibility. A cloud API abstracts all of this away.
- Quantization reduces quality. Running at Q4 quantization (necessary to fit larger models in less RAM) reduces output quality compared to full-precision inference. The effect is measurable on benchmarks but often acceptable for practical agent tasks. - Best AI Models for Hermes Agent
- How to Self-Host Hermes Agent on a VPS
- Hermes Agent Setup Guide
- Best DeepSeek Models for Hermes Agent