Skip to main content
Back to blog

Using Hermes Agent to Manage Your Infrastructure with Clanker Cloud

Run Hermes 3 locally via Ollama, connect it to Clanker Cloud via MCP, and manage your entire infrastructure with zero API cost and no data leaving your machine.

The Zero-API-Cost Infrastructure Agent

Every major AI infrastructure tool today requires an API key — OpenAI, Anthropic, Google. Every query has a token cost. For a team running infrastructure queries dozens of times per day, those costs accumulate quickly, and more importantly, every query sends your infrastructure context — pod names, resource states, AWS account data — to a third-party endpoint.

There is a different approach. Hermes 3 by NousResearch is an open-weights model specifically designed for agentic tasks: tool use, function calling, structured output, and long-context instruction following. It runs locally via Ollama. Point the Clanker Cloud desktop app at that local Ollama endpoint, and Hermes becomes the reasoning layer inside a full infrastructure workspace rather than a raw local model sitting behind ad hoc scripts.

That distinction matters. Hermes alone is just the model runtime. Clanker Cloud is what gives Hermes the live provider context, cost data, logs, topology, Deep Research scans, and MCP bridge that make the model operationally useful inside a real production workflow.

Your cloud credentials stay local. Your Hermes model stays local. Your queries stay local. For teams with compliance requirements or cost discipline, this matters.

This guide covers everything from pulling the model to running fully autonomous infrastructure monitoring — no managed API required.


What Hermes 3 Actually Is

NousResearch Hermes-3 is a fine-tune of the Llama 3 base model, released under the MIT license. It is available directly through Ollama, which means no model hosting, no rate limits, no per-token billing.

What distinguishes Hermes 3 from a general-purpose chat model is its agentic fine-tuning. The model was specifically trained for:

  • Tool use and function calling — Hermes reliably selects the right tool from a set of available functions and formats the call correctly, even across multi-step sequences.
  • Structured output — JSON, YAML, and other structured formats come out correctly, which matters when you are generating Kubernetes manifests or Terraform blocks.
  • Long-context instruction following — system prompts with complex rules and constraints are respected consistently throughout a conversation.

These properties are why Hermes 3 is well-suited for infrastructure automation. Base Llama 3 can answer questions about infrastructure. Hermes 3 can drive infrastructure tools autonomously.

Available models:

Model VRAM needed CPU-only? Best for
hermes3:8b 8–16GB Possible (slow) Monitoring, simple queries
hermes3:70b 48GB (2× RTX 3090 or A100) Not practical Complex reasoning, multi-step tasks

For most teams: hermes3:8b on a Mac Studio (M4 Pro, 36GB unified memory) or a single 24GB GPU workstation handles routine infrastructure queries well. For full autonomous infrastructure management with complex multi-step reasoning, hermes3:70b on a Hetzner AX102 (two RTX 3090s) at €189/month keeps the entire setup self-hosted.


Installation — Hermes 3 via Ollama

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull Hermes 3 — choose based on your hardware
ollama pull hermes3:8b     # For 16GB+ VRAM / Mac M-series
ollama pull hermes3:70b    # For 48GB+ VRAM

# Verify the model loads and runs
ollama run hermes3:8b "what tools do you have access to?"

# Test instruction-following on an infrastructure task
ollama run hermes3:8b \
  "List the steps to diagnose a failing Kubernetes pod, in order"

# Run Ollama as a background service (required for MCP integration)
ollama serve &
# Ollama is now available at http://127.0.0.1:11434

ollama serve keeps the model loaded in memory and exposes an OpenAI-compatible API on port 11434. Any tool that supports a custom OpenAI endpoint can point to it — including Clanker Cloud.


Why Hermes Works Better Inside Clanker Cloud

Running Hermes in Ollama gives you a local model. Running Hermes inside Clanker Cloud gives you a local model attached to a complete infrastructure app.

  • The desktop app already manages provider connections for AWS, GCP, Azure, Kubernetes, Cloudflare, Hetzner, and DigitalOcean.
  • The Ask view and Deep Research flows gather live context from those providers, then send the relevant context to your local Hermes runtime.
  • Results come back into the app alongside the rest of the workflow: connected accounts, topology, logs, cost visibility, and review-first actions.
  • The CLI and MCP server are optional companion surfaces for terminal-heavy workflows and external agents, not the reason Hermes is useful in the first place.

The short version: Hermes is the brain, but Clanker Cloud is the operating surface. Without the app, you still have to build the cloud connections, context gathering, and workflow layer yourself.


Setting Up Clanker Cloud with Hermes

Start with the Clanker Cloud desktop app. Connect your providers there first, then point the app at your local Ollama runtime.

Clanker Cloud desktop app: Settings → AI Model → Bring Your Own Key → Ollama → model name: hermes3:8b (or hermes3:70b) → endpoint: http://127.0.0.1:11434

No API key field is required. Ollama is local.

Inside the app, Clanker Cloud gathers live state from your connected providers, assembles the relevant context, sends that context to the local Hermes endpoint at http://127.0.0.1:11434, and renders Hermes's answer back into the app. The same pattern powers Ask, Deep Research, and other local-first flows. The model stays local; the workflow is already built.

You do not need the CLI to make this work. The CLI is useful if you also want terminal commands or the local MCP server:

brew tap clankercloud/tap && brew install clanker

Once configured, the app gives Hermes the provider integrations — AWS, GCP, Azure, Kubernetes, Cloudflare, Hetzner, DigitalOcean — and Hermes handles the reasoning.

# Query active pods with restart issues
clanker ask "show me all pods in the production namespace that are restarting frequently"

# AWS cost breakdown
clanker ask "what is my AWS EC2 spend this month by instance type"

# Security posture check
clanker ask "find any S3 buckets that have public access enabled"

Compare this to the raw kubectl equivalent for that first query:

# The hard way — without Clanker Cloud
kubectl get pods -n production \
  -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{range .status.containerStatuses[*]}{.restartCount}{"\n"}{end}{end}' \
  | sort -k2,2nr | head -20

# Or with jq for structured output
kubectl get pods -n production -o json | \
  jq '.items[] | select(.status.containerStatuses != null) |
  {name: .metadata.name,
   restarts: [.status.containerStatuses[].restartCount] | add}
  | select(.restarts > 5)'

Both approaches give you the same data. The difference is query time and cognitive load — especially across multiple cloud providers simultaneously.

Total API cost for every clanker ask query above: $0.


Exposing Clanker Cloud as an MCP Server for Hermes

For most teams, the desktop app is the main Hermes experience already. Set up the MCP bridge only if you want an external agent to drive Clanker Cloud programmatically. This allows a Hermes agent process to call Clanker Cloud's infrastructure tools directly.

# Start Clanker Cloud MCP server
clanker mcp --transport http --listen 127.0.0.1:39393

# Verify the MCP server is responding
curl http://127.0.0.1:39393/health

The MCP server exposes three core tools that a Hermes agent can call:

  • clanker_version — check Clanker version and connected providers
  • clanker_route_question — route a natural-language infrastructure question to the correct provider and return structured results
  • clanker_run_command — execute an infrastructure command with optional --maker (plan), --apply (execute), or --destroyer (destructive ops) flags

See the full tool schema in the Clanker Cloud docs.


Using Hermes + OpenClaw for Autonomous Infrastructure Management

OpenClaw (68K+ GitHub stars) supports Ollama as its model backend. Combined with Hermes 3 and Clanker Cloud's MCP server, this creates a fully local autonomous infrastructure agent.

# Configure OpenClaw to use Hermes 3 via Ollama
# In OpenClaw settings: Model → Custom → Ollama → hermes3:70b
# Endpoint: http://127.0.0.1:11434

# Register Clanker Cloud as an MCP skill for OpenClaw
openclaw mcp set clanker-cloud --url http://127.0.0.1:39393

Now create a HEARTBEAT.md file that OpenClaw runs on a schedule:

# Infrastructure Monitoring — runs every 30 minutes

## Tasks

- [ ] Check all services health: ask clanker-cloud "are all my services healthy right now"
- [ ] Check for new alerts: ask clanker-cloud "show any new CloudWatch alarms fired in the last 30 minutes"
- [ ] Check pod restarts: ask clanker-cloud "find any pods that restarted more than 3 times in the last hour"

## Weekly (Mondays)

- [ ] Run cost review: ask clanker-cloud "what is my cloud spend this week vs last week"
- [ ] Run deep research: ask clanker-cloud "run a deep research scan and find misconfigurations and cost waste"

OpenClaw runs this file every 30 minutes. Hermes 3 is the reasoning engine. Clanker Cloud is the infrastructure data source. Total API cost: $0.

The deep research scan fans out across every connected provider simultaneously, runs parallel agent swarms, and returns severity-ranked findings — all driven by your local Hermes model.


What Hermes + Clanker Cloud Can Handle Autonomously

Incident Triage

An alert fires. OpenClaw triggers. Hermes 3 reasons through the alert context and calls clanker_route_question with a structured query:

clanker ask "what changed in the last 30 minutes that could explain a 503 spike on the payments service"

Kubernetes equivalent — what you'd run manually without Clanker:

# Check recent events in the namespace
kubectl get events -n payments --sort-by='.lastTimestamp' | tail -30

# Find pods that restarted recently
kubectl get pods -n payments -o json | \
  jq '.items[] | select(.status.containerStatuses != null) |
  {name: .metadata.name,
   restarts: [.status.containerStatuses[].restartCount] | add,
   state: .status.containerStatuses[].state}'

# Check if any deployments were rolled out recently
kubectl rollout history deployment -n payments

# Describe any pods not in Running state
kubectl get pods -n payments --field-selector=status.phase!=Running -o wide

Hermes reasons through all of this context, summarizes the probable cause, and posts it to Slack. You get the diagnosis without being the one who ran all those commands at 2 AM.

Cost Monitoring

# Weekly
clanker ask "compare my cloud spend this week vs last week across all providers and flag any anomalies"

# Monthly audit
clanker ask "run a deep research scan and find the top 5 cost optimization opportunities"

Security Checks

clanker ask "scan for any new public S3 buckets, open security groups, or IAM policies created in the last 7 days"

Kubernetes Health

# Find pods in failure states
clanker ask "list all pods in CrashLoopBackOff state and summarize why they're failing"

# Check replication status
clanker ask "find all Kubernetes deployments with fewer replicas than their desired count"

The kubectl equivalents:

# CrashLoopBackOff pods across all namespaces
kubectl get pods --all-namespaces \
  --field-selector=status.phase!=Running \
  -o json | jq '.items[] | select(
    .status.containerStatuses != null and
    (.status.containerStatuses[].state.waiting.reason == "CrashLoopBackOff")
  ) | {namespace: .metadata.namespace, name: .metadata.name,
       reason: .status.containerStatuses[].state.waiting.reason,
       message: .status.containerStatuses[].state.waiting.message}'

# Deployments not at desired replica count
kubectl get deployments --all-namespaces -o json | \
  jq '.items[] | select(.status.availableReplicas < .spec.replicas) |
  {namespace: .metadata.namespace,
   name: .metadata.name,
   desired: .spec.replicas,
   available: .status.availableReplicas}'

# OOM-killed pods (useful alongside the above)
kubectl get events --all-namespaces --field-selector reason=OOMKilling

Hermes vs. Managed API Models for Infrastructure

The comparison is not capability — it is cost and data residency.

vs. Claude Opus 4.6: Hermes 3 70B is less capable on complex multi-step reasoning tasks, but costs $0 per query versus ~$15–75/1M tokens. For routine operations queries and incident triage, Hermes 3 is sufficient.

vs. GPT-5.4 Pro: GPT-5.4 Pro is the right choice for generating complex Terraform modules, deep security audits, or novel IaC generation. It is overkill for "show me pods restarting frequently" or "what is my EC2 spend this month."

The practical pattern: Run Hermes 3 locally for the 90% of daily queries that are monitoring, status checks, and standard incident triage — at zero cost. Reserve Claude Opus 4.6 or GPT-5.4 via BYOK for deep research scans and complex audits where the reasoning ceiling matters. See the full BYOK model comparison for specifics on when to use which model.

This setup also fits naturally into teams that want to move from vibe-coding to production-grade infrastructure automation — Hermes handles the routine layer while managed models handle the complex edge cases.


Hardware Requirements

Model VRAM needed CPU-only? Best for
hermes3:8b 8–16GB Possible (slow) Monitoring, simple queries
hermes3:70b 48GB (2× RTX 3090 or A100) Not practical Complex reasoning, multi-step tasks

Starting point for most teams: hermes3:8b on a Mac Studio (M4 Pro, 36GB unified memory) or a single workstation with a 24GB GPU (RTX 3090 or 4090). It runs comfortably within the memory budget and handles routine infrastructure queries without delay.

Full autonomous infrastructure management: hermes3:70b on a Hetzner AX102 (two RTX 3090s, 48GB VRAM total) at €189/month. Fully self-hosted, fully offline. For teams managing significant cloud infrastructure, this pays for itself quickly relative to managed API costs.


FAQ

What is Hermes 3 and why use it for infrastructure management?

Hermes 3 is an open-weights model from NousResearch, fine-tuned on Llama 3 with a focus on agentic tasks: tool use, function calling, structured output, and complex instruction following. Unlike general-purpose chat models, Hermes 3 reliably calls tools in sequence and formats structured output correctly — both critical properties for infrastructure automation. It is available under the MIT license and runs locally via Ollama, which means zero API cost and no data leaving your network.

How do I connect Hermes to Clanker Cloud?

Install the Clanker Cloud desktop app, connect your providers, and run Ollama locally with your chosen Hermes model (ollama pull hermes3:8b, then ollama serve). In the app, go to Settings → AI Model → Bring Your Own Key → Ollama, set the model to hermes3:8b or hermes3:70b, and set the endpoint to http://127.0.0.1:11434. That is enough to make Hermes the active reasoning model inside the app. Install the CLI only if you also want terminal commands or the local MCP server for external agents.

Can Hermes 3 run autonomous infrastructure monitoring?

Yes. Using OpenClaw with Ollama as the model backend and Clanker Cloud registered as an MCP skill, you can run a HEARTBEAT.md file on a 30-minute schedule. Hermes 3 processes each check, calls clanker_route_question or clanker_run_command for live infrastructure data, and can post summaries to Slack or write findings to a log. The entire loop — model inference, infrastructure queries, output — runs locally. For more on building AI-driven DevOps workflows, see AI DevOps for teams.

How does Hermes compare to Claude or GPT-5.4 for infrastructure tasks?

Hermes 3 70B is less capable than Claude Opus 4.6 or GPT-5.4 Pro on complex multi-step tasks like generating novel Terraform modules from scratch or running deep cross-provider security audits. For those tasks, use a managed model via BYOK. Hermes 3 is well-suited for the majority of daily infrastructure queries: status checks, incident triage, cost monitoring, and Kubernetes health — all at zero API cost. The practical approach is to run Hermes for routine work and escalate to managed APIs for complex analysis. Full comparison at /faq.


Get Started

Run hermes3:8b locally today, point the Clanker Cloud app at it, and use Hermes inside a real infrastructure workspace instead of wiring a local model to cloud tools by hand.