Most AI DevOps tools have a quiet problem: they're cloud-hosted. Every natural-language query about your infrastructure, every generated deployment plan, every incident triage prompt — it all leaves your network and lands on someone else's servers. For personal side projects, that tradeoff is fine. For regulated industries, government contracts, or organizations with strict data handling policies, it's a blocker.
This guide covers a stack that eliminates that tradeoff: Hermes 3 by NousResearch running locally via Ollama, wired into Clanker Cloud's MCP server. The result is an AI DevOps agent that can query live AWS, GCP, Kubernetes, and other infrastructure contexts — with zero data egress, no per-token API costs, and no credential exposure to a third-party model provider.
Why Local Models Matter for DevOps Agents
When you describe your infrastructure to a cloud-hosted LLM — pod counts, environment variables, deployment configs, cost breakdowns — you are sending sensitive operational data off-premises. Most enterprise security policies don't account for this yet, but they will. And in healthcare (HIPAA), finance (SOX, PCI-DSS), and government (FedRAMP), the answer is already clear: it doesn't fly.
Beyond compliance, there are practical reasons to prefer local inference for DevOps agents:
- No per-token cost. A cost audit agent that runs on a daily cron schedule against your full cloud inventory would cost real money if routed through OpenAI or Anthropic. Running locally, it costs electricity.
- Offline operation. Incident response shouldn't depend on whether
api.openai.comis reachable. A local agent works even when external services are degraded. - No rate limits. Cloud APIs throttle. A local model doesn't.
- Auditability. You control every inference call. Nothing is logged by a third party.
The missing piece, historically, was that local models weren't reliable enough at structured function calling to be useful in agent frameworks. That has changed.
What Hermes 3 Is and Why It Works Here
Hermes 3 is an open-source model series from NousResearch, fine-tuned on Meta's Llama 3.1 base (available in 8B, 70B, and 405B parameter sizes). Where most fine-tunes optimize for general conversational quality, Hermes 3 was explicitly trained for:
- Reliable function calling. Hermes uses a structured
<tool_call>/<tool_response>format within the ChatML prompt template. The model consistently outputs parseable JSON function calls without needing special prompting tricks. - Agentic multi-step reasoning. Hermes 3 supports a
<scratch_pad>mechanism — a Goal-Oriented Action Planning (GOAP) framework where the model reasons through a task, plans its tool calls, observes results, and reflects before taking the next step. This is critical for non-trivial DevOps workflows. - Structured output / JSON mode. When you need the agent to produce a typed report (cost anomalies, config drift, triage summary), Hermes can output strict JSON conforming to a Pydantic schema.
The Hermes 3 technical report describes it directly: "When combined, Hermes 3 can perform planning, incorporate outside data, and make use of external tools in an interpretable and transparent manner out-of-the-box, making it an excellent choice for agentic tasks."
For DevOps agent work specifically, the 8B variant handles most tasks with low latency on a machine with 16GB of RAM. The 70B variant produces noticeably more careful reasoning for complex multi-step problems like incident triage across distributed systems.
The Stack: Hermes + Ollama + Agent Framework + Clanker Cloud MCP
Here is how the four layers fit together:
┌─────────────────────────────────────┐
│ Agent Framework │ CrewAI / LangChain / AutoGen
│ (task decomposition, tool routing) │
└────────────────┬────────────────────┘
│
┌────────────────▼────────────────────┐
│ Hermes 3 via Ollama │ Local inference, function calling
│ (LLM brain of the agent) │
└────────────────┬────────────────────┘
│ MCP protocol
┌────────────────▼────────────────────┐
│ Clanker Cloud Desktop │ Live infra context, auth layer
│ (MCP server + cloud connectors) │
└────────────────┬────────────────────┘
│
┌────────────────▼────────────────────┐
│ Cloud Providers & Platforms │ AWS · GCP · Azure · K8s · GitHub
│ │ Cloudflare · Hetzner · DigitalOcean
└─────────────────────────────────────┘
Hermes 3 is the language model. It receives tool schemas from the agent framework, decides which tools to call, parses the responses, and generates plans or reports.
Ollama is the local inference runtime. It serves Hermes via an OpenAI-compatible API endpoint (http://localhost:11434), which means any LangChain or CrewAI code that can talk to OpenAI can talk to Hermes locally with a one-line change.
The agent framework (CrewAI, LangChain, or others) handles task decomposition, agent roles, memory, and tool invocation. It is the orchestration layer that turns Hermes's function calling into multi-step workflows.
Clanker Cloud is the infrastructure layer. It is a local-first desktop app that holds your cloud credentials on your machine and exposes all connected infrastructure — AWS, GCP, Azure, Kubernetes, Cloudflare, GitHub, and more — as a live MCP server. Your agent framework connects to that MCP endpoint and gets access to real-time infrastructure state without ever handling credentials directly.
Clanker Cloud also operates read-first. Before any change is applied, it generates a reviewed plan. Changes only execute in explicit "maker mode." This means a Hermes agent querying infrastructure context through Clanker Cloud has safe read access by default — it cannot accidentally delete a production database.
Setup Walkthrough
Step 1: Install Ollama and pull Hermes 3
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# Start the Ollama server
ollama serve
# Pull Hermes 3 (8B is a good starting point)
ollama pull nous-hermes-3
# Verify it runs
ollama run nous-hermes-3 "List three Kubernetes health checks in JSON"
The model name on Ollama is nous-hermes-3 (also available as hermes3). It maps to the NousResearch Hermes-3-Llama-3.1 series.
Step 2: Install Clanker Cloud desktop
Download the desktop app from clankercloud.ai and install it. During setup, connect your cloud providers. Credentials are stored locally — they never leave your machine.
Clanker Cloud is a BYOK (bring your own keys) platform. Hermes via Ollama is one of the supported local model options alongside Gemma 4, Claude Code, and Codex — no external API key required for local inference.
Once your providers are connected, the MCP server endpoint is available at http://localhost:PORT/mcp (the exact port is shown in the desktop app settings). See the full docs at docs.clankercloud.ai for the MCP server configuration.
Step 3: Install your agent framework
pip install crewai langchain-community langchain
Step 4: Wire Hermes to Clanker Cloud MCP
LangChain example:
from langchain_community.llms import Ollama
from langchain.agents import AgentExecutor, create_react_agent
from langchain_mcp_adapters.client import MultiServerMCPClient
# Local Hermes model — no API key, no data egress
llm = Ollama(model="nous-hermes-3", base_url="http://localhost:11434")
# Connect to Clanker Cloud's MCP server for live infra tools
async def build_agent():
async with MultiServerMCPClient(
{
"clanker-cloud": {
"url": "http://localhost:3100/mcp", # Port from Clanker Cloud desktop
"transport": "streamable_http",
}
}
) as client:
tools = client.get_tools()
agent = create_react_agent(llm, tools)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
return executor
CrewAI example:
from crewai import Agent, Task, Crew, LLM
# Hermes 3 via Ollama
hermes_llm = LLM(
model="ollama/nous-hermes-3",
base_url="http://localhost:11434",
temperature=0.1, # Low temperature for consistent structured output
)
# Clanker Cloud MCP tools are loaded separately via mcp_server_url
devops_agent = Agent(
role="Infrastructure Analyst",
goal="Query live infrastructure state and produce triage reports",
backstory="A senior SRE with deep knowledge of AWS, GCP, and Kubernetes.",
llm=hermes_llm,
verbose=True,
)
For team workflows, see the AI DevOps for teams guide for multi-agent patterns. For getting from prototype to production, the vibe coding to production guide covers the deployment path.
Use Case 1: Incident Triage Agent
An alert fires at 2 AM. A pod is crashlooping in your production Kubernetes cluster. Before you dig in manually, your Hermes triage agent can do the first pass.
The agent wakes (triggered via webhook or alertmanager integration), connects to Clanker Cloud's MCP server, and executes a sequence of read queries:
- List recent deployments in the affected namespace (last 2 hours)
- Pull current pod status and restart counts
- Fetch recent config changes from the Clanker Cloud change log
- Cross-reference with GitHub — any recent merges to the production branch?
Hermes's <scratch_pad> reasoning framework makes it well-suited to this sequential investigation. It plans what to check, observes results, and revises its hypothesis before writing a triage report.
Example triage task:
triage_task = Task(
description="""
A CrashLoopBackOff has been detected for pod {pod_name} in namespace {namespace}.
Query Clanker Cloud for:
1. Recent deployments in the namespace (last 2 hours)
2. Current pod resource limits vs. actual usage
3. Any config changes in the last 24 hours
Generate a structured triage report with probable root cause and next steps.
""",
expected_output="JSON triage report with probable_cause, evidence, and recommended_actions fields",
agent=devops_agent,
)
Everything stays local. The pod names, namespace configs, and recent change history never leave your machine.
Use Case 2: Cost Audit Agent
Cloud costs drift. A forgotten load balancer, an oversized RDS instance, a Lambda function that started hitting a paid tier — these accumulate quietly. A cost audit agent on a daily cron is a cheap way to surface anomalies before they become surprises.
# cron: 0 8 * * 1 (every Monday at 8 AM)
cost_audit_task = Task(
description="""
Connect to Clanker Cloud and retrieve the last 7 days of cloud cost data across
all connected providers. Identify:
- Any service with >20% week-over-week cost increase
- Resources running in unexpected regions
- Idle resources (compute with <5% average CPU over 7 days)
Write the findings to /reports/cost-audit-{date}.json
""",
expected_output="Cost anomaly report with resource ID, current cost, baseline cost, and anomaly type",
agent=devops_agent,
)
The agent handles auth by querying through Clanker Cloud — the cloud provider credentials stay in the desktop app, not in the agent code. No AWS access key is ever passed to Hermes.
Use Case 3: Pre-Deploy Config Review
Before pushing a deployment, it is useful to know whether the target environment matches what you're expecting. Config drift is a common source of failed deploys — a Kubernetes namespace that was manually patched, an environment variable that was never updated, a service account that was rotated without updating the secret.
A Hermes agent connected to Clanker Cloud can automate this check as a CI/CD step:
pre_deploy_task = Task(
description="""
Before deploying {service_name} version {version} to production:
1. Query the current production Kubernetes config for {service_name} via Clanker Cloud
2. Compare actual config against the expected config in {config_file}
3. Flag any mismatches: image tags, replica counts, resource limits, env vars, secrets
4. Return PASS or FAIL with a diff report
""",
expected_output="Pre-deploy validation report with status (PASS/FAIL) and list of config mismatches",
agent=devops_agent,
)
This can block a deploy pipeline on FAIL, or surface warnings for review. Because it is running local inference, it adds minimal latency to the pipeline — no round-trip to an external API.
See the demo for a live walkthrough of this workflow in Clanker Cloud.
Security and Compliance
The standard objection to AI-assisted DevOps in regulated environments is: "We can't send infrastructure data to OpenAI." This stack is the answer to that objection.
Zero egress. Hermes runs on your machine via Ollama. Clanker Cloud stores credentials locally. The MCP server runs locally. No infrastructure data travels off-premises during agent operation.
Credential isolation. The agent code never handles cloud provider credentials. It calls the Clanker Cloud MCP server, which handles auth internally. This means your AWS access keys, Kubernetes service account tokens, and other credentials are never present in agent prompts or logs.
Auditability. Every tool call the agent makes is logged locally. Clanker Cloud's read-first model means a full audit trail of what was queried and when. No black-box inference on a remote server.
Suitable for regulated industries. The stack can be deployed in:
- Healthcare environments requiring HIPAA-compliant data handling
- Financial services with SOX or PCI-DSS obligations
- Government systems operating under FedRAMP or equivalent frameworks
- Any organization with a data residency requirement
The question "can we use AI for DevOps without sending data to OpenAI?" has a concrete yes here.
FAQ
What is Hermes 3 and how does it compare to GPT-4?
Hermes 3 is an open-source instruction-following and function-calling model by NousResearch, built on Meta's Llama 3.1 base. It is not directly comparable to GPT-4 in general capability, but for structured tool use and function calling in agent frameworks, it performs at a level that makes it production-viable. The key difference is deployment: Hermes 3 runs fully locally via Ollama, with no API key, no data egress, and no per-token cost. GPT-4 is a cloud-hosted proprietary model. For DevOps agents in regulated environments, that distinction matters more than benchmark scores.
Can I run an AI DevOps agent locally without sending data to the cloud?
Yes. The Hermes + Ollama + Clanker Cloud stack described in this guide operates entirely on your local machine. Hermes performs inference locally. Clanker Cloud holds cloud provider credentials locally and serves live infrastructure context via a local MCP server. No infrastructure data, no model prompts, and no query results leave your network.
How do I use Hermes with Clanker Cloud?
Install Ollama, pull Hermes 3 with ollama pull nous-hermes-3, and install the Clanker Cloud desktop app. Connect your cloud providers in the desktop app, note the MCP server endpoint, and configure your agent framework (LangChain or CrewAI) to use Hermes as the LLM and Clanker Cloud's MCP endpoint as the tool source. Full setup is covered in the walkthrough above, and detailed documentation is at docs.clankercloud.ai.
What agent frameworks work with Hermes 3?
Any framework that accepts an OpenAI-compatible API endpoint works with Hermes via Ollama. This includes CrewAI, LangChain, AutoGen, and LlamaIndex. The most common pattern is configuring Ollama(model="nous-hermes-3", base_url="http://localhost:11434") as the LLM within the framework. CrewAI's LLM class also supports LLM(model="ollama/nous-hermes-3") directly. For MCP tool integration, LangChain's MCP adapters or the CrewAI MCP toolset handle the protocol bridge to Clanker Cloud. See the for-agents page for integration patterns and the FAQ for common setup questions.
Get Started
The Clanker Cloud desktop app is free during beta. Connect your first cloud provider, start the MCP server, and wire in Hermes via the snippets above.
Full documentation — including MCP server configuration, supported providers, and BYOK model setup — is at docs.clankercloud.ai.
For teams already running infrastructure workflows, the AI DevOps for teams guide covers multi-agent patterns, role-based access, and shared workspace setup.
