13 min read2025-07-08Clanker Cloud Team

Run Your Own AI on Your Infrastructure: Gemma 4, Hermes, Codex, and Claude Code with Clanker Cloud

Run Gemma 4, Hermes, Claude Code, or Codex with your own infrastructure. Zero data egress, no token markup, full model control. Clanker Cloud BYOK.

Download Clanker Cloud

local AI infrastructureBYOK AI DevOpsGemma 4 local inferenceClaude Code infrastructureCodex infrastructure automationHermes AI agent DevOpslocal LLM DevOps

Most AI DevOps tools are hosted SaaS products. You send your cloud credentials and infrastructure queries to their servers. Their model processes the request. Their servers return an answer. For many teams — those with compliance requirements, data residency obligations, or existing AI tooling investments — that architecture is a non-starter.

Clanker Cloud works differently. You bring your own AI. Whether that means running Gemma 4 locally for DevOps with zero network egress, or connecting Claude Code or Codex as an MCP-enabled agent that can query your live infrastructure, the model is yours. The data flow is yours. The cost is yours — no token markup.

This article covers the four primary integration patterns: Gemma 4 local inference, Claude Code, Codex, and Hermes — and how each connects to Clanker Cloud's local AI infrastructure layer.

What BYOK Actually Means in AI DevOps

"Bring Your Own Keys" gets thrown around loosely. Here's the precise meaning in the context of Clanker Cloud:

You supply your own API key (Anthropic, OpenAI, Gemini, or any OpenAI-compatible provider), or you point at a local endpoint (Ollama, LM Studio, llama.cpp).
When you ask a question about your infrastructure, that query travels: your machine → your AI provider → back to your machine.
Clanker Cloud's servers are never in that path. Your queries, your infrastructure metadata, your resource names, and your cloud credentials do not transit Clanker Cloud's infrastructure.
You control cost directly — you're billed by your own provider at their standard rates, with no platform markup on tokens.

Contrast this with most hosted AI DevOps tools: they act as a proxy. Your data enters their system, gets processed by their chosen model at their markup, and returns to you. That's a fundamentally different data flow — one that may conflict with your security posture, data residency requirements, or existing contract obligations.

Clanker Cloud's architecture separates two things that most tools conflate: the infrastructure query layer (fetching live state from AWS, GCP, Azure, Kubernetes, Cloudflare, etc.) and the AI layer (reasoning over that state). You own the AI layer completely.

Option 1: Gemma 4 for Fully Local Inference

What Gemma 4 Is

Gemma 4 is Google's open-weight model family released in 2025/2026. The two variants most relevant for infrastructure operations are Gemma 4 12B (fast, runs well on a modern developer workstation) and Gemma 4 27B (more capable reasoning, better for complex multi-step infrastructure analysis). Gemma 4 includes multimodal improvements and stronger reasoning compared to Gemma 2, and the 27B variant handles function-following and structured output well — both important for infrastructure queries.

It runs on Ollama, LM Studio, llama.cpp, and any server that exposes an OpenAI-compatible /v1 endpoint.

Why It Matters for Infrastructure Operations

When you run Gemma 4 locally for DevOps, everything stays on your machine:

Cloud credentials never touch an external API
Resource names, IP ranges, IAM policies, cost data — local only
Queries and responses — local only
Zero network egress on the AI side

For teams in regulated industries — healthcare (HIPAA), finance (SOC 2, PCI-DSS), defence, critical infrastructure — this is often a requirement, not a preference. For EU teams subject to strict GDPR data residency rules, local inference eliminates the question of whether your cloud metadata constitutes personal data that can't leave a jurisdiction. For air-gapped production environments, it's the only option.

It's also increasingly viable for Hetzner-hosted workloads: a Hetzner dedicated server with an H100 or two A100s running Ollama and Gemma 4 27B is a legitimate production inference setup.

How to Set It Up with Clanker Cloud

Step 1 — Run Gemma 4 locally via Ollama:

ollama pull gemma2:27b
ollama run gemma2:27b

Ollama exposes an OpenAI-compatible endpoint at http://localhost:11434/v1 by default.

Step 2 — Point Clanker Cloud at your local endpoint:

In Clanker Cloud's settings, choose "Custom / Local" as your AI provider and enter:

Base URL: http://localhost:11434/v1
Model: gemma2:27b
API Key: (leave blank or use "ollama")

Step 3 — Use Clanker Cloud normally. Your infrastructure queries go to Gemma 4 running on your machine. No API call is made to any external AI service.

Trade-offs

Gemma 4 27B requires approximately 20GB VRAM for GPU inference or ~32GB RAM for CPU-only inference. Query latency is higher than cloud API models — expect 5–30 seconds for complex infrastructure questions depending on your hardware, versus 2–5 seconds with a cloud API. For most infrastructure investigation tasks (topology queries, cost analysis, incident triage), the quality is excellent. For highly complex multi-hop reasoning chains, the 27B model outperforms the 12B significantly.

Gemma 4 12B is faster and works on a machine with 16GB RAM or an 8GB GPU. It's well-suited for quick "what's running in this namespace?" queries during development.

Option 2: Claude Code as Your Infrastructure Agent

What Claude Code Is

Claude Code is Anthropic's agentic coding tool. Engineers use it to write code, debug, refactor, and run terminal tasks — increasingly as a full end-to-end development workflow manager. It's particularly strong at reasoning over multiple files, understanding codebases, and executing multi-step plans.

The MCP Integration

Clanker Cloud exposes a Model Context Protocol (MCP) server endpoint. Claude Code can call into Clanker Cloud mid-task to fetch live infrastructure context — without you switching tools.

Example workflow — debugging a deployment failure:

You're in Claude Code, investigating why a production deploy failed. Instead of switching to a separate infrastructure console:

You (in Claude Code): "Debug why the ECS service frontend-prod failed to deploy"

Claude Code → calls Clanker Cloud MCP: "What's the current state of the frontend-prod ECS service in us-east-1?"

Clanker Cloud: [returns live ECS service status, task definition diff, CloudWatch errors, recent deployments]

Claude Code: uses that context to identify the root cause and suggest a fix

The infrastructure context is available inside Claude Code's reasoning loop. You never leave your terminal. Claude Code's reasoning over code and Claude Code's reasoning over infrastructure state happen in the same context window.

This is most powerful for Claude Code Kubernetes workflows — deployments, rollbacks, scaling events, and pod-level debugging all become queries that Claude Code can issue inline.

Data Flow Note

With Claude Code, your infrastructure context (the queries and the fetched state) travels to Anthropic's API. Your cloud credentials themselves stay on your machine — Clanker Cloud fetches infrastructure data using your credentials locally, then passes only the relevant context to Claude Code. If sending infrastructure metadata to Anthropic is acceptable for your use case, this is the highest-quality reasoning option available.

For technical MCP setup documentation, see Clanker Cloud's agent integration docs.

Option 3: Codex for Coding-to-Infrastructure Workflows

What Codex Is in 2025/2026

OpenAI's Codex has evolved into an agentic coding system used for automated coding tasks, PR generation, and increasingly, DevOps automation. Codex CLI runs locally and can execute multi-step tasks that involve reading, writing, and running code. It's the natural home for vibe coders who build in OpenAI's ecosystem and need to connect AI-generated code to real cloud infrastructure.

Codex Infrastructure Automation via MCP

The integration uses the same MCP pattern as Claude Code. Codex agent workflows can call Clanker Cloud to get live infrastructure context inline — turning Codex infrastructure automation into a real workflow.

Example — Kubernetes manifest validation:

Codex generates a Kubernetes deployment manifest for a new service.

Before applying, Codex queries Clanker Cloud:
"Does this deployment conflict with anything currently running in the production cluster? 
 What's the current resource utilization on the nodes it would be scheduled on?"

Clanker Cloud returns live cluster state.

Codex revises the manifest based on actual constraints.

This closes the loop between AI-generated code and production reality — a gap that causes the majority of "it worked in dev" failures. For anyone doing run AI locally infrastructure with an OpenAI-first stack, this is the practical path.

For vibe coders in particular: generating infrastructure-touching code without knowing the current state of your infrastructure is how you accidentally create duplicate resources, blow your budget, or break existing services. Codex + Clanker Cloud MCP means the agent knows what it's deploying into.

Option 4: Hermes for Local Agent Workflows

What Hermes Is

NousResearch's Hermes model series — Hermes 3, Hermes Pro — is fine-tuned specifically for function calling, tool use, and agentic reasoning. It's widely used in local agent frameworks: LangChain, LlamaIndex, AutoGen, CrewAI, and custom agent pipelines.

Hermes differs from general-purpose local models in a meaningful way: it's trained to call tools reliably, handle structured output, and reason through multi-step plans — exactly the capability profile you need for Hermes AI agent DevOps use cases like incident triage, automated runbooks, and infrastructure investigation pipelines.

The Integration

Run Hermes locally via Ollama or LM Studio, connect it as your BYOK model in Clanker Cloud, and you have a fully local, agent-capable reasoning backend for infrastructure operations.

For agentic setups, Hermes-based agents can call Clanker Cloud via MCP to include live infrastructure context in multi-step tasks. Everything stays local.

Example — local CrewAI incident triage agent:

# CrewAI agent setup (conceptual)
incident_agent = Agent(
    role="On-call incident responder",
    llm=HermesLocalLLM(endpoint="http://localhost:11434/v1"),
    tools=[ClankerCloudMCPTool()]  # via MCP
)

# Agent flow:
# 1. Detect anomaly (alert fires)
# 2. Hermes calls Clanker Cloud: "What changed in the last 30 minutes?"
# 3. Clanker Cloud returns recent deployments, config changes, scaling events
# 4. Hermes reasons over the context, generates a triage report
# 5. Agent escalates or auto-remediates with human approval

This is a bring your own AI model DevOps workflow that runs entirely on your infrastructure. No data leaves. No SaaS subscriptions. No per-token costs beyond your hardware.

Comparison: Which Model for Which Use Case

Model	Type	Best For	Trade-off
Gemma 4 27B	Local open-weight	Air-gapped environments, GDPR-strict teams, zero data egress	~20GB VRAM required, slower than cloud APIs
Gemma 4 12B	Local open-weight	Developer workstations, fast local queries, cost-sensitive	Less capable than 27B for complex multi-step reasoning
Claude Code (Sonnet/Opus)	API + agentic	End-to-end coding + infra workflows, highest reasoning quality	Infrastructure context goes to Anthropic's API
Codex (GPT-4o-based)	API + agentic	OpenAI-native workflows, Codex CLI integrations, vibe coding	Infrastructure context goes to OpenAI's API
Hermes 3 / Hermes Pro	Local open-weight	Local agentic pipelines, function calling, multi-step tasks	Requires local setup, smaller ecosystem than cloud models
Any OpenAI-compatible	API or local	Custom or fine-tuned models, enterprise self-hosted deployments	Varies by model and hosting setup

For teams with strict data sovereignty requirements, the local options (Gemma 4, Hermes) are the answer. For teams optimizing for reasoning quality and already using Anthropic or OpenAI, Claude Code or Codex with MCP gives the best end-to-end workflow. For teams wanting a hybrid — local for sensitive queries, cloud for complex reasoning — Clanker Cloud lets you switch the model without changing anything else.

Security and Data Flow: What Stays Local, What Doesn't

This is the question that matters. Here's the precise breakdown:

Gemma 4 or Hermes running locally:

Cloud credentials: stay on your machine
Infrastructure queries: processed locally
Fetched infrastructure state (resource names, IPs, costs, IAM): processed locally
Responses: generated locally
Result: zero external data flow

Claude Code or Codex API:

Cloud credentials: stay on your machine (Clanker Cloud never transmits your AWS/GCP/Azure keys)
Infrastructure queries + fetched context: sent to Anthropic / OpenAI API
Responses: returned from Anthropic / OpenAI API

The key architectural distinction: Clanker Cloud is the query layer, not the AI layer. It fetches live infrastructure data using your credentials (locally), packages that as context, and hands it to whichever AI you've configured. If that AI is local, nothing leaves your machine. If that AI is an API, the context goes to that API — but only the context, never the raw credentials.

Your cloud API keys are not transmitted to any AI provider by Clanker Cloud. The model sees the result of a query like "ECS service frontend-prod has 3 running tasks, last deployment 14 minutes ago, exit code 1 on task abc123" — not your AWS access key.

For DevOps teams with compliance requirements, this distinction is worth documenting in your data flow diagrams.

Getting Started: Quick Setup

Step 1 — Download the Clanker Cloud desktop app

https://clankercloud.ai — installs in under a minute.

Step 2 — Connect your cloud providers

AWS, GCP, Azure, Kubernetes, Cloudflare, Hetzner, DigitalOcean, GitHub — your credentials are stored locally on your machine.

Step 3 — Choose your AI model

Anthropic (Claude): paste your Anthropic API key
OpenAI (Codex/GPT-4o): paste your OpenAI API key
Local (Gemma 4, Hermes, any Ollama model): enter http://localhost:11434/v1 as the base URL, no API key required

Step 4 — Start querying

"What's consuming the most cost in my AWS account this month?"
"Show me all pods with restart count > 5 in the production namespace"
"What changed in my infrastructure in the last 24 hours?"

For MCP and agent integration, the full setup guide is at docs.clankercloud.ai. The open-source CLI is also available at github.com/bgdnvk/clanker if you prefer a terminal-first workflow.

Book a demo to see the full workflow with your preferred model.

Conclusion

The default assumption in AI DevOps tooling — that you'll pipe your infrastructure data through a vendor's hosted model — doesn't hold for a large portion of the teams that most need these tools. Regulated industries, EU teams, security-conscious startups, and engineers who've already built AI workflows around Claude Code, Codex, or local models all need a different architecture.

Clanker Cloud's BYOK model means you're not choosing between AI-assisted infrastructure operations and data control. You get both. Run Gemma 4 local inference for zero-egress operations. Connect Claude Code or Codex via MCP to bring infrastructure context into your existing agent workflows. Run Hermes locally for multi-step agentic pipelines that stay entirely on your hardware.

The infrastructure query layer is handled. You own the AI.

Download Clanker Cloud — free, installs in one minute.

FAQ

Can I use my own AI model with Clanker Cloud?

Yes. Clanker Cloud supports any OpenAI-compatible endpoint as a model source. That includes cloud APIs (Anthropic, OpenAI, Google, Mistral) and locally hosted models via Ollama, LM Studio, or llama.cpp. You configure the base URL and API key (or no key for local models), and Clanker Cloud routes all AI inference through your endpoint. Nothing goes through Clanker Cloud's servers.

How do I run Gemma 4 locally for infrastructure operations?

Install Ollama (brew install ollama or equivalent), pull the Gemma model (ollama pull gemma2:27b), and start the server. Ollama runs an OpenAI-compatible API at http://localhost:11434/v1. In Clanker Cloud, set the AI provider to "Custom" and enter that URL. All infrastructure queries will be processed by your local Gemma 4 instance — no external AI API calls, no data egress. For CPU-only machines, Gemma 4 12B is more practical; Gemma 4 27B benefits significantly from GPU acceleration.

What is BYOK in AI DevOps tools?

BYOK (Bring Your Own Keys) in AI DevOps means you supply your own AI provider credentials rather than using the platform's hosted AI. In practice, this means your infrastructure queries travel from your machine to your AI provider directly, without routing through the DevOps platform's servers. The result: you control data flow, cost (no token markup), and model selection. Clanker Cloud extends this to local models — BYOK in its strongest form means no external AI call at all.

Can Claude Code manage cloud infrastructure?

Claude Code can query and reason over live cloud infrastructure when connected to Clanker Cloud via MCP (Model Context Protocol). Claude Code sends queries to Clanker Cloud's MCP endpoint, which fetches live state from AWS, GCP, Azure, Kubernetes, and other providers, then returns the context to Claude Code's reasoning loop. This means Claude Code can answer questions like "what's the current state of the production ECS cluster?" as part of a coding or deployment task, without you switching tools. For infrastructure changes, Clanker Cloud requires explicit human approval in maker mode — Claude Code can plan, but a human confirms execution. See the agent integration docs for MCP setup.

How do I connect Codex to my infrastructure?

Codex connects to Clanker Cloud via the Model Context Protocol (MCP). Configure Clanker Cloud as an MCP tool in your Codex agent setup, with the Clanker Cloud MCP server URL as the endpoint. Codex can then issue infrastructure queries mid-task — "what resources are currently deployed in this cluster?", "what's the current state of this S3 bucket policy?" — and receive live answers from Clanker Cloud. Full setup documentation is at docs.clankercloud.ai. The open-source CLI provides an alternative path for terminal-native Codex workflows.

Byline

Clanker Cloud Team

Editorial Team

Clanker Cloud Editorial Team writes about local-first infrastructure, multi-cloud operations, AI-assisted incident response, and safer workflows for builders and infrastructure teams.