13 min readClanker Cloud Editorial Team

Running Production Infrastructure on Hetzner and DigitalOcean: An AI Operations Guide

Run production workloads on Hetzner and DigitalOcean without the ops blindspots. A practical AI operations guide for EU and global engineering teams.

Download and plan a deploy Watch demo

Hetzner and DigitalOcean have built their reputations on the same promise: serious compute at a fraction of AWS and GCP prices, without the enterprise weight. A Hetzner AX162 dedicated server — 128 GB RAM, 2× 3.84 TB NVMe, AMD EPYC — costs around €80/month. The AWS r5 equivalent runs five times that. A DigitalOcean managed Kubernetes cluster with managed Postgres and Spaces storage is fully operational before most AWS architecture diagrams are finished.

This is why Hetzner production infrastructure powers a significant share of European startups, indie SaaS tools, and GDPR-sensitive workloads. And why DigitalOcean production deployment is the default choice for early-stage global teams who want transparent billing and Kubernetes without a certification prerequisite. The infrastructure is genuinely excellent.

The gap isn't compute. It's operations. When something breaks at 2am on a Hetzner dedicated server — or a pod in your DOKS cluster starts crashlooping — you need observability tooling, not just good hardware. That's where the AWS tooling advantage historically showed up. This guide covers how to close that gap with AI-assisted ops.

Why Engineering Teams Choose Hetzner and DigitalOcean

Hetzner

Hetzner Cloud operates data centers in Nuremberg, Falkenstein, Helsinki, and Ashburn (US East). The European-first topology makes it the default choice for teams with EU data residency requirements and GDPR obligations.

The product lineup covers a wide range: CX-series shared vCPU Cloud Servers starting under €5/month, AX and EX dedicated root servers with NVMe arrays, and GPU instances including A100 configurations for ML inference workloads. Hetzner Kubernetes runs via the open-source hcloud-controller-manager, which provisions Hetzner Cloud Servers as Kubernetes nodes and integrates Floating IPs and Load Balancers natively.

Self-hosted tools — Gitea, Plausible Analytics, Meilisearch, n8n — are almost universally deployed on Hetzner in the EU community, because the hardware density per euro is unmatched.

DigitalOcean

DigitalOcean's strength is managed abstractions done right. DOKS (DigitalOcean Kubernetes Service) is arguably the cleanest managed Kubernetes offering below the hyperscaler tier: straightforward node pool management, automatic upgrades, built-in monitoring, and no hidden networking fees. Managed Databases (Postgres, MySQL, Redis, MongoDB) are production-ready and sized predictably. App Platform removes the Kubernetes layer entirely for simpler workloads. Spaces provides S3-compatible object storage with a CDN that waives egress fees when traffic flows through it.

The common thread across both providers: these teams want operational efficiency without enterprise pricing. They've made a deliberate architectural choice. But that choice has an ops cost that isn't always obvious at signup.

The Operational Gaps Compared to AWS

Being honest about the trade-offs matters. Here's what you give up:

Native monitoring is thin. Hetzner provides basic CPU, disk, and network graphs in the Cloud Console and Robot interface. DigitalOcean offers Droplet monitoring with alert thresholds and a basic metrics dashboard. Neither has anything comparable to CloudWatch Logs Insights, AWS Security Hub, or Cost Explorer's filtering depth.

Kubernetes on Hetzner requires more ownership. While hcloud-controller-manager handles cloud-provider integration well, you're responsible for your own CNI plugin (typically Flannel or Cilium), ingress controller, cert-manager, cluster autoscaler configuration, and storage provisioning. DOKS removes most of this, but if you're on Hetzner Kubernetes you're closer to bare metal operations than you might expect.

Multi-cloud complexity is real. A common production stack for EU teams: Hetzner for compute, DigitalOcean for managed Postgres or Redis, Cloudflare for DNS and CDN, GitHub for CI. That's four separate consoles, four separate CLIs (hcloud, doctl, Cloudflare API, gh), and zero native correlation between them. Tracing a latency issue across this stack manually means browser tab archaeology.

CLI tools are capable but terse. hcloud and doctl are genuinely good CLIs. But both assume you already know what you're looking for. Querying across resource types, correlating events, or exploring an unfamiliar cluster requires knowing the exact subcommands and flags.

Common Operational Scenarios: Manual vs. AI-Assisted

Scenario 1: A Hetzner Server Is Unresponsive

The manual path:

# List all servers and status
hcloud server list

# Get detailed server info
hcloud server describe <server-id>

# Check server metrics in the last hour
hcloud server metrics --type cpu,disk,network <server-id> --start $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ)

# SSH in and investigate
ssh root@<server-ip>
top
dmesg | tail -50
systemctl status --failed
journalctl -p err --since "1 hour ago"

Then open the Hetzner Cloud Console to check if the IPMI/out-of-band console shows anything, cross-reference with the Hetzner Status page for regional incidents, and decide whether to hard-reboot via the Robot API.

With Clanker Cloud:

"What's the status of my Hetzner servers and are any showing high resource usage?"

Clanker Cloud pulls live status across all servers, surfaces any with elevated CPU, disk pressure, or failed services, and can tell you the last-known metrics without you touching a terminal.

Scenario 2: A DOKS Pod Is Crashlooping

The manual path:

# Save DOKS kubeconfig
doctl kubernetes cluster list
doctl kubernetes cluster kubeconfig save <cluster-id>

# Find unhealthy pods
kubectl get pods -A | grep -v Running

# Investigate the crashlooping pod
kubectl describe pod <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace> --previous

# Check resource pressure on nodes
kubectl top nodes
kubectl top pods -A

With Clanker Cloud:

"Which pods in my DigitalOcean Kubernetes cluster are unhealthy?"

The answer comes back with pod status, recent events, and log snippets — without saving a kubeconfig first or knowing which namespace to look in.

Scenario 3: DigitalOcean Spaces Costs Are Spiking

The manual path:

# Check current month's invoice
doctl billing invoice list
doctl billing invoice get <invoice-id>

# List Spaces buckets (requires s3cmd or AWS CLI configured for DO endpoint)
s3cmd ls --host=nyc3.digitaloceanspaces.com

# Get bucket size — no native doctl subcommand for this
s3cmd du s3://<bucket-name>

Then manually correlate bucket sizes with line items on the invoice to identify the offending bucket.

With Clanker Cloud:

"What's consuming the most storage in my DigitalOcean Spaces buckets?"

Gets you a ranked breakdown by bucket without configuring a separate S3-compatible client.

Scenario 4: Understanding the Full Production Topology

The manual path: Open Hetzner Cloud Console in one tab. DigitalOcean Console in another. Cloudflare DNS in a third. Piece together which server handles what, which database backs which service, how traffic routes through Cloudflare Workers or Page Rules, and where the Floating IPs resolve.

With Clanker Cloud:

"Show me the full architecture of my production stack."

Surfaces Hetzner servers with their Floating IPs and attached volumes, DOKS clusters with node pools and services, DigitalOcean Managed Databases and which apps are connected, and Cloudflare DNS records and proxy status — in a single response. This is the topology view that neither console provides on its own.

Debugging Guide: Hetzner-Specific Operations

The hcloud CLI covers Hetzner Cloud resources. For dedicated servers, the Hetzner Robot API is separate.

Useful hcloud commands for production debugging:

# List all resources with status
hcloud server list -o columns=id,name,status,ipv4,datacenter,server_type
hcloud volume list
hcloud load-balancer list
hcloud firewall list

# Inspect a specific server
hcloud server describe <id>

# Rebuild / power cycle
hcloud server reboot <id>
hcloud server reset <id>        # hard reset
hcloud server power-on <id>

# Attach/detach floating IP
hcloud floating-ip assign <floating-ip-id> --server <server-id>

# SSH via Hetzner-managed rescue system
hcloud server enable-rescue --type linux64 <id>
hcloud server reset <id>

# Metrics (CPU, disk, network)
hcloud server metrics --type cpu <id> \
  --start 2025-01-01T00:00:00Z \
  --end 2025-01-01T01:00:00Z

For dedicated root servers (Hetzner Robot):

The Robot API (https://robot-ws.your-server.de) handles dedicated servers. Authentication is separate from Hetzner Cloud API tokens. Key endpoints: /server (list/describe), /reset (hardware reset types), /boot (rescue modes), /rdns (reverse DNS).

Where Hetzner's native tooling falls short:

No log aggregation or centralized logging — you need to ship logs to an external service (Loki, Datadog, Papertrail) yourself
Metrics retention is short; there's no anomaly detection or alerting built in
No correlation between server metrics and application-level events
Kubernetes control-plane metrics require self-configuration (Prometheus, metrics-server)

Clanker Cloud supplements this by combining live hcloud API data with your connected Kubernetes context and any Cloudflare or GitHub data, giving you a correlated view that the Hetzner Console can't provide. See the Clanker Cloud documentation for how to connect your Hetzner API token.

Debugging Guide: DigitalOcean-Specific Operations

Useful doctl commands for production debugging:

# Droplets
doctl compute droplet list
doctl compute droplet get <id>
doctl compute droplet-action reboot <id>

# DOKS
doctl kubernetes cluster list
doctl kubernetes cluster get <id>
doctl kubernetes cluster kubeconfig save <id>
doctl kubernetes cluster node-pool list <cluster-id>

# Managed Databases
doctl databases list
doctl databases get <id>
doctl databases connection <id>    # connection string
doctl databases backups <id>       # list backups
doctl databases firewall list <id> # trusted sources

# Spaces (uses s3-compatible endpoint)
# Billing
doctl billing balance get
doctl billing invoice list

# Monitoring alerts
doctl monitoring alert list
doctl monitoring alert get <alert-id>

Where DigitalOcean's native monitoring falls short:

DO monitoring supports threshold alerts on Droplet-level metrics (CPU %, memory %, disk %, bandwidth). It does not provide log analytics, Kubernetes-level application metrics without a separate agent, or cross-resource correlation. The App Platform provides some deployment-level logging but not query-level observability.

For DOKS specifically: DO doesn't expose Kubernetes control-plane logs or API server audit logs. Debugging cluster-level issues requires deploying your own observability stack (Prometheus + Grafana, or an external APM).

Clanker Cloud's AI DevOps for Teams workflow integrates with doctl natively, letting you query Droplet status, DOKS cluster health, managed database connectivity, and Spaces usage through a single conversational interface without configuring separate monitoring agents.

How Clanker Cloud Works with Hetzner and DigitalOcean

Clanker Cloud has native support for both providers. Connection is straightforward:

Generate a Hetzner Cloud API token (read-only for read-first mode, read-write for maker mode)
Generate a DigitalOcean personal access token with the required scopes
Add both to Clanker Cloud's credential store — they stay local, never transmitted to a third-party service

From there, both providers appear on the same surface alongside any connected AWS, GCP, Azure, Kubernetes, Cloudflare, or GitHub accounts. You're querying across all of them simultaneously.

Maker mode — Clanker Cloud's term for write operations — requires explicit opt-in. In read-first mode, the agent gathers live context and surfaces recommendations. In maker mode, it can restart a Droplet, resize a Hetzner server, apply a Kubernetes manifest, or update a firewall rule — but shows you the intended change and asks for confirmation first. The plan-then-execute approach means no accidental production changes from an ambiguous prompt.

BYOK model flexibility: Clanker Cloud is bring-your-own-key across all supported models. For EU teams with strict data sovereignty requirements, running Gemma 4 locally on a Hetzner GPU server means zero infrastructure metadata leaves your network. The agent runs on your machine, calls your local model, and queries your infrastructure via local API tokens. For teams that want higher reasoning capability, Claude Code, Codex, or Hermes can be connected via their own API keys with no token markup. See the product demo to see how this works in practice.

Hetzner + Clanker Cloud for EU Teams: The Sovereign AI Ops Stack

This combination warrants a specific callout.

Hetzner is a German company. Its data centers are in Germany and Finland. Its infrastructure sits inside the EU legal framework. For teams building with GDPR obligations — SaaS products handling personal data, health tech, fintech, anything touching EU user data — Hetzner is often the first infrastructure choice precisely because of this.

Clanker Cloud's local-first architecture means your infrastructure metadata — server IDs, IP addresses, cluster configurations, database connection strings — never leaves your machine. The app is installed locally. Credentials are stored locally. API calls go directly from your machine to provider APIs.

Combine these with BYOK local inference (Gemma 4 running on a Hetzner GPU node, or locally on your workstation) and you have an AI operations stack where no data — infrastructure metadata, query context, or model inputs — transits a third-party cloud. That's the most GDPR-sovereign AI ops configuration available today.

As EU-based model providers mature (Mistral already offers strong options; more will follow), the BYOK layer means you can swap in any compliant model without changing anything else in your ops workflow.

FAQ

Is Hetzner good for production?

Yes, with the right setup. Hetzner's hardware is enterprise-grade — EX and AX dedicated servers run the same NVMe, ECC RAM, and AMD EPYC or Intel Xeon configurations as data center hardware costing multiples more. The trade-off is that Hetzner provides less managed tooling than AWS or GCP. You're responsible for your own monitoring, alerting, log aggregation, and (on Hetzner Kubernetes) much of your cluster control plane configuration. Teams that run production workloads successfully on Hetzner typically pair it with external observability (Grafana Cloud, Loki, Datadog) and, increasingly, AI-assisted ops tooling to compensate for the thinner native visibility layer.

How do I monitor DigitalOcean Kubernetes?

DOKS doesn't expose control-plane metrics or logs natively. For production monitoring you'll want to deploy the DigitalOcean Kubernetes Monitoring Stack (Prometheus + Grafana via the DO Marketplace, or self-deployed), configure metrics-server for kubectl top to work, and set up log forwarding (Fluent Bit → Loki or an external aggregator). DO's built-in monitoring covers Droplet-level metrics and basic alerting. For application-level and Kubernetes-level observability, you need a self-managed or external stack. Clanker Cloud can query live cluster state — pod health, events, node pressure — without a pre-deployed monitoring stack, which helps for quick triage before a full observability setup is in place.

Can I use AI tools with Hetzner Cloud?

Yes. Hetzner exposes a full REST API and the hcloud CLI covers all Cloud resources programmatically. Clanker Cloud connects via your Hetzner API token and queries servers, volumes, load balancers, floating IPs, networks, and firewall rules in natural language. For dedicated servers, the Robot API provides equivalent programmatic access. Hetzner GPU instances (including A100 configurations) are also useful for running local AI models — combining a Hetzner GPU server for model inference with Clanker Cloud's local-first architecture gives you a fully self-contained AI operations setup where no data leaves your infrastructure.

What is the best alternative to AWS for EU teams?

Hetzner is the strongest choice for EU-sovereign compute, particularly for cost-sensitive and data-residency-sensitive workloads. It offers the best price-to-performance ratio in the EU market by a significant margin, German and Finnish data centers, and full compliance with EU data protection law. DigitalOcean is a strong complement for managed services — particularly managed Kubernetes (DOKS), Postgres, and Redis — with a developer-friendly API and transparent pricing. The combination of Hetzner for compute and DigitalOcean for managed services, with Cloudflare at the edge, is a popular and well-tested EU production stack. For teams that also need AI operations tooling, Clanker Cloud's local-first, BYOK architecture is designed specifically to work within this stack without introducing additional data residency concerns.

Conclusion

Hetzner and DigitalOcean aren't compromises. They're deliberate infrastructure choices made by engineers who've done the math and decided that 3-5× cheaper compute, EU data residency, and developer-friendly APIs are worth operating outside the AWS ecosystem.

The operational gap is real, but it's closeable. With the right CLI fluency, an external observability stack, and AI-assisted ops tooling that understands both providers natively, you get the cost and sovereignty advantages without giving up production-grade visibility.

If your stack runs on Hetzner, DigitalOcean, or both — and you want to query, debug, and operate it in plain English without context-switching between four browser tabs — try Clanker Cloud free. One-minute setup, local-first, no token markup on your AI model keys.

More on how AI-assisted operations works for engineering teams: AI DevOps for Teams | Product Demo | Documentation

Next step

Move the repo from prototype to production

Install the desktop app, connect GitHub plus one cloud provider, and review the deployment plan before Clanker Cloud touches real infrastructure.

Download and plan a deploy Watch demo

Byline

Clanker Cloud Editorial Team

Editorial Team

Clanker Cloud Editorial Team writes about local-first infrastructure, multi-cloud operations, AI-assisted incident response, and safer workflows for builders and infrastructure teams.