Skip to main content
Back to blog

Infrastructure as Code and Kubernetes: Practical Implementation Patterns for 2026

Five practical IaC Kubernetes patterns for 2026: Terraform+Helm, GitOps, Crossplane, Pulumi, and the minimal viable stack. Real trade-offs, real gotchas.

IaC Kubernetes is not a single thing. In 2026, at least five distinct approaches are running in production across teams of all sizes — and each one is the right answer under specific conditions. This is not a tool survey. It is a patterns guide: given the tools that exist, how do you actually structure your IaC for Kubernetes, and where does each pattern break down?

Every pattern below includes a concrete repo structure, the specific failure mode you will hit, and how to fix it before it hits production.


Pattern 1: Terraform + Helm (the most common production pattern)

This is the default for teams migrating from traditional infrastructure to Kubernetes. Terraform manages the cluster itself and the surrounding cloud resources — VPC, subnets, IAM roles, RDS, S3 buckets. Helm manages application deployments inside the cluster. The boundary is clean: Terraform owns infrastructure, Helm owns workloads.

Typical repo structure:

infra/
  terraform/
    modules/
      eks/
      vpc/
      rds/
    envs/
      staging/
      production/
  helm/
    charts/
      api-service/
      worker/
    values/
      staging.yaml
      production.yaml

How Terraform outputs feed Helm values:

The most common production pitfall is Helm chart values drifting from Terraform outputs. You provision an EKS cluster and an IAM role for service account (IRSA). Terraform outputs the role ARN. Your Helm values.yaml hardcodes that ARN. Six months later, you recreate the IAM role — new ARN — and the Helm values are never updated. The pod starts failing to access S3 at 2 AM.

The fix is to make Terraform outputs the single source of truth for any value that Helm consumes:

# CI pipeline step after terraform apply
ROLE_ARN=$(terraform output -raw api_service_role_arn)
helm upgrade api-service ./helm/charts/api-service \
  --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=$ROLE_ARN \
  -f ./helm/values/production.yaml

This wires Terraform outputs directly into the Helm install step. No manual values editing, no drift.

Where it breaks down: When your team grows and multiple squads own different Helm charts, the values sprawl becomes hard to govern. You end up with dozens of values-overrides-final-v2.yaml files with no clear ownership. At that point, moving toward GitOps (Pattern 2) pays off.


Pattern 2: GitOps with ArgoCD or Flux

In a GitOps model, Git is the single source of truth for all Kubernetes state. ArgoCD or Flux watches one or more Git repositories and continuously syncs the cluster to match the declared state. Any deviation between live cluster state and Git is surfaced as drift.

Repo structure — app-of-apps pattern (ArgoCD):

gitops/
  bootstrap/
    argocd-app-of-apps.yaml    # root Application pointing at /apps
  apps/
    api-service.yaml
    worker.yaml
    monitoring.yaml
  manifests/
    api-service/
      deployment.yaml
      service.yaml
      hpa.yaml
    worker/
      deployment.yaml

The argocd-app-of-apps.yaml is a single ArgoCD Application that points at the /apps directory. Each file in /apps is itself an ArgoCD Application pointing at a specific subdirectory in /manifests. This gives you a single bootstrap entrypoint and independent sync status per service.

Environment promotion is handled via directory structure, not branching:

gitops/
  manifests/
    api-service/
      base/
        deployment.yaml
      overlays/
        staging/
          kustomization.yaml
        production/
          kustomization.yaml

Kustomize overlays per environment. ArgoCD Applications reference overlays/staging and overlays/production respectively. Promotion is a PR that updates the production overlay's image tag.

Secret management — the one thing you must get right:

Never put secrets in Git, even base64-encoded. The External Secrets Operator is the standard solution in 2026. It connects to AWS Secrets Manager, GCP Secret Manager, or HashiCorp Vault and syncs secrets as native Kubernetes Secret objects. Your Git repo contains only the ExternalSecret resource, which references a secret path — not the secret value itself.

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: api-db-credentials
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: ClusterSecretStore
  target:
    name: api-db-credentials
  data:
    - secretKey: DB_PASSWORD
      remoteRef:
        key: production/api/db
        property: password

The alternative — Sealed Secrets — is acceptable for smaller teams, but External Secrets Operator is easier to rotate and audit in production.

Where it breaks down: GitOps works until a developer makes a kubectl change directly on the cluster. ArgoCD detects the drift and reverts it on the next sync cycle, which causes confusion if the team does not know why their change disappeared. Enforce Git-only changes as a team policy and configure ArgoCD sync policies to auto-heal.

For a deeper look at structuring GitOps for multi-team environments, see the AI DevOps for Teams guide.


Pattern 3: Crossplane (Kubernetes-native everything)

Crossplane moves cloud resource provisioning — RDS, S3, VPC, GKE clusters — into Kubernetes as Custom Resource Definitions. You write a PostgreSQLInstance CRD instead of a Terraform resource block. The Crossplane controller reconciles the live cloud state against your CRD definitions, the same way the K8s controller manager reconciles Deployment replicas.

When this makes sense:

  • Your team is deeply Kubernetes-native and already operates the cluster well
  • You want a single control plane for both workloads and infrastructure
  • You are running a platform engineering team building self-service infrastructure for other squads (Compositions + CompositeResourceDefinitions are powerful here)

A minimal Crossplane RDS definition:

apiVersion: database.aws.crossplane.io/v1beta1
kind: RDSInstance
metadata:
  name: api-postgres
spec:
  forProvider:
    region: us-east-1
    dbInstanceClass: db.t3.medium
    engine: postgres
    engineVersion: "15"
    masterUsername: apiuser
    allocatedStorage: 20
  writeConnectionSecretsToRef:
    namespace: default
    name: api-postgres-conn

Crossplane writes the connection secret directly to Kubernetes. Your application reads it as a normal K8s Secret. No Terraform state file, no separate pipeline.

Where it breaks down: Crossplane Compositions — the mechanism for building reusable, self-service infrastructure templates — are genuinely complex to write and debug. A Composition that provisions a VPC + subnets + EKS + node group involves nested patch-and-transform rules that are easy to misconfigure and hard to trace when they fail. The error messages from Crossplane controllers are improving, but debugging a failed Composite Resource still requires understanding both the Crossplane resource model and the underlying cloud provider API. Budget significant ramp-up time if your team is coming from a Terraform background.


Pattern 4: Pulumi (code-first everything)

Pulumi manages cloud resources and Kubernetes resources in a single codebase using real programming languages — TypeScript, Python, Go, or C#. You write a program, not a DSL. One Pulumi stack can provision a GKE cluster and deploy a Kubernetes workload to it in the same program.

TypeScript example — GKE cluster + Kubernetes Deployment:

import * as gcp from "@pulumi/gcp";
import * as k8s from "@pulumi/kubernetes";

const cluster = new gcp.container.Cluster("api-cluster", {
  initialNodeCount: 3,
  nodeConfig: { machineType: "e2-standard-4" },
});

const kubeconfig = cluster.name.apply(name =>
  generateKubeconfig(cluster, name)
);

const provider = new k8s.Provider("gke-provider", { kubeconfig });

const deployment = new k8s.apps.v1.Deployment("api", {
  spec: {
    replicas: 2,
    selector: { matchLabels: { app: "api" } },
    template: {
      metadata: { labels: { app: "api" } },
      spec: {
        containers: [{
          name: "api",
          image: "gcr.io/my-project/api:latest",
          ports: [{ containerPort: 8080 }],
        }],
      },
    },
  },
}, { provider });

The cluster and the workload are managed in one pulumi up. You get type checking, IDE autocompletion, and the ability to use loops, conditionals, and functions across your entire infrastructure definition.

Where it breaks down: Mixing cloud infrastructure and Kubernetes workloads in one codebase couples their release cycles. A change to your GKE node pool version and a change to a Deployment replicas value go through the same Pulumi program. For small teams, this is fine. For larger teams, you will want to split Pulumi stacks by concern — one stack for the cluster, one stack for workloads — which partially negates the "everything in one place" benefit.

Also, Pulumi state must be stored somewhere (Pulumi Cloud, S3, or Azure Blob). State locking and concurrent runs require the same care as Terraform remote state.


Pattern 5: Minimal viable IaC for Kubernetes

If you are a startup or a small team and you are not yet sure which pattern is right for you, here is the minimum setup that is defensible at a Series A engineering review:

The stack:

  • OpenTofu (or Terraform Cloud) with an S3 backend + DynamoDB state locking for cluster provisioning
  • Helm for application packaging
  • ArgoCD for GitOps-style delivery
  • External Secrets Operator for secret management

The rules:

  1. All cluster config changes go through OpenTofu, reviewed in PRs
  2. All app deployments go through ArgoCD — no direct kubectl apply in production
  3. No secrets in Git — ever
  4. One environment per namespace (or per cluster if you can afford it)

This is not clever. It is intentionally boring. It covers drift detection, rollback, audit trails, and secret safety without requiring a platform engineering team to maintain it. You can graduate to Crossplane or full GitOps mono-repo later when the complexity is earned.


The AI layer above IaC

Regardless of which pattern you run, your IaC code describes the desired state — but it does not tell you what is actually running right now, why a pod is crashlooping, or which resource drifted since your last apply.

ClankerCloud.ai sits above your IaC as a plain-English query and operations layer. You still write Terraform, Helm, Crossplane, or Pulumi as usual. Clanker Cloud adds:

  • Plain English cluster queries: clanker ask "which deployments have fewer than 2 ready replicas?" — runs against live K8s state, no kubectl gymnastics
  • Drift detection between IaC and live state: clanker ask "what's drifted from my last Terraform apply?" — surfaces config drift across namespaces instantly
  • AI agent context via MCP: Claude Code writing a Helm chart, or Codex generating a Crossplane Composition, can query live K8s state through Clanker Cloud's MCP endpoint while they write. The agent sees what is actually running, not just what the spec says

Clanker Cloud supports EKS, GKE, AKS, Hetzner K3s, and bare-metal Kubernetes. It is BYOK: bring your own model (Gemma 4 via Ollama, Claude Code, Codex, Hermes) or run local-first with no cloud API calls.

For teams building AI-assisted workflows on top of Kubernetes IaC, see how ClankerCloud works with AI agents and the live demo.

The open-source clanker CLI powers the backend if you want to self-host.


Common IaC + Kubernetes mistakes

Terraform state file corruption. Running terraform apply from two machines without remote state locking corrupts the state file. Always use a remote backend with locking — S3 + DynamoDB, Terraform Cloud, or the OpenTofu equivalent. Recover with terraform import, not by editing the state file directly.

Secret leakage in GitOps. A single git commit containing a base64-encoded Secret pushed to a public repo means credential rotation for everything in that secret. Use External Secrets Operator from day one. There is no retroactive fix once a secret hits git history — git filter-branch and BFG Repo Cleaner clean the history but cannot revoke tokens that were already scraped.

Namespace collisions in multi-tenant clusters. Two teams both deploying a service named api to the default namespace. Use namespaces per team, per environment, or per service. Enforce namespace conventions via admission controllers (Kyverno or OPA Gatekeeper) — not just documentation.

Helm values sprawl. Ten environment overrides, five release-specific patches, and a hotfix values file that no one remembers creating. Consolidate values into base + overlay structure using Helm's --values layering, and delete override files when the condition they address is resolved.

Crossplane Composition debugging. A Composite Resource that never becomes Ready, with no error on the CR itself. Check the managed resources the Composition is creating — they carry the actual provider error. Run kubectl describe on the managed resource to find the cloud provider API rejection message.


FAQ

What is the best IaC approach for Kubernetes in 2026?

It depends on team size and K8s maturity. Terraform + Helm is the safest choice for most teams — it is widely understood, well-documented, and has the largest support ecosystem. GitOps with ArgoCD is the right next step when you need strong audit trails and multi-team collaboration. Crossplane is the right choice if you have strong K8s expertise and want a unified control plane. Pulumi is the right choice if your team prefers real programming languages over HCL or YAML.

Should I use Terraform or Crossplane for Kubernetes?

Use Terraform if your team already knows it, you need to manage non-Kubernetes cloud resources alongside K8s, or you want a proven recovery path when things go wrong. Use Crossplane if you want everything in the Kubernetes API, you are building a self-service platform for other teams, and you have the expertise to maintain Compositions.

How do I implement GitOps with Kubernetes?

Install ArgoCD (or Flux) on your cluster, create a Git repository for your manifests, and define ArgoCD Applications that point at directories in that repo. Use the app-of-apps pattern for managing multiple services. Use External Secrets Operator for secret management. Enforce that no changes go directly to the cluster — all changes go through Git. See the ClankerCloud demo for a walkthrough of GitOps drift detection in practice.

What is the minimum IaC setup a startup needs for Kubernetes?

OpenTofu (or Terraform) with remote state for cluster provisioning, Helm for application packaging, ArgoCD for deployment, and External Secrets Operator for secrets. That is it. Resist the urge to add Crossplane or Pulumi until you have a concrete problem that they solve. Check the FAQ for common questions about scaling this stack.


Get started with ClankerCloud

If you are setting up IaC Kubernetes patterns today, the infrastructure code is one part of the problem. Knowing what is actually running, what has drifted, and why something is broken is the other part.

Start using ClankerCloud — Beta is free, Lite is $5/month, Pro is $20/month. Full documentation at docs.clankercloud.ai.

Next step

Ask Clanker Cloud what your cluster is doing

Install the local app, connect your kubeconfig, and turn cluster state, workload health, cost context, and safe next steps into one readable answer.

Download and inspect a clusterWatch demo