Skip to main content
Back to blog

Best Kubernetes Platform for GPU Workload Orchestration in 2026

Merged into the canonical NVIDIA Kubernetes cost optimization guide to keep one stable page for GPU-operations coverage.

Merged article

This topic now lives on one canonical page

This platform comparison was merged into the canonical NVIDIA GPU operations guide to keep the strongest GPU orchestration coverage in one place.

Read the canonical article

GPU workloads on Kubernetes are not the same problem they were three years ago. The tooling has matured, managed platforms have added genuine GPU-aware features, and bare-metal has become a credible option for teams that know what they are doing. But the decision between GKE, EKS, AKS, and self-managed Kubernetes on Hetzner still comes with real trade-offs — cost, operational burden, driver management, and observability vary significantly across all four.

This article is for ML engineers and infrastructure engineers who need to make that choice in 2026 with real numbers.


The GPU Scheduling Problem on Kubernetes

GPU is not like CPU. It is an opaque resource from the scheduler's perspective: the default Kubernetes scheduler sees nvidia.com/gpu: 1 as an integer unit and has no awareness of GPU memory, compute capability, or topology. Until 2024, fractional GPU allocation was not possible without third-party solutions. That has changed.

nvidia-device-plugin is the bridge between Kubernetes and physical GPUs. It runs as a DaemonSet on every GPU-capable node, advertises GPU resources to the kubelet, and manages device access for containers. Where it breaks: driver version mismatches between the plugin and the installed CUDA toolkit, nodes cordoned or drained leaving the plugin pod in a bad state, and cluster upgrades that touch kernel versions without coordinated driver re-installation.

Time-slicing and MIG are now production-ready. NVIDIA time-slicing (configured via the device plugin ConfigMap) allows multiple pods to share a single physical GPU by time-multiplexing access. It carries no memory isolation — all pods share the full VRAM — but for inference workloads that are not memory-bound, it delivers real cost savings. MIG (Multi-Instance GPU), available on A100 and H100, provides hard memory and compute partitioning at the hardware level. Both are stable in production as of 2025 and are supported across all platforms discussed here.


Evaluation Criteria

When comparing Kubernetes platforms for GPU workloads, six things matter:

  1. Native GPU node pool management — does the platform auto-provision GPU nodes, handle taints, and support scale-to-zero?
  2. NVIDIA driver and toolkit version management — installing nvidia-device-plugin is the easy part; keeping drivers aligned across kernel upgrades is where platforms diverge sharply.
  3. GPU-aware auto-scaling — can the cluster scale up GPU nodes in response to pending pods requesting nvidia.com/gpu, and scale down idle nodes to zero?
  4. Spot/preemptible GPU instance support — training jobs can tolerate interruption; the cost difference between on-demand and spot GPU nodes is often 60–70%.
  5. Observability — DCGM Exporter integration, per-GPU utilization metrics, memory bandwidth, temperature, and NVLink throughput exposed to Prometheus.
  6. Multi-node training networking — for distributed training across multiple GPU nodes, the interconnect matters (EFA, InfiniBand, or plain Ethernet).

GKE (Google Kubernetes Engine)

GKE has the best managed GPU experience of the three cloud providers in 2026. Google's investment in its own GPU hardware (A100, H100, and L4 for inference) translates into tight integration between node pools and the Kubernetes scheduler.

Node pool setup is straightforward. Specify the accelerator type (nvidia-tesla-a100, nvidia-h100-80gb, nvidia-l4) and count in the node pool configuration; GKE handles driver installation automatically on Container-Optimized OS (COS) nodes. The cloud.google.com/gke-accelerator node selector label is applied automatically and can be used directly in pod specs.

GKE Autopilot takes this further: GPU nodes are provisioned on demand per workload, driver management is fully abstracted, and you pay only for what your pods consume. Autopilot supports L4 and A100 GPU classes in 2026; the cloud.google.com/gke-accelerator node selector still works and Autopilot provisions matching hardware transparently.

Node Auto-Provisioner (NAP) in Standard mode supports GPU nodes scaling to zero. When no workloads request GPU resources, nodes are terminated. When pending pods specify nvidia.com/gpu requests, NAP provisions matching nodes within 60–90 seconds. This is the cleanest GPU auto-scaling story in the managed cloud space.

Observability is a genuine differentiator: DCGM Exporter metrics are available in GKE Managed Prometheus with no additional DaemonSet configuration. GPU utilization, memory utilization, and SM activity are exposed per-GPU and per-pod out of the box.

Weaknesses: GKE is locked into Google Cloud. Egress costs are real for large model artifact workloads. Spot GPU availability varies by region — A100 spot nodes are frequently unavailable in us-central1 during peak demand. The g2-standard (L4) family is more reliably available on spot than a2 (A100).

Pricing: A100 on-demand via a2-highgpu-1g is approximately $2.93/hr. A100 spot is approximately $0.88/hr. L4 on-demand (g2-standard-4) is approximately $0.70/hr.


EKS (Amazon Elastic Kubernetes Service)

EKS is the dominant platform for multi-node distributed training, primarily because of EFA (Elastic Fabric Adapter) networking on p4d.24xlarge and p5.48xlarge instances. For single-node inference or fine-tuning workloads, it is competitive but requires more operational effort than GKE.

GPU instance families: p3 (V100), p4d (A100 + EFA), g4dn (T4), g5 (A10G), p5 (H100 + EFA). EKS does not auto-install GPU drivers; use the Amazon Linux 2 GPU AMI (CUDA, cuDNN, and NVIDIA drivers pre-baked) or manage driver installation via a bootstrap script or the NVIDIA GPU Operator.

The NVIDIA device plugin requires a DaemonSet deployment — without it, nodes will not advertise nvidia.com/gpu resources regardless of the GPU AMI used:

kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.5/nvidia-device-plugin.yml

Karpenter is the right auto-scaler for GPU workloads on EKS. Managed node groups with Cluster Autoscaler cannot bin-pack GPU pods or respond to mixed-instance-type requirements efficiently. Karpenter's NodePool CRD lets you specify GPU instance families with interruption handling for Spot, and it achieves GPU node scale-up in under 45 seconds in typical configurations. Without Karpenter, EKS GPU scaling is painful.

EFA for multi-node training is a genuine advantage. p4d.24xlarge nodes with EFA deliver 400 Gbps of network bandwidth between nodes, enabling near-linear scaling for large distributed training runs. If you are running training across 64+ GPUs, EKS with EFA is the correct answer.

Weaknesses: Driver management is manual unless you use the GPU AMI (which ties you to Amazon Linux 2). Karpenter adds operational complexity but is required for good GPU scaling. The EKS control plane fee ($0.10/hr per cluster) adds up across multiple clusters. GPU observability requires separate DCGM Exporter and Prometheus setup.

Pricing: p3.2xlarge (1x V100) is approximately $3.06/hr on-demand and $0.92/hr on Spot. g5.xlarge (1x A10G) is approximately $1.01/hr on-demand and $0.40/hr on Spot.


AKS (Azure Kubernetes Service)

AKS supports the NC, ND, and NV GPU VM series. NC v3 and NC A100 v4 are the most commonly used for ML workloads; NV series are optimized for remote visualization, not training or inference.

The NVIDIA GPU Operator is AKS's most useful GPU feature. Rather than requiring manual driver, CUDA toolkit, and device plugin installation, the Operator manages the full stack via CRDs. Deploy it, configure a ClusterPolicy, and it handles driver installation via a DaemonSet targeting GPU nodes using the feature.node.kubernetes.io/pci-10de.present=true NFD label. Cleaner than EKS's manual setup, though GKE's managed driver installation is still simpler.

Azure Spot VMs with AKS use VMSS spot priorities with the spot.azure.com/eviction-policy: Deallocate annotation. Azure's spot eviction rate for GPU SKUs is generally lower than AWS for comparable regions, making Spot GPU more predictable on AKS.

Strengths: The GPU Operator support is excellent and well-documented by Microsoft. Enterprises already in Azure benefit from native AD integration, Azure Monitor, and Azure Container Registry without additional configuration. AKS is the right platform for Windows-based GPU container workloads (DirectML, Windows CUDA containers).

Weaknesses: GPU SKU availability outside eastus and westeurope is inconsistent. NC A100 v4 nodes are frequently in limited availability. Base costs for comparable GPU compute are higher than GCP or AWS in several regions.

Best for: Enterprises with existing Azure commitments, Windows GPU container workloads, and teams that need Azure Active Directory integration with Kubernetes RBAC.


Bare-Metal Kubernetes on Hetzner

Hetzner's dedicated GPU servers are not a managed Kubernetes offering — they are bare-metal servers you install Kubernetes on yourself. The tradeoff is stark: 10–15x cost reduction compared to equivalent cloud GPU, with full operational responsibility in return.

Available GPU hardware: The AX102 server offers 2x RTX 3090 (48 GB VRAM total) at €189/month (~$205/month). The AX101 offers 1x RTX 3090 at approximately €119/month. RTX 4090 configurations are appearing in Hetzner's robot ordering system as of early 2026.

Setup: k3s is the simplest Kubernetes installation for Hetzner servers — single binary, low memory overhead, compatible with the NVIDIA device plugin:

# Install k3s on control node
curl -sfL https://get.k3s.io | sh -

# Install NVIDIA drivers (Ubuntu 22.04)
apt install -y nvidia-driver-535 nvidia-container-toolkit

# Deploy device plugin
kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.5/nvidia-device-plugin.yml

All driver updates, kernel compatibility checks, and toolkit upgrades are manual. There is no auto-scaling — you scale by adding and configuring servers. There is no managed control plane.

Real cost comparison: 2x RTX 3090 on Hetzner is ~$205/month. Two g5.2xlarge (A10G) instances on AWS run approximately $2,900/month on-demand, or ~$1,000/month on Spot. For inference workloads running 24/7, the Hetzner cost structure is difficult to match in the cloud even accounting for bandwidth costs and operational time.

Best for: Startups running inference at scale, teams with Kubernetes operations experience, workloads that are not latency-sensitive to instance provisioning time, and projects where RTX 3090/4090 VRAM is sufficient (24 GB per card).


Platform Comparison Table

Platform Managed GPU nodes Auto-scaling NVIDIA driver mgmt GPU spot support Est. A100 cost/hr Best for
GKE Yes (COS, Autopilot) NAP, scale to zero Fully managed Yes (preemptible) ~$0.88 spot Best overall managed GPU experience
EKS Partial (GPU AMI) Karpenter required Manual or GPU AMI Yes (Spot) ~$3.06 on-demand V100 Multi-node training, EFA networking
AKS Yes (GPU Operator) VMSS autoscale GPU Operator Yes (Spot VMs) Varies by region Azure enterprises, Windows GPU
Hetzner No Manual Manual No N/A (RTX 3090 ~$0.01/hr equiv.) Cost-optimized inference, startups

Managing GPU Infrastructure with Clanker Cloud

Regardless of which platform you choose — or if you are running GPU workloads across multiple platforms simultaneously — Clanker Cloud gives you a unified workspace to monitor, query, and manage GPU infrastructure without switching between cloud consoles.

From a single workspace, you can query across all connected clusters:

clanker ask "list all GPU nodes across my clusters and show current utilization"
clanker ask "find pods requesting GPU resources that have been pending for more than 10 minutes"
clanker ask "what is my total GPU spend across EKS and GKE this month"

For write operations — adding node pools, updating device plugin configurations, or modifying taints — Maker mode generates a plan before executing:

clanker ask --maker "add a GPU node pool to my GKE cluster for inference workloads"

The --maker flag produces a plan. Pass --apply to execute. All credentials stay local; Clanker Cloud runs on your machine, not on a remote SaaS platform. You can connect your own models via BYOK — Gemma 4 (via Ollama with gemma4:31b or gemma4:26b tags), Claude Code, Codex, or Hermes — or use the default GPT-5 backend. This is particularly useful for teams running on Hetzner who want AI-assisted operations without sending cluster credentials to a third-party service.

Full documentation is at docs.clankercloud.ai. For teams moving GPU workloads from vibe-coded prototypes to production deployments, the vibe-coding-to-production guide covers the full path.


Deep Scan: Finding GPU Infrastructure Problems Automatically

GPU infrastructure accumulates invisible problems: device plugin pods in CrashLoopBackOff on recently upgraded nodes, A100 nodes idle because a taint was never removed, pending pods that cannot schedule because no GPU node matches the required accelerator label, and DCGM metrics showing 95% memory utilization while the scheduler sees the node as available.

Clanker Cloud's deep research feature runs an agent swarm across all connected providers simultaneously to surface these issues in a single report:

clanker ask "run a deep scan of my GPU infrastructure — find idle GPU nodes, pending pods, driver mismatches, and cost anomalies"

The swarm checks nvidia-device-plugin DaemonSet status, DCGM metrics if the exporter is running, node taints blocking pod scheduling, kubelet scheduling errors, and GPU nodes running without active pod assignments. Everything runs on your machine; no credentials leave the device. See the for-ai-agents.md documentation for the agent architecture and how to extend it with custom checks.


FAQ

What is the best Kubernetes platform for GPU workloads in 2026?

GKE is the best managed option for most teams: cleanest driver management, Node Auto-Provisioner support for scale-to-zero GPU nodes, and DCGM metrics in Managed Prometheus without additional setup. EKS is the better choice for multi-node distributed training due to EFA networking. Hetzner bare-metal is the best option for cost-optimized inference if your team has Kubernetes operations experience.

How do you set up GPU time-slicing in Kubernetes?

Configure time-slicing via the nvidia-device-plugin ConfigMap. Set sharing.timeSlicing.resources with replicas equal to the number of virtual GPUs per physical GPU. For example, replicas: 4 on a single A10G creates four nvidia.com/gpu units that share the physical device. Apply the ConfigMap, then restart the device plugin DaemonSet. Time-slicing provides no memory isolation — all sharing pods can see the full VRAM. For memory isolation, use MIG on A100 or H100.

How do you monitor GPU utilization in Kubernetes?

Deploy the NVIDIA DCGM Exporter as a DaemonSet on GPU nodes. It exposes per-GPU metrics (DCGM_FI_DEV_GPU_UTIL, DCGM_FI_DEV_FB_USED, DCGM_FI_DEV_SM_CLOCK) to Prometheus on port 9400. On GKE, DCGM metrics are available in Managed Prometheus without a separate exporter. On EKS and AKS, deploy the exporter via the NVIDIA DCGM Exporter Helm chart and configure a Prometheus scrape targeting the dcgm-exporter Service. Alternatively, clanker ask "show GPU utilization across all nodes" works if Clanker Cloud is connected.

Is Hetzner a viable option for GPU workloads on Kubernetes?

Yes, with caveats. Hetzner dedicated GPU servers (AX101, AX102) deliver RTX 3090 hardware at €119–189/month — a 10–15x cost reduction versus equivalent cloud GPU on-demand pricing. What you give up: no managed control plane, no auto-scaling, manual driver management, and no hardware replacement SLA. For teams running inference 24/7 that do not need dynamic scaling and have K8s operations experience, Hetzner is a serious option, not a workaround.


Get Started

Request a demo to see Clanker Cloud connected to GPU clusters across GKE, EKS, AKS, and Hetzner in a single workspace. Ready to connect your clusters now? Create an account and have your first clanker ask query running in under five minutes. Browse the FAQ for common setup questions.

Next step

Ask Clanker Cloud what your cluster is doing

Install the local app, connect your kubeconfig, and turn cluster state, workload health, cost context, and safe next steps into one readable answer.

Download Clanker CloudRead canonical article