A production-ready Kubernetes cluster for €20/month

date: Dec 8 2025 4 min read

Three control planes, high availability, GitOps, encrypted secrets. Enterprise-grade infrastructure without the enterprise budget.

KubernetesHetznerTalosArgoCDTerragruntGitOps

Companies pay thousands per month for a “production-ready” Kubernetes cluster. High availability, GitOps, secrets management, observability. The whole package

I built the same thing for €20/month. Three control planes, etcd quorum, automatic failover, GitOps deployments. The code is public, the cluster runs in production

This isn’t a hobbyist homelab. It’s a proof of concept that you can have enterprise-grade infrastructure without the enterprise budget

The problem with “production-ready”

Most enterprise Kubernetes setups are theater. A €500/month EKS cluster, managed nodes, and behind the scenes? Manual kubectl apply, secrets in plaintext ConfigMaps, zero reproducibility

Pay a lot, do whatever. That’s the dominant model

Meanwhile, you can have a clean infrastructure for a fraction of the cost. The price isn’t money, it’s competence. You need to know what you’re doing

The stack

Talos Linux: the OS that prevents cheating

No SSH. No shell. No “let me quickly connect to debug”. All node configuration goes through an API and a YAML file

machine:
  type: controlplane
  network:
    hostname: cp-1
cluster:
  controlPlane:
    endpoint: https://api.etcd.me:6443

You apply this config, the node is in that state. Reapply it six months later, same state. Zero drift, zero surprises

Teams doing “GitOps” but SSH-ing into nodes when things break? Talos makes that impossible. You do things properly or you don’t do them at all

Terragrunt: infrastructure that redeploys in one command

The entire infrastructure is split into modules. Cluster, DNS, firewall, floating IPs. Each module has its own Terraform state, explicit dependencies

terraform/
├── modules/
│   ├── cluster/      # Talos + Hetzner
│   ├── dns/          # Route53
│   └── firewall/     # Network rules
└── live/
    └── etcdme/       # The prod environment

One terragrunt run-all apply and the complete cluster deploys. In order. Automatically. I destroyed and rebuilt this cluster a dozen times during development. Same command, same result

ArgoCD: the self-healing cluster

Every application is a manifest in Git. ArgoCD watches the repo, detects changes, applies automatically. No manual kubectl apply, no “forgot to deploy that”

syncPolicy:
  automated:
    selfHeal: true
    prune: true

The selfHeal: true is the magic. Someone modifies something manually on the cluster? ArgoCD detects it and reverts it. The Git repo is the source of truth. Period

Cilium + Hetzner Load Balancer: transparent failover

Cilium handles networking via eBPF and exposes a Gateway API. The Hetzner load balancer does health checks. A node goes down? Traffic shifts to the others. Automatically. No intervention needed

All for the price of three VMs

Secrets in Git (yes, really)

SOPS + age. Secrets are encrypted in the repo, decrypted at runtime on the cluster

stringData:
  password: ENC[AES256_GCM,data:xxx,tag:xxx]

Anyone can see the file. No one can read it without the private key. Secrets follow the same workflow as code: versioned, reviewed, auditable

It’s cleaner than an external Vault for this use case. Fewer dependencies, fewer failure points

Backups and PITR

A cluster without backups isn’t production, it’s hope. PostgreSQL runs with automatic backups and Point-In-Time Recovery

WAL files are continuously shipped to Backblaze B2. Why B2? At $6/TB/month it’s three times cheaper than S3 and the API is S3-compatible. You can restore the database to any point in the last 24 hours. Not just “the 3am backup”, no, the exact state at 2:37:22pm if you want

Data corruption? A DELETE without WHERE? You roll back time to just before the mistake. That’s the safety net most “production clusters” don’t have

The real cost

Resource	Monthly
3x CX22 Hetzner	€15
Floating IP	€4
Domain	~€1
Total	~€20/month

For comparison, an EKS cluster with three nodes is minimum €200/month. And you still don’t have GitOps, encrypted secrets, or infrastructure as code

What this demonstrates

High availability isn’t a luxury. Three control planes instead of one costs €10 more per month. In exchange, you get a cluster that survives node loss. The cost/benefit ratio is absurd

GitOps isn’t complicated. ArgoCD + a Git repo + YAML manifests. That’s it. No complex CI/CD pipelines, no proprietary tools needed

Immutability works. Talos proves you can manage nodes without ever connecting to them. The OS becomes an implementation detail, not something to maintain

Public code forces quality. When your infrastructure is visible, you do things properly. No temporary hacks that become permanent, no hardcoded secrets “for now”