Tools: Latest: I decided to build my own alternative to Kubernetes. Here's the architecture I chose (and why).

Tools: Latest: I decided to build my own alternative to Kubernetes. Here's the architecture I chose (and why).

The name

The study phase

It's not a new Kubernetes

The design: learning from others' mistakes

Houdini's architecture

Principle #1: One binary, zero external dependencies

Principle #2: Multi-runtime

Principle #3: Networking built-in

Principle #4: Smart deploy pipeline

Principle #5: Continuous reconciliation

Principle #6: Security is not optional

Simplified overview

But what about the user experience?

Next steps In my last post I explained why I decided to think "ok, I'm going to build an alternative to Kubernetes." I got a lot of comments — some supportive, others questioning if I really knew what I was getting into. Fair enough. But the decision was already made, so the next step was simple: stop thinking and start doing. First thing I did was pick a name. Might seem silly, but I always do this with my projects — naming something turns a vague idea into something real. It stops being "that crazy thought I had in the shower" and becomes an actual thing. This time the name came instantly. One of the main goals of the project is to radically simplify the experience of getting things running. No stress, no 40-hour courses, no infinite YAML. Something that feels almost... magical. Harry Houdini was the world's first great magician and escape artist. Maybe it's pretentious to compare an open-source project with the guy who escaped coffins underwater, but the name fits perfectly with what I want to deliver: making something complex look simple. Let me clarify something that was confusing in the previous post — some people understood that I don't know Kubernetes at all. I don't blame them, I probably wasn't clear. When I mentioned "using AI to create the cluster" I was referring to a time and efficiency thing, not total ignorance. I've been using and deploying K8s in production for years. But there's a difference between using Kubernetes and creating something that competes in the same space. For that, I needed to go much deeper. I started studying the Kubernetes source code. Not the docs, not the tutorials — the code. How things were implemented, why certain decisions were made, what trade-offs were accepted. I did the same with HashiCorp's Nomad. This study even led me to consider getting the CKA certification — not for the certificate itself, but for the structured learning process. AI helps a lot at this stage, I'll be honest. Reading 2 million lines of Kubernetes Go code without any guide would be insane. But the understanding has to be mine — AI accelerates, it doesn't replace. A crucial point: I'm not trying to build a new Kubernetes. A solo developer will hardly achieve that, and it's not the goal anyway. I want to build an alternative. Not better, not worse — simply a different way of thinking about how an orchestrator should work. The project is opinionated. Unapologetically. Because it reflects how I would like an orchestrator to work, based on 22 years of writing code and deploying to production. To arrive at Houdini's design, I did three things: The most common criticisms I found: With all of that in mind, these are the design decisions I made. Note: I'm not saying these are immutable — the project is in active development and things may change as I encounter real problems. Same binary does everything. No separate etcd, no Consul, no Vault, no Prometheus server. Everything embedded. Storage uses BoltDB (an embedded key-value store in Go), distributed consensus uses Raft (same library Nomad uses), service mesh uses native WireGuard. Why? Because the biggest source of operational complexity in Kubernetes isn't the concept — it's the 15 components you need to keep alive. I want someone to be able to spin up a working cluster with literally houdini server and houdini agent --server <ip>. Houdini doesn't just run containers. It supports four types of workload with the same API: Why? Because not every workload needs a container. A Python script that runs every 5 minutes doesn't need a 200MB Docker image. A webhook handler doesn't need to be alive 24/7 consuming RAM. Each node gets a WireGuard subnet (10.42.X.0/24). Containers communicate directly between nodes through encrypted tunnels. Internal DNS resolves service.namespace.houdini to the correct IPs. Ingress controller with automatic HTTPS (Let's Encrypt) included. Why? Because in Kubernetes you need to choose between Calico, Flannel, Cilium for CNI, then install an Ingress controller (nginx, traefik, envoy), then configure cert-manager for TLS. That's 3-4 decisions and installations before external traffic reaches your service. Deploying isn't "send the YAML and pray." It's a pipeline with phases: Four strategies: Rolling (zero-downtime), Canary (test with % of traffic), Blue-Green (atomic switch), All-at-once (fast, accepts downtime). If it fails? Retry with exponential backoff. Exhausted all attempts? Dead Letter Queue — nothing is silently lost. Every 30 seconds, a reconciler compares the desired state (what you asked for) with the actual state (what's really running). If they diverge, it corrects. Workload crashed? Restart with backoff. Node died? Reschedule to another node. Orphan container? Automatic GC. Why? This is the pattern Kubernetes absolutely nailed — declarative + reconciliation. I copied it. I'm not ashamed of copying what works. I know what you're thinking: "Ok, lots of technical stuff under the hood, but what about in practice?" Good question. The point is that all this complexity — scheduling, mesh, autoscaling, reconciliation — exists so that the user doesn't have to think about it. All of this will be managed transparently through a Web UI where you configure, deploy, monitor, and scale your services without ever opening a terminal if you don't want to. The idea is simple: if you want to go deep with TOML and the CLI, go ahead. If you want to click "Deploy" and go grab a coffee, you can do that too. No judgment. The Web UI will be the topic of a future article — and I promise that one will be way less technical than this one. Actually, this is probably the most technical article I'll write in this series. If you survived this far, the next ones will be a breeze. The project isn't public yet, but it will be soon. I'm preparing everything so that when I open the repository, people can actually clone, run, and test it — I don't want to open something half-baked and give the wrong first impression. In the next post I'll talk about the Web UI and the experience I want to deliver for those who don't want (or need) to live in the terminal. Until then, if you have questions, criticism, or suggestions — send them. Especially if you disagree with any decision. That's how projects improve. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Command

Copy

$ houdini server # control plane $ houdini agent # worker node $ houdini deploy # CLI $ houdini server # control plane $ houdini agent # worker node $ houdini deploy # CLI $ houdini server # control plane $ houdini agent # worker node $ houdini deploy # CLI ┌─────────── CONTROL PLANE (houdini server) ──────────────┐ │ │ │ REST API (:4646) │ gRPC (:4648) │ Ingress (:443) │ │ │ │ Scheduler │ Reconciler │ Deploy Pipeline │ Autoscaler │ │ │ │ State Store (BoltDB / Raft cluster) │ └──────────────────────────┬───────────────────────────────┘ │ gRPC Streams + WireGuard Mesh ▼ ┌──────── WORKER NODE (houdini agent) ─────────┐ │ │ │ Runtime Registry │ │ ├── Docker (containers) │ │ ├── Process (native) │ │ ├── WASM (modules) │ │ └── Function (serverless) │ │ │ │ Workload Manager │ Health Probes │ Logs │ └───────────────────────────────────────────────┘ ┌─────────── CONTROL PLANE (houdini server) ──────────────┐ │ │ │ REST API (:4646) │ gRPC (:4648) │ Ingress (:443) │ │ │ │ Scheduler │ Reconciler │ Deploy Pipeline │ Autoscaler │ │ │ │ State Store (BoltDB / Raft cluster) │ └──────────────────────────┬───────────────────────────────┘ │ gRPC Streams + WireGuard Mesh ▼ ┌──────── WORKER NODE (houdini agent) ─────────┐ │ │ │ Runtime Registry │ │ ├── Docker (containers) │ │ ├── Process (native) │ │ ├── WASM (modules) │ │ └── Function (serverless) │ │ │ │ Workload Manager │ Health Probes │ Logs │ └───────────────────────────────────────────────┘ ┌─────────── CONTROL PLANE (houdini server) ──────────────┐ │ │ │ REST API (:4646) │ gRPC (:4648) │ Ingress (:443) │ │ │ │ Scheduler │ Reconciler │ Deploy Pipeline │ Autoscaler │ │ │ │ State Store (BoltDB / Raft cluster) │ └──────────────────────────┬───────────────────────────────┘ │ gRPC Streams + WireGuard Mesh ▼ ┌──────── WORKER NODE (houdini agent) ─────────┐ │ │ │ Runtime Registry │ │ ├── Docker (containers) │ │ ├── Process (native) │ │ ├── WASM (modules) │ │ └── Function (serverless) │ │ │ │ Workload Manager │ Health Probes │ Logs │ └───────────────────────────────────────────────┘ - Studied the architecture of Kubernetes and Nomad in depth - Catalogued the main community criticisms of K8s, Nomad, and Swarm - Combined all of that with my practical experience of what works and what doesn't - Absurd complexity for simple use cases - Dozens of components to maintain (etcd, kube-apiserver, scheduler, controller-manager, kubelet, kube-proxy...) - YAML hell — verbose configs that are painful to debug - Service mesh requires external components (Istio, Linkerd) - Learning curve measured in months - Simple but incomplete — needs Consul for -weight: 500;">service discovery, Vault for secrets - No built-in -weight: 500;">service mesh - Smaller community, less tooling - Abandoned by Docker Inc. - No real autoscaling - Limited deployment strategies - Container — Docker, for when you need full isolation - Process — native OS processes, for local dev or binaries that don't need a container - WASM — WebAssembly modules, <1ms startup, lightweight sandboxing - Function — serverless, event-driven, automatic scale-to-zero - Validation — does the spec make sense? - Scheduling — where to run? (bin-pack or spread, with anti-affinity, constraints, failure scoring) - Dispatch — direct push to agent via gRPC stream - Secrets encrypted with AES-256-GCM (Argon2id for key derivation) - RBAC with 4 roles (admin, operator, developer, viewer) - 2FA with TOTP - Agent↔server communication with token + gRPC streams - Policy engine for admission control (blocks :latest in prod, requires health checks, etc.)