Tools: How I built sandboxes that boot in 28ms using Firecracker snapshots
What Firecracker actually is
The cold boot problem
The snapshot trick
How I implemented it in Go
The copy-on-write detail
The guest agent
Why not just Docker?
Numbers
What I'd do differently
Try it A deep-dive into building a sandbox orchestrator that gives AI agents their own isolated machines. Firecracker microVMs, snapshot restore, and why 28ms matters.
tags: go, opensource, ai, devops I've been building AI agents that generate and execute code. The agents write Python scripts, run data analysis, generate charts, process files. Standard stuff in 2026. The problem I kept hitting: where does that code actually run? I tried Docker. It works, but containers share the host kernel. When the runc CVEs dropped in 2024-2025 (CVE-2024-21626, then three more in 2025), I started thinking harder about what "isolation" actually means when an AI is writing arbitrary code on my machine. I tried E2B. Great product, but my data was leaving my machine. For an internal tool processing company data, that was a non-starter. So I built ForgeVM. A single Go binary that orchestrates isolated sandboxes. This article is about the hardest part: getting Firecracker microVMs to boot in 28ms. Firecracker is AWS's microVM manager. It's what powers Lambda and Fargate. Open source, written in Rust, runs on KVM. The key insight: Firecracker is not QEMU. QEMU emulates an entire PC with hundreds of devices. Firecracker emulates exactly 4 devices: That's it. No USB, no GPU, no sound card, no PCI bus. This minimal device model is why it's fast and why the attack surface is tiny. Each Firecracker microVM gets: A guest exploit can't reach the host because there's a hardware boundary (KVM) between them. Compare that to Docker where a kernel vulnerability affects every container on the host. Here's the thing though. Booting a Firecracker microVM from scratch takes about 1 second. That includes: 1 second is fine for long-running workloads. It's not fine when your AI agent needs to run print(1+1) and return the result in a chat interface. Users notice 1 second of latency. I needed sub-100ms. Ideally sub-50ms. Firecracker supports snapshotting a running VM's complete state to disk. This includes: When you restore from a snapshot, Firecracker doesn't boot a kernel. It doesn't run init. It doesn't start your agent. It memory-maps the snapshot file, loads the CPU state, and resumes execution from exactly where it left off. The VM doesn't know it was ever stopped. From the guest's perspective, time just skipped forward. Here's what this looks like in practice: The 28ms breaks down roughly as: ForgeVM's Firecracker provider manages the snapshot lifecycle. Here's the simplified flow: The snapshot files are per-image. First time someone spawns python:3.12, it cold-boots, snapshots, and every subsequent python:3.12 spawn restores in 28ms. Different images get different snapshots. You can't share a single snapshot file across multiple running VMs because each VM writes to memory. The solution is copy-on-write: This means 50 running VMs from the same snapshot share most of their memory pages. Only the pages that each VM actually wrote are unique. Memory efficient. Each Firecracker VM runs a custom agent binary (forgevm-agent) as PID 1. The agent: The protocol is simple: vsock is important here. It's a virtio socket, not TCP/IP. The guest has no network stack visible to the host. There's no IP address, no port, no routing. Just a direct kernel-to-kernel channel. This eliminates an entire class of network-based attacks. I actually built a Docker provider too. ForgeVM has a provider interface, and Docker is one of the backends. Here's the honest comparison: Firecracker microVMs: gVisor (via Docker provider with runsc runtime): In ForgeVM, you switch between these with one config change: Same API. Same SDKs. Same pool mode. Different isolation level. For development, I use Docker (runs on my Mac). For production, Firecracker. The application code doesn't know or care which provider is active. This is the part I'm most proud of and it has nothing to do with Firecracker specifically. Traditional sandbox tools: 1 user = 1 VM (or container). If you have 100 concurrent users, you need 100 VMs. At 512MB each, that's 50GB of RAM just for sandboxes. ForgeVM's pool mode: 1 VM serves up to N users. Each user gets a logical "sandbox" with its own workspace directory (/workspace/{sandbox-id}/). The orchestrator: 100 users, 20 VMs instead of 100. 60% less infrastructure. The security trade-off is real: pool mode gives you directory-level isolation, not kernel-level. Users in the same VM share a kernel. For internal tools where you trust the users but want to isolate the AI-generated code from the host, this is fine. For multi-tenant public platforms, you'd want the optional per-user UID and PID namespace hardening on top. Some benchmarks from my development machine (AMD Ryzen 7, 32GB RAM, NVMe SSD): The Firecracker exec latency is lower because vsock is a direct kernel channel, while Docker exec creates a new exec instance and attaches via the Docker daemon. Start with Docker, not Firecracker. I built the Firecracker provider first because I was excited about 28ms boots. But 80% of people trying ForgeVM don't have KVM available (Mac users, CI/CD, cloud VMs without nested virt). The Docker provider should have been day one. The guest agent protocol should have been gRPC, not custom JSON. The length-prefixed JSON protocol works fine but I'm essentially maintaining a custom RPC framework. gRPC over vsock would have given me streaming, error codes, and code generation for free. Pool mode security should have been built-in from the start. The directory-level isolation works, but per-user UIDs and PID namespace isolation should be default-on, not optional. I'm retrofitting this now. MIT licensed. Single binary. No telemetry. No cloud. GitHub: github.com/DohaerisAI/forgevm If you made it this far and found this useful, a star on GitHub genuinely helps with discoverability. Happy to answer questions in the comments about the Firecracker internals, the provider architecture, or the pool mode design. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse