Why Linux Copy-Primitive Bugs Keep Breaking Containers
Dirty COW: The Bug That Started the Pattern
Leaky Vessels: Same Pattern, Different Layer
Do Rootless Containers Actually Protect You From Copy-Primitive Bugs?
How to Actually Defend Against Linux Copy-Primitive Container Escapes
The Pattern Isn't Going Away
Frequently Asked Questions
Can Dirty COW still affect containers in 2026?
What is the difference between a container escape and a privilege escalation?
Do rootless containers prevent all container escape attacks?
How does gVisor protect against kernel vulnerabilities?
Was Leaky Vessels (CVE-2024-21626) exploited in the wild?
Why do copy-related bugs keep appearing in the Linux kernel? Three times in a decade. That's how often a Linux copy-primitive bug has blown a hole through container isolation. In 2016 it was Dirty COW. In 2024 it was Leaky Vessels. In 2026, a new class of Linux copy-primitive bugs is proving, again, that containers share a kernel. And that kernel keeps betraying them. The pattern is hard to ignore. Bugs in how the Linux kernel copies, references, or manages data at the lowest level keep punching through container isolation boundaries. If you're running Docker or Podman in production, rootless or not, this should be on your radar. The next copy-primitive container escape isn't a question of if. It's when. Containers aren't virtual machines. They don't have their own kernel. Every container on a host shares the same Linux kernel, separated only by namespaces, cgroups, and a handful of security mechanisms like seccomp and AppArmor. That's the fundamental bargain: lightweight, fast isolation in exchange for sharing the most privileged piece of software on the machine. When a bug exists in the kernel's handling of copy operations — whether it's copying memory pages, file descriptors, or data between user and kernel space — it cuts across every isolation boundary containers rely on. I learned this the hard way. After migrating production workloads to rootless Podman containers in 2022, I thought we'd significantly reduced our attack surface. We had. But the kernel was still the kernel. When Leaky Vessels dropped in early 2024, it was a cold reminder that our "rootless" setup was only as strong as the syscall layer sitting underneath it. The copy-primitive pattern is consistent: the kernel needs to move or reference data — a memory page, a file descriptor, a buffer. The operation has a race condition, a leaked reference, or a missing permission check. An attacker inside a container exploits that flaw to read or write data they shouldn't touch, punching through the namespace boundary. Three times in ten years. That's not a coincidence. That's a systemic weakness in how Linux manages data at the lowest level. Dirty COW (CVE-2016-5195) was a race condition in the Linux kernel's memory subsystem. It exploited how the kernel handles Copy-on-Write (COW) memory mappings. When a process tries to write to a read-only memory-mapped file, the kernel is supposed to create a private copy. Dirty COW exploited a race condition in that copy operation, allowing a local user to gain write access to read-only memory mappings. The bug had existed in the kernel for nearly nine years before anyone found it. Nine years. In a component so fundamental that virtually every Linux system was affected. For containers, Dirty COW was devastating. Because containers share the host kernel, any process inside a container could exploit the race condition to escalate privileges on the host. The isolation that namespaces and cgroups provided was irrelevant. The bug was beneath all of it. Dirty COW proved something the container community didn't want to hear: if the kernel's copy mechanism is broken, your container boundary doesn't exist. The fix was a kernel patch. But the lesson was bigger than one CVE. The kernel's memory management code is ancient, complex, and handles billions of operations per second. Copy-on-Write is not a feature you can rip out. It's foundational to how Linux works. And foundational code is where the worst bugs hide. Fast forward to January 2024. Snyk's security research team disclosed Leaky Vessels, a set of vulnerabilities in runc, the container runtime used by both Docker and Podman. The most critical was CVE-2024-21626, which exploited a file descriptor leak during container initialization. Different mechanism than Dirty COW. Identical pattern: a low-level operation that copies or references data across a trust boundary had a flaw. In this case, runc leaked a file descriptor pointing to the host filesystem into the container's process space. An attacker who controlled the container's working directory could use that leaked descriptor to escape the container and access the host filesystem. This is a copy-primitive bug in spirit. The kernel and runtime are supposed to carefully manage which file descriptors are visible to which namespaces. A file descriptor is just a reference — a pointer to data. When that reference leaks across the container boundary, it's functionally the same as Dirty COW's memory page write: data that should be isolated isn't. Having worked with container runtimes in production, I can tell you what made Leaky Vessels particularly terrifying wasn't just the escape. It was that the attack could be embedded in a malicious container image. Pull the wrong image, run it, and the container breaks out during initialization — before your runtime security tools even start monitoring. The attack surface was the docker run command itself. The affected runc versions were patched quickly. But the incident reinforced a point that Adrian Mouat, author of Using Docker, has written about extensively: rootless containers aren't a magic bullet. If a kernel or runtime exploit exists, an attacker can still escalate privileges after breaking out. Rootless containers are the single best security improvement most teams can make to their container infrastructure. That's not the debate. The debate is whether they're sufficient. Rootless containers operate within a distinct user namespace, mapping the container's internal root user to an unprivileged user ID on the host. As Red Hat has documented, the core benefit is straightforward: if there's a container breakout, the attacker only has the privileges of the unprivileged host user, not root. That matters. A Dirty COW-style exploit inside a rootless container would land the attacker as an unprivileged user on the host rather than root. Massive reduction in blast radius. But here's where teams get into trouble: they treat rootless mode as the finish line for container security rather than one layer of it. The most severe attacks chain a container escape with a separate kernel privilege escalation. You break out of the container as an unprivileged user, then use a second kernel bug to escalate to root. When Dirty COW was unpatched, that second step was trivial — the same bug that got you out of the container could also get you to root. This chaining is exactly why copy-primitive bugs are so dangerous. They tend to affect the kernel at a level that's useful for both container escape and privilege escalation. A single bug gives you two steps of the kill chain. I wrote about similar defense-in-depth thinking for AI agents in production — the principle is the same: no single safeguard survives a determined, multi-step attack. [YOUTUBE:x1npPrzyKfs|Linux Container Primitives: cgroups, namespaces, and more!] I've spent the last two years hardening container deployments, and the boring answer is the right one: no single tool solves this. You need layers. Here's what I've seen actually work in production: Patch aggressively and automatically. Copy-primitive bugs get patched in the kernel within days of disclosure. The problem is most organizations take weeks or months to roll out kernel updates. If you're running Kubernetes, tools like kured (Kubernetes Reboot Daemon) can automate node reboots after kernel updates. If you're running standalone Docker or Podman hosts, unattended-upgrades for the kernel package is table stakes. The window between disclosure and patch is where these bugs get exploited. Run rootless by default. Yes, I just spent a section explaining why rootless isn't sufficient. It's still essential. Rootless mode in Podman is mature and production-ready. Docker's rootless mode has improved significantly since 2023. If you're still running containers as root in 2026, you're handing attackers a free privilege escalation on every container escape. Stop. Seriously. Deploy syscall filtering with seccomp profiles. Copy-primitive bugs require specific syscalls to exploit. Dirty COW needed madvise and write. Leaky Vessels exploited WORKDIR processing during container init. Custom seccomp profiles that restrict unnecessary syscalls reduce the exploitability of kernel bugs you haven't even heard about yet. The default Docker seccomp profile blocks about 44 syscalls. For sensitive workloads, you should be blocking far more. Consider gVisor for high-value workloads. Google's gVisor interposes a userspace kernel between your container and the host kernel. Your container's syscalls don't hit the real Linux kernel directly — they're intercepted by gVisor's Sentry process, which reimplements a subset of Linux syscalls in a sandboxed environment. A copy-primitive bug in the host kernel becomes unexploitable from inside the container because the container never makes the vulnerable syscall directly. The tradeoff is performance overhead and compatibility limitations. For multi-tenant or security-critical workloads, it's the strongest isolation you can get without a full VM. Monitor for anomalous file descriptor and memory behavior. Tools like Falco can detect runtime behaviors associated with container escapes — unexpected file descriptor access patterns, attempts to access /proc/self/fd entries pointing outside the container's filesystem, or memory mapping operations that shouldn't be happening in your workload. This won't prevent the exploit, but it catches it in progress. Having worked through incident response on container escapes, I can tell you that detection at the early stages of exploit chains matters more than most teams realize. Here's my prediction: we will see another major copy-primitive container escape within the next 18 months. The Linux kernel's memory management, file descriptor handling, and data copying paths are some of the oldest and most complex code in the entire operating system. They're also some of the most security-critical. Ancient + complex + security-critical = more bugs. Count on it. The container model's fundamental architecture — shared kernel, namespace isolation — means every one of these bugs is a potential container escape. This isn't a flaw in Docker or Podman. It's a structural property of how Linux containers work. The teams that survive the next copy-primitive bug won't be the ones who picked the right container runtime or checked the right compliance box. They'll be the ones who treated container isolation as one layer in a stack, patched their kernels in hours instead of weeks, and ran their most sensitive workloads behind gVisor or equivalent sandboxing. Rootless mode buys you time. Syscall filtering reduces your surface area. Runtime monitoring catches what slips through. But the kernel is still the kernel. And until containers stop sharing it, copy-primitive bugs will keep breaking the boundaries we trust them to enforce. The only question is whether you'll be patched when the next one drops. Dirty COW (CVE-2016-5195) was patched in the Linux kernel in October 2016. Any kernel version from 4.8.3 onward includes the fix. If you're running a supported, updated Linux distribution in 2026, Dirty COW itself is not a direct threat. However, the class of vulnerability it represents — race conditions in copy-on-write memory handling — continues to produce new bugs. A container escape is when code running inside a container gains access to resources outside the container's namespace boundary — such as the host filesystem or another container's processes. A privilege escalation is when a process gains higher permissions than it was originally given, such as going from an unprivileged user to root. These are different attack steps, but they're often chained together: escape the container first, then escalate to root on the host. No. Rootless containers ensure that a breakout lands the attacker as an unprivileged host user instead of root, which significantly limits damage. But they don't prevent the escape itself. A kernel-level bug can still allow code inside a rootless container to access host resources. The attacker just has fewer permissions once they get there. For full protection, rootless mode should be combined with seccomp filtering, regular kernel patching, and runtime monitoring. gVisor runs a userspace kernel called Sentry that intercepts your container's system calls before they reach the host Linux kernel. Instead of your container code directly invoking kernel syscalls, gVisor reimplements a subset of those syscalls in a sandboxed Go process. This means a vulnerability in the host kernel's copy-on-write handling or file descriptor management can't be triggered from inside the container, because those calls never reach the vulnerable host code. As of early 2024, there was no confirmed evidence of active exploitation before the Leaky Vessels disclosure. Snyk coordinated disclosure with the runc maintainers, and patches were released before proof-of-concept exploits became widely available. However, working exploits were developed quickly after disclosure, making rapid patching essential for any organization running affected runc versions (1.0.0 through 1.1.11). The Linux kernel's memory management and data-copying code paths are among the oldest and most complex in the entire codebase. Copy-on-Write, file descriptor passing, and buffer management involve intricate concurrency logic with millions of possible execution paths. These operations are also performance-critical, so they're heavily optimized in ways that can introduce subtle race conditions. The combination of complexity, age, and performance pressure makes these code paths a recurring source of security bugs. Templates let you quickly answer FAQs or store snippets for re-use. as well , this person and/or