Tools: CVE-2026-31431: Why Agent Sandboxes Need More Than Containers - 2025 Update

Tools: CVE-2026-31431: Why Agent Sandboxes Need More Than Containers - 2025 Update

What the exploit actually does

Why this is specifically an agent platform problem

What actually works

Where AgentLair sits

The broader pattern Yesterday, a 732-byte Python script was published that roots any Linux kernel shipped since 2017. No race condition. No kernel-version offsets. Same exploit binary, four distributions, four root shells in one take. CVE-2026-31431. "Copy Fail." Found by Xint Code. It's trending for a reason. The copy.fail disclosure page doesn't just describe a kernel vulnerability. It explicitly categorizes agent sandboxes and tenant containers as HIGH risk, in the same tier as Kubernetes clusters and multi-tenant shell hosts. The logic flaw lives in authencesn, a combined authentication-encryption module in the Linux kernel crypto subsystem. The chain: The result is not a crash. It's a root shell. And because the write targets the shared page cache, not the on-disk file, it crosses container boundaries. Two tenant containers sharing the same host kernel share the same page cache. One compromised tenant gets the page-cache write. The write is visible across both. No file-integrity monitor catches it. The on-disk binary is untouched. The PoC is 732 bytes, uses only Python 3.10 stdlib, and requires no preinstalled primitives. The disclosure confirmed it on Ubuntu 24.04 LTS, Amazon Linux 2023, RHEL 10.1, and SUSE 16. Any other distribution running an unpatched kernel in the same window is in the same position. Standard server deployments are "medium" on the copy.fail risk scale. You're usually the only user, and the bug doesn't grant remote access by itself. It's a local privilege escalation. Agent platforms are different. The entire architecture is built around executing untrusted code in isolated environments. Tenant isolation is not a security goal that complements the product. It's the product. When one tenant's agent runs its tool calls inside a container, and another tenant's agent runs inside a different container on the same host, the threat model requires that those two environments cannot see each other. Copy Fail invalidates that assumption at the kernel level. A malicious agent gets a root shell. Root on the host. All containers exposed. Container boundaries, without kernel-level enforcement, are process grouping. That's useful for resource limits and file isolation. It is not a security boundary against a kernel LPE. This isn't a new argument. But CVE-2026-31431 turns a theoretical concern into a 732-byte script with a publicly verified proof of concept. Three layers, in order of urgency. First, patch. The fix is mainline commit a664bf3d603d, which reverts the 2017 in-place optimization in algif_aead. Most major distributions are shipping patched kernels now. This is the fix. Everything else is mitigation while you wait for it. Second, disable the algif_aead module. Before patching, block the attack vector: What breaks? Almost nothing. AF_ALG is the userspace door to the kernel crypto API. OpenSSL, GnuTLS, NSS, SSH, LUKS, kTLS, IPsec: none of them use AF_ALG in their default configurations. The only things at risk are applications explicitly configured to call the AF_ALG socket interface directly, which is rare outside embedded crypto offload paths. Check with lsof | grep AF_ALG if you're unsure. Third, block AF_ALG socket creation via seccomp. For workloads running untrusted code, blocking AF_ALG at the syscall level is the right long-term control. This denies the socket before the kernel module is even invoked, and it survives a module misconfiguration or future re-enable. It should live in your seccomp profile regardless of patch state. The copy.fail page also mentions no-new-privileges and user namespaces. Both are good hygiene and worth having. But they don't close this specific vector. The issue is page-cache sharing at the host level, not privilege escalation within a namespace. Honest answer: we run on Linux. Our tenant isolation is container-based. That puts us in the HIGH risk category on the copy.fail disclosure. Our current mitigations: we're deploying the kernel patch as distributions ship it. algif_aead blacklisting is going out across the fleet. Seccomp profiles for agent containers already block a wide socket surface; AF_ALG is being added explicitly. What we don't have yet: gVisor-level kernel virtualization or hardware virtualization per tenant. These would reduce the shared kernel surface substantially. They're on the roadmap, with a harder cost and complexity tradeoff than the mitigations above. We're not claiming to be patched and done. CVE-2026-31431 is a useful forcing function. The architectural question it raises (shared kernel as a trust boundary) doesn't go away when the patch ships. Copy Fail is striking because it's reliable. 100% reliable, across all distributions, for nearly a decade. Most kernel CVEs require specific kernel versions, specific configurations, or race windows that make exploitation probabilistic. This one needs none of that. That reliability is what makes it useful for attackers specifically targeting sandboxed environments. A race-dependent exploit is hard to deploy quietly inside a tenant workload. A straight-line 732-byte script is not. Agent platforms are a target class. Not because of anything specific to AgentLair, but because the architecture is inherently attractive: multi-tenant, credential-holding, executing external code, with tenants who have legitimate reason to run complex workloads. An attacker who controls a compromised agent could, in theory, use Copy Fail to escape the tenant container and access other tenants' data or credentials. That's not a hypothetical concern. It's what the CVSS 7.8 score reflects. The fix is layered: patch the kernel, block the module, enforce via seccomp, and plan for stronger isolation primitives. No single layer is sufficient. Copy Fail was publicly disclosed on 2026-04-29 by Xint Code. Full details and the PoC at copy.fail. The GitHub issue tracker for coordinated disclosure is at github.com/theori-io/copy-fail-CVE-2026-31431. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Command

Copy

$ echo "-weight: 500;">install algif_aead /bin/false" > /etc/modprobe.d/-weight: 500;">disable-algif.conf rmmod algif_aead 2>/dev/null || true echo "-weight: 500;">install algif_aead /bin/false" > /etc/modprobe.d/-weight: 500;">disable-algif.conf rmmod algif_aead 2>/dev/null || true echo "-weight: 500;">install algif_aead /bin/false" > /etc/modprobe.d/-weight: 500;">disable-algif.conf rmmod algif_aead 2>/dev/null || true - Open an AF_ALG socket (the kernel crypto API's userspace interface, enabled by default on essentially every mainstream distro). - Use splice() to route page-cache pages into the authenticated encryption path. - The bug: an in-place optimization introduced in 2017 allows the writable destination scatterlist to reference the same page-cache page as the source. That gives userspace a writable reference to a read-only kernel page. - Four bytes. Pick a setuid binary. Write in place.