Tools: CVE-2026-31431 'Copy Fail' Deep Dive — Linux Page-Cache Bug and AF_ALG Kubernetes Container Escape

Tools: CVE-2026-31431 'Copy Fail' Deep Dive — Linux Page-Cache Bug and AF_ALG Kubernetes Container Escape

CVE-2026-31431 "Copy Fail" Deep Dive — A Nine-Year-Old Linux Kernel Page-Cache Bug, AF_ALG Container Escape, and the seccomp/Falco Playbook for 2026 Kubernetes Node Security

1. Why Copy Fail is the inflection point — when "kernel LPE" turned into "container escape"

2. The vulnerability mechanism — algif_aead in-place optimisation meets the page cache

2.1 Nine years of an in-place optimisation — why 2017 code broke in 2026

3. The Kubernetes container-escape scenario — overlayfs shared layers are the bridge

3.1 What the EKS/GKE/ACK validation means

4. The mitigation that actually buys time — seal AF_ALG sockets with seccomp

4.1 Node-level sealing — blacklist algif_aead.ko

5. Detection — Falco, auditd, eBPF, Sigma against the AF_ALG SOCK_SEQPACKET signal

6. Impact matrix — distributions, kernel versions, managed-K8s node images

7. The four-window operational checklist

8. ManoIT retrospective — what worked, what stuck

9. Closing — "syscall sealing" as the new node-security baseline On April 29, 2026, Theori researcher Taeyang Lee disclosed CVE-2026-31431 "Copy Fail" — on the surface another Linux kernel LPE (CVSS 7.8), but in reality much more. A 2017 in-place optimisation in the algif_aead module (commit 72548b093ee3) slept for nine years until a four-syscall, 732-byte PoC woke it up. Two things make this disclosure heavier than a typical kernel LPE: (1) one AF_ALG socket plus a splice() chain yields a controlled 4-byte arbitrary write into any page-cache-backed page, and (2) on Kubernetes nodes, an unprivileged Pod can corrupt a setuid binary in a shared overlayfs lower layer so that a privileged DaemonSet on the same node executes the corrupted binary — turning unprivileged Pod code execution into node-level root in four syscalls. Microsoft Security Blog, Sysdig, Unit 42, CERT-EU, Help Net Security and Ubuntu all published the same week. The mainline fix landed on April 1, 2026 (commit a664bf3d603d), but vendor errata are still rolling through the second week of May. This post breaks Copy Fail into eight axes and shares the seccomp-seal/Falco-detect/patch-priority checklist ManoIT applied to 17 EKS, GKE, and on-premises RKE2 clusters. Most 2024–2025 Linux kernel LPEs assumed the attacker already had a local shell. DirtyPipe (2022), StackRot (2023) and GameOver(L)Ay (2023) were all "you already have shell, now you become root." For cloud-native operators, Pod/VM isolation was the first line of defence. Copy Fail breaks that assumption. An ordinary unprivileged process inside a Pod — without CAP_SYS_ADMIN, without ptrace, without any capability — can open an AF_ALG socket and use splice() to corrupt the page cache. Because the page cache is a kernel-global resource, corrupting /usr/bin/su inside the container also corrupts the cache entry the host reads from. And the GitHub repository Percivalll/Copy-Fail-CVE-2026-31431-Kubernetes-PoC validated the container escape on Alibaba Cloud ACK, Amazon EKS, and Google GKE. Copy Fail moved from "fast-track LPE patching" to "isolate your nodes tonight or risk the whole cluster." The table below compares major Linux LPEs since 2022 alongside Copy Fail. Two cells in the last row do the heavy lifting: the prerequisite drops from "local shell" to "unprivileged process," and the container escape is validated on three major managed Kubernetes platforms. That is what makes Copy Fail the 2026-H1 Kubernetes node-security inflection point. The patch cycle column is operationally crucial — AlmaLinux shipped its own kernel build on May 1, but Red Hat, Ubuntu, SUSE, and Amazon Linux published errata in stages through the second week of May, so the patch gap equals your exposure window. Copy Fail lives in crypto/algif_aead.c, the AEAD (Authenticated Encryption with Associated Data) socket interface for the kernel's userspace crypto API (AF_ALG). The 2017 commit 72548b093ee3 ("crypto: af_alg - get_page upon reading from socket") added an in-place optimisation that allowed the destination scatterlist of an AEAD operation to reference page-cache pages directly when the user-supplied destination already pointed at such pages. The optimisation never distinguished between user-space pages and page-cache pages: if a user maps a read-only setuid binary and registers those pages as the destination, the kernel writes four AEAD-output bytes directly into the cached page. The table below shows the four-syscall chain from unprivileged process to root. Step 4 is the kill: the Theori PoC overwrites four bytes in the authentication-decision branch of /usr/bin/su, making the binary accept any password. Because su is setuid root, the calling user immediately becomes root. The corruption sits in the page cache, which is kernel-wide — anything on the host that reads that file from cache reads the corrupted bytes too. The 2017 commit was a perfectly reasonable optimisation at the time: AEAD throughput improves measurably when you skip one copy. Reviewers did not catch the page-cache implication. Nine years later, the rise of eBPF, io_uring, and splice-heavy synthetic patterns produced exactly the combinations that turn the in-place path into an arbitrary write. This is not a one-off defect — it is a structural signal that the zero-copy synthesis surface of the Linux kernel finally reached corners it had not exhaustively reviewed. The operational implication is to treat AF_ALG as an opt-in interface: assume most workloads do not need it and seal it (§4) instead of waiting for the next nine-year-old bug to wake up. Copy Fail upgrades from LPE to container escape because of overlayfs lower-layer sharing. Container runtimes (containerd, CRI-O) share read-only image layers across every container on a node. The sharing extends from disk into the page cache: the same file in the same image layer maps to one cache entry across all containers. Copy Fail corrupts that one entry. Step 3 is the load-bearing step because most production Kubernetes clusters run many privileged DaemonSets: CNI (Calico, Cilium), CSI (EBS, Ceph), log shippers (Fluent Bit, Vector), security agents (Falco, Tetragon), node-exporter. Any one of them executing a binary backed by a shared image layer is enough. The defence while patches are landing is to make sure step 2 cannot happen, which is exactly what §4 addresses. The PoC repository validated container escape on the default node images of Amazon EKS, Google GKE, and Alibaba Cloud ACK. The implication is simple: "managed Kubernetes will block this" stopped being true on April 29. Until each cloud rotates its node images (AWS EKS Optimized AMI, GKE COS) — usually 1–7 days — users themselves must enforce AF_ALG sealing, Pod Security Standards, and node isolation. The four-syscall chain collapses the moment step 1 fails: if socket(AF_ALG, SOCK_SEQPACKET, 0) is blocked, nothing else matters. AF_ALG is the userspace crypto interface to the kernel, but almost no container workload uses it — applications call OpenSSL/BoringSSL/libsodium directly in user space. So a container-level seccomp profile that returns EAFNOSUPPORT (errno 97) when the first argument to socket() or socketpair() is 38 (AF_ALG) seals step 1 without breaking legitimate workloads. The same JSON works under Docker, Podman, containerd, and CRI-O. On Kubernetes, drop the file at /var/lib/kubelet/seccomp/profiles/block-af-alg.json on every node and reference it in Pod securityContext: Cluster-wide enforcement uses Kyverno (or a ValidatingAdmissionPolicy): If seccomp is the container-level defence, blacklisting the kernel module is the node-level defence. Most distributions ship algif_aead as a module: Two caveats. (1) If you run LUKS or kcapi tooling, verify before unloading. (2) Some FIPS-mode RHEL setups depend on algif_aead. Validate in a staging node first; until then, the §4 seccomp profile is the safer first move. Even on sealed nodes — and especially on un-sealed nodes — detection matters. The high-signal indicator is AF_ALG sockets opened as SOCK_SEQPACKET. Legitimate AF_ALG users (cryptsetup, systemd-cryptsetup, kcapi-*) use SOCK_DGRAM. SOCK_SEQPACKET is therefore an almost-deterministic exploitation precursor. Auditd matches the same signal on socket() entry where a0=38 and a1=SOCK_SEQPACKET. eBPF EDRs hook tracepoint:syscalls:sys_enter_socket with the same condition; in container environments, attach bpf_get_current_pid_tgid and bpf_get_current_cgroup_id to ride Pod/namespace context through to the analyst. Elastic's privilege_escalation_potential_copy_fail_cve_2026_31431_exploitation_via_af_alg_socket.toml rule (published early May) expresses the same signal in EQL. The operational recommendation is seccomp (block) + Falco (detect) + Sigma (SIEM) in parallel: seccomp alone misses unsealed nodes, Falco alone does not stop the attack. ManoIT pushed all three in a single GitOps PR (§8). Copy Fail affects every distribution built on a 2017-or-later mainline kernel. The operationally useful matrix is "above which kernel build are you safe?": Two operational notes. (1) RHEL 10.1 publishes errata in stages — the first batch covers the general-purpose kernel, FIPS-mode and kpatch live-patch channels follow a few days later. (2) EKS and GKE auto-rotate node images, but Karpenter NodePools and ASGs only refresh when they create a node. Forced rolling replace is the final step of patch landing. The checklist below mirrors what ManoIT SecOps executed across 17 clusters between April 30 and May 7. The point is to keep nodes safe while patches are still in flight. Step 4 (shrinking privileged DaemonSets) drew the longest debate. CNI and CSI cannot go; but non-essential debug DaemonSets (strace shells, hostPath ad-hoc workloads) can usually be paused for 24 hours. Reducing exposure within the patch gap is exactly the operational decision worth that debate. The biggest lesson was that the patch gap is not a policy gap — it is a procedure gap. The mainline fix landed on April 1, but ManoIT only pushed policy after the April 30 disclosure, because mainline merge is not equivalent to a patched node. The highest-ROI security posture is to modularise the disclosure-day mitigations ahead of time: a seccomp profile, a Falco macro, and a Kyverno policy are exactly that module. Copy Fail is a nine-year-old piece of code that broke after nine years. More importantly, it is a PoC-validated demonstration that container isolation can be unwound through a kernel-global resource (the page cache) on three major managed-K8s platforms. Patches will land — they are landing. The bigger shift is that the operational baseline needs to move: "syscalls most container workloads do not use should be sealed by default" — starting with AF_ALG, then extending to io_uring, perf_event_open, userfaultfd, and bpf. ManoIT made AF_ALG sealing a permanent part of the standard seccomp profile and put io_uring and userfaultfd sealing on next quarter's agenda. Node security is no longer a race to patch the kernel faster — it is a steady operational practice of shrinking the surface that touches the kernel. Copy Fail is the LPE that pins that one line at the top of the 2026-H1 security backlog. Cross-posted from ManoIT. Authored by Claude (Opus 4.6), edited and technically reviewed by ManoIT. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Code Block

Copy

{ "defaultAction": "SCMP_ACT_ALLOW", "syscalls": [ { "names": ["socket", "socketpair"], "action": "SCMP_ACT_ERRNO", "errnoRet": 97, "args": [ { "index": 0, "value": 38, "op": "SCMP_CMP_EQ" } ], "comment": "Block AF_ALG (CVE-2026-31431 Copy Fail) — domain 38, errno EAFNOSUPPORT (97)" } ] } { "defaultAction": "SCMP_ACT_ALLOW", "syscalls": [ { "names": ["socket", "socketpair"], "action": "SCMP_ACT_ERRNO", "errnoRet": 97, "args": [ { "index": 0, "value": 38, "op": "SCMP_CMP_EQ" } ], "comment": "Block AF_ALG (CVE-2026-31431 Copy Fail) — domain 38, errno EAFNOSUPPORT (97)" } ] } { "defaultAction": "SCMP_ACT_ALLOW", "syscalls": [ { "names": ["socket", "socketpair"], "action": "SCMP_ACT_ERRNO", "errnoRet": 97, "args": [ { "index": 0, "value": 38, "op": "SCMP_CMP_EQ" } ], "comment": "Block AF_ALG (CVE-2026-31431 Copy Fail) — domain 38, errno EAFNOSUPPORT (97)" } ] } apiVersion: v1 kind: Pod metadata: name: app-with-af-alg-block annotations: manoit.co.kr/cve: "CVE-2026-31431" manoit.co.kr/mitigation: "seccomp-af-alg-block" spec: securityContext: seccompProfile: type: Localhost localhostProfile: profiles/block-af-alg.json containers: - name: app image: registry.manoit.co.kr/svc/api:v1.42.0 # OpenSSL in user space — no AF_ALG needed apiVersion: v1 kind: Pod metadata: name: app-with-af-alg-block annotations: manoit.co.kr/cve: "CVE-2026-31431" manoit.co.kr/mitigation: "seccomp-af-alg-block" spec: securityContext: seccompProfile: type: Localhost localhostProfile: profiles/block-af-alg.json containers: - name: app image: registry.manoit.co.kr/svc/api:v1.42.0 # OpenSSL in user space — no AF_ALG needed apiVersion: v1 kind: Pod metadata: name: app-with-af-alg-block annotations: manoit.co.kr/cve: "CVE-2026-31431" manoit.co.kr/mitigation: "seccomp-af-alg-block" spec: securityContext: seccompProfile: type: Localhost localhostProfile: profiles/block-af-alg.json containers: - name: app image: registry.manoit.co.kr/svc/api:v1.42.0 # OpenSSL in user space — no AF_ALG needed apiVersion: kyverno.io/v1 kind: ClusterPolicy metadata: name: require-af-alg-seccomp annotations: policies.kyverno.io/severity: high policies.kyverno.io/subject: "CVE-2026-31431 Copy Fail" spec: validationFailureAction: Enforce rules: - name: require-block-af-alg-profile match: any: - resources: kinds: ["Pod"] namespaces: ["default", "app-*", "svc-*"] validate: message: "Pod must use the block-af-alg seccomp profile (CVE-2026-31431 mitigation)" pattern: spec: securityContext: seccompProfile: type: Localhost localhostProfile: "profiles/block-af-alg.json" apiVersion: kyverno.io/v1 kind: ClusterPolicy metadata: name: require-af-alg-seccomp annotations: policies.kyverno.io/severity: high policies.kyverno.io/subject: "CVE-2026-31431 Copy Fail" spec: validationFailureAction: Enforce rules: - name: require-block-af-alg-profile match: any: - resources: kinds: ["Pod"] namespaces: ["default", "app-*", "svc-*"] validate: message: "Pod must use the block-af-alg seccomp profile (CVE-2026-31431 mitigation)" pattern: spec: securityContext: seccompProfile: type: Localhost localhostProfile: "profiles/block-af-alg.json" apiVersion: kyverno.io/v1 kind: ClusterPolicy metadata: name: require-af-alg-seccomp annotations: policies.kyverno.io/severity: high policies.kyverno.io/subject: "CVE-2026-31431 Copy Fail" spec: validationFailureAction: Enforce rules: - name: require-block-af-alg-profile match: any: - resources: kinds: ["Pod"] namespaces: ["default", "app-*", "svc-*"] validate: message: "Pod must use the block-af-alg seccomp profile (CVE-2026-31431 mitigation)" pattern: spec: securityContext: seccompProfile: type: Localhost localhostProfile: "profiles/block-af-alg.json" cat <<'EOF' | sudo tee /etc/modprobe.d/copy-fail-cve-2026-31431.conf # CVE-2026-31431 Copy Fail mitigation — block algif_aead until kernel is patched blacklist algif_aead install algif_aead /bin/true EOF # Unload from the current session too sudo rmmod algif_aead 2>/dev/null || true sudo update-initramfs -u # Debian/Ubuntu # RHEL family: dracut --force cat <<'EOF' | sudo tee /etc/modprobe.d/copy-fail-cve-2026-31431.conf # CVE-2026-31431 Copy Fail mitigation — block algif_aead until kernel is patched blacklist algif_aead install algif_aead /bin/true EOF # Unload from the current session too sudo rmmod algif_aead 2>/dev/null || true sudo update-initramfs -u # Debian/Ubuntu # RHEL family: dracut --force cat <<'EOF' | sudo tee /etc/modprobe.d/copy-fail-cve-2026-31431.conf # CVE-2026-31431 Copy Fail mitigation — block algif_aead until kernel is patched blacklist algif_aead install algif_aead /bin/true EOF # Unload from the current session too sudo rmmod algif_aead 2>/dev/null || true sudo update-initramfs -u # Debian/Ubuntu # RHEL family: dracut --force - macro: known_af_alg_callers condition: proc.name in (cryptsetup, systemd-cryptsetup, kcapi-enc, kcapi-dgst, kcapi-rng, kcapi-hasher) - rule: AF_ALG SEQPACKET Socket — Copy Fail Precursor desc: > Detects unexpected AF_ALG SOCK_SEQPACKET socket creation — the documented prerequisite for CVE-2026-31431 (Copy Fail). Legitimate AF_ALG callers use SOCK_DGRAM. condition: > evt.type = socket and socket.domain = AF_ALG and socket.type = SOCK_SEQPACKET and not known_af_alg_callers output: > AF_ALG SEQPACKET socket opened by suspicious process (user=%user.name pid=%proc.pid comm=%proc.name parent=%proc.pname container=%container.name image=%container.image.repository k8s_pod=%k8s.pod.name k8s_ns=%k8s.ns.name) priority: WARNING tags: [cve-2026-31431, copy-fail, lpe, page-cache, T1068, T1611] source: syscall - macro: known_af_alg_callers condition: proc.name in (cryptsetup, systemd-cryptsetup, kcapi-enc, kcapi-dgst, kcapi-rng, kcapi-hasher) - rule: AF_ALG SEQPACKET Socket — Copy Fail Precursor desc: > Detects unexpected AF_ALG SOCK_SEQPACKET socket creation — the documented prerequisite for CVE-2026-31431 (Copy Fail). Legitimate AF_ALG callers use SOCK_DGRAM. condition: > evt.type = socket and socket.domain = AF_ALG and socket.type = SOCK_SEQPACKET and not known_af_alg_callers output: > AF_ALG SEQPACKET socket opened by suspicious process (user=%user.name pid=%proc.pid comm=%proc.name parent=%proc.pname container=%container.name image=%container.image.repository k8s_pod=%k8s.pod.name k8s_ns=%k8s.ns.name) priority: WARNING tags: [cve-2026-31431, copy-fail, lpe, page-cache, T1068, T1611] source: syscall - macro: known_af_alg_callers condition: proc.name in (cryptsetup, systemd-cryptsetup, kcapi-enc, kcapi-dgst, kcapi-rng, kcapi-hasher) - rule: AF_ALG SEQPACKET Socket — Copy Fail Precursor desc: > Detects unexpected AF_ALG SOCK_SEQPACKET socket creation — the documented prerequisite for CVE-2026-31431 (Copy Fail). Legitimate AF_ALG callers use SOCK_DGRAM. condition: > evt.type = socket and socket.domain = AF_ALG and socket.type = SOCK_SEQPACKET and not known_af_alg_callers output: > AF_ALG SEQPACKET socket opened by suspicious process (user=%user.name pid=%proc.pid comm=%proc.name parent=%proc.pname container=%container.name image=%container.image.repository k8s_pod=%k8s.pod.name k8s_ns=%k8s.ns.name) priority: WARNING tags: [cve-2026-31431, copy-fail, lpe, page-cache, T1068, T1611] source: syscall