Tools

Tools: AppArmor and Seccomp in Kubernetes: What the Docs Don't Tell You

2026-03-23 0 views admin

Why Platform Teams Should Care

How the Kernel Enforces Security: Not a Pipeline

From Syscall to Enforcement: The Full Execution Path

The Runtime Default Trap

Managing Profiles at Scale: Declarative or Nothing

Writing a Real Profile: The Rule Model

The Path-Based Trap

What AppArmor Won't Stop

AppArmor vs. Seccomp vs. SELinux: An Opinionated Take

Choosing AppArmor vs. SELinux at Platform Level

Control Failure Mode Comparison

When Is seccomp Alone Enough?

LSM Stacking: The Frontier

Seccomp: Deeper Than You Think

The cBPF Filter Model

Return Actions (More Than Allow/Deny)

Argument Filtering: The Underused Power Feature

Two Non-Obvious Properties

What Seccomp Won't Stop

A Threat Scenario: Container Escape Attempt

What Breaks First in Production

Performance Considerations

A Production-Grade Pod Spec

Observability: Catching Denials Before They Become Incidents

Compliance Mapping

A Realistic Failure Postmortem

AppArmor's Threat Model Boundary

The Operational Cost of AppArmor

Common Anti-Patterns

Platform Team Playbook

Designing for Control Failure

Key Takeaways

The Real Purpose of AppArmor

If You Remember Only One Thing Per Control

Closing Thoughts You've read the Kubernetes security docs. You know to set appArmorProfile: RuntimeDefault and seccompProfile: RuntimeDefault. You've ticked the CIS Benchmark boxes. And yet, if a container in your cluster were compromised right now, you might be surprised by what these controls would — and wouldn't — stop. This post is for engineers who've moved past configuration and want to reason about AppArmor and seccomp under pressure: their real enforcement models, where each fails, how they interact, how to manage them at scale, and what breaks first in production. If you haven't read the companion post on syscalls — Syscalls in Kubernetes: The Invisible Layer That Runs Everything — the enforcement mechanics below will make more sense with that foundation. Both controls operate on the syscall path; understanding what a syscall is and how it traverses the kernel is prerequisite context. Most Kubernetes clusters already run with several security controls in place: So why add AppArmor to that stack? Because those controls primarily restrict what a container can ask the kernel to do — not what resources it can access once it's running. AppArmor fills a specific gap in that model: For platform teams operating multi-tenant clusters, this gap matters for two distinct reasons: Containment. A compromised container running under a tight AppArmor profile cannot read /etc/shadow, traverse /proc//maps, write to /sys/kernel/, or access service account tokens it wasn't explicitly granted. The blast radius of a post-exploitation scenario is substantially smaller. Detection signal. AppArmor denials fire early. When an attacker inside a container attempts reconnaissance — reading process maps, accessing credential paths, probing kernel interfaces — they hit AppArmor rules before they hit application-level controls. In many real incidents, AppArmor denial logs are the first signal that something is wrong, appearing minutes before behavioral anomalies surface in application logs. Without mandatory access controls like AppArmor or SELinux, a compromised container often has far broader read access to the host filesystem and /proc namespace than platform teams realize — even under PSS Restricted. AppArmor is the layer that makes that access explicit and auditable. A common mental model is that capabilities, AppArmor, and seccomp form an ordered enforcement stack. That's a useful simplification, but it's not how the kernel works — and the difference matters when you're reasoning about bypasses. All three are enforced inside the Linux kernel, but at different enforcement points with different objects: There is no strict serial pipeline. A process action may be evaluated against all three simultaneously, each at their respective hook point. AppArmor is uniquely path-aware in a way that neither capabilities nor seccomp are — it can express "this process may read /etc/nginx/ but not /etc/passwd" — which is why it complements rather than duplicates the others. Before diving into each control individually, it's worth being precise about when each fires. The ordering matters when you're reasoning about bypasses — and it's commonly misunderstood. When a container process makes a syscall, here's the actual sequence inside the kernel: The critical implication: seccomp executes before LSM hooks. In most common syscall paths, seccomp is evaluated at syscall entry, followed by LSM hooks (AppArmor/SELinux) and capability checks during operation-specific validation — the exact interleaving varies by syscall and operation type, but the invariant that matters is: a syscall denied by seccomp never reaches the AppArmor evaluation point. Conversely, a syscall allowed by seccomp is still subject to AppArmor's access controls on what that syscall can touch. Capabilities complete the triad. They're evaluated alongside LSM hooks for many operations and gate the privilege level of what a process can do — independent of both which syscalls it can invoke (seccomp) and which objects it can access (AppArmor). In practice, dropping capabilities is often the simplest way to eliminate entire exploit paths before seccomp or AppArmor need to engage. Dropping CAP_SYS_ADMIN removes more attack surface with one line than most seccomp tuning achieves: This is why the three controls are complementary by design, not redundant: A workload with all three configured correctly gets seccomp narrowing the callable surface, capabilities bounding privilege, then AppArmor restricting what permitted syscalls can reach. Remove any layer and the others become your only backstop. AppArmor reduces blast radius. Seccomp reduces reachable attack surface. These are different threat properties. Seccomp blocking mount() means the exploit path requiring mount() simply cannot execute — the kernel never sees it. AppArmor can't block a syscall entirely (it operates after kernel entry), but it can block every object that syscall would have reached. They defend different dimensions. One insight that gets lost in feature comparisons: most container escapes don't bypass all controls — they exploit the gaps between them. CVE-2022-0492 required unshare() and mount() in sequence; seccomp's RuntimeDefault blocked mount(), AppArmor's default profile independently denied it too. Either layer alone would have stopped the exploit. Security failures are rarely about a single mechanism failing — they're about assumptions breaking at the boundaries between seccomp, LSMs, and capabilities. Understanding those boundaries is what this article is actually about. RuntimeDefault is not a single profile. It's an instruction to the container runtime to apply its own default profile — and that profile differs across runtimes. Modern runtimes have largely converged on a similar baseline, but differences still exist in rule ordering, abstraction includes, and which operations are allowed. In hardened environments, those deltas matter — you're reasoning about a security envelope you don't fully control. The only way to get a consistent, auditable security envelope is to manage your own Localhost profiles: That localhostProfile value is a path relative to /etc/apparmor.d/ on the node. Which brings us to the hard problem: getting profiles onto nodes reliably. The platform implication for multi-cluster environments: two clusters running identical pod manifests can have subtly different effective security envelopes depending on their runtime. This creates configuration drift that is largely invisible in CI pipelines — a security review comparing manifests will see the same RuntimeDefault annotation, but the actual enforcement may differ. The only reliable mitigation is to treat profiles as versioned infrastructure and manage them declaratively, as covered in the next section. The naive approach is a DaemonSet that writes profile files and runs apparmor_parser -r. This works until it doesn't — profile updates require careful ordering, new nodes joining the cluster won't have profiles until the DaemonSet pod schedules there, and you have no audit trail. At cluster scale, profile lifecycle must be reconciled declaratively. The Security Profiles Operator (SPO) is currently the most production-ready implementation of that model — a Kubernetes-native controller that manages AppArmor (and seccomp) profiles as first-class CRDs. SPO reconciles profiles onto nodes, surfaces violations as Kubernetes events, and integrates with OPA/Gatekeeper to enforce that pods only reference profiles that are actually loaded. It also has a --record mode that observes a running workload and generates a profile from its real behavior — invaluable for brownfield workloads. Here's what a real SPO-managed AppArmorProfile CRD looks like for an nginx container. The Kubernetes metadata section is standard — name and namespace are how pods reference the profile. The spec.policy field is a raw AppArmor policy written in AppArmor's own language, which SPO writes directly to /etc/apparmor.d/ on each node: AppArmor rules follow a simple pattern: [qualifier] [resource] [permissions]. But the devil is in the details — particularly the path model. A profile worth deploying has explicit deny rules, not just allowances. The deny keyword takes precedence over allow rules and is your backstop against profile inheritance tricks. Default-deny with explicit allows is the correct mental model. AppArmor evaluates rules against resolved path strings, not inodes. This is a non-obvious but important limitation. If an attacker inside a container can manipulate how paths resolve — through bind mounts, symlinks, or mount namespace tricks — they may be able to access a file via an allowed path that reaches an inode your policy intended to restrict. For example: Well-written profiles must pair with readOnlyRootFilesystem: true and careful namespace configuration to close this class of bypass. It's not a reason to avoid AppArmor, but it's a reason to understand what you're actually enforcing. Generating a starting profile: Use aa-genprof to record behavior in complain mode, then tighten from there. For containers, SPO's --record mode is cleaner. This is the section most blog posts skip. Network exfiltration: AppArmor can allow or deny protocol families (TCP, UDP, raw) but has no concept of destination IPs or domains. A process with network inet tcp allowed can exfiltrate data to any external endpoint. That's NetworkPolicy's domain — and the two must work together. In-memory attacks: AppArmor is path-based and capability-based. It has no visibility into what happens in memory. A process with permitted capabilities can still execute heap sprays, ROP chains, or in-process exploitation. Runtime detection tools like Falco or Tetragon — which observe syscall patterns using eBPF — are the right layer for this. Kernel vulnerabilities: AppArmor is a kernel module that hooks via LSM interfaces. An exploit that compromises the kernel below those hooks bypasses AppArmor entirely. CVE-2022-0185 (a kernel heap overflow enabling container escape) is a real example — no AppArmor profile would have stopped it because the exploit occurred before LSM enforcement points were reached. Misloaded profiles: Depending on your runtime version and Kubernetes version, a missing Localhost profile may either cause pod admission failure or allow the container to start without confinement. This variance is precisely why profile lifecycle management must be automated — the SPO's status reporting makes this observable; bare annotation approaches fail silently. Path-based bypasses: As described above — bind mounts and mount namespace manipulation can cause policy to evaluate against a path that resolves to an unintended inode. These three are frequently described as interchangeable. They're not — they enforce different things at different kernel hook points. Most platform teams don't choose between AppArmor and SELinux for purely technical reasons. They choose based on node OS standardization — which is already determined before the security conversation happens. The operational cost of running both MAC engines across heterogeneous nodes — maintaining separate toolchains, policy languages, expertise, and audit pipelines — almost always outweighs any technical benefit. In practice, consistency of tooling and policy management matters more than the underlying MAC engine. This is the table most comparison posts omit: not what each control does, but what each control fails to stop. The table makes one thing clear: these two controls have almost no overlap in what they stop. They're not alternatives — they're complements covering different axes of the attack surface. Seccomp RuntimeDefault blocks syscalls that are rarely needed and frequently abused — keyctl, kexec_load, ptrace, mount, unshare, and others. For many workloads, this provides the most impactful risk reduction per unit of operational effort. Stay with seccomp-only when: The right answer is not "AppArmor everywhere" — it's "AppArmor where the containment value exceeds the operational cost." Modern Linux kernels (5.7+) support LSM stacking, which means AppArmor, SELinux, BPF-LSM, and Landlock can coexist in the same kernel, each enforcing at their respective hooks — though the exact combinations available depend on kernel configuration and distribution defaults. In hardened environments where stacking is supported, this enables layered MAC enforcement that goes well beyond any single module. Tetragon takes this further: where AppArmor is a static policy engine evaluated at access time, Tetragon uses eBPF to enforce dynamic policy based on runtime context — process ancestry, argument values, network connection state — things AppArmor cannot express. If you're running Tetragon, AppArmor and eBPF enforcement are complementary, not competing. The practical answer for most clusters: seccomp everywhere as a syscall filter, AppArmor on Ubuntu/Debian nodes for filesystem and capability restrictions, and SELinux on RHEL-based nodes. On kernel 5.7+, investigate BPF-LSM for workloads that need dynamic policy. The comparison section above treats seccomp as a peer of AppArmor. It is — but most engineers use it at a much shallower level than AppArmor because the documentation stops at "configure RuntimeDefault and move on." Here's what staff-level seccomp understanding looks like. Seccomp filters are classic BPF (cBPF) programs, not eBPF. This distinction matters: Unlike eBPF observability tools (Falco, Tetragon) which can maintain maps, call helper functions, and do complex processing, a seccomp filter is a simple decision function: given this syscall number and these argument values, return an action. That simplicity is why seccomp is evaluated first — before any LSM hook, before capability checks, before kernel logic runs at all. The filter is attached per-process (inheritable by children) and evaluated on every syscall entry. Attaching a filter requires CAP_SYS_ADMIN or the no_new_privs bit to be set — which is why allowPrivilegeEscalation: false is a prerequisite to meaningful seccomp enforcement. RuntimeDefault uses SCMP_ACT_ERRNO for blocked syscalls. But seccomp has a richer action set that custom profiles can leverage: The most powerful and least-known action is SCMP_ACT_NOTIFY (introduced in kernel 5.0). It sends the syscall event to a userspace supervisor via a file descriptor — the container's syscall is paused until the supervisor makes a decision. This turns seccomp into a programmable enforcement point: a policy engine can inspect the syscall's arguments, look up process context, consult external state, and then approve or deny — all before the kernel executes anything. This is how tools like sysbox implement OCI-compliant syscall interception without full gVisor overhead. For most Kubernetes workloads you'll never need SCMP_ACT_NOTIFY, but understanding it exists clarifies what seccomp is: not just a static blocklist, but a kernel-userspace interception interface with real programmability. Most engineers know seccomp filters on syscall numbers. Fewer know it can filter on syscall arguments. Seccomp's cBPF instructions can inspect the syscall argument registers. This enables policies like: Concretely, argument filtering enables: Argument filtering is how RuntimeDefault blocks namespace-creating clone() without breaking thread creation in multithreaded applications — a subtlety that gets lost in "seccomp blocks 44 syscalls" summaries. Seccomp filters are per-thread, not per-container. The filter is attached to a process and inherited by threads and child processes. In multithreaded applications, each thread runs under the same filter — but thread-specific behavior (signal handling, JVM internal threads, async runtimes) can produce syscall patterns that weren't covered during profile generation. JVM profiling windows that only observed the main application thread frequently miss the GC thread's madvise and mmap patterns. Seccomp is not namespace-aware. The filter applies equally regardless of which container, namespace, or cgroup the thread belongs to. A seccomp filter attached to a process doesn't know it's running inside a container. This is both a strength (it can't be bypassed by namespace tricks) and a limitation (you can't express "allow mount() inside the container's mount namespace but deny it in the host namespace" — that distinction lives in the capability and LSM layers, not seccomp). Seccomp has no concept of paths. It sees openat(AT_FDCWD, "/etc/shadow", O_RDONLY) as a permitted openat syscall — unless you've also checked the path argument. But path arguments are memory pointers, not inline values, and cBPF can't dereference pointers. Seccomp fundamentally cannot enforce path-level access control. That's AppArmor's domain. Seccomp cannot enforce stateful policies. Each syscall decision is independent. Seccomp cannot say "allow the first open() to this fd but deny the third" or "allow connect() unless the previous execve() was suspicious." For stateful, context-aware enforcement, you need eBPF-based tools (Tetragon) or SCMP_ACT_NOTIFY with a userspace supervisor. Seccomp cannot prevent allowed syscall abuse. If write() is allowed and an attacker has an open fd to a sensitive file, seccomp won't stop the write. Allowing a syscall means allowing it — what it operates on is AppArmor's responsibility. Argument filtering has coverage limits. Pointer arguments (file paths, struct pointers) cannot be dereferenced by cBPF. Only integer-valued arguments (flags, fd numbers, mode values) can be reliably checked. Seccomp answers "can this syscall be invoked?" but not "what does this syscall operate on?" Let's make this concrete. An attacker has achieved code execution inside a container via a deserialization vulnerability. What happens? Note that attack vector 2 is only blocked if you've explicitly added deny @{PROC}//maps r to your profile. Many RuntimeDefault profiles do not include this denial. This is a good example of why "we have AppArmor" and "we have AppArmor enforcing what we think it is" are different claims. Theory is necessary. Operational experience is different. Here's what actually causes profile-related incidents: Java and JVM workloads write to unexpected temp paths at startup — often derived from system properties and JDK version. A profile tight enough to deny /tmp/hsperfdata_* will break JVM health checks. Generate profiles from running workloads, not from documentation. Dynamic language runtimes (Python, Ruby, Node.js) load shared libraries and modules from paths that vary by distribution and package version. /usr/lib/x86_64-linux-gnu/ may need to be explicitly allowed, and that path is distribution-specific. Sidecars accessing shared volumes: If your app container and a sidecar (e.g., an Envoy proxy or log shipper) share an emptyDir, both containers need profiles that permit access to that volume's underlying path. The actual path under /var/lib/kubelet/pods/ is unpredictable — use @{run} and path globs carefully, or use a dedicated volume mount path that's consistent. Health and readiness probes: Kubernetes exec probes run inside the container's process namespace. If your probe invokes /bin/sh or /bin/curl and your profile restricts shell execution, probes will fail. Either allow the probe binary explicitly or switch to HTTP/TCP probes that don't require exec. Profile load ordering on node startup: If a node reboots and the SPO pod hasn't yet reconciled profiles before a workload pod schedules, the workload pod may fail admission. Build node readiness checks that verify profile presence, or use pod disruption budgets and node cordoning during maintenance windows. AppArmor's overhead is generally low, but not zero, and it scales with profile complexity. The cost is incurred at file open, exec, and network operations — each requires an LSM hook traversal and a policy lookup. For most workloads, this is imperceptible. For workloads doing high-frequency file I/O (logging pipelines, database engines, build systems), a dense profile with many path rules can add measurable latency to path lookups. Practical guidance: keep profiles focused. A profile with 20 precise rules is faster and easier to audit than one with 200 broad globs. Avoid / catch-alls on performance-sensitive paths — use specific subtree rules. And in complain mode, audit noise from high-frequency deny events can itself affect throughput; don't leave workloads in complain mode indefinitely. A few non-obvious choices: automountServiceAccountToken: false removes the default credential most pods get but rarely need. The emptyDir volumes provide writable space within a readOnlyRootFilesystem: true constraint — without them, many runtimes crash on startup trying to write to /tmp. Drop ALL capabilities and add back only what's needed; NET_BIND_SERVICE is the only one most web services require. Note also what Restricted PSS enforces at admission: it validates that appArmorProfile is present and set to RuntimeDefault or Localhost, but it does not validate the strength or content of the referenced profile. Admission compliance and actual security posture are not the same thing. AppArmor logs to the kernel audit subsystem: Distinguishing seccomp denials from AppArmor denials is a practical skill that gets skipped in documentation. They surface differently: When a container fails with Operation not permitted and you see nothing in the AppArmor audit log, seccomp is the likely culprit. Add SCMP_ACT_LOG to your profile's unknown syscalls during profiling to surface them — the Security Profiles Operator's --record mode does this automatically. The operation, profile, name, and comm fields tell you exactly what was denied, by which profile, from which binary. When a denial fires, the triage path matters: A denial from a known binary hitting a path the app shouldn't need is a high-fidelity signal — treat it as an incident until proven otherwise. Feed these logs into your SIEM(Security Information and Event Management) with a volume-based alert on any single pod. Note the gap: compliance frameworks check for the presence of controls, not their effectiveness. PSS Restricted enforces that a profile is declared, not that it actually restricts anything meaningful. That's your team's responsibility to close. Understanding where AppArmor goes wrong in practice is as important as knowing how to configure it. Here's a failure mode that plays out more often than it should. A platform team deploys a new microservice to production. They've done the right things: a custom Localhost profile, authored from SPO's recorded output in staging, reviewed and tightened before go-live. The deployment succeeds. Pods are running. And then, three hours later, Kubernetes starts restarting pods due to failing readiness probes. What went wrong: The profile was generated from the application process's behavior, but exec probes spawn a separate shell inside the container that the profiling run never observed. The profile correctly represented the app — but not all the processes Kubernetes would run inside it. The compounding failure: Under pressure at 2am, the team set type: Unconfined and moved on. That pod has been running without AppArmor enforcement for six months. Nobody notices because it's not visible in normal kubectl output. The systemic lesson: The root issue was not the profile itself — it was the rollout process. Security controls fail most often during deployment, not during steady state operation. The failure mode is predictable: profiles generated from synthetic or incomplete observation windows miss edge cases, and the incident response path of least resistance is to disable the control entirely. Treat AppArmor enforcement like any other breaking infrastructure change: progressive rollout, canary namespaces, and automated rollback to complain mode — not to Unconfined. The right rollout strategy: The lesson isn't that AppArmor is fragile. It's that profile coverage must be validated against everything the kernel will run in a container, not just the application binary. AppArmor is a meaningful control within a specific set of assumptions. Outside those assumptions, it provides weaker or no protection. Being explicit about this boundary is what separates operational security from security theater. The most common way these assumptions break silently in real clusters: None of these are AppArmor failures. They're assumption violations. The control is only as strong as the assumptions underneath it — which is why security posture reviews should explicitly verify these preconditions, not just check that the field is set. The real cost of AppArmor is not performance overhead — it's policy maintenance. Every Localhost profile you deploy becomes part of your platform's API surface. Applications depend on it. Admission controllers enforce its presence. And unlike most Kubernetes configuration, profile changes can silently break applications in ways that only surface under specific runtime conditions. The ongoing maintenance surface that teams underestimate: Without automation — SPO for lifecycle, GitOps for change tracking, SIEM alerts for denial spikes — AppArmor deployments tend to degrade into one of two failure modes: overly permissive profiles that allow nearly everything, or growing lists of Unconfined exceptions added during incidents and never revisited. The question for platform teams is not "should we use AppArmor?" but "do we have the operational infrastructure to maintain it at the security level it needs to operate?" These are the patterns that undermine AppArmor in production, across organizations that have done the work to deploy it. Treating RuntimeDefault as "secure enough." It's a reasonable baseline, but it's not a security posture. RuntimeDefault does not restrict filesystem access, doesn't prevent reading /proc/*/maps, and varies across runtimes. It's a starting point, not a destination. Generating profiles from synthetic or short-observation traffic. A profile generated from 30 minutes of staging traffic will miss weekly batch jobs, on-call runbook paths, slow-startup JVM behavior, and any probe interactions that didn't fire during the window. Observation windows must cover full operational cycles — including failure modes. Running security controls only in production. Profiles validated only in production are profiles you can't roll back safely. Complain mode in staging, enforce in production, with CI comparison of denial delta between environments. Setting Unconfined during incidents and not reverting. This is the most common way security posture degrades silently. Every Unconfined exception added under pressure is a permanent policy rollback unless tracked and scheduled for follow-up. Not auditing profile drift. Applications change. Profiles don't automatically change with them. An 18-month-old profile for a service that has been through three dependency upgrades is almost certainly wrong — either too permissive (allowing paths the app no longer needs) or insufficiently permissive (missing paths added in newer dependencies). Alerting on absolute denial counts rather than deviation. Some workloads produce steady, low-level denial noise from probe edge cases or library behavior. Alerting on any denial will exhaust on-call teams; alerting on zero denials misses real events. Baseline first, then alert on deviation. If you operate Ubuntu-based Kubernetes nodes and are building or hardening your security posture, here's a concrete sequence. This isn't theory — it's the operational order that reduces the risk of each step breaking what the previous step protected. A few implementation notes on the less obvious steps: Step 1 before AppArmor: Seccomp RuntimeDefault is lower risk, higher portability, and easier to validate. Getting it in first means AppArmor is hardening a surface that's already narrowed. Don't try to do both simultaneously — sequence reduces blast radius. Step 4 duration matters: 48 hours is a minimum. If your workload has weekly batch jobs, cron patterns, or on-call runbooks that trigger unusual paths, you need observation windows that cover those cycles. A profile generated from one hour of traffic will miss them. Step 7 baselining: Before you alert on denial spikes, you need to know what "normal" looks like. Some workloads legitimately produce periodic denials from probe edge cases or library behavior that's been allowed to be noisy. Baseline first, alert on deviation — not absolute counts. Step 8 is the one teams skip: Profiles drift out of sync with applications as code changes. An overly permissive profile that hasn't been reviewed in 18 months is security debt. Treat profile review as part of your service's security hygiene, not a one-time setup task. In real systems, controls fail. Profiles drift. Runtimes change defaults. Exceptions get introduced under pressure and never revisited. The value of layering is not redundancy — it's graceful degradation. When reasoning about your security posture, think through each layer's failure mode: This framing changes how you think about rollout decisions. A team that disables AppArmor under incident pressure hasn't removed one control — they've removed one layer of a degradation chain. The question is: which other layers are still in place, and are they configured to compensate? It also informs how you instrument for failure. Monitoring that seccomp is applied (via the pod's securityContext) and AppArmor is loaded (via aa-status) should be part of your cluster's security posture signals — not one-time setup validation. AppArmor's value comes from how you operate it, not that you've enabled it. Seccomp's value comes from the specificity of your profile, not the existence of one. RuntimeDefault for both is not a security posture — it's a starting point. RuntimeDefault seccomp blocks ~44 high-risk syscalls but doesn't cover newer attack surfaces like io_uring. RuntimeDefault AppArmor is not a single well-defined thing; modern runtimes have converged but differences remain, and those differences matter in hardened environments. AppArmor has real blind spots: network exfiltration, in-memory attacks, kernel exploits, and path-based bypass via mount manipulation. Seccomp has its own: it cannot enforce path-level access control, cannot reason about what allowed syscalls operate on, and cannot make stateful policy decisions. These aren't reasons to skip either control — they're reasons to understand what you're actually enforcing and pair both with NetworkPolicy and runtime detection. Custom Localhost AppArmor profiles managed declaratively via the Security Profiles Operator, combined with custom seccomp profiles scoped to actual workload behavior, are the only way to get a consistent, auditable posture at scale. On modern kernels with LSM stacking, AppArmor, BPF-LSM, and Landlock can coexist — enabling layered MAC enforcement that goes beyond any single module. eBPF-based systems like Tetragon express things AppArmor and seccomp cannot: dynamic, context-aware enforcement based on process ancestry and runtime state. These are complementary layers, not alternatives. When something goes wrong, AppArmor denial logs are among your highest-quality signals for distinguishing misconfiguration from intrusion. Build that triage path before you need it. AppArmor's goal is not to stop every exploit. No single control does that. Its goal is to force attackers to cross more boundaries — and to create high-fidelity signals when they try. A compromised container under a well-authored AppArmor profile cannot silently read credentials, probe /proc, write to kernel interfaces, or move laterally via the filesystem. The attacker's options narrow, and each attempt they make is logged with enough context to tell you exactly what was tried. In modern Kubernetes platforms, AppArmor and seccomp are most valuable not as standalone controls, but as two layers in a deliberate architecture: No single control is sufficient. The platform architecture is the control — and these two layers together make the attack surface explicit, auditable, and enforceable at both the syscall invocation and object access levels. One framing worth keeping: Kubernetes doesn't implement these controls — it orchestrates them. securityContext, appArmorProfile, and seccompProfile are instructions to the Linux kernel. The real enforcement always happens below the Kubernetes abstraction layer, in the kernel's syscall path. Understanding that boundary is what prevents "we have AppArmor configured" from being confused with "we have AppArmor enforcing what we think it is." Security is not about stacking controls — it's about understanding where each control stops. The table is also a checklist: if you can't articulate what each layer doesn't cover, you're configuring controls, not designing a security posture. The engineers who get this right aren't the ones who've read the most documentation. They're the ones who've been paged at 2am, watched a team disable AppArmor under pressure and never re-enable it, and learned the hard way that "we have security controls" and "we know what our security controls actually enforce" are different claims. AppArmor and seccomp are not hard to enable. They're hard to operate correctly over time — as applications change, nodes are replaced, sidecars are added, and profiles drift silently out of sync. The tooling exists to do this well: the Security Profiles Operator for lifecycle management, SPO's record mode for profile generation, SIEM integration for denial signals, and eBPF-based detection for the gaps neither control can fill. What separates a security posture from a compliance checkbox is whether you've thought through the failure modes — what happens when a profile is missing, when a runtime changes its defaults, when a team adds Unconfined during an incident. That's the work. The YAML is the easy part. For a deeper understanding of the syscall mechanics that both controls rely on, see the companion post: Syscalls in Kubernetes: The Invisible Layer That Runs Everything. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Code Block

Copy

User Process │ └── syscall() ← ring 3 → ring 0 transition │ ├── seccomp filter (classic BPF) │ │ │ ├── KILL / ERRNO / TRAP / NOTIFY → exit here, never reaches kernel │ └── ALLOW → continue │ ├── kernel executes syscall logic │ │ │ └── LSM hooks fire (AppArmor / SELinux) │ │ │ ├── path / capability / network label check │ └── DENY → EACCES, operation aborted │ ├── capability checks (if privileged op requested) │ │ │ └── e.g. CAP_SYS_ADMIN for mount(), CAP_NET_RAW for raw sockets │ └── actual resource access (filesystem, network, IPC) User Process │ └── syscall() ← ring 3 → ring 0 transition │ ├── seccomp filter (classic BPF) │ │ │ ├── KILL / ERRNO / TRAP / NOTIFY → exit here, never reaches kernel │ └── ALLOW → continue │ ├── kernel executes syscall logic │ │ │ └── LSM hooks fire (AppArmor / SELinux) │ │ │ ├── path / capability / network label check │ └── DENY → EACCES, operation aborted │ ├── capability checks (if privileged op requested) │ │ │ └── e.g. CAP_SYS_ADMIN for mount(), CAP_NET_RAW for raw sockets │ └── actual resource access (filesystem, network, IPC) User Process │ └── syscall() ← ring 3 → ring 0 transition │ ├── seccomp filter (classic BPF) │ │ │ ├── KILL / ERRNO / TRAP / NOTIFY → exit here, never reaches kernel │ └── ALLOW → continue │ ├── kernel executes syscall logic │ │ │ └── LSM hooks fire (AppArmor / SELinux) │ │ │ ├── path / capability / network label check │ └── DENY → EACCES, operation aborted │ ├── capability checks (if privileged op requested) │ │ │ └── e.g. CAP_SYS_ADMIN for mount(), CAP_NET_RAW for raw sockets │ └── actual resource access (filesystem, network, IPC) securityContext: appArmorProfile: type: Localhost localhostProfile: my-org/nginx-v2 securityContext: appArmorProfile: type: Localhost localhostProfile: my-org/nginx-v2 securityContext: appArmorProfile: type: Localhost localhostProfile: my-org/nginx-v2 apiVersion: security-profiles-operator.x-k8s.io/v1alpha1 kind: AppArmorProfile metadata: name: nginx-restricted # pods reference this name in appArmorProfile.localhostProfile namespace: production # profile is scoped to this namespace spec: policy: | #include <tunables/global> # defines @{PROC}, @{HOME} and other path variables profile nginx-restricted flags=(attach_disconnected) { # attach_disconnected: allow profile to apply even if the binary path # isn't reachable at load time (common in containers with overlayfs) #include <abstractions/base> # allows libc, locale files, /dev/null etc. #include <abstractions/nameservice> # allows DNS resolution (/etc/resolv.conf, nsswitch) # Allow outbound TCP only — no UDP, no raw sockets network inet tcp, network inet6 tcp, # Binary: map+read+execute (mr). Denies writes to the nginx binary itself. /usr/sbin/nginx mr, /etc/nginx/** r, # read-only access to all nginx config files /var/log/nginx/** w, # write access for access/error logs /var/cache/nginx/** rw, # read+write for proxy cache and temp files /tmp/** rw, # read+write for nginx temp upload/body buffers # Explicit denials — these take precedence over any allow rules above deny /proc/sys/kernel/core_pattern w, # prevent overwriting core dump handler (container escape vector) deny @{PROC}/*/mem rw, # prevent reading/writing any process's memory deny /sys/** w, # prevent writing to sysfs (kernel tunable manipulation) } apiVersion: security-profiles-operator.x-k8s.io/v1alpha1 kind: AppArmorProfile metadata: name: nginx-restricted # pods reference this name in appArmorProfile.localhostProfile namespace: production # profile is scoped to this namespace spec: policy: | #include <tunables/global> # defines @{PROC}, @{HOME} and other path variables profile nginx-restricted flags=(attach_disconnected) { # attach_disconnected: allow profile to apply even if the binary path # isn't reachable at load time (common in containers with overlayfs) #include <abstractions/base> # allows libc, locale files, /dev/null etc. #include <abstractions/nameservice> # allows DNS resolution (/etc/resolv.conf, nsswitch) # Allow outbound TCP only — no UDP, no raw sockets network inet tcp, network inet6 tcp, # Binary: map+read+execute (mr). Denies writes to the nginx binary itself. /usr/sbin/nginx mr, /etc/nginx/** r, # read-only access to all nginx config files /var/log/nginx/** w, # write access for access/error logs /var/cache/nginx/** rw, # read+write for proxy cache and temp files /tmp/** rw, # read+write for nginx temp upload/body buffers # Explicit denials — these take precedence over any allow rules above deny /proc/sys/kernel/core_pattern w, # prevent overwriting core dump handler (container escape vector) deny @{PROC}/*/mem rw, # prevent reading/writing any process's memory deny /sys/** w, # prevent writing to sysfs (kernel tunable manipulation) } apiVersion: security-profiles-operator.x-k8s.io/v1alpha1 kind: AppArmorProfile metadata: name: nginx-restricted # pods reference this name in appArmorProfile.localhostProfile namespace: production # profile is scoped to this namespace spec: policy: | #include <tunables/global> # defines @{PROC}, @{HOME} and other path variables profile nginx-restricted flags=(attach_disconnected) { # attach_disconnected: allow profile to apply even if the binary path # isn't reachable at load time (common in containers with overlayfs) #include <abstractions/base> # allows libc, locale files, /dev/null etc. #include <abstractions/nameservice> # allows DNS resolution (/etc/resolv.conf, nsswitch) # Allow outbound TCP only — no UDP, no raw sockets network inet tcp, network inet6 tcp, # Binary: map+read+execute (mr). Denies writes to the nginx binary itself. /usr/sbin/nginx mr, /etc/nginx/** r, # read-only access to all nginx config files /var/log/nginx/** w, # write access for access/error logs /var/cache/nginx/** rw, # read+write for proxy cache and temp files /tmp/** rw, # read+write for nginx temp upload/body buffers # Explicit denials — these take precedence over any allow rules above deny /proc/sys/kernel/core_pattern w, # prevent overwriting core dump handler (container escape vector) deny @{PROC}/*/mem rw, # prevent reading/writing any process's memory deny /sys/** w, # prevent writing to sysfs (kernel tunable manipulation) } # File rules /etc/nginx/** r # read all files under /etc/nginx /var/log/nginx/*.log w # write to log files /tmp/nginx-*/ rw # read/write temp directories /run/nginx.pid rw # read/write PID file # Capability rules capability net_bind_service, # allow binding to ports < 1024 capability dac_override, # override file permission checks (avoid if possible) # Network rules network inet tcp, network inet6 tcp, deny network raw, # deny raw sockets explicitly # Deny dangerous kernel paths explicitly deny /proc/sys/kernel/** w, deny @{PROC}/*/maps r, # prevent reading process memory maps (explicit deny required) # File rules /etc/nginx/** r # read all files under /etc/nginx /var/log/nginx/*.log w # write to log files /tmp/nginx-*/ rw # read/write temp directories /run/nginx.pid rw # read/write PID file # Capability rules capability net_bind_service, # allow binding to ports < 1024 capability dac_override, # override file permission checks (avoid if possible) # Network rules network inet tcp, network inet6 tcp, deny network raw, # deny raw sockets explicitly # Deny dangerous kernel paths explicitly deny /proc/sys/kernel/** w, deny @{PROC}/*/maps r, # prevent reading process memory maps (explicit deny required) # File rules /etc/nginx/** r # read all files under /etc/nginx /var/log/nginx/*.log w # write to log files /tmp/nginx-*/ rw # read/write temp directories /run/nginx.pid rw # read/write PID file # Capability rules capability net_bind_service, # allow binding to ports < 1024 capability dac_override, # override file permission checks (avoid if possible) # Network rules network inet tcp, network inet6 tcp, deny network raw, # deny raw sockets explicitly # Deny dangerous kernel paths explicitly deny /proc/sys/kernel/** w, deny @{PROC}/*/maps r, # prevent reading process memory maps (explicit deny required) # If /allowed-dir is permitted and an attacker can bind-mount /etc/shadow there: mount --bind /etc/shadow /allowed-dir/shadow # now readable via allowed path # If /allowed-dir is permitted and an attacker can bind-mount /etc/shadow there: mount --bind /etc/shadow /allowed-dir/shadow # now readable via allowed path # If /allowed-dir is permitted and an attacker can bind-mount /etc/shadow there: mount --bind /etc/shadow /allowed-dir/shadow # now readable via allowed path # Load a profile in complain mode (logs denials, doesn't enforce) apparmor_parser -C /etc/apparmor.d/my-profile # Watch would-be denials in real time journalctl -k -f | grep apparmor # Load a profile in complain mode (logs denials, doesn't enforce) apparmor_parser -C /etc/apparmor.d/my-profile # Watch would-be denials in real time journalctl -k -f | grep apparmor # Load a profile in complain mode (logs denials, doesn't enforce) apparmor_parser -C /etc/apparmor.d/my-profile # Watch would-be denials in real time journalctl -k -f | grep apparmor AppArmor Coverage vs. Threat Severity HIGH │ ✗ Kernel CVE bypass ║ ✓ Cgroup release_agent escape │ ✗ In-memory / ROP chain ║ ✓ Write to /sys or /proc/kernel S │ ✗ Network exfiltration ║ ✓ Service account token read E │ ✗ Misloaded profile (silent) ║ ✓ Read /proc/*/maps (recon) V │ ✗ Path traversal/bind mount ║ E ─────┼──────────────────────────────╫──────────────────────────────── R │ ║ I │ (no threats here — ║ ✓ Raw socket creation T │ low severity threats ║ Y │ not covered by AA ║ │ are acceptable risk) ║ LOW │ ║ └──────────────────────────────╨──────────────────────────────── LOW COVERAGE HIGH COVERAGE ◄── AppArmor Coverage ──► ✗ = AppArmor does NOT cover this threat (needs other controls) ✓ = AppArmor blocks this (if profile is correctly written) AppArmor Coverage vs. Threat Severity HIGH │ ✗ Kernel CVE bypass ║ ✓ Cgroup release_agent escape │ ✗ In-memory / ROP chain ║ ✓ Write to /sys or /proc/kernel S │ ✗ Network exfiltration ║ ✓ Service account token read E │ ✗ Misloaded profile (silent) ║ ✓ Read /proc/*/maps (recon) V │ ✗ Path traversal/bind mount ║ E ─────┼──────────────────────────────╫──────────────────────────────── R │ ║ I │ (no threats here — ║ ✓ Raw socket creation T │ low severity threats ║ Y │ not covered by AA ║ │ are acceptable risk) ║ LOW │ ║ └──────────────────────────────╨──────────────────────────────── LOW COVERAGE HIGH COVERAGE ◄── AppArmor Coverage ──► ✗ = AppArmor does NOT cover this threat (needs other controls) ✓ = AppArmor blocks this (if profile is correctly written) AppArmor Coverage vs. Threat Severity HIGH │ ✗ Kernel CVE bypass ║ ✓ Cgroup release_agent escape │ ✗ In-memory / ROP chain ║ ✓ Write to /sys or /proc/kernel S │ ✗ Network exfiltration ║ ✓ Service account token read E │ ✗ Misloaded profile (silent) ║ ✓ Read /proc/*/maps (recon) V │ ✗ Path traversal/bind mount ║ E ─────┼──────────────────────────────╫──────────────────────────────── R │ ║ I │ (no threats here — ║ ✓ Raw socket creation T │ low severity threats ║ Y │ not covered by AA ║ │ are acceptable risk) ║ LOW │ ║ └──────────────────────────────╨──────────────────────────────── LOW COVERAGE HIGH COVERAGE ◄── AppArmor Coverage ──► ✗ = AppArmor does NOT cover this threat (needs other controls) ✓ = AppArmor blocks this (if profile is correctly written) { "syscalls": [ { "names": ["clone"], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 2114060288, "op": "SCMP_CMP_MASKED_EQ" } ] } ] } { "syscalls": [ { "names": ["clone"], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 2114060288, "op": "SCMP_CMP_MASKED_EQ" } ] } ] } { "syscalls": [ { "names": ["clone"], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 2114060288, "op": "SCMP_CMP_MASKED_EQ" } ] } ] } apiVersion: v1 kind: Pod metadata: name: hardened-app namespace: production spec: automountServiceAccountToken: false # disable default SA token mount — most pods don't need API access securityContext: # pod-level: applies to all containers runAsNonRoot: true # kubelet rejects the pod if the image runs as UID 0 runAsUser: 1001 # explicit UID — avoid root (0) and well-known service UIDs runAsGroup: 1001 # primary GID for the process fsGroup: 1001 # volume files are chowned to this GID on mount seccompProfile: type: RuntimeDefault # use the container runtime's built-in seccomp profile (~44 blocked syscalls) # move to Localhost + custom profile for high-security workloads containers: - name: app image: my-org/app:1.4.2 # pin to digest in production; tags are mutable securityContext: # container-level: overrides pod-level where both exist appArmorProfile: type: Localhost # use a node-loaded custom profile, not RuntimeDefault localhostProfile: my-org/app-v1 # path relative to /etc/apparmor.d/ — must be loaded by SPO before pod starts allowPrivilegeEscalation: false # prevents setuid binaries and sudo from granting more privilege than the parent readOnlyRootFilesystem: true # container filesystem is immutable — writes go only to explicit volume mounts capabilities: drop: - ALL # drop every capability Linux grants by default add: - NET_BIND_SERVICE # re-add only if binding to ports < 1024; remove if app uses port >= 1024 volumeMounts: - name: tmp mountPath: /tmp # writable scratch space — required by many runtimes even under readOnlyRootFilesystem - name: cache mountPath: /var/cache/app # app-specific writable path; scope this as narrowly as possible volumes: - name: tmp emptyDir: {} # ephemeral, node-local; wiped on pod restart — not for persistent data - name: cache emptyDir: {} # same — both volumes exist only to satisfy readOnlyRootFilesystem apiVersion: v1 kind: Pod metadata: name: hardened-app namespace: production spec: automountServiceAccountToken: false # disable default SA token mount — most pods don't need API access securityContext: # pod-level: applies to all containers runAsNonRoot: true # kubelet rejects the pod if the image runs as UID 0 runAsUser: 1001 # explicit UID — avoid root (0) and well-known service UIDs runAsGroup: 1001 # primary GID for the process fsGroup: 1001 # volume files are chowned to this GID on mount seccompProfile: type: RuntimeDefault # use the container runtime's built-in seccomp profile (~44 blocked syscalls) # move to Localhost + custom profile for high-security workloads containers: - name: app image: my-org/app:1.4.2 # pin to digest in production; tags are mutable securityContext: # container-level: overrides pod-level where both exist appArmorProfile: type: Localhost # use a node-loaded custom profile, not RuntimeDefault localhostProfile: my-org/app-v1 # path relative to /etc/apparmor.d/ — must be loaded by SPO before pod starts allowPrivilegeEscalation: false # prevents setuid binaries and sudo from granting more privilege than the parent readOnlyRootFilesystem: true # container filesystem is immutable — writes go only to explicit volume mounts capabilities: drop: - ALL # drop every capability Linux grants by default add: - NET_BIND_SERVICE # re-add only if binding to ports < 1024; remove if app uses port >= 1024 volumeMounts: - name: tmp mountPath: /tmp # writable scratch space — required by many runtimes even under readOnlyRootFilesystem - name: cache mountPath: /var/cache/app # app-specific writable path; scope this as narrowly as possible volumes: - name: tmp emptyDir: {} # ephemeral, node-local; wiped on pod restart — not for persistent data - name: cache emptyDir: {} # same — both volumes exist only to satisfy readOnlyRootFilesystem apiVersion: v1 kind: Pod metadata: name: hardened-app namespace: production spec: automountServiceAccountToken: false # disable default SA token mount — most pods don't need API access securityContext: # pod-level: applies to all containers runAsNonRoot: true # kubelet rejects the pod if the image runs as UID 0 runAsUser: 1001 # explicit UID — avoid root (0) and well-known service UIDs runAsGroup: 1001 # primary GID for the process fsGroup: 1001 # volume files are chowned to this GID on mount seccompProfile: type: RuntimeDefault # use the container runtime's built-in seccomp profile (~44 blocked syscalls) # move to Localhost + custom profile for high-security workloads containers: - name: app image: my-org/app:1.4.2 # pin to digest in production; tags are mutable securityContext: # container-level: overrides pod-level where both exist appArmorProfile: type: Localhost # use a node-loaded custom profile, not RuntimeDefault localhostProfile: my-org/app-v1 # path relative to /etc/apparmor.d/ — must be loaded by SPO before pod starts allowPrivilegeEscalation: false # prevents setuid binaries and sudo from granting more privilege than the parent readOnlyRootFilesystem: true # container filesystem is immutable — writes go only to explicit volume mounts capabilities: drop: - ALL # drop every capability Linux grants by default add: - NET_BIND_SERVICE # re-add only if binding to ports < 1024; remove if app uses port >= 1024 volumeMounts: - name: tmp mountPath: /tmp # writable scratch space — required by many runtimes even under readOnlyRootFilesystem - name: cache mountPath: /var/cache/app # app-specific writable path; scope this as narrowly as possible volumes: - name: tmp emptyDir: {} # ephemeral, node-local; wiped on pod restart — not for persistent data - name: cache emptyDir: {} # same — both volumes exist only to satisfy readOnlyRootFilesystem # Live denial stream journalctl -k -f | grep 'apparmor="DENIED"' # Example denial entry kernel: audit: type=1400 audit(1708012345.123:42): apparmor="DENIED" operation="open" profile="my-org/app-v1" name="/proc/1/maps" pid=12345 comm="sh" requested_mask="r" denied_mask="r" fsuid=1001 ouid=0 # Live denial stream journalctl -k -f | grep 'apparmor="DENIED"' # Example denial entry kernel: audit: type=1400 audit(1708012345.123:42): apparmor="DENIED" operation="open" profile="my-org/app-v1" name="/proc/1/maps" pid=12345 comm="sh" requested_mask="r" denied_mask="r" fsuid=1001 ouid=0 # Live denial stream journalctl -k -f | grep 'apparmor="DENIED"' # Example denial entry kernel: audit: type=1400 audit(1708012345.123:42): apparmor="DENIED" operation="open" profile="my-org/app-v1" name="/proc/1/maps" pid=12345 comm="sh" requested_mask="r" denied_mask="r" fsuid=1001 ouid=0 If seccomp fails (missing profile, wrong defaults) → AppArmor still restricts filesystem and object access → Capabilities still bound privilege → NetworkPolicy still governs egress If AppArmor fails (Unconfined exception, profile drift) → Seccomp still blocks high-risk syscall classes → readOnlyRootFilesystem still prevents write exploitation → Capabilities still block privileged operations If both fail → Capabilities + PSS Restricted still constrain privilege → Detection (Falco/Tetragon) becomes your last active layer → NetworkPolicy still limits lateral movement If seccomp fails (missing profile, wrong defaults) → AppArmor still restricts filesystem and object access → Capabilities still bound privilege → NetworkPolicy still governs egress If AppArmor fails (Unconfined exception, profile drift) → Seccomp still blocks high-risk syscall classes → readOnlyRootFilesystem still prevents write exploitation → Capabilities still block privileged operations If both fail → Capabilities + PSS Restricted still constrain privilege → Detection (Falco/Tetragon) becomes your last active layer → NetworkPolicy still limits lateral movement If seccomp fails (missing profile, wrong defaults) → AppArmor still restricts filesystem and object access → Capabilities still bound privilege → NetworkPolicy still governs egress If AppArmor fails (Unconfined exception, profile drift) → Seccomp still blocks high-risk syscall classes → readOnlyRootFilesystem still prevents write exploitation → Capabilities still block privileged operations If both fail → Capabilities + PSS Restricted still constrain privilege → Detection (Falco/Tetragon) becomes your last active layer → NetworkPolicy still limits lateral movement - Why Platform Teams Should Care - How the Kernel Enforces Security: Not a Pipeline - From Syscall to Enforcement: The Full Execution Path - The Runtime Default Trap - Managing Profiles at Scale: Declarative or Nothing - Writing a Real Profile: The Rule Model The Path-Based Trap - The Path-Based Trap - What AppArmor Won't Stop - AppArmor vs. Seccomp vs. SELinux: An Opinionated Take Choosing AppArmor vs. SELinux at Platform Level Control Failure Mode Comparison When Is seccomp Alone Enough? LSM Stacking: The Frontier - Choosing AppArmor vs. SELinux at Platform Level - Control Failure Mode Comparison - When Is seccomp Alone Enough? - LSM Stacking: The Frontier - Seccomp: Deeper Than You Think The cBPF Filter Model Return Actions (More Than Allow/Deny) Argument Filtering: The Underused Power Feature Two Non-Obvious Properties What Seccomp Won't Stop - The cBPF Filter Model - Return Actions (More Than Allow/Deny) - Argument Filtering: The Underused Power Feature - Two Non-Obvious Properties - What Seccomp Won't Stop - A Threat Scenario: Container Escape Attempt - What Breaks First in Production - Performance Considerations - A Production-Grade Pod Spec - Observability: Catching Denials Before They Become Incidents - Compliance Mapping - A Realistic Failure Postmortem - AppArmor's Threat Model Boundary - The Operational Cost of AppArmor - Common Anti-Patterns - Platform Team Playbook - Designing for Control Failure - Key Takeaways - The Real Purpose of AppArmor - If You Remember Only One Thing Per Control - Closing Thoughts - The Path-Based Trap - Choosing AppArmor vs. SELinux at Platform Level - Control Failure Mode Comparison - When Is seccomp Alone Enough? - LSM Stacking: The Frontier - The cBPF Filter Model - Return Actions (More Than Allow/Deny) - Argument Filtering: The Underused Power Feature - Two Non-Obvious Properties - What Seccomp Won't Stop - Pod Security Standards at admission - seccomp RuntimeDefault filtering syscalls - NetworkPolicies governing traffic paths - RBAC limiting API surface - Capabilities gate privileged operations at the point they're requested (e.g., CAP_NET_BIND_SERVICE before binding to a port below 1024). - Seccomp intercepts syscalls before they execute, using a BPF filter to allow, deny, or trap them. - AppArmor is a Linux Security Module (LSM) that hooks into kernel object access — mediating access to files, sockets, capabilities, and IPC based on a per-process policy. - Seccomp → reduce what the kernel will execute - AppArmor → reduce what processes can access - Capabilities → reduce what processes are privileged to do - Seccomp answers: can this syscall be invoked at all? - AppArmor answers: given this syscall is allowed, what can it operate on? - Capabilities answers: does this process hold the privilege required for this operation? - containerd delegates to the default.profile shipped by the runtime shim. - CRI-O uses a profile derived from the OCI runtime spec with its own modifications. - Docker uses its own docker-default profile (relevant in non-Kubernetes contexts). - You need path-level access control (restrict reads to specific filesystem subtrees) - You're running multi-tenant workloads and need isolation between namespace tenants - You need explicit capability access control beyond what the Pod securityContext expresses - You're building toward a compliance posture that requires MAC - Your nodes are heterogeneous (mixed OS) and profile management would span both engines - Your workloads are internal tooling with low breach impact - The operational cost of profile lifecycle management exceeds your team's capacity - Filters are compiled into a set of instructions evaluated in kernel context on every syscall entry - Execution is intentionally constrained: no loops, bounded instruction count, no memory allocation - This constraint is a feature — it guarantees the filter cannot hang or crash the kernel - Allow open() for read-only, deny write: check the flags argument for O_RDWR or O_WRONLY - Block clone() with namespace-creating flags: the RuntimeDefault profile already does this — it doesn't block clone entirely (threads need it), it blocks the CLONE_NEWUSER / CLONE_NEWNS flag combinations that enable container escapes - Restrict prctl() operations: allow PR_SET_NAME (used by many runtimes), block PR_SET_DUMPABLE and PR_CAP_AMBIENT - Deploy to a canary namespace with the profile in complain mode first (flags=(complain)), even if you generated it from production traffic. - Monitor denials for 24–48 hours across all probe types, init containers, and sidecar interactions. - Promote to enforce mode only after the denial stream is clean. - If enforcement causes an incident, roll back to complain mode — never to Unconfined. Complain mode preserves the security signal while restoring service. - Treat a post-incident Unconfined pod as technical debt with a ticket, not a resolved incident. - A team runs a debug container with privileged: true to diagnose an incident and never removes it. - A legacy workload requires hostPath mounts that weren't caught in policy review. - A node autoscaler provisions a new node type whose image doesn't have the SPO-managed profiles loaded. - An operator chart sets appArmorProfile: type: Unconfined for convenience during development and the override is never removed before promotion. - Profile lifecycle management — profiles must be versioned, reviewed, and retired as applications evolve. A profile that was accurate at authoring time may be wrong after a dependency upgrade or JDK version change. - Runtime compatibility — when containerd or CRI-O ships a new version, default behavior can shift. Profiles that relied on implicit runtime behavior may need updates. - Sidecar and probe coverage — every new sidecar (Envoy, log shipper, OTel collector) added to a namespace needs its own profile or must be explicitly covered. Forgetting this is how Unconfined exceptions accumulate. - Exception management under pressure — during incidents, the fastest resolution is always to disable the control. Without a clear policy on what constitutes a legitimate exception (and a process for revisiting it), profiles erode over time.

Share this article

Twitter Facebook LinkedIn Reddit

🏷️ Tags

toolsutilitiessecurity toolsapparmorseccompkubernetescontentsplatformrce

More from Tools

Tools: Gas-Aware Trading: Execute Only When Gas Is Cheap (2026)

2026-03-30 0

Tools: Grafana k6 Has a Free API That Load Tests Your APIs With JavaScript - Full Analysis

2026-03-30 0

Tools: Caddy Has a Free API That Gives You Automatic HTTPS With Zero Configuration (2026)

2026-03-30 0

Tools: Fly.io Has a Free API That Deploys Docker Apps Globally With Edge Hosting (2026)

2026-03-30 0

Trending

1

CVE-2025-61481: Critical Remote Code Execution Vulnerability in MikroTik RouterOS & SwitchOS

2025-10-27 • 189 views

2

CVE-2025-43939: Dell Unity OS Command Injection (High)

2025-10-30 • 148 views

3

Google disputes false claims of massive Gmail data breach

2025-10-30 • 130 views

4

Microsoft: DNS outage impacts Azure and Microsoft 365 services

2025-10-30 • 88 views

5

3.5B Accounts, 1 Critical Flaw: Meta Closes WhatsApp Data-Harvesting

2025-11-25 • 81 views

InfinitSec - Latest Cybersecurity, Technology & Gaming News