Tools: Essential Guide: Stop Linux Memory Death Spirals Early: Practical `systemd-oomd` with PSI and cgroup policy

Tools: Essential Guide: Stop Linux Memory Death Spirals Early: Practical `systemd-oomd` with PSI and cgroup policy

Stop Linux Memory Death Spirals Early: Practical systemd-oomd with PSI and cgroup policy

Why this is a different angle

What the docs say

First, confirm the host is compatible

Install and enable systemd-oomd

Make sure memory accounting is on

Start with slice-level policy, not one-off service hacks

Add swap-aware protection if appropriate

Mark critical services as less likely kill candidates

Inspect what systemd-oomd is watching

A careful test plan

A practical policy pattern

What not to do

References

Closing thought When a Linux box runs out of memory, the bad outcome usually starts before the actual out-of-memory kill. SSH gets sticky. Web requests slow down. Latency spikes. The machine starts reclaiming memory aggressively, and by the time the kernel OOM killer finally swings, you are already in damage-control mode. systemd-oomd is built to intervene earlier. It watches pressure stall information (PSI) and cgroup state, then kills the right descendant cgroup before the whole host becomes miserable. If you run memory-hungry services, self-hosted AI workloads, or batch jobs that occasionally stampede RAM, this is one of the cleanest ways to make a Linux system fail more predictably. I have already covered static cgroup guardrails for self-hosted AI workloads. This article is intentionally different. That approach is about hard ceilings such as MemoryMax= and CPUQuota=. This one is about proactive pressure-based action. Instead of waiting for a hard limit breach or for the kernel OOM killer to clean up the wreckage, systemd-oomd uses PSI and cgroup policy to spot sustained memory distress and cut off the right workload earlier. According to systemd-oomd.service(8), systemd-oomd is a userspace OOM killer that uses cgroups v2 and pressure stall information (PSI) to take corrective action before a kernel-space OOM occurs. The same documentation also notes a few important prerequisites: From oomd.conf(5), the global defaults are documented as: Those are not magic numbers. They are just sane defaults. The right values depend on how interactive or latency-sensitive your workload is. Check whether you are on cgroup v2: Check whether PSI files exist: You should see entries like: Peek at current system-wide memory pressure: From the kernel PSI documentation: That second case is where a system starts feeling truly awful. Packaging varies by distro. On some systems, systemd-oomd ships as part of the main systemd package. On others, it is split out. So start with discovery instead of guessing: If the service is not present, check your package manager: On Debian-family systems that package it separately, install it with: Confirm it is active: The man page recommends memory accounting for monitored units, and the simplest system-wide way is DefaultMemoryAccounting=yes. Check the effective setting: If needed, add a systemd manager drop-in: Reload the manager configuration: This is the part that matters most. systemd-oomd does not simply kill the unit where you set policy. Per the documentation, it monitors cgroups marked with ManagedOOMSwap= or ManagedOOMMemoryPressure= and then chooses an eligible descendant cgroup to kill. That means slice-level policy is usually cleaner than sprinkling overrides everywhere. A good first target for server workloads is system.slice. Or write it directly: Because it catches ordinary system services while letting you reason about policy at the group level. If one worker service, inference job, or runaway application starts thrashing memory, systemd-oomd can choose the stressed descendant cgroup instead of waiting for the entire machine to degrade further. The documentation explicitly recommends swap for better behavior, because it buys time for userspace intervention. If the host has swap and you want swap-based protection too, you can add: For a combined drop-in: I would not enable aggressive policy everywhere on day one. Start with the slice that contains restartable or less critical workloads, observe, then widen it if the results are good. You may have services that should be sacrificed last, not first. systemd.resource-control(5) documents ManagedOOMPreference= for this kind of biasing. If a service is important to keep alive, add a drop-in like this: For a lower-priority worker, you can lean the other direction: Read the local man page for the exact semantics supported by your systemd version before standardizing on these values: That version check matters because systemd features do move over time. oomctl exists for exactly this reason. Show the current state known to systemd-oomd: Or dump monitored contexts in a more script-friendly way if your version supports it: You can also inspect the slice and service properties directly: And for a specific service: Watch the logs while testing: Do not test this blindly on a production host during business hours. You can observe PSI live with: If you already have a known memory-hungry workload, use that in a test environment. If you want a simple synthetic allocation tool on Debian or Ubuntu, stress-ng is a common option: Then, in another terminal: The goal is not “make something die.” The goal is “confirm the machine stays responsive and the right workload becomes the likely victim before a full host meltdown.” For many homelab and small-server setups, this is a sensible starting point: Example starting drop-in for system.slice: Then protect critical infra individually, for example: for your reverse proxy, database, or SSH bastion, if that matches your risk model. A few things I would avoid: The nice thing about systemd-oomd is not that it prevents every memory problem. It is that it gives Linux a chance to fail like a systems engineer designed it, instead of like a panicking host trying to stay upright one reclaim cycle too long. That is a much better bargain. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Code Block

Copy

stat -fc %T /sys/fs/cgroup stat -fc %T /sys/fs/cgroup stat -fc %T /sys/fs/cgroup ls /proc/pressure ls /proc/pressure ls /proc/pressure cpu io memory cpu io memory cpu io memory cat /proc/pressure/memory cat /proc/pressure/memory cat /proc/pressure/memory some avg10=0.00 avg60=0.12 avg300=0.08 total=1234567 full avg10=0.00 avg60=0.05 avg300=0.02 total=345678 some avg10=0.00 avg60=0.12 avg300=0.08 total=1234567 full avg10=0.00 avg60=0.05 avg300=0.02 total=345678 some avg10=0.00 avg60=0.12 avg300=0.08 total=1234567 full avg10=0.00 avg60=0.05 avg300=0.02 total=345678 systemctl list-unit-files 'systemd-oomd*' systemctl list-unit-files 'systemd-oomd*' systemctl list-unit-files 'systemd-oomd*' apt-cache policy systemd-oomd apt-cache policy systemd-oomd apt-cache policy systemd-oomd sudo apt install systemd-oomd sudo apt install systemd-oomd sudo apt install systemd-oomd sudo systemctl enable --now systemd-oomd.service sudo systemctl enable --now systemd-oomd.service sudo systemctl enable --now systemd-oomd.service systemctl status systemd-oomd.service --no-pager systemctl status systemd-oomd.service --no-pager systemctl status systemd-oomd.service --no-pager systemctl show --property=DefaultMemoryAccounting systemctl show --property=DefaultMemoryAccounting systemctl show --property=DefaultMemoryAccounting sudo mkdir -p /etc/systemd/system.conf.d sudo tee /etc/systemd/system.conf.d/60-memory-accounting.conf >/dev/null <<'EOF' [Manager] DefaultMemoryAccounting=yes EOF sudo mkdir -p /etc/systemd/system.conf.d sudo tee /etc/systemd/system.conf.d/60-memory-accounting.conf >/dev/null <<'EOF' [Manager] DefaultMemoryAccounting=yes EOF sudo mkdir -p /etc/systemd/system.conf.d sudo tee /etc/systemd/system.conf.d/60-memory-accounting.conf >/dev/null <<'EOF' [Manager] DefaultMemoryAccounting=yes EOF sudo systemctl daemon-reexec sudo systemctl daemon-reexec sudo systemctl daemon-reexec systemctl show --property=DefaultMemoryAccounting systemctl show --property=DefaultMemoryAccounting systemctl show --property=DefaultMemoryAccounting sudo systemctl edit system.slice sudo systemctl edit system.slice sudo systemctl edit system.slice [Slice] ManagedOOMMemoryPressure=kill ManagedOOMMemoryPressureLimit=50% ManagedOOMMemoryPressureDurationSec=20s [Slice] ManagedOOMMemoryPressure=kill ManagedOOMMemoryPressureLimit=50% ManagedOOMMemoryPressureDurationSec=20s [Slice] ManagedOOMMemoryPressure=kill ManagedOOMMemoryPressureLimit=50% ManagedOOMMemoryPressureDurationSec=20s sudo mkdir -p /etc/systemd/system/system.slice.d sudo tee /etc/systemd/system/system.slice.d/60-oomd.conf >/dev/null <<'EOF' [Slice] ManagedOOMMemoryPressure=kill ManagedOOMMemoryPressureLimit=50% ManagedOOMMemoryPressureDurationSec=20s EOF sudo mkdir -p /etc/systemd/system/system.slice.d sudo tee /etc/systemd/system/system.slice.d/60-oomd.conf >/dev/null <<'EOF' [Slice] ManagedOOMMemoryPressure=kill ManagedOOMMemoryPressureLimit=50% ManagedOOMMemoryPressureDurationSec=20s EOF sudo mkdir -p /etc/systemd/system/system.slice.d sudo tee /etc/systemd/system/system.slice.d/60-oomd.conf >/dev/null <<'EOF' [Slice] ManagedOOMMemoryPressure=kill ManagedOOMMemoryPressureLimit=50% ManagedOOMMemoryPressureDurationSec=20s EOF sudo systemctl daemon-reload sudo systemctl daemon-reload sudo systemctl daemon-reload [Slice] ManagedOOMSwap=kill [Slice] ManagedOOMSwap=kill [Slice] ManagedOOMSwap=kill [Slice] ManagedOOMMemoryPressure=kill ManagedOOMMemoryPressureLimit=50% ManagedOOMMemoryPressureDurationSec=20s ManagedOOMSwap=kill [Slice] ManagedOOMMemoryPressure=kill ManagedOOMMemoryPressureLimit=50% ManagedOOMMemoryPressureDurationSec=20s ManagedOOMSwap=kill [Slice] ManagedOOMMemoryPressure=kill ManagedOOMMemoryPressureLimit=50% ManagedOOMMemoryPressureDurationSec=20s ManagedOOMSwap=kill sudo systemctl edit nginx.service sudo systemctl edit nginx.service sudo systemctl edit nginx.service [Service] ManagedOOMPreference=omit [Service] ManagedOOMPreference=omit [Service] ManagedOOMPreference=omit sudo systemctl edit ollama.service sudo systemctl edit ollama.service sudo systemctl edit ollama.service [Service] ManagedOOMPreference=avoid [Service] ManagedOOMPreference=avoid [Service] ManagedOOMPreference=avoid man systemd.resource-control man systemd.resource-control man systemd.resource-control oomctl dump oomctl dump oomctl dump systemctl show system.slice \ --property=ManagedOOMMemoryPressure \ --property=ManagedOOMMemoryPressureLimit \ --property=ManagedOOMMemoryPressureDurationSec \ --property=ManagedOOMSwap systemctl show system.slice \ --property=ManagedOOMMemoryPressure \ --property=ManagedOOMMemoryPressureLimit \ --property=ManagedOOMMemoryPressureDurationSec \ --property=ManagedOOMSwap systemctl show system.slice \ --property=ManagedOOMMemoryPressure \ --property=ManagedOOMMemoryPressureLimit \ --property=ManagedOOMMemoryPressureDurationSec \ --property=ManagedOOMSwap systemctl show ollama.service \ --property=ManagedOOMPreference \ --property=MemoryCurrent \ --property=MemoryPeak systemctl show ollama.service \ --property=ManagedOOMPreference \ --property=MemoryCurrent \ --property=MemoryPeak systemctl show ollama.service \ --property=ManagedOOMPreference \ --property=MemoryCurrent \ --property=MemoryPeak journalctl -u systemd-oomd -f journalctl -u systemd-oomd -f journalctl -u systemd-oomd -f watch -n 1 'cat /proc/pressure/memory' watch -n 1 'cat /proc/pressure/memory' watch -n 1 'cat /proc/pressure/memory' sudo apt install stress-ng sudo apt install stress-ng sudo apt install stress-ng systemd-run --unit=oomd-test --slice=system.slice \ stress-ng --vm 1 --vm-bytes 85% --vm-keep --timeout 2m systemd-run --unit=oomd-test --slice=system.slice \ stress-ng --vm 1 --vm-bytes 85% --vm-keep --timeout 2m systemd-run --unit=oomd-test --slice=system.slice \ stress-ng --vm 1 --vm-bytes 85% --vm-keep --timeout 2m journalctl -u systemd-oomd -f journalctl -u systemd-oomd -f journalctl -u systemd-oomd -f [Slice] ManagedOOMMemoryPressure=kill ManagedOOMMemoryPressureLimit=50% ManagedOOMMemoryPressureDurationSec=20s ManagedOOMSwap=kill [Slice] ManagedOOMMemoryPressure=kill ManagedOOMMemoryPressureLimit=50% ManagedOOMMemoryPressureDurationSec=20s ManagedOOMSwap=kill [Slice] ManagedOOMMemoryPressure=kill ManagedOOMMemoryPressureLimit=50% ManagedOOMMemoryPressureDurationSec=20s ManagedOOMSwap=kill [Service] ManagedOOMPreference=omit [Service] ManagedOOMPreference=omit [Service] ManagedOOMPreference=omit - what systemd-oomd actually does - how to confirm your system can use it - how to enable it safely - how to apply policy at the right cgroup level - how to inspect what it is monitoring - how to test without guessing - you want a full unified cgroup hierarchy (cgroup v2) - memory accounting should be enabled for monitored units - the kernel needs PSI support - having swap enabled is strongly recommended, because it gives systemd-oomd time to react before the system collapses into a livelock - SwapUsedLimit=90% - DefaultMemoryPressureLimit=60% - DefaultMemoryPressureDurationSec=30s - some means at least some tasks are stalled - full means all non-idle tasks are stalled simultaneously - apply policy to a non-critical slice or lab machine - watch PSI and oomctl - create controlled memory pressure - confirm the right descendant cgroup becomes the target - tune the thresholds - enable systemd-oomd - turn on default memory accounting - apply pressure-based policy to system.slice - reserve stricter preferences for clearly critical services - leave room to tune thresholds after observing real pressure patterns - Do not treat systemd-oomd as a substitute for capacity planning. - Do not skip swap and expect equally graceful behavior. - Do not set one ultra-aggressive threshold globally without testing. - Do not forget that cgroup structure matters. If everything lives in one giant bucket, targeting gets worse. - Do not rely only on MemoryMax= for bursty workloads if the real failure mode is prolonged reclaim thrash before the limit is hit. - systemd-oomd.service(8): https://www.man7.org/linux/man-pages/man8/systemd-oomd.8.html - oomd.conf(5): https://www.man7.org/linux/man-pages/man5/oomd.conf.5.html - systemd.resource-control(5): https://man7.org/linux/man-pages/man5/systemd.resource-control.5.html - Linux kernel PSI documentation: https://docs.kernel.org/accounting/psi.html - oomctl(1) reference index: https://www.freedesktop.org/software/systemd/man/latest/oomctl.html