Tools

Tools: Incident Report: Service failure due to storage full - Complete Guide

2026-04-18 0 views admin

Lessons learned Yesterday, my homelab server suddenly became unresponsive. It started with a flurry of Discord notifications, the universal signal that something has gone seriously wrong. I found all services offline. The logs pointed to a primary culprit: a Redis failure, specifically a Server Out of Memory error. The core error was: RedisClient::CommandError: MISCONF Errors writing to the AOF file: No space left on device

My first thought was: Why is AOF even enabled? I turned it on for testing and forgot. My root partition was at 99% capacity, just 270MB remaining out of 24GB. Further investigation revealed where the "wasted" space was hiding: To get the system breathing again, I performed a quick "surgical" cleaning: Now I had enough space to spin up all process again, but I need to recover Redis since it entered a Read-Only mode to protect data integrity. After that I could successfully restart all services and have everything running. All data inside redis were not critical, so I didn't care about losing it. The failure was a classic case of neglecting "boring" infrastructure: log rotation and disk monitoring. To prevent a repeat performance, I've implemented the following: Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to ? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Code Block

Copy

RedisClient::CommandError: MISCONF Errors writing to the AOF file: No space left on device truncate -s 0 log/*.log journalctl --vacuum-size=500M appendonly.aof.manifest sudo redis-check-aof --fix /var/lib/redis/appendonlydir/ redis-cli config set stop-writes-on-bgsave-error no systemctl reset-failed redis-server apt autoremove - PM2 Logs (~3.8GB): The process manager was storing massive, unrotated text logs. - Hidden Caches (~1.5GB): Accumulated ~/.cache, ~/.npm, and ~/.rvmsource files from multiple builds and deployments. - PM2 Flush: Immediately cleared the massive log files using pm2 flush. - Log Truncation: Emptied application logs using truncate -s 0 log/*.log (this clears the content without deleting the file handle). - Cache Pruning: Deleted hidden build caches in ~/.npm and ~/.cache. - Journal Vacuum: Cleared system logs with journalctl --vacuum-size=500M. - Fixing the AOF Manifest: Because the disk filled during a Redis write, the appendonly.aof.manifest was corrupted. I fixed it using sudo redis-check-aof --fix on the manifest file inside /var/lib/redis/appendonlydir/. - Clearing the MISCONF Lock: Even with free space, Redis remained in a "protected" state. I manually overrode this with redis-cli config set stop-writes-on-bgsave-error no. - Service Restart: Reset the systemd failure counter with systemctl reset-failed redis-server and restarted the service. - Log Management: Installed pm2-logrotate to cap PM2 logs at 10MB per file and limited journald to 500MB globally. - Next Steps: Expand the VM disk size (24GB is too tight for this stack). Set up a cron job for weekly apt autoremove and cache clearing. Implement an automated disk usage alert (likely via Grafana or a simple shell script to Discord). - Expand the VM disk size (24GB is too tight for this stack). - Set up a cron job for weekly apt autoremove and cache clearing. - Implement an automated disk usage alert (likely via Grafana or a simple shell script to Discord). - Expand the VM disk size (24GB is too tight for this stack). - Set up a cron job for weekly apt autoremove and cache clearing. - Implement an automated disk usage alert (likely via Grafana or a simple shell script to Discord).

Share this article

Twitter Facebook LinkedIn Reddit

🏷️ Tags

toolsutilitiessecurity toolsincidentreportservicefailurestoragecompleteguide

More from Tools

Tools: Complete Guide to Anthropic Cybersecurity Skills: 754 playbooks de seguridad para agentes IA

2026-04-20 0

Tools: Docker for Beginners: Containerize Your App in 30 Minutes

2026-04-20 0

Tools: Complete Guide to Docker Compose Explained: Multi-Container Stacks (2026)

2026-04-20 0

Tools: Essential Guide: Cheapest VPS $5/Month: What You Really Get

2026-04-20 0

Trending

1

CVE-2025-61481: Critical Remote Code Execution Vulnerability in MikroTik RouterOS & SwitchOS

2025-10-27 • 189 views

2

CVE-2025-43939: Dell Unity OS Command Injection (High)

2025-10-30 • 148 views

3

Google disputes false claims of massive Gmail data breach

2025-10-30 • 130 views

4

Microsoft: DNS outage impacts Azure and Microsoft 365 services

2025-10-30 • 88 views

5

3.5B Accounts, 1 Critical Flaw: Meta Closes WhatsApp Data-Harvesting

2025-11-25 • 81 views

InfinitSec - Latest Cybersecurity, Technology & Gaming News

Tools: Incident Report: Service failure due to storage full - Complete Guide

🏷️ Tags

More from Tools

Tools: Complete Guide to Anthropic Cybersecurity Skills: 754 playbooks de seguridad para agentes IA

Tools: Docker for Beginners: Containerize Your App in 30 Minutes

Tools: Complete Guide to Docker Compose Explained: Multi-Container Stacks (2026)

Tools: Essential Guide: Cheapest VPS $5/Month: What You Really Get

Trending

CVE-2025-61481: Critical Remote Code Execution Vulnerability in MikroTik RouterOS & SwitchOS

CVE-2025-43939: Dell Unity OS Command Injection (High)

Google disputes false claims of massive Gmail data breach

Microsoft: DNS outage impacts Azure and Microsoft 365 services

3.5B Accounts, 1 Critical Flaw: Meta Closes WhatsApp Data-Harvesting