Tools: Report: Why Your Self-Hosted App Keeps Dying at 3 AM (And How to Fix It)

Tools: Report: Why Your Self-Hosted App Keeps Dying at 3 AM (And How to Fix It)

The Root Cause: You Deployed an App, Not a System

Step 2: Fix the Silent Disk Killer

Step 3: Add Health Checks That Actually Work

Step 4: Reverse Proxy Configuration That Doesn't Suck

Step 5: Backups (The Thing You'll Wish You Had)

Prevention: The Checklist

The Bigger Picture So you spun up a VPS, deployed your app, told everyone it was live — and then woke up to angry Slack messages because the whole thing went down at 3 AM. Welcome to the club. Self-hosting production applications is one of those things that sounds straightforward until you actually do it. I've been running self-hosted services for about six years now, and the gap between "it works on my server" and "it works reliably in production" is where most of the pain lives. There's actually a massive free guide floating around (750+ pages) covering this exact territory, which reminded me that a lot of developers keep hitting the same walls. Let me walk through the most common reasons self-hosted apps fail in production and how to actually fix them. Here's the core issue. When you docker compose up -d and walk away, you've deployed an application. But production needs a system — monitoring, automatic restarts, log rotation, backups, resource limits, and reverse proxy configuration that doesn't fall over. Most 3 AM crashes come down to one of three things: If you're using Docker Compose, you need memory limits. Without them, a single misbehaving container can take down everything on the host. That restart: unless-stopped line is doing heavy lifting. It means Docker will automatically restart crashed containers unless you explicitly stopped them. I'm genuinely surprised how many production setups I've seen without it. Docker logs will eat your disk alive if you don't configure rotation. By default, Docker just appends JSON logs forever. I learned this the hard way when a 40GB log file took down a production Postgres instance. Add this to your Docker daemon config: Restart Docker after changing this. Existing containers need to be recreated (not just restarted) to pick up the new logging config. While you're at it, set up a basic disk monitoring cron job: Schedule it every 15 minutes with cron and you'll never be surprised by a full disk again. Docker health checks let you detect when your app is technically running but not actually working — like when your Node.js server is up but stuck in an event loop block. But here's the thing most guides skip: your /health endpoint needs to actually check dependencies. Don't just return 200. If you're exposing services to the internet, you need a reverse proxy. Caddy has become my go-to because it handles TLS certificates automatically and the config is minimal. Caddy handles HTTPS automatically through Let's Encrypt. No certbot cron jobs, no renewal scripts. It just works. I know. Backups are boring. But future-you will be incredibly grateful. Here's a minimal but functional approach for Postgres: Run this daily via cron. Uncomment the rclone line when you've set up remote storage — local-only backups on the same server are better than nothing, but not by much. Before you consider any self-hosted deployment "production ready," run through this: You don't need Kubernetes for this. You don't need a managed platform. A single well-configured VPS with Docker Compose can reliably host a surprising amount of production traffic. The key word is well-configured. Self-hosting is making a comeback for good reasons — cost control, data sovereignty, and honestly just the satisfaction of running your own infrastructure. But the gap between tutorials and production-grade setups is real, and it's where most people get burned. The pattern is almost always the same: the app itself is fine, but the operational wrapper around it is missing. Add restart policies, resource limits, health checks, log management, and backups, and you've eliminated probably 90% of the 3 AM pages. Now go set up those log rotation limits before your disk fills up. Ask me how I know this is urgent. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Command

Copy

# -weight: 500;">docker-compose.yml services: app: image: your-app:latest deploy: resources: limits: memory: 512M # hard ceiling — container gets killed past this cpus: '1.0' reservations: memory: 256M # guaranteed minimum -weight: 500;">restart: unless-stopped # this alone prevents most 3 AM incidents postgres: image: postgres:16 deploy: resources: limits: memory: 1G # tune shared_buffers to ~25% of memory limit environment: POSTGRES_SHARED_BUFFERS: 256MB volumes: - pgdata:/var/lib/postgresql/data -weight: 500;">restart: unless-stopped # -weight: 500;">docker-compose.yml services: app: image: your-app:latest deploy: resources: limits: memory: 512M # hard ceiling — container gets killed past this cpus: '1.0' reservations: memory: 256M # guaranteed minimum -weight: 500;">restart: unless-stopped # this alone prevents most 3 AM incidents postgres: image: postgres:16 deploy: resources: limits: memory: 1G # tune shared_buffers to ~25% of memory limit environment: POSTGRES_SHARED_BUFFERS: 256MB volumes: - pgdata:/var/lib/postgresql/data -weight: 500;">restart: unless-stopped # -weight: 500;">docker-compose.yml services: app: image: your-app:latest deploy: resources: limits: memory: 512M # hard ceiling — container gets killed past this cpus: '1.0' reservations: memory: 256M # guaranteed minimum -weight: 500;">restart: unless-stopped # this alone prevents most 3 AM incidents postgres: image: postgres:16 deploy: resources: limits: memory: 1G # tune shared_buffers to ~25% of memory limit environment: POSTGRES_SHARED_BUFFERS: 256MB volumes: - pgdata:/var/lib/postgresql/data -weight: 500;">restart: unless-stopped // /etc/-weight: 500;">docker/daemon.json { "log-driver": "json-file", "log-opts": { "max-size": "10m", "max-file": "3" } } // /etc/-weight: 500;">docker/daemon.json { "log-driver": "json-file", "log-opts": { "max-size": "10m", "max-file": "3" } } // /etc/-weight: 500;">docker/daemon.json { "log-driver": "json-file", "log-opts": { "max-size": "10m", "max-file": "3" } } #!/bin/bash # /usr/local/bin/disk-check.sh # Alert when disk usage crosses 85% THRESHOLD=85 USAGE=$(df / | tail -1 | awk '{print $5}' | sed 's/%//') if [ "$USAGE" -gt "$THRESHOLD" ]; then # swap this for your preferred notification method -weight: 500;">curl -X POST "https://your-webhook-url" \ -H "Content-Type: application/json" \ -d "{\"text\": \"Disk usage at ${USAGE}% on $(hostname)\"}" fi #!/bin/bash # /usr/local/bin/disk-check.sh # Alert when disk usage crosses 85% THRESHOLD=85 USAGE=$(df / | tail -1 | awk '{print $5}' | sed 's/%//') if [ "$USAGE" -gt "$THRESHOLD" ]; then # swap this for your preferred notification method -weight: 500;">curl -X POST "https://your-webhook-url" \ -H "Content-Type: application/json" \ -d "{\"text\": \"Disk usage at ${USAGE}% on $(hostname)\"}" fi #!/bin/bash # /usr/local/bin/disk-check.sh # Alert when disk usage crosses 85% THRESHOLD=85 USAGE=$(df / | tail -1 | awk '{print $5}' | sed 's/%//') if [ "$USAGE" -gt "$THRESHOLD" ]; then # swap this for your preferred notification method -weight: 500;">curl -X POST "https://your-webhook-url" \ -H "Content-Type: application/json" \ -d "{\"text\": \"Disk usage at ${USAGE}% on $(hostname)\"}" fi # -weight: 500;">docker-compose.yml services: app: image: your-app:latest healthcheck: test: ["CMD", "-weight: 500;">curl", "-f", "http://localhost:3000/health"] interval: 30s timeout: 5s retries: 3 start_period: 10s # grace period for startup -weight: 500;">restart: unless-stopped # -weight: 500;">docker-compose.yml services: app: image: your-app:latest healthcheck: test: ["CMD", "-weight: 500;">curl", "-f", "http://localhost:3000/health"] interval: 30s timeout: 5s retries: 3 start_period: 10s # grace period for startup -weight: 500;">restart: unless-stopped # -weight: 500;">docker-compose.yml services: app: image: your-app:latest healthcheck: test: ["CMD", "-weight: 500;">curl", "-f", "http://localhost:3000/health"] interval: 30s timeout: 5s retries: 3 start_period: 10s # grace period for startup -weight: 500;">restart: unless-stopped // Express health check that actually means something app.get('/health', async (req, res) => { try { // check database connection await db.query('SELECT 1'); // check redis if you use it await redis.ping(); res.-weight: 500;">status(200).json({ -weight: 500;">status: 'ok' }); } catch (err) { // returning 503 makes Docker mark container as unhealthy res.-weight: 500;">status(503).json({ -weight: 500;">status: 'degraded', error: err.message }); } }); // Express health check that actually means something app.get('/health', async (req, res) => { try { // check database connection await db.query('SELECT 1'); // check redis if you use it await redis.ping(); res.-weight: 500;">status(200).json({ -weight: 500;">status: 'ok' }); } catch (err) { // returning 503 makes Docker mark container as unhealthy res.-weight: 500;">status(503).json({ -weight: 500;">status: 'degraded', error: err.message }); } }); // Express health check that actually means something app.get('/health', async (req, res) => { try { // check database connection await db.query('SELECT 1'); // check redis if you use it await redis.ping(); res.-weight: 500;">status(200).json({ -weight: 500;">status: 'ok' }); } catch (err) { // returning 503 makes Docker mark container as unhealthy res.-weight: 500;">status(503).json({ -weight: 500;">status: 'degraded', error: err.message }); } }); # Caddyfile yourapp.example.com { reverse_proxy app:3000 { # passive health checks — -weight: 500;">stop sending traffic to dead upstreams health_uri /health health_interval 30s } # basic rate limiting to prevent abuse rate_limit { zone dynamic { key {remote_host} events 100 window 1m } } encode gzip log { output file /var/log/caddy/access.log { roll_size 50mb roll_keep 5 } } } # Caddyfile yourapp.example.com { reverse_proxy app:3000 { # passive health checks — -weight: 500;">stop sending traffic to dead upstreams health_uri /health health_interval 30s } # basic rate limiting to prevent abuse rate_limit { zone dynamic { key {remote_host} events 100 window 1m } } encode gzip log { output file /var/log/caddy/access.log { roll_size 50mb roll_keep 5 } } } # Caddyfile yourapp.example.com { reverse_proxy app:3000 { # passive health checks — -weight: 500;">stop sending traffic to dead upstreams health_uri /health health_interval 30s } # basic rate limiting to prevent abuse rate_limit { zone dynamic { key {remote_host} events 100 window 1m } } encode gzip log { output file /var/log/caddy/access.log { roll_size 50mb roll_keep 5 } } } #!/bin/bash # /usr/local/bin/backup-db.sh BACKUP_DIR="/backups/postgres" TIMESTAMP=$(date +%Y%m%d_%H%M%S) RETENTION_DAYS=7 # dump the database from the running container -weight: 500;">docker exec postgres pg_dump -U appuser -Fc appdb > "${BACKUP_DIR}/app_${TIMESTAMP}.dump" # clean up old backups find "$BACKUP_DIR" -name "*.dump" -mtime +$RETENTION_DAYS -delete # optional: sync to remote storage # rclone copy "$BACKUP_DIR" remote:backups/postgres --max-age 24h #!/bin/bash # /usr/local/bin/backup-db.sh BACKUP_DIR="/backups/postgres" TIMESTAMP=$(date +%Y%m%d_%H%M%S) RETENTION_DAYS=7 # dump the database from the running container -weight: 500;">docker exec postgres pg_dump -U appuser -Fc appdb > "${BACKUP_DIR}/app_${TIMESTAMP}.dump" # clean up old backups find "$BACKUP_DIR" -name "*.dump" -mtime +$RETENTION_DAYS -delete # optional: sync to remote storage # rclone copy "$BACKUP_DIR" remote:backups/postgres --max-age 24h #!/bin/bash # /usr/local/bin/backup-db.sh BACKUP_DIR="/backups/postgres" TIMESTAMP=$(date +%Y%m%d_%H%M%S) RETENTION_DAYS=7 # dump the database from the running container -weight: 500;">docker exec postgres pg_dump -U appuser -Fc appdb > "${BACKUP_DIR}/app_${TIMESTAMP}.dump" # clean up old backups find "$BACKUP_DIR" -name "*.dump" -mtime +$RETENTION_DAYS -delete # optional: sync to remote storage # rclone copy "$BACKUP_DIR" remote:backups/postgres --max-age 24h - Memory exhaustion — your app (or its database) slowly ate all available RAM - Disk full — logs or temp files filled the drive - No automatic recovery — the process crashed and nothing restarted it - Resource limits set for every container - Restart policies configured (unless-stopped at minimum) - Log rotation enabled at the Docker daemon level - Health checks that verify actual functionality, not just process liveness - TLS termination via a reverse proxy with automatic cert renewal - Automated backups with at least one off-server copy - Disk and memory monitoring with alerts - Firewall rules — only expose ports 80, 443, and your SSH port - Unattended security updates enabled on the host OS