Tools: Building a real-time anomaly detection engine for a self-hosted Nextcloud (HNG Stage 3) - 2025 Update
What the project does and why it matters
The architecture in one picture
Step 1 — How the sliding window works
Step 2 — How the baseline learns from traffic
Per-hour-of-day slots
Floors
Step 3 — How detection makes a decision
Step 4 — How iptables is used to block an IP
Putting it all together
What I would change next time
Try it yourself TL;DR — I built a Python daemon that watches every HTTP request hitting an Nginx reverse proxy, learns what normal traffic looks like in real time, and automatically bans IPs that misbehave by inserting iptables rules at the kernel level. It alerts to Slack and ships its live metrics on a public dashboard. This post explains every piece in beginner-friendly terms. Live demo: http://cloud-ng-anomaly.duckdns.org
Source code: https://github.com/ibraheembello/hng-stage3-anomaly-detector Imagine you run a public website — say, a self-hosted Google-Drive-style cloud storage. Random people visit it all day. Most are real users. Some are bots. A few are attackers trying to flood your server with fake requests until it falls over. That last category is called a DDoS attack — Distributed Denial of Service. The traditional way to deal with this is reactive: when something breaks, an engineer wakes up at 3 a.m., diagnoses the issue, and manually blocks the offender. That's slow and painful. The better way is proactive detection: watch the traffic continuously, learn what "normal" looks like, and automatically block the abnormal stuff before it brings the site down. That's what this project does — for a Nextcloud cluster running behind Nginx. It builds its own definition of "normal" from the last 30 minutes of real traffic, not a hardcoded number. It works whether you get 2 requests per second at 3 a.m. or 200 at peak. And when an attacker sends a burst, the detector reacts within a couple of seconds — adds a kernel-level firewall rule, pings me on Slack, and shows the ban on a dashboard. Everything runs in Docker containers, orchestrated by Docker Compose, on one Ubuntu VPS. This is the heart of the detector. Skip to step 2 if data structures bore you, but trust me — this part is clever and simple. A "rate" is a question: "how many requests has this IP made in the last 60 seconds?" If you naively count requests per minute by tallying them in 60-second buckets, you get the wrong answer when traffic falls right between two buckets — and an attacker can exploit that gap. A sliding window answers the same question, but the 60-second period slides forward continuously with the clock. So at 12:00:30, "the last 60 seconds" means 11:59:30 to 12:00:30. At 12:00:31, it means 11:59:31 to 12:00:31. Always exactly the past 60 seconds, never a fixed bucket. How do you implement that without re-counting every time? Use a deque — a "double-ended queue". That's it. Two operations per request: pop old entries off the front (O(1)), append the new one to the back (O(1)). The current rate is just len(window) / 60. The detector keeps one such window per source IP and one global window across all traffic. ⚠️ The Stage 3 brief explicitly forbids "faking the sliding window with a per-minute counter." This deque pattern is the cheapest way to satisfy that requirement honestly. The detector now knows the current request rate. But it has no idea whether 5 requests per second is a lot or a little for this site. To answer that, it needs a baseline — a model of normal traffic. The baseline keeps a 30-minute history of per-second request counts. Every second, the count of requests in that second is appended to a rolling list. The list is bounded so anything older than 30 minutes is automatically forgotten. Every 60 seconds, the baseline thread does this: That gives us two numbers describing the baseline: Here's a subtle wrinkle: traffic at 3 a.m. is naturally lower than at 3 p.m. If you used one universal baseline, the detector would think 3 p.m. traffic was an attack just because it's higher than the all-day average. Or it would miss a real attack at 3 a.m. because it's still below the all-day average. The fix: keep 24 separate slots, one per hour of day. Once the slot for the current hour has at least 5 minutes of data, use that slot instead of the global rolling baseline. So 3 a.m. traffic is judged against other 3 a.m. traffic, and 3 p.m. against 3 p.m. If traffic is genuinely zero for a while, mean and stddev both crash toward zero. The very next request would then have an infinite z-score (you can't divide by zero) and trigger a false alarm. Two cheap "floor" values prevent this: The mean is never allowed below 1.0; the stddev never below 0.5. Boring, but necessary. For every request that comes in, the detector asks two simultaneous questions: Is this rate too many sigmas above the mean?The "z-score" answers that: z = (rate - mean) / stddev.If z > 3.0, the rate is more than three standard deviations away from normal — which statistically happens in about 0.3% of normal traffic. Almost certainly an anomaly. Is this rate flat-out higher than 5× the mean?
This is a hard ceiling that catches steady, sustained floods even when stddev is wide. If the mean is 2 req/s, anything above 10 req/s is suspicious regardless of statistics. If either fires, the IP (or the global stream) is anomalous. Whichever fires first wins. There's one more wrinkle: an error surge. If an IP's 4xx/5xx response rate is at least 3× the baseline error rate, the detector tightens that IP's thresholds (multiplies them by 0.7). The intuition: an IP generating mostly errors is probing or scanning, and we want to ban it sooner. Once the detector decides an IP is bad, the question is: how do you actually block it? Linux kernels include a packet-filtering subsystem called netfilter. The user-space tool to talk to it is iptables. When you run …the kernel installs a rule that says: "any TCP packet coming in on port 80 from 1.2.3.4 — drop it silently." The packet never even reaches Nginx. It's the lowest-level, fastest way to block traffic. The blocker module shells out to iptables directly: I scope the rule to TCP ports 80 and 443 only, not the whole IP. Why? So that when an admin's workstation accidentally trips the detector, the admin doesn't lose SSH at the same time. (Yes, I learned this the painful way.) The ban isn't permanent — it's tiered: The first ban lasts 10 minutes. If the same IP misbehaves again later, the second ban lasts 30 minutes. The third lasts 2 hours. The fourth is permanent. This back-off pattern keeps you from permanently banning a real user who had one bad minute, but punishes repeat offenders harder each time. A small "unbanner" thread polls every 5 seconds for expired bans, removes the corresponding iptables rule, and pings Slack: "unbanned 1.2.3.4 (ban #1)". Here's a real audit log line generated during testing: I sent a burst of 400 HTTP requests from my home IP. The detector took ~30 seconds to process them, computed z-score = 3.03, banned my IP for 10 minutes, and notified Slack. Exactly 600 seconds later, the unbanner removed the rule and notified Slack again. Meanwhile, the dashboard was updating every 3 seconds, the baseline was learning from the burst (its stddev climbed to 6.6 and decayed back over the next 30 minutes), and a synthetic multi-IP attack later that hour fired the global detection path with rate 11.28 > mean 2.26 × 5.00. Everything I described above ran without me intervening once. But those are polish. The core mechanism — sliding windows + rolling baseline + z-score + iptables — works. The README has a full from-scratch runbook, including how to install Docker on a fresh Ubuntu 24.04 box, how to point a free DuckDNS domain at the EC2 IP, and how to set up the Slack webhook. If you build something on top of this, ping me — I'd love to see what you change. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse