Tools

Tools: Building a real-time anomaly detection engine for a self-hosted Nextcloud (HNG Stage 3) - 2025 Update

2026-04-26 0 views admin

What the project does and why it matters

The architecture in one picture

Step 1 — How the sliding window works

Step 2 — How the baseline learns from traffic

Per-hour-of-day slots

Floors

Step 3 — How detection makes a decision

Step 4 — How iptables is used to block an IP

Putting it all together

What I would change next time

Try it yourself TL;DR — I built a Python daemon that watches every HTTP request hitting an Nginx reverse proxy, learns what normal traffic looks like in real time, and automatically bans IPs that misbehave by inserting iptables rules at the kernel level. It alerts to Slack and ships its live metrics on a public dashboard. This post explains every piece in beginner-friendly terms. Live demo: http://cloud-ng-anomaly.duckdns.org

Source code: https://github.com/ibraheembello/hng-stage3-anomaly-detector Imagine you run a public website — say, a self-hosted Google-Drive-style cloud storage. Random people visit it all day. Most are real users. Some are bots. A few are attackers trying to flood your server with fake requests until it falls over. That last category is called a DDoS attack — Distributed Denial of Service. The traditional way to deal with this is reactive: when something breaks, an engineer wakes up at 3 a.m., diagnoses the issue, and manually blocks the offender. That's slow and painful. The better way is proactive detection: watch the traffic continuously, learn what "normal" looks like, and automatically block the abnormal stuff before it brings the site down. That's what this project does — for a Nextcloud cluster running behind Nginx. It builds its own definition of "normal" from the last 30 minutes of real traffic, not a hardcoded number. It works whether you get 2 requests per second at 3 a.m. or 200 at peak. And when an attacker sends a burst, the detector reacts within a couple of seconds — adds a kernel-level firewall rule, pings me on Slack, and shows the ban on a dashboard. Everything runs in Docker containers, orchestrated by Docker Compose, on one Ubuntu VPS. This is the heart of the detector. Skip to step 2 if data structures bore you, but trust me — this part is clever and simple. A "rate" is a question: "how many requests has this IP made in the last 60 seconds?" If you naively count requests per minute by tallying them in 60-second buckets, you get the wrong answer when traffic falls right between two buckets — and an attacker can exploit that gap. A sliding window answers the same question, but the 60-second period slides forward continuously with the clock. So at 12:00:30, "the last 60 seconds" means 11:59:30 to 12:00:30. At 12:00:31, it means 11:59:31 to 12:00:31. Always exactly the past 60 seconds, never a fixed bucket. How do you implement that without re-counting every time? Use a deque — a "double-ended queue". That's it. Two operations per request: pop old entries off the front (O(1)), append the new one to the back (O(1)). The current rate is just len(window) / 60. The detector keeps one such window per source IP and one global window across all traffic. ⚠️ The Stage 3 brief explicitly forbids "faking the sliding window with a per-minute counter." This deque pattern is the cheapest way to satisfy that requirement honestly. The detector now knows the current request rate. But it has no idea whether 5 requests per second is a lot or a little for this site. To answer that, it needs a baseline — a model of normal traffic. The baseline keeps a 30-minute history of per-second request counts. Every second, the count of requests in that second is appended to a rolling list. The list is bounded so anything older than 30 minutes is automatically forgotten. Every 60 seconds, the baseline thread does this: That gives us two numbers describing the baseline: Here's a subtle wrinkle: traffic at 3 a.m. is naturally lower than at 3 p.m. If you used one universal baseline, the detector would think 3 p.m. traffic was an attack just because it's higher than the all-day average. Or it would miss a real attack at 3 a.m. because it's still below the all-day average. The fix: keep 24 separate slots, one per hour of day. Once the slot for the current hour has at least 5 minutes of data, use that slot instead of the global rolling baseline. So 3 a.m. traffic is judged against other 3 a.m. traffic, and 3 p.m. against 3 p.m. If traffic is genuinely zero for a while, mean and stddev both crash toward zero. The very next request would then have an infinite z-score (you can't divide by zero) and trigger a false alarm. Two cheap "floor" values prevent this: The mean is never allowed below 1.0; the stddev never below 0.5. Boring, but necessary. For every request that comes in, the detector asks two simultaneous questions: Is this rate too many sigmas above the mean?The "z-score" answers that: z = (rate - mean) / stddev.If z > 3.0, the rate is more than three standard deviations away from normal — which statistically happens in about 0.3% of normal traffic. Almost certainly an anomaly. Is this rate flat-out higher than 5× the mean?

This is a hard ceiling that catches steady, sustained floods even when stddev is wide. If the mean is 2 req/s, anything above 10 req/s is suspicious regardless of statistics. If either fires, the IP (or the global stream) is anomalous. Whichever fires first wins. There's one more wrinkle: an error surge. If an IP's 4xx/5xx response rate is at least 3× the baseline error rate, the detector tightens that IP's thresholds (multiplies them by 0.7). The intuition: an IP generating mostly errors is probing or scanning, and we want to ban it sooner. Once the detector decides an IP is bad, the question is: how do you actually block it? Linux kernels include a packet-filtering subsystem called netfilter. The user-space tool to talk to it is iptables. When you run …the kernel installs a rule that says: "any TCP packet coming in on port 80 from 1.2.3.4 — drop it silently." The packet never even reaches Nginx. It's the lowest-level, fastest way to block traffic. The blocker module shells out to iptables directly: I scope the rule to TCP ports 80 and 443 only, not the whole IP. Why? So that when an admin's workstation accidentally trips the detector, the admin doesn't lose SSH at the same time. (Yes, I learned this the painful way.) The ban isn't permanent — it's tiered: The first ban lasts 10 minutes. If the same IP misbehaves again later, the second ban lasts 30 minutes. The third lasts 2 hours. The fourth is permanent. This back-off pattern keeps you from permanently banning a real user who had one bad minute, but punishes repeat offenders harder each time. A small "unbanner" thread polls every 5 seconds for expired bans, removes the corresponding iptables rule, and pings Slack: "unbanned 1.2.3.4 (ban #1)". Here's a real audit log line generated during testing: I sent a burst of 400 HTTP requests from my home IP. The detector took ~30 seconds to process them, computed z-score = 3.03, banned my IP for 10 minutes, and notified Slack. Exactly 600 seconds later, the unbanner removed the rule and notified Slack again. Meanwhile, the dashboard was updating every 3 seconds, the baseline was learning from the burst (its stddev climbed to 6.6 and decayed back over the next 30 minutes), and a synthetic multi-IP attack later that hour fired the global detection path with rate 11.28 > mean 2.26 × 5.00. Everything I described above ran without me intervening once. But those are polish. The core mechanism — sliding windows + rolling baseline + z-score + iptables — works. The README has a full from-scratch runbook, including how to install Docker on a fresh Ubuntu 24.04 box, how to point a free DuckDNS domain at the EC2 IP, and how to set up the Slack webhook. If you build something on top of this, ping me — I'd love to see what you change. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Command

Copy

$ from collections import deque # Per-IP and global windows: each one stores timestamps of recent requests window = deque() WINDOW_SECONDS = 60 def record(now: float): # 1. Drop everything older than 60s from the front while window and window[0] <= now - WINDOW_SECONDS: window.popleft() # 2. Append the new request to the back window.append(now) def current_rate() -> float: return len(window) / WINDOW_SECONDS from collections import deque # Per-IP and global windows: each one stores timestamps of recent requests window = deque() WINDOW_SECONDS = 60 def record(now: float): # 1. Drop everything older than 60s from the front while window and window[0] <= now - WINDOW_SECONDS: window.popleft() # 2. Append the new request to the back window.append(now) def current_rate() -> float: return len(window) / WINDOW_SECONDS from collections import deque # Per-IP and global windows: each one stores timestamps of recent requests window = deque() WINDOW_SECONDS = 60 def record(now: float): # 1. Drop everything older than 60s from the front while window and window[0] <= now - WINDOW_SECONDS: window.popleft() # 2. Append the new request to the back window.append(now) def current_rate() -> float: return len(window) / WINDOW_SECONDS mean = statistics.fmean(per_second_counts) stddev = statistics.stdev(per_second_counts) mean = statistics.fmean(per_second_counts) stddev = statistics.stdev(per_second_counts) mean = statistics.fmean(per_second_counts) stddev = statistics.stdev(per_second_counts) mean_floor: 1.0 stddev_floor: 0.5 mean_floor: 1.0 stddev_floor: 0.5 mean_floor: 1.0 stddev_floor: 0.5 z = (rate - mean) / max(stddev, 1e-9) if z > 3.0 or rate > mean * 5.0: return "anomaly!" z = (rate - mean) / max(stddev, 1e-9) if z > 3.0 or rate > mean * 5.0: return "anomaly!" z = (rate - mean) / max(stddev, 1e-9) if z > 3.0 or rate > mean * 5.0: return "anomaly!" -weight: 600;">sudo iptables -I INPUT -p tcp -s 1.2.3.4 --dport 80 -j DROP -weight: 600;">sudo iptables -I INPUT -p tcp -s 1.2.3.4 --dport 80 -j DROP -weight: 600;">sudo iptables -I INPUT -p tcp -s 1.2.3.4 --dport 80 -j DROP subprocess.run(["iptables", "-I", "INPUT", "-p", "tcp", "-s", ip, "--dport", str(port), "-j", "DROP"], check=True) subprocess.run(["iptables", "-I", "INPUT", "-p", "tcp", "-s", ip, "--dport", str(port), "-j", "DROP"], check=True) subprocess.run(["iptables", "-I", "INPUT", "-p", "tcp", "-s", ip, "--dport", str(port), "-j", "DROP"], check=True) schedule_seconds: [600, 1800, 7200] # 10 min, 30 min, 2 h schedule_seconds: [600, 1800, 7200] # 10 min, 30 min, 2 h schedule_seconds: [600, 1800, 7200] # 10 min, 30 min, 2 h [2026-04-26T00:31:22Z] BAN 102.88.55.19 | z-score 3.03 > 3.00 | rate=2.52 | baseline=1.00 | duration=600s [2026-04-26T00:41:27Z] UNBAN 102.88.55.19 | … | duration=released | reason=scheduled-release [2026-04-26T00:31:22Z] BAN 102.88.55.19 | z-score 3.03 > 3.00 | rate=2.52 | baseline=1.00 | duration=600s [2026-04-26T00:41:27Z] UNBAN 102.88.55.19 | … | duration=released | reason=scheduled-release [2026-04-26T00:31:22Z] BAN 102.88.55.19 | z-score 3.03 > 3.00 | rate=2.52 | baseline=1.00 | duration=600s [2026-04-26T00:41:27Z] UNBAN 102.88.55.19 | … | duration=released | reason=scheduled-release -weight: 500;">git clone https://github.com/ibraheembello/hng-stage3-anomaly-detector.-weight: 500;">git cd hng-stage3-anomaly-detector cp .env.example _private/.env $EDITOR _private/.env -weight: 500;">docker compose --env-file _private/.env up -d --build -weight: 500;">git clone https://github.com/ibraheembello/hng-stage3-anomaly-detector.-weight: 500;">git cd hng-stage3-anomaly-detector cp .env.example _private/.env $EDITOR _private/.env -weight: 500;">docker compose --env-file _private/.env up -d --build -weight: 500;">git clone https://github.com/ibraheembello/hng-stage3-anomaly-detector.-weight: 500;">git cd hng-stage3-anomaly-detector cp .env.example _private/.env $EDITOR _private/.env -weight: 500;">docker compose --env-file _private/.env up -d --build - mean — on average, how many requests does this site get per second? - stddev ("standard deviation") — how much does that number bounce around? - Is this rate too many sigmas above the mean? The "z-score" answers that: z = (rate - mean) / stddev. If z > 3.0, the rate is more than three standard deviations away from normal — which statistically happens in about 0.3% of normal traffic. Almost certainly an anomaly. - Is this rate flat-out higher than 5× the mean? This is a hard ceiling that catches steady, sustained floods even when stddev is wide. If the mean is 2 req/s, anything above 10 req/s is suspicious regardless of statistics. - Production WSGI — Flask's built-in dev server runs the dashboard. For a real deployment I'd put it behind gunicorn. - TLS — I left HTTP-only because the brief didn't mandate it; in real life I'd terminate TLS at Nginx via certbot. - Persistence of the baseline — currently the baseline restarts cold on a daemon -weight: 500;">restart. Snapshotting the recent history to disk would let it warm up faster. - Adaptive backoff — the schedule is static. A repeat-offender weighting that takes the gap between offences into account would feel more humane.

Share this article

Twitter Facebook LinkedIn Reddit

🏷️ Tags

toolsutilitiessecurity toolsbuildinganomalydetectionenginehostednextcloudstage

More from Tools

Tools: 🚀 How to deploy flask app kubernetes helm — the right way (2026)

2026-04-26 0

Tools: Guía para instalar Docker en Ubuntu 26.04 LTS usando el repositorio oficial (2026)

2026-04-26 0

Tools: term.lfix.us – A Simple Yet Powerful Web Ssh client You Should Know About - Expert Insights

2026-04-26 0

Tools: Ubuntu 26.10 Rust Coreutils: A Security Leap or Just a Stunt? (2026)

2026-04-26 0

Trending

1

CVE-2025-61481: Critical Remote Code Execution Vulnerability in MikroTik RouterOS & SwitchOS

2025-10-27 • 189 views

2

CVE-2025-43939: Dell Unity OS Command Injection (High)

2025-10-30 • 148 views

3

Google disputes false claims of massive Gmail data breach

2025-10-30 • 130 views

4

Microsoft: DNS outage impacts Azure and Microsoft 365 services

2025-10-30 • 88 views

5

3.5B Accounts, 1 Critical Flaw: Meta Closes WhatsApp Data-Harvesting

2025-11-25 • 81 views

InfinitSec - Latest Cybersecurity, Technology & Gaming News

Tools: Building a real-time anomaly detection engine for a self-hosted Nextcloud (HNG Stage 3) - 2025 Update

What the project does and why it matters

The architecture in one picture

Step 1 — How the sliding window works

Step 2 — How the baseline learns from traffic

Per-hour-of-day slots

Floors

Step 3 — How detection makes a decision

Step 4 — How iptables is used to block an IP

Putting it all together

What I would change next time

🏷️ Tags

More from Tools

Tools: 🚀 How to deploy flask app kubernetes helm — the right way (2026)

Tools: Guía para instalar Docker en Ubuntu 26.04 LTS usando el repositorio oficial (2026)

Tools: term.lfix.us – A Simple Yet Powerful Web Ssh client You Should Know About - Expert Insights

Tools: Ubuntu 26.10 Rust Coreutils: A Security Leap or Just a Stunt? (2026)

Trending

CVE-2025-61481: Critical Remote Code Execution Vulnerability in MikroTik RouterOS & SwitchOS

CVE-2025-43939: Dell Unity OS Command Injection (High)

Google disputes false claims of massive Gmail data breach

Microsoft: DNS outage impacts Azure and Microsoft 365 services

3.5B Accounts, 1 Critical Flaw: Meta Closes WhatsApp Data-Harvesting