Tools

Tools: Building a Real-Time DDoS Detection Engine from Scratch: HNG DevOps Stage 3 (2026)

2026-04-29 0 views admin

A Quick Recap

The Task

What the Project Does and Why It Matters

Step 1: Setting Up the Server

Step 2: Setting Up the DuckDNS Subdomain

Step 3: Writing the Code

Step 4: Deploying on the Server

Step 5: The First Problem — psutil Failed to Build

Step 6: The Second Problem — Nextcloud Architecture Mismatch

Step 7: Verifying the Stack

Step 8: The Third Problem — Slack Webhook Returning 404

Step 9: The Fourth Problem — iptables Inside Docker

Step 10: Testing Everything End to End

How the Detection Works (For Beginners)

The Sliding Window

The Rolling Baseline

How Detection Makes a Decision

iptables Blocking

The Live Dashboard

Final Verification

The Big Picture This is part of my HNG DevOps internship series. Follow along as I document every stage.

Previous articles:Stage 0: How I Secured a Linux Server from ScratchStage 1: Build, Deploy and Reverse Proxy a Rust API

Stage 2: Containerizing a Microservices App with Docker and CI/CD Stage 0 was server hardening. Stage 1 was deploying an API. Stage 2 was containerization and CI/CD. Stage 3 is something different entirely. This time the task was to build a security tool from scratch. The scenario: I have been hired as a DevSecOps engineer at a cloud storage company running Nextcloud. After a wave of suspicious traffic, my job is to build a daemon that watches every HTTP request in real time, learns what normal looks like, and automatically blocks attackers when traffic goes abnormal. No Fail2Ban. No rate-limiting libraries. Build it yourself. The repository is here: https://github.com/GideonBature/hng-stage3 Here is a summary of what needed to be built: Every public web server on the internet gets attacked. Sometimes it is a single IP sending thousands of requests per second trying to overwhelm your server. Sometimes it is a distributed flood from many IPs at once. Both are called Denial of Service attacks and they can take a real service offline completely. Traditional tools scan log files periodically, which introduces a delay. What I built here reads the log file line by line in real time as Nginx writes it, makes a decision within one second, and blocks the offending IP at the firewall level before the attack can do serious damage. The full data flow looks like this: I reused the same Oracle Cloud server from Stages 0, 1, and 2. It runs Ubuntu 24.04 on ARM64 (Ampere A1) with 4 OCPUs and 23GB RAM, well above the 2 vCPU and 2GB minimum required. Docker was already installed from Stage 2. I confirmed it was working: I also needed to open port 8080 for the detector dashboard since the previous stages only had 80 and 443 open: I also added port 8080 in Oracle Cloud's Security List the same way I added ports 80 and 443 back in Stage 0. The task required the dashboard to be served at a domain or subdomain. I already had gideonbature.duckdns.org from Stage 1, pointing to my server IP 92.5.80.18. I went to duckdns.org, logged in, and created a second entry: After saving, I verified the DNS was resolving correctly: I built the detector locally on my Mac and organised it into seven separate modules, each with a single responsibility: I also wrote the Nginx configuration and Docker Compose file to wire everything together. Once everything was ready, I pushed it to GitHub: One issue immediately: GitHub's secret scanning blocked the push because my Slack webhook URL was hardcoded in config.yaml. The fix was to use an environment variable placeholder instead: And resolve it at runtime in main.py: After removing the hardcoded secret, the push went through. I cloned the repository on the server: Then created the .env file with my real values: Then brought the stack up: The first build attempt failed with: psutil is a Python library for reading CPU and memory usage. It contains native C code that needs to be compiled. My Dockerfile was using python:3.11-alpine, and Alpine is a minimal Linux image that does not include build tools by default. The fix was to add the required build dependencies to the Dockerfile: After adding those four packages, the build succeeded. After the build succeeded, I ran docker compose up -d and saw this warning: The task specified using the image kefaslungu/hng-nextcloud, but that image was built only for AMD64. My Oracle Cloud server runs on ARM64. Docker warned about the mismatch and then Nextcloud started crashing repeatedly with: This is a binary incompatibility. The AMD64 binary simply cannot execute on an ARM64 processor. Because Nextcloud was crashing, Nginx could not resolve the nextcloud hostname in its config and also crashed: And because Nginx was down, the detector dashboard was also unreachable even though the detector itself was running fine. The fix was to replace the image with the official Nextcloud image which supports multiple architectures including ARM64: After this change, Nextcloud started correctly, Nginx resolved nextcloud successfully, and the dashboard became accessible. With all three containers running, I verified each piece: The Nextcloud check returned a 200 OK with Nextcloud headers. The dashboard check returned 200 OK showing the live metrics page. I triggered a test flood to confirm bans were working: The dashboard showed a ban fired correctly. But I never received a Slack notification. Checking the detector logs revealed: The webhook URL was being rejected by Slack. This happened because I had initially hardcoded the URL in config.yaml, generated a new webhook URL after the GitHub push was blocked, but only updated the .env file. The config.yaml file inside the container still had the old expired URL hardcoded. The fix was to update config.yaml to use the environment variable placeholder: And rebuild the detector container: After this, the next ban produced a proper Slack notification immediately. I ran another flood and saw the ban fire in the logs: But when I checked the host iptables: Nothing appeared. The DROP rule was being added inside the Docker container's network namespace, not the host machine's. This means the attacker's traffic was still reaching the server untouched. The fix was to run the detector with network_mode: host, which makes the container share the host's network stack directly: When using network_mode: host, the container cannot be on a named network, so I also removed the networks and ports entries from the detector service. This introduced a new problem: Nginx could no longer reach the detector at detector:8080 since the detector was no longer on the Docker bridge network. The fix for the dashboard proxy was to use the Docker bridge gateway IP, which I found with: I updated nginx.conf to proxy the dashboard to 172.19.0.1:8080 instead of detector:8080: After restarting nginx the dashboard came back up and iptables DROP rules now appeared on the host. With all fixes applied, I ran a final full test. I opened three terminals: Terminal 1 (server) — watch iptables live: Terminal 2 (server) — watch detector logs live: Terminal 3 (Mac) — run the flood: Within about 10 seconds of the flood starting, the detector fired: Terminal 1 immediately showed: Slack received both the ban notification and the global anomaly alert. Ten minutes later, the unbanner thread fired and Slack received the unban notification. Now that you have seen the process, let me explain the key concepts clearly. Now here is the core concept. To know if traffic is abnormal, you first need to know the current rate of traffic. A sliding window is how you do that efficiently. Think of it like a conveyor belt. Each request puts a timestamp on the belt. The belt is exactly 60 seconds long. Old timestamps fall off the left end automatically. At any moment, the number of timestamps on the belt divided by 60 gives you the current requests per second. In Python, a collections.deque is the perfect data structure for this because removing items from the left (eviction) is O(1), meaning it takes the same time regardless of how many items are in the queue. Every time a new log entry arrives, its timestamp goes into both the per-IP deque and the global deque. Every second, the evaluator evicts old timestamps and counts what remains. If the count is too high, it fires an alert. The key thing to understand: this window is exact. It is not an approximation or a counter that resets every minute. It literally counts every request that arrived in the last 60 seconds. Knowing the current rate is only half the problem. You also need to know what normal looks like. Is 10 requests per second high? It depends. At 3am with one user, yes. At 2pm with many users, maybe not. This is where the rolling baseline comes in. Instead of hardcoding a threshold like "block anything above 100 req/s", the daemon learns from actual traffic. The baseline works like this: Step 1: Count requests per second in buckets Each second gets its own bucket with a count. These buckets cover the last 30 minutes. Step 2: Every 60 seconds, recalculate mean and standard deviation The mean tells you the average rate. The standard deviation tells you how much the rate varies normally. Step 3: Per-hour slots Traffic at 3am is different from traffic at 3pm. The baseline maintains a separate record for each clock hour. When the current hour has enough data, it is preferred over the global average. This means the detector adapts to time-of-day patterns automatically. During very quiet periods, the computed mean might be nearly zero. If the mean is 0.001 req/s and one request arrives, the rate is suddenly thousands of times the mean, which would trigger a false alarm. To prevent this, a minimum floor is enforced: With the current rate and the baseline established, detection is a single calculation called the z-score: The z-score measures how many standard deviations the current rate is above the mean. In a normal distribution: The 5x multiplier catches sudden bursts even when the stddev is large. The z-score threshold catches sustained elevated rates even when the burst is not huge but is statistically abnormal. Error surge detection adds another layer. If an IP is sending a lot of 4xx or 5xx errors (typical of scanners probing for vulnerabilities), the thresholds are automatically tightened by 50%, making it easier to ban that IP: When an IP is flagged as anomalous, the blocker adds a DROP rule to the Linux firewall. iptables is the kernel-level packet filter in Linux. Adding a DROP rule means the kernel silently discards all packets from that IP before they even reach Nginx. The attacker gets no response at all. The -I INPUT flag inserts the rule at position 1, which means it is evaluated before any other rules. This is important because iptables processes rules in order and stops at the first match. The backoff schedule means repeat offenders get banned for longer: When the ban expires, the unbanner thread removes the rule: And sends a Slack notification so the operator knows the IP has been released. The dashboard is a Flask web application that serves a single HTML page. It auto-refreshes every 3 seconds using an HTML meta refresh tag and shows: There is also a /api/metrics JSON endpoint so the data can be consumed programmatically: Everything passed. The daemon was left running for the required 12 continuous hours and responded correctly to the test attack traffic sent by the graders. Stage 3 introduced concepts that real security engineers work with every day: The hardest bugs were not the obvious ones. The architecture mismatch, the iptables namespace issue, and the stale webhook URL were all invisible until the system was running under real conditions. That is what makes security tooling hard and interesting. Stage 4 is next. Follow along as I keep documenting the journey. Find me on Dev.to | GitHub Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Code Block

Copy

Every HTTP request | v Nginx serves the response, writes one JSON log line | v Daemon reads that line immediately | v Updates sliding windows (per-IP and global) | v Every second: compares current rate against the rolling baseline | v If anomalous: blocks IP with iptables + sends Slack alert | v After ban duration: unblocks and sends Slack unban alert Every HTTP request | v Nginx serves the response, writes one JSON log line | v Daemon reads that line immediately | v Updates sliding windows (per-IP and global) | v Every second: compares current rate against the rolling baseline | v If anomalous: blocks IP with iptables + sends Slack alert | v After ban duration: unblocks and sends Slack unban alert Every HTTP request | v Nginx serves the response, writes one JSON log line | v Daemon reads that line immediately | v Updates sliding windows (per-IP and global) | v Every second: compares current rate against the rolling baseline | v If anomalous: blocks IP with iptables + sends Slack alert | v After ban duration: unblocks and sends Slack unban alert docker --version docker compose version docker --version docker compose version docker --version docker compose version sudo iptables -I INPUT -p tcp --dport 8080 -j ACCEPT sudo netfilter-persistent save sudo iptables -I INPUT -p tcp --dport 8080 -j ACCEPT sudo netfilter-persistent save sudo iptables -I INPUT -p tcp --dport 8080 -j ACCEPT sudo netfilter-persistent save ping detector-gideonbature.duckdns.org # Should return 92.5.80.18 ping detector-gideonbature.duckdns.org # Should return 92.5.80.18 ping detector-gideonbature.duckdns.org # Should return 92.5.80.18 detector/ main.py # Entry point, wires all components together monitor.py # Tails and parses Nginx JSON access log baseline.py # Rolling baseline tracker with per-hour slots detector.py # Z-score anomaly detection logic blocker.py # iptables ban/unban management unbanner.py # Background thread for auto-unban backoff notifier.py # Slack alert sender dashboard.py # Flask live metrics web UI config.yaml # All thresholds and configuration requirements.txt Dockerfile detector/ main.py # Entry point, wires all components together monitor.py # Tails and parses Nginx JSON access log baseline.py # Rolling baseline tracker with per-hour slots detector.py # Z-score anomaly detection logic blocker.py # iptables ban/unban management unbanner.py # Background thread for auto-unban backoff notifier.py # Slack alert sender dashboard.py # Flask live metrics web UI config.yaml # All thresholds and configuration requirements.txt Dockerfile detector/ main.py # Entry point, wires all components together monitor.py # Tails and parses Nginx JSON access log baseline.py # Rolling baseline tracker with per-hour slots detector.py # Z-score anomaly detection logic blocker.py # iptables ban/unban management unbanner.py # Background thread for auto-unban backoff notifier.py # Slack alert sender dashboard.py # Flask live metrics web UI config.yaml # All thresholds and configuration requirements.txt Dockerfile git init git add . git commit -m "Initial Stage 3 anomaly detection engine" git remote add origin [email protected]:GideonBature/hng-stage3.git git push -u origin main git init git add . git commit -m "Initial Stage 3 anomaly detection engine" git remote add origin [email protected]:GideonBature/hng-stage3.git git push -u origin main git init git add . git commit -m "Initial Stage 3 anomaly detection engine" git remote add origin [email protected]:GideonBature/hng-stage3.git git push -u origin main slack: webhook_url: "${SLACK_WEBHOOK_URL}" slack: webhook_url: "${SLACK_WEBHOOK_URL}" slack: webhook_url: "${SLACK_WEBHOOK_URL}" import re, os def load_config(path="config.yaml"): with open(path, "r") as f: content = f.read() def replace_env(match): return os.environ.get(match.group(1), match.group(0)) content = re.sub(r'\$\{(\w+)\}', replace_env, content) return yaml.safe_load(content) import re, os def load_config(path="config.yaml"): with open(path, "r") as f: content = f.read() def replace_env(match): return os.environ.get(match.group(1), match.group(0)) content = re.sub(r'\$\{(\w+)\}', replace_env, content) return yaml.safe_load(content) import re, os def load_config(path="config.yaml"): with open(path, "r") as f: content = f.read() def replace_env(match): return os.environ.get(match.group(1), match.group(0)) content = re.sub(r'\$\{(\w+)\}', replace_env, content) return yaml.safe_load(content) git clone https://github.com/GideonBature/hng-stage3.git cd hng-stage3 git clone https://github.com/GideonBature/hng-stage3.git cd hng-stage3 git clone https://github.com/GideonBature/hng-stage3.git cd hng-stage3 cp .env.example .env nano .env cp .env.example .env nano .env cp .env.example .env nano .env SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/WEBHOOK/URL SERVER_IP=92.5.80.18 SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/WEBHOOK/URL SERVER_IP=92.5.80.18 SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/WEBHOOK/URL SERVER_IP=92.5.80.18 docker compose up -d --build docker compose up -d --build docker compose up -d --build psutil could not be installed from sources because gcc is not installed. error: command 'gcc' failed: No such file or directory psutil could not be installed from sources because gcc is not installed. error: command 'gcc' failed: No such file or directory psutil could not be installed from sources because gcc is not installed. error: command 'gcc' failed: No such file or directory FROM python:3.11-alpine RUN apk add --no-cache \ iptables \ gcc \ musl-dev \ python3-dev \ linux-headers FROM python:3.11-alpine RUN apk add --no-cache \ iptables \ gcc \ musl-dev \ python3-dev \ linux-headers FROM python:3.11-alpine RUN apk add --no-cache \ iptables \ gcc \ musl-dev \ python3-dev \ linux-headers nextcloud The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) nextcloud The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) nextcloud The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) exec /inuxius-entrypoint.sh: exec format error nextcloud exited with code 255 (restarting) exec /inuxius-entrypoint.sh: exec format error nextcloud exited with code 255 (restarting) exec /inuxius-entrypoint.sh: exec format error nextcloud exited with code 255 (restarting) nginx: [emerg] host not found in upstream "nextcloud" nginx: [emerg] host not found in upstream "nextcloud" nginx: [emerg] host not found in upstream "nextcloud" # docker-compose.yml nextcloud: image: nextcloud:apache # replaced kefaslungu/hng-nextcloud # docker-compose.yml nextcloud: image: nextcloud:apache # replaced kefaslungu/hng-nextcloud # docker-compose.yml nextcloud: image: nextcloud:apache # replaced kefaslungu/hng-nextcloud # Check all containers are up docker compose ps # Check the named volume exists docker volume ls | grep HNG-nginx-logs # Confirm Nginx is writing JSON logs docker compose exec nginx tail -5 /var/log/nginx/hng-access.log # Confirm detector is tailing the log docker compose logs detector | head -20 # Check Nextcloud is accessible by IP curl -I http://92.5.80.18 # Check dashboard is accessible by subdomain curl -I http://detector-gideonbature.duckdns.org # Check all containers are up docker compose ps # Check the named volume exists docker volume ls | grep HNG-nginx-logs # Confirm Nginx is writing JSON logs docker compose exec nginx tail -5 /var/log/nginx/hng-access.log # Confirm detector is tailing the log docker compose logs detector | head -20 # Check Nextcloud is accessible by IP curl -I http://92.5.80.18 # Check dashboard is accessible by subdomain curl -I http://detector-gideonbature.duckdns.org # Check all containers are up docker compose ps # Check the named volume exists docker volume ls | grep HNG-nginx-logs # Confirm Nginx is writing JSON logs docker compose exec nginx tail -5 /var/log/nginx/hng-access.log # Confirm detector is tailing the log docker compose logs detector | head -20 # Check Nextcloud is accessible by IP curl -I http://92.5.80.18 # Check dashboard is accessible by subdomain curl -I http://detector-gideonbature.duckdns.org # Install hey on Mac (HTTP load testing tool) brew install hey # Send 500 rapid requests hey -n 500 -c 50 http://92.5.80.18/ # Install hey on Mac (HTTP load testing tool) brew install hey # Send 500 rapid requests hey -n 500 -c 50 http://92.5.80.18/ # Install hey on Mac (HTTP load testing tool) brew install hey # Send 500 rapid requests hey -n 500 -c 50 http://92.5.80.18/ [ERROR] notifier: Slack webhook error: 404 no_service [ERROR] notifier: Slack webhook error: 404 no_service [ERROR] notifier: Slack webhook error: 404 no_service slack: webhook_url: "${SLACK_WEBHOOK_URL}" slack: webhook_url: "${SLACK_WEBHOOK_URL}" slack: webhook_url: "${SLACK_WEBHOOK_URL}" docker compose up -d --build --force-recreate detector docker compose up -d --build --force-recreate detector docker compose up -d --build --force-recreate detector [INFO] blocker: iptables DROP rule added for 105.112.17.175 [WARNING] blocker: BANNED 105.112.17.175: duration=600s [INFO] blocker: iptables DROP rule added for 105.112.17.175 [WARNING] blocker: BANNED 105.112.17.175: duration=600s [INFO] blocker: iptables DROP rule added for 105.112.17.175 [WARNING] blocker: BANNED 105.112.17.175: duration=600s sudo iptables -L INPUT -n | grep DROP sudo iptables -L INPUT -n | grep DROP sudo iptables -L INPUT -n | grep DROP detector: network_mode: host cap_add: - NET_ADMIN - NET_RAW detector: network_mode: host cap_add: - NET_ADMIN - NET_RAW detector: network_mode: host cap_add: - NET_ADMIN - NET_RAW docker compose exec nginx ip route # default via 172.19.0.1 dev eth0 docker compose exec nginx ip route # default via 172.19.0.1 dev eth0 docker compose exec nginx ip route # default via 172.19.0.1 dev eth0 server { listen 80; server_name detector-gideonbature.duckdns.org; location / { proxy_pass http://172.19.0.1:8080; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; } } server { listen 80; server_name detector-gideonbature.duckdns.org; location / { proxy_pass http://172.19.0.1:8080; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; } } server { listen 80; server_name detector-gideonbature.duckdns.org; location / { proxy_pass http://172.19.0.1:8080; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; } } watch -n 1 "sudo iptables -L INPUT -n --line-numbers | grep -E 'num|DROP'" watch -n 1 "sudo iptables -L INPUT -n --line-numbers | grep -E 'num|DROP'" watch -n 1 "sudo iptables -L INPUT -n --line-numbers | grep -E 'num|DROP'" docker compose logs -f detector | grep -i "anomaly\|ban\|drop" docker compose logs -f detector | grep -i "anomaly\|ban\|drop" docker compose logs -f detector | grep -i "anomaly\|ban\|drop" hey -n 2000 -c 100 http://92.5.80.18/ hey -n 2000 -c 100 http://92.5.80.18/ hey -n 2000 -c 100 http://92.5.80.18/ [WARNING] detector: IP anomaly: 105.112.17.175 rate=4.82 mean=1.00 z=7.63 [INFO] blocker: iptables DROP rule added for 105.112.17.175 [WARNING] blocker: BANNED 105.112.17.175: duration=600s [WARNING] detector: GLOBAL anomaly: rate=4.82 mean=1.00 z=7.63 [WARNING] detector: IP anomaly: 105.112.17.175 rate=4.82 mean=1.00 z=7.63 [INFO] blocker: iptables DROP rule added for 105.112.17.175 [WARNING] blocker: BANNED 105.112.17.175: duration=600s [WARNING] detector: GLOBAL anomaly: rate=4.82 mean=1.00 z=7.63 [WARNING] detector: IP anomaly: 105.112.17.175 rate=4.82 mean=1.00 z=7.63 [INFO] blocker: iptables DROP rule added for 105.112.17.175 [WARNING] blocker: BANNED 105.112.17.175: duration=600s [WARNING] detector: GLOBAL anomaly: rate=4.82 mean=1.00 z=7.63 1 DROP 0 -- 105.112.17.175 0.0.0.0/0 1 DROP 0 -- 105.112.17.175 0.0.0.0/0 1 DROP 0 -- 105.112.17.175 0.0.0.0/0 from collections import deque import time # One deque per IP, one global deque ip_window = deque() global_window = deque() def record(source_ip: str): now = time.time() ip_window.append(now) global_window.append(now) def get_rate(window: deque, window_seconds: int = 60) -> float: cutoff = time.time() - window_seconds # Evict timestamps older than 60 seconds from the left while window and window[0] < cutoff: window.popleft() # Count remaining entries and convert to per-second rate return len(window) / window_seconds from collections import deque import time # One deque per IP, one global deque ip_window = deque() global_window = deque() def record(source_ip: str): now = time.time() ip_window.append(now) global_window.append(now) def get_rate(window: deque, window_seconds: int = 60) -> float: cutoff = time.time() - window_seconds # Evict timestamps older than 60 seconds from the left while window and window[0] < cutoff: window.popleft() # Count remaining entries and convert to per-second rate return len(window) / window_seconds from collections import deque import time # One deque per IP, one global deque ip_window = deque() global_window = deque() def record(source_ip: str): now = time.time() ip_window.append(now) global_window.append(now) def get_rate(window: deque, window_seconds: int = 60) -> float: cutoff = time.time() - window_seconds # Evict timestamps older than 60 seconds from the left while window and window[0] < cutoff: window.popleft() # Count remaining entries and convert to per-second rate return len(window) / window_seconds now_bucket = int(time.time()) self._global_buckets[now_bucket] = self._global_buckets.get(now_bucket, 0) + 1 now_bucket = int(time.time()) self._global_buckets[now_bucket] = self._global_buckets.get(now_bucket, 0) + 1 now_bucket = int(time.time()) self._global_buckets[now_bucket] = self._global_buckets.get(now_bucket, 0) + 1 counts = list(self._global_buckets.values()) mean = sum(counts) / len(counts) variance = sum((x - mean) ** 2 for x in counts) / len(counts) stddev = math.sqrt(variance) counts = list(self._global_buckets.values()) mean = sum(counts) / len(counts) variance = sum((x - mean) ** 2 for x in counts) / len(counts) stddev = math.sqrt(variance) counts = list(self._global_buckets.values()) mean = sum(counts) / len(counts) variance = sum((x - mean) ** 2 for x in counts) / len(counts) stddev = math.sqrt(variance) min_rps_floor: 1.0 # never let mean drop below 1 req/s min_rps_floor: 1.0 # never let mean drop below 1 req/s min_rps_floor: 1.0 # never let mean drop below 1 req/s z-score = (current_rate - baseline_mean) / baseline_stddev z-score = (current_rate - baseline_mean) / baseline_stddev z-score = (current_rate - baseline_mean) / baseline_stddev zscore = (ip_rate - mean) / stddev rate_breach = ip_rate >= 5.0 * mean # 5x the average zscore_breach = zscore > 3.0 # statistically very unlikely if zscore_breach or rate_breach: # This IP is attacking fire_anomaly_event(ip) zscore = (ip_rate - mean) / stddev rate_breach = ip_rate >= 5.0 * mean # 5x the average zscore_breach = zscore > 3.0 # statistically very unlikely if zscore_breach or rate_breach: # This IP is attacking fire_anomaly_event(ip) zscore = (ip_rate - mean) / stddev rate_breach = ip_rate >= 5.0 * mean # 5x the average zscore_breach = zscore > 3.0 # statistically very unlikely if zscore_breach or rate_breach: # This IP is attacking fire_anomaly_event(ip) if ip_error_rate >= 3.0 * baseline_error_rate: effective_zscore_threshold = 3.0 * 0.5 # tightened to 1.5 effective_rate_multiplier = 5.0 * 0.5 # tightened to 2.5x if ip_error_rate >= 3.0 * baseline_error_rate: effective_zscore_threshold = 3.0 * 0.5 # tightened to 1.5 effective_rate_multiplier = 5.0 * 0.5 # tightened to 2.5x if ip_error_rate >= 3.0 * baseline_error_rate: effective_zscore_threshold = 3.0 * 0.5 # tightened to 1.5 effective_rate_multiplier = 5.0 * 0.5 # tightened to 2.5x import subprocess def ban(ip: str): subprocess.run( ["iptables", "-I", "INPUT", "-s", ip, "-j", "DROP"], check=True, ) import subprocess def ban(ip: str): subprocess.run( ["iptables", "-I", "INPUT", "-s", ip, "-j", "DROP"], check=True, ) import subprocess def ban(ip: str): subprocess.run( ["iptables", "-I", "INPUT", "-s", ip, "-j", "DROP"], check=True, ) subprocess.run( ["iptables", "-D", "INPUT", "-s", ip, "-j", "DROP"], ) subprocess.run( ["iptables", "-D", "INPUT", "-s", ip, "-j", "DROP"], ) subprocess.run( ["iptables", "-D", "INPUT", "-s", ip, "-j", "DROP"], ) curl http://detector-gideonbature.duckdns.org/api/metrics curl http://detector-gideonbature.duckdns.org/api/metrics curl http://detector-gideonbature.duckdns.org/api/metrics # Audit log showing bans, unbans, and baseline recalculations docker compose exec detector sh -c "grep -v BASELINE_RECALC /var/log/detector/audit.log | tail -10" # Named volume confirmed docker volume ls | grep HNG-nginx-logs # Dashboard live curl -I http://detector-gideonbature.duckdns.org # Nextcloud accessible by IP curl -I http://92.5.80.18 # Audit log showing bans, unbans, and baseline recalculations docker compose exec detector sh -c "grep -v BASELINE_RECALC /var/log/detector/audit.log | tail -10" # Named volume confirmed docker volume ls | grep HNG-nginx-logs # Dashboard live curl -I http://detector-gideonbature.duckdns.org # Nextcloud accessible by IP curl -I http://92.5.80.18 # Audit log showing bans, unbans, and baseline recalculations docker compose exec detector sh -c "grep -v BASELINE_RECALC /var/log/detector/audit.log | tail -10" # Named volume confirmed docker volume ls | grep HNG-nginx-logs # Dashboard live curl -I http://detector-gideonbature.duckdns.org # Nextcloud accessible by IP curl -I http://92.5.80.18 - Deploy Nextcloud behind Nginx using Docker Compose - Nginx must write JSON access logs to a named Docker volume called HNG-nginx-logs - Build a Python daemon that tails the log, tracks request rates using deque-based sliding windows, computes a rolling baseline, and blocks anomalous IPs using iptables - The daemon must send Slack alerts on every ban and unban - A live metrics dashboard must be served at a domain or subdomain - Everything must auto-recover: bans release on a backoff schedule of 10 minutes, 30 minutes, 2 hours, then permanent - Subdomain: detector-gideonbature - IP: 92.5.80.18 - z-score of 1.0 means the rate is slightly elevated but normal - z-score of 2.0 means it is moderately elevated - z-score of 3.0 means it is very unlikely to be normal traffic The detector flags an anomaly if either condition fires first: - Currently banned IPs with ban time, duration, and condition - Global requests per second - Top 10 source IPs in the last 60 seconds - CPU and memory usage of the server - Current baseline mean and stddev - Uptime of the daemon

Share this article

Twitter Facebook LinkedIn Reddit

🏷️ Tags

toolsutilitiessecurity toolsbuildingdetectionenginescratchdevopsstageddos

More from Tools

Tools: Latest: How I Built a Real-Time DDoS Detection Engine From Scratch (And What I Learned)

2026-04-29 0

Tools: Maintenance Has Stopped — How to Plan Your PostgreSQL Backup Migration pgbackrest

2026-04-29 0

Tools: CI/CD Pipeline for a Multi-Site Video Platform (2026)

2026-04-29 0

Tools: Postmortem: How an Azure DevOps 2025 Bug Caused Our .NET 8.0 App to Deploy to the Wrong K8s 1.31 Cluster

2026-04-29 0

Trending

1

CVE-2025-61481: Critical Remote Code Execution Vulnerability in MikroTik RouterOS & SwitchOS

2025-10-27 • 189 views

2

CVE-2025-43939: Dell Unity OS Command Injection (High)

2025-10-30 • 148 views

3

Google disputes false claims of massive Gmail data breach

2025-10-30 • 130 views

4

Microsoft: DNS outage impacts Azure and Microsoft 365 services

2025-10-30 • 88 views

5

3.5B Accounts, 1 Critical Flaw: Meta Closes WhatsApp Data-Harvesting

2025-11-25 • 81 views

InfinitSec - Latest Cybersecurity, Technology & Gaming News

Tools: Building a Real-Time DDoS Detection Engine from Scratch: HNG DevOps Stage 3 (2026)

A Quick Recap

The Task

What the Project Does and Why It Matters

Step 1: Setting Up the Server

Step 2: Setting Up the DuckDNS Subdomain

Step 3: Writing the Code

Step 4: Deploying on the Server

Step 5: The First Problem — psutil Failed to Build

Step 6: The Second Problem — Nextcloud Architecture Mismatch

Step 7: Verifying the Stack

Step 8: The Third Problem — Slack Webhook Returning 404

Step 9: The Fourth Problem — iptables Inside Docker

Step 10: Testing Everything End to End

How the Detection Works (For Beginners)

The Sliding Window

The Rolling Baseline

How Detection Makes a Decision

iptables Blocking

The Live Dashboard

Final Verification

The Big Picture This is part of my HNG DevOps internship series. Follow along as I document every stage.

🏷️ Tags

More from Tools

Tools: Latest: How I Built a Real-Time DDoS Detection Engine From Scratch (And What I Learned)

Tools: Maintenance Has Stopped — How to Plan Your PostgreSQL Backup Migration pgbackrest

Tools: CI/CD Pipeline for a Multi-Site Video Platform (2026)

Tools: Postmortem: How an Azure DevOps 2025 Bug Caused Our .NET 8.0 App to Deploy to the Wrong K8s 1.31 Cluster

Trending

CVE-2025-61481: Critical Remote Code Execution Vulnerability in MikroTik RouterOS & SwitchOS

CVE-2025-43939: Dell Unity OS Command Injection (High)

Google disputes false claims of massive Gmail data breach

Microsoft: DNS outage impacts Azure and Microsoft 365 services

3.5B Accounts, 1 Critical Flaw: Meta Closes WhatsApp Data-Harvesting