Tools: I Built LLM Evaluation-as-Code in CI: Here's How to Avoid Shipping Regressions

Tools: I Built LLM Evaluation-as-Code in CI: Here's How to Avoid Shipping Regressions

API Rate Limiting Playbook: Protect Your Backend From Abuse

The Problem

Why Most Indie Teams Skip Rate Limiting

The Three-Layer Strategy

Layer 1: IP-Based Rate Limiting (Nginx)

Layer 2: User/Token-Based Rate Limiting (Redis + Python)

Layer 3: Endpoint-Specific Thresholds

Real-World Cost Breakdown

Implementation Checklist

Common Mistakes to Avoid

Debugging Rate Limit Issues

Next Steps Your API is live in production. Traffic is growing. Then one day, a bot discovers your endpoint and starts hammering it with 100,000 requests per second. Your database melts. Your users see 500 errors. You lose revenue and reputation. Or worse: a malicious actor uses your API to brute-force user accounts. You didn't have rate limiting in place. You're liable. This is the silent killer of indie SaaS. You ship the product. You don't ship the protection. Then production breaks. Rate limiting sounds complicated. "Distributed rate limiting"? "Token bucket algorithm"? "Redis backing stores"? In reality, it's simple. And you don't need expensive tools. You don't need AWS API Gateway ($0.35 per million requests). You don't need third-party middleware. You need a methodology. Once you have methodology, the implementation is trivial. First line of defense: block obvious bots and abusers at the edge. Cost: $0 (Nginx is free). Setup time: 15 minutes. Blocks: 95% of bot traffic and accidental DDoS. Your authenticated users have legitimate spikes. A single IP-based rule punishes them unfairly. Instead, rate limit per API key or user ID: Cost: Redis Cloud free tier (up to 30MB). Setup time: 30 minutes. Blocks: Authenticated abuse, account enumeration, brute-force attacks. Different endpoints have different abuse vectors: Document these in your API spec. Expose rate limit headers to clients: Compare to AWS API Gateway: $0.35 per million requests = $3,500/month at scale. Time to implement: 2–4 hours. Cost: $0 (for 95% of use cases). When a user reports "API blocked", here's how to troubleshoot: This playbook includes: Implementing rate limiting takes 2–4 hours. Ignoring it costs you production incidents and security breaches. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Code Block

Copy

limit_req_zone $binary_remote_addr zone=general:10m rate=10r/s; limit_req_zone $binary_remote_addr zone=auth:10m rate=1r/s; server { location /api/ { limit_req zone=general burst=20 nodelay; } location /api/auth/login { limit_req zone=auth burst=3 nodelay; } } limit_req_zone $binary_remote_addr zone=general:10m rate=10r/s; limit_req_zone $binary_remote_addr zone=auth:10m rate=1r/s; server { location /api/ { limit_req zone=general burst=20 nodelay; } location /api/auth/login { limit_req zone=auth burst=3 nodelay; } } limit_req_zone $binary_remote_addr zone=general:10m rate=10r/s; limit_req_zone $binary_remote_addr zone=auth:10m rate=1r/s; server { location /api/ { limit_req zone=general burst=20 nodelay; } location /api/auth/login { limit_req zone=auth burst=3 nodelay; } } import redis from datetime import datetime, timedelta r = redis.Redis() def is_rate_limited(user_id, limit=100, window_seconds=3600): key = f"rate_limit:{user_id}:{int(datetime.now().timestamp() // window_seconds)}" current = r.incr(key) r.expire(key, window_seconds) return current > limit @app.route('/api/resource') def get_resource(): if is_rate_limited(current_user.id): return {'error': 'Rate limit exceeded'}, 429 return process_request() import redis from datetime import datetime, timedelta r = redis.Redis() def is_rate_limited(user_id, limit=100, window_seconds=3600): key = f"rate_limit:{user_id}:{int(datetime.now().timestamp() // window_seconds)}" current = r.incr(key) r.expire(key, window_seconds) return current > limit @app.route('/api/resource') def get_resource(): if is_rate_limited(current_user.id): return {'error': 'Rate limit exceeded'}, 429 return process_request() import redis from datetime import datetime, timedelta r = redis.Redis() def is_rate_limited(user_id, limit=100, window_seconds=3600): key = f"rate_limit:{user_id}:{int(datetime.now().timestamp() // window_seconds)}" current = r.incr(key) r.expire(key, window_seconds) return current > limit @app.route('/api/resource') def get_resource(): if is_rate_limited(current_user.id): return {'error': 'Rate limit exceeded'}, 429 return process_request() response.headers['X-RateLimit-Limit'] = '100' response.headers['X-RateLimit-Remaining'] = '87' response.headers['X-RateLimit-Reset'] = unix_timestamp response.headers['X-RateLimit-Limit'] = '100' response.headers['X-RateLimit-Remaining'] = '87' response.headers['X-RateLimit-Reset'] = unix_timestamp response.headers['X-RateLimit-Limit'] = '100' response.headers['X-RateLimit-Remaining'] = '87' response.headers['X-RateLimit-Reset'] = unix_timestamp - Public endpoints (search, info): 100 req/min per IP - Auth endpoints (login, signup): 5 req/min per IP + distributed rate limit - Resource creation (write APIs): 10 req/min per user - Admin endpoints: 1000 req/day per user (tight control) - [ ] Deploy Nginx rate limiting (zone + limit_req directive) - [ ] Set up Redis account (free tier) - [ ] Write rate limit middleware in your framework - [ ] Define endpoint-specific limits - [ ] Add rate limit headers to responses - [ ] Test with Apache Bench or Vegeta load testing tool - [ ] Set up alerts (Slack notification when a user hits limits) - [ ] Document rate limits in your API docs - Only IP-based limiting: Punishes corporate networks and VPNs. - No graduated response: Ban immediately instead of throttling first. - Storing counts in database: Too slow. Use Redis or in-memory cache. - Not exposing rate limit headers: Clients can't intelligently back off. - Ignoring health check endpoints: Don't rate limit your own monitoring. - Check Redis keys: redis-cli KEYS "rate_limit:*" - Inspect their request pattern: high burst vs sustained? - Whitelist their IP/user if it's a legitimate use case - Adjust thresholds based on real traffic patterns - Ready-to-deploy Nginx configs for all major frameworks - Redis setup guide (AWS ElastiCache, DigitalOcean, Heroku) - Complete Python/Node.js middleware code - GitHub Actions workflow for load testing - Real abuse patterns from production SaaS systems - Cost optimization strategies (cache tiers, fallback limits) - Comprehensive debugging guide - Whitelist/bypass strategies for trusted partners