Tools
Tools: 429: TOO MANY REQUESTS
2026-03-05
0 views
admin
The Case for Rate Limiting ## Rate Limiting Algorithms ## Token Bucket ## Leaky Bucket ## Fixed Window Counter ## Sliding Window ## Comparison: All Major Algorithms ## Conclusion As systems continue to scale to serve millions of concurrent users, the need for controlled and predictable API access has never been more critical. Rate limiting is a traffic management technique that restricts the number of requests a client can make to a server within a defined time period. It serves as a foundational safeguard for system reliability, fairness, and security in production environments. An API without rate limiting is vulnerable. The consequences of unbounded request traffic can range from low performing micro-services to complete service outages. The primary motivations for implementing rate limiting include: Several well established algorithms exist for implementing rate limiting, each with distinct trade-offs in terms of accuracy, memory usage, and burst tolerance. Now, imagine a bucket of fixed size in which water is flowing at constant rate and for each request you take out one glass of water out of it. If you have sufficient water in the bucket to fill the glass - you accept the request, else rate limit it. Now, in the bucket replace water with tokens, and each request consumes one token. If the bucket is empty, the request is rate limited and if bucket is full, tokens are dropped/overflown. To implement this algorithm we require two parameters: It is usually better to have different bucket for different API endpoints. Key users of this algorithm include Amazon AWS (API Gateway), Stripe, OpenAI, Cloudflare, and Netflix (Zuul/Envoy). Now, we have the same bucket again, but instead of taking out a glass of water every time we make a hole at bottom, from where the water leaks out at constant rate (i.e. the request to be processed). Question should be if request is leaking out, what is going into the bucket, tokens or request? In this case request. In this algorithm, whenever a new request arrives we add a glass of water to bucket, and then the water leaks out of the hole at constant rate. What if the bucket is full? The water overflows and the request is dropped. To implement this algorithm we require two parameters: Shopify, Uber and NGINX are known to use leaking bucket algorithm in their APIs. The Fixed Window Counter algorithm divides time into fixed, different intervals called windows and tracks how many requests arrive within each one. Every window has a counter starting at zero. Each incoming request increments it, and if the counter exceeds the configured limit, the request is rejected until the next window begins and the counter resets. To implement this algorithm we require two parameters: This algorithm is resource efficient and easy to understand. The only issue with this algorithm is that a bust of traffic at the edges of window, it allows more requests than allowed quota. As discussed, fixed window algorithms has a major issue, i.e. more no of requests at the edges of window. We use sliding window algorithm to overcome this hurdle. There are actually two distinct algorithms under the "sliding window" umbrella: The Sliding Window Counter is a hybrid approach, it combines the memory efficiency of fixed windows with the accuracy of true sliding windows. Instead of storing individual timestamps, it maintains just two counters: one for the current window and one for the previous window. When a request arrives, the algorithm estimates how many requests have been made in the "rolling" window using a weighted average: Where windowProgress is how far you are into the current time window (a value between 0 and 1). While being a resource efficient and distributed system ready algorithm (with redis integration to store timestamps and counter), it has certain drawbacks also. It assumes uniform request distribution in the previous window, which may not always hold true. In edge cases, a few extra requests may slip through due to the estimation There is no "best" rate limiting algorithm, we should always select the right algorithm based on requirement. For most distributed, high-traffic APIs, the Sliding Window Counter is the best suited: O(1) memory, redis-friendly, and good performance to boundary bursts. When accuracy is non-negotiable, upgrade to Sliding Window Log. When bursty-but-bounded traffic is the normal in your APIs, Token Bucket performs best. Understanding these trade-offs is what separates a system that survives traffic spikes from one that crashes under it. Check out the implementation of these algorithms here Thanks for reading, stay tuned! Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse - Protection against denial-of-service (DoS) attacks : Malicious actors can flood endpoints with requests, exhausting server resources and crashing the system.
- Prevention of resource monopolization : A single client/user can degrade the experience for all other users
- Cost management : Uncontrolled traffic directly affects the cost of the infrastructure used to deploy the application.
- Enforcement of API usage plans : Rate limiting enables business models built around different service plans (e.g., free vs. premium plans) to increase frequency of requests in a given time period. - capacity: size of bucket or the maximum number of tokens the bucket should hold at a time.
- refillRate: the rate at which tokens are refilled in the bucket. - capacity: size of bucket or the maximum number of request the bucket should hold at a time.
- outFlowRate: the rate at which requests are leaking/processed out of the bucket. - counter: configured limit of count of request
- timeWindow: the interval of each window - The Sliding Window Log stores a timestamp for every request and checks how many fall within the rolling window, it is accurate but memory in-efficient at scale.
- The Sliding Window Counter, covered in depth below, is the production-grade choice for most APIs.
how-totutorialguidedev.toaiopenaiservernginx