
Requirement
- Limit the number of requests a user can send to an API within a time window: n requests/minute. Window size = 1 minute.
- The APIs are accessible through a cluster, so the rate limit should be considered across different servers.
- No substantial latency
Token bucket algorithm
- For each user, we keep a count representing the number of request users has made so far and a timestamp when we started counting. This data can be stored in a hashtable (in a Redis hash in practice):
Key : Value
UserId : { Count, Timestamp }
- Whenever a new request comes in:
if
userId is not in the hashtable, insert it:count = 1, timestamp = now
. Then allow the requestelse
find the value { Count, Timestamp } of key userId in the hashtable:if
(now - Timestamp > 1min): Timestamp = now, Count = 1, allow the requestelse
if
(Count >= n): reject the requestelse
: Count++, allow the request
Disadvantages:
- Fixed window problem

- For example, rate limit = 3 request / minute. We have 3 requests come in the first minute at: 0.4, 0.7, 0.8. Other 3 requests come in the second minute at: 1.4, 1.5, 1.6.
- The token bucket algorithm accepts all 6 requests, especially 5 requests in a window of 1 min
[0.7, 1.7]
=> which violates the rate limit.
- Atomicity
- We read from the hashtable, modify the value, the write back to hashtable. In a distributed environment, this
read-then-write
operator can create a race condition.

- We can resolve the race condition by using lock, but this will slow down the concurrent requests from the same user.
Memory usage
- for each user we need: 2 bytes for userId, 4 bytes for timestamp, 2 bytes for count (can count up to 2^(2*8) ~ 65000).
- for 1 million users:
(2+4+2) * 1000000 = 8MB
Sliding Window algorithm

- Sliding window maintains a list of request timestamp for each user.
- Whenever a request comes:
- we pop all outdated timestamps - timestamps that are older than the time window.
- push new request timestamp
- if list'size > rate limit: reject
- else: accept
- In practice, we can use a Redis Sorted Set for storing this list, and the atomic ZREMRANGEBYSCORE command for quickly removing outdated timestamps.
Key : Value
userId : {
timestamp: timestamp
}
123456 : {
1590225327717: 1590225327717
1590225333131: 1590225333131
1590225333417: 1590225333417
}
- Using this algorithm, we can solve the fixed window problem: for each request arrives at time
t
, the rate in window(t-1, t]
won't exceed the rate limit; as well as atomicity problem.
Disadvantages
- Even if the request is rejected, its request time is recored in the list, this leads to 2 problems:
- memory consumption becomes unbounded.
- if a user continually exceeds the rate limit, none of their actions will be allowed:
rate limit istime request id log list 1 1 {1} accept 2 2 {1, 2} accept 3 3 {1, 2, 3} accept 60 4 {1, 2, 3, 4} reject 61 5 {2, 3, 4, 5} reject 63 6 {4, 5, 6} accept
3 requests/min
. request #5 is rejected, even if in interval [2,61], there are only 2 accepted requests (#2 and #3).
Better Sliding Window algorithm
- The basic idea is to use the previous counter to extrapolate an accurate approximation of the request rate based on the assumption that the distribution of requests is even. In practice, this approximation proved to be good enough.

- Example: there are 9 requests in window [00:00:00, 00:01:00) and 5 reqeuests in window [00:01, 00:02) so far. For newly arrived request at 00:01:15, we can calculate the estimated number of request in [00:00:15, 00:01:15) =
9 x (60-15)/60) + 5 = 11.75
. - This way, we only need to store 2 counters: one for current window and another one for previous window.
Ref