API Rate Limiting: Strategies, Algorithms, and Free Tools
A complete guide to API rate limiting in 2025: Fixed Window, Sliding Window, Token Bucket, and Leaky Bucket algorithms explained, with implementation in Node.js, Python, and Go, Redis-based distributed limiting, and client-side backoff strategies.
API Rate Limiting: Strategies, Algorithms, and Free Tools
Rate limiting is one of those backend concerns that seems simple until you need to implement it correctly. The basic idea is straightforward: limit how many requests a client can make in a given time window. The implementation is where things get interesting — four major algorithms each make different tradeoffs between accuracy, memory usage, and burst tolerance.
This guide covers why rate limiting matters, how each algorithm works, how to implement them in Node.js, Python, and Go, how to handle rate limits gracefully on the client side, and when to choose which approach.
What Rate Limiting Is and Why It’s Critical
Rate limiting controls the rate at which clients can call your API. Without it:
Your API is vulnerable to abuse. A single misconfigured client or malicious actor can send millions of requests, consuming all your server resources and degrading service for everyone else. This is the baseline denial-of-service scenario, and it requires no sophisticated attack — just a for loop.
You can’t guarantee fair resource distribution. If one customer can consume 90% of your compute capacity, other customers suffer. Rate limiting ensures that no single client monopolizes shared infrastructure.
Your costs scale with abuse. If you’re paying per compute-second or per database query, unlimited requests mean unlimited costs. A runaway client job or a bot can generate an AWS bill in hours.
You can’t monetize tiers effectively. If there are no limits, there’s no reason to upgrade from a free plan. Rate limits define the difference between tiers.
Rate limiting belongs on every API that will be called by more than one client — which is effectively every API.
The Four Rate Limiting Algorithms
1. Fixed Window
How it works: Divide time into fixed windows (e.g., 1-minute buckets starting at :00 and :60). Count requests per client per window. Reset the count at the start of each new window.
|--- window 1 (0:00-1:00) ---|--- window 2 (1:00-2:00) ---|
req: 1 2 3 4 5 [LIMIT] 1 2 3 ...
Pros:
- Simple to implement
- Easy to reason about
- Low memory usage (just a counter per client per window)
Cons:
- Burst vulnerability at window boundaries. A client can make 100 requests at 0:59, the window resets at 1:00, and they immediately make 100 more. That’s 200 requests in 2 seconds despite a 100/minute limit.
Best for: Simple APIs, internal services, situations where the window-boundary burst is acceptable.
2. Sliding Window Log
How it works: Keep a timestamp log of each request. When a new request arrives, remove all timestamps older than the window duration, then count remaining timestamps. If under the limit, allow and add the new timestamp.
At time T, request comes in:
Remove timestamps older than (T - 60s)
Count = remaining timestamps
If count < limit: allow + add T
Else: reject
Pros:
- Precise — no boundary bursting
- Smoothly enforces the rate over any rolling window
Cons:
- Memory-intensive. You store a timestamp for every request per user. At high traffic with large limits, this becomes significant.
- Harder to implement efficiently
Best for: APIs where precision matters more than memory efficiency. Low-to-medium traffic APIs.
3. Sliding Window Counter (Hybrid)
A practical compromise between Fixed Window and Sliding Window Log. Instead of storing individual timestamps, approximate the sliding window using the current and previous fixed-window counters:
approximated_count = previous_window_count * (1 - elapsed_fraction) + current_window_count
Where elapsed_fraction is how far you are through the current window (0 to 1).
Pros:
- Much lower memory than Sliding Window Log
- More accurate than Fixed Window
- Good enough for most production use cases
Cons:
- Approximate (within ~1% accuracy for most patterns)
Best for: Production APIs that need better accuracy than Fixed Window but can’t afford the memory of Sliding Window Log.
4. Token Bucket
How it works: Each client gets a “bucket” with a maximum capacity of tokens. Tokens are added at a constant rate (e.g., 10 per second). Each request consumes one token. If the bucket is empty, the request is rejected.
Bucket: [TOKEN][TOKEN][TOKEN][TOKEN][TOKEN] (max 5)
+ 1 token/second (refill rate)
- 1 token/request (consumption rate)
Burst allowed up to bucket capacity
Sustained rate limited to refill rate
Pros:
- Allows controlled bursting. A client can accumulate tokens and use them in a burst — useful for APIs where occasional spikes are acceptable.
- Natural fit for APIs that want to allow bursts but constrain average rate
Cons:
- Requires tracking bucket state (token count + last refill time)
- Slightly more complex to implement than fixed window
Best for: APIs that want to allow burst traffic within bounds. Rate limiting third-party API consumers who occasionally have batch workloads.
5. Leaky Bucket
How it works: Requests enter a queue (the “bucket”). Requests are processed from the queue at a constant rate. If the queue is full, new requests are dropped.
Requests → [QUEUE (capacity 10)] → Processed at 5 req/sec → Response
↓ (if queue full)
Rejected
Pros:
- Produces a perfectly smooth, constant output rate
- Prevents traffic spikes from reaching downstream services
Cons:
- Adds latency — requests wait in queue
- Queue capacity requires tuning
- Not great for request-response APIs (clients wait unpredictably)
Best for: Background processing queues, message systems, anywhere you need to smooth traffic to a downstream service. Less suitable for interactive APIs.
Algorithm Comparison
| Algorithm | Burst Handling | Memory | Accuracy | Complexity |
|---|---|---|---|---|
| Fixed Window | Poor (boundary bursts) | Very Low | Low | Very Simple |
| Sliding Window Log | None (precise) | High | Exact | Moderate |
| Sliding Window Counter | Limited | Low | ~99% | Simple |
| Token Bucket | Controlled | Low | Exact | Moderate |
| Leaky Bucket | Smoothed (queue) | Medium | Exact | Moderate |
Rule of thumb: Token Bucket for most APIs. Sliding Window Counter when you need accuracy without memory overhead. Leaky Bucket for internal service-to-service traffic shaping.
Implementation: Node.js with express-rate-limit
npm install express-rate-limit
Basic Setup
import express from 'express';
import { rateLimit } from 'express-rate-limit';
const app = express();
// Global limiter: 100 requests per 15 minutes per IP
const globalLimiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 100,
standardHeaders: 'draft-7', // Send RateLimit-* headers
legacyHeaders: false,
message: {
error: 'Too many requests, please try again later.',
retryAfter: 'See Retry-After header'
}
});
// Stricter limiter for auth routes
const authLimiter = rateLimit({
windowMs: 15 * 60 * 1000,
max: 10, // Only 10 login attempts per 15 minutes
message: { error: 'Too many authentication attempts' }
});
app.use(globalLimiter);
app.post('/auth/login', authLimiter, loginHandler);
app.post('/auth/register', authLimiter, registerHandler);
Custom Key Generator (by user ID, not IP)
const userLimiter = rateLimit({
windowMs: 60 * 1000, // 1 minute
max: 60,
keyGenerator: (req) => {
// Rate limit by authenticated user ID, fall back to IP
return req.user?.id || req.ip;
},
skip: (req) => {
// Don't rate limit admin users
return req.user?.role === 'admin';
}
});
Implementation: Python with slowapi
pip install slowapi
from fastapi import FastAPI, Request
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address
from slowapi.errors import RateLimitExceeded
limiter = Limiter(key_func=get_remote_address)
app = FastAPI()
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)
@app.get("/api/data")
@limiter.limit("30/minute")
async def get_data(request: Request):
return {"data": "your data here"}
@app.post("/auth/login")
@limiter.limit("5/minute")
async def login(request: Request):
return {"token": "..."}
# Custom key function: rate limit by API key, not IP
def get_api_key(request: Request) -> str:
return request.headers.get("X-API-Key") or get_remote_address(request)
limiter_by_key = Limiter(key_func=get_api_key)
@app.get("/api/premium")
@limiter_by_key.limit("1000/hour")
async def premium_endpoint(request: Request):
return {"data": "premium data"}
Python with Flask
from flask import Flask
from flask_limiter import Limiter
from flask_limiter.util import get_remote_address
app = Flask(__name__)
limiter = Limiter(
app=app,
key_func=get_remote_address,
default_limits=["200 per day", "50 per hour"]
)
@app.route("/api/data")
@limiter.limit("30 per minute")
def get_data():
return {"data": "ok"}
Implementation: Go
package main
import (
"net/http"
"golang.org/x/time/rate"
"sync"
)
type RateLimiter struct {
limiters map[string]*rate.Limiter
mu sync.RWMutex
r rate.Limit // requests per second
b int // burst size
}
func NewRateLimiter(r rate.Limit, b int) *RateLimiter {
return &RateLimiter{
limiters: make(map[string]*rate.Limiter),
r: r,
b: b,
}
}
func (rl *RateLimiter) getLimiter(ip string) *rate.Limiter {
rl.mu.Lock()
defer rl.mu.Unlock()
limiter, exists := rl.limiters[ip]
if !exists {
limiter = rate.NewLimiter(rl.r, rl.b)
rl.limiters[ip] = limiter
}
return limiter
}
func (rl *RateLimiter) Middleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
limiter := rl.getLimiter(r.RemoteAddr)
if !limiter.Allow() {
http.Error(w, "Rate limit exceeded", http.StatusTooManyRequests)
return
}
next.ServeHTTP(w, r)
})
}
func main() {
// 10 requests/second, burst of 20
limiter := NewRateLimiter(rate.Limit(10), 20)
mux := http.NewServeMux()
mux.HandleFunc("/api/data", handler)
http.ListenAndServe(":8080", limiter.Middleware(mux))
}
Go’s golang.org/x/time/rate package implements a Token Bucket algorithm natively.
Rate Limit Response Headers
A good rate limiting implementation communicates its state through HTTP headers. Three headers are standard:
X-RateLimit-Limit: 100 # Max requests in window
X-RateLimit-Remaining: 87 # Requests remaining in current window
X-RateLimit-Reset: 1711027200 # Unix timestamp when window resets
Retry-After: 47 # Seconds until next request is allowed (on 429)
The newer RateLimit header format (RFC 9110 draft) consolidates these:
RateLimit-Limit: 100
RateLimit-Remaining: 87
RateLimit-Reset: 47
When returning a 429 Too Many Requests response, always include Retry-After. Clients that respect this header will back off for the right amount of time, rather than retrying immediately and making the situation worse.
// Node.js: custom 429 response with headers
app.use((req, res, next) => {
res.on('finish', () => {
if (res.statusCode === 429) {
console.log(`Rate limit hit: ${req.ip} on ${req.path}`);
}
});
next();
});
Client-Side Handling: Exponential Backoff
When your code consumes a rate-limited API, you need to handle 429 responses gracefully. Retrying immediately after a 429 is counterproductive and can get your client banned.
Exponential backoff with jitter:
async function fetchWithRetry(url, options = {}, maxRetries = 5) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const response = await fetch(url, options);
if (response.status !== 429) {
return response;
}
// Read Retry-After header if available
const retryAfter = response.headers.get('Retry-After');
const baseDelay = retryAfter
? parseInt(retryAfter) * 1000
: Math.pow(2, attempt) * 1000; // Exponential: 1s, 2s, 4s, 8s...
// Add jitter (±20%) to avoid thundering herd
const jitter = baseDelay * (0.8 + Math.random() * 0.4);
const delay = Math.min(jitter, 60000); // Cap at 60 seconds
console.log(`Rate limited. Retrying in ${Math.round(delay / 1000)}s (attempt ${attempt + 1})`);
await new Promise(resolve => setTimeout(resolve, delay));
}
throw new Error('Max retries exceeded');
}
Request queue pattern — for batch operations, pre-throttle requests to stay under the limit:
class RateLimitedQueue {
constructor(requestsPerSecond) {
this.queue = [];
this.interval = 1000 / requestsPerSecond;
this.lastRequest = 0;
}
async add(fn) {
return new Promise((resolve, reject) => {
this.queue.push({ fn, resolve, reject });
this.process();
});
}
async process() {
if (this.processing) return;
this.processing = true;
while (this.queue.length > 0) {
const now = Date.now();
const wait = this.lastRequest + this.interval - now;
if (wait > 0) await new Promise(r => setTimeout(r, wait));
const { fn, resolve, reject } = this.queue.shift();
this.lastRequest = Date.now();
try {
resolve(await fn());
} catch (e) {
reject(e);
}
}
this.processing = false;
}
}
// Usage: send 5 requests/second max
const queue = new RateLimitedQueue(5);
const results = await Promise.all(
userIds.map(id => queue.add(() => fetchUser(id)))
);
Redis-Based Distributed Rate Limiting
Single-server rate limiting breaks when you scale horizontally. If a client’s request can hit any of 10 servers, each server’s local counter only tracks 1/10th of the actual request count — effectively giving the client 10x the limit.
Redis solves this: all servers share a single rate limit state.
import { createClient } from 'redis';
const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();
async function isRateLimited(clientId, limit, windowSeconds) {
const key = `ratelimit:${clientId}`;
const now = Date.now();
const windowStart = now - windowSeconds * 1000;
// Sliding Window Log using Redis Sorted Sets
const pipeline = redis.multi();
pipeline.zRemRangeByScore(key, '-inf', windowStart); // Remove old entries
pipeline.zAdd(key, { score: now, value: `${now}-${Math.random()}` });
pipeline.zCard(key); // Count current window requests
pipeline.expire(key, windowSeconds);
const results = await pipeline.exec();
const count = results[2]; // zCard result
return count > limit;
}
// Usage in middleware
app.use(async (req, res, next) => {
const clientId = req.user?.id || req.ip;
const limited = await isRateLimited(clientId, 100, 60);
if (limited) {
res.status(429).json({ error: 'Rate limit exceeded' });
return;
}
next();
});
For production, use node-rate-limiter-flexible, which implements all four algorithms with Redis support:
npm install rate-limiter-flexible ioredis
import { RateLimiterRedis } from 'rate-limiter-flexible';
import Redis from 'ioredis';
const redisClient = new Redis(process.env.REDIS_URL);
const rateLimiter = new RateLimiterRedis({
storeClient: redisClient,
keyPrefix: 'middleware',
points: 100, // requests
duration: 60, // per 60 seconds
blockDuration: 60 // block for 60s if exceeded
});
app.use(async (req, res, next) => {
try {
const result = await rateLimiter.consume(req.ip);
res.setHeader('X-RateLimit-Remaining', result.remainingPoints);
res.setHeader('X-RateLimit-Reset', Math.ceil(result.msBeforeNext / 1000));
next();
} catch (rejRes) {
res.setHeader('Retry-After', Math.ceil(rejRes.msBeforeNext / 1000));
res.status(429).json({ error: 'Too Many Requests' });
}
});
Testing Rate Limits
Testing your rate limiting implementation requires sending many requests quickly and verifying the right ones are rejected. The API Tester at devplaybook.cc lets you inspect response headers directly in the browser — useful for verifying that X-RateLimit-* headers are being returned correctly.
For automated testing:
# Quick burst test with curl: send 20 requests in parallel
for i in {1..20}; do
curl -s -o /dev/null -w "%{http_code}\n" http://localhost:3000/api/data &
done
wait
# You should see a mix of 200 and 429 responses
// Integration test with Jest
test('rate limiting blocks excess requests', async () => {
const responses = await Promise.all(
Array.from({ length: 15 }, () =>
fetch('http://localhost:3000/api/data')
)
);
const statusCodes = responses.map(r => r.status);
const successCount = statusCodes.filter(s => s === 200).length;
const limitedCount = statusCodes.filter(s => s === 429).length;
expect(successCount).toBe(10); // Assuming limit of 10
expect(limitedCount).toBe(5);
// Verify headers on a 429 response
const limitedResponse = responses.find(r => r.status === 429);
expect(limitedResponse.headers.get('Retry-After')).toBeTruthy();
});
When to Choose Which Algorithm
Fixed Window — Use it when simplicity is the priority and you’re not worried about window-boundary bursts. Internal APIs, admin dashboards, webhooks.
Sliding Window Counter — The best default choice for most production APIs. Accurate enough, memory-efficient, straightforward to implement with Redis.
Token Bucket — Choose this when your API has legitimate use cases for burst traffic. API consumers who occasionally run batch jobs, or integrations that buffer requests and release them at once.
Leaky Bucket — Use it at the infrastructure layer to smooth traffic to downstream services, not at the API response layer. Think: smoothing writes to a database, messages to a queue.
Multi-tier limiting — For serious production APIs, layer multiple limiters: a per-second burst limit (Token Bucket) plus a per-hour sustained limit (Sliding Window). This allows brief spikes while preventing sustained abuse.
// Two-tier rate limiting example
const burstLimiter = rateLimit({ windowMs: 1000, max: 20 }); // 20/second burst
const sustainedLimiter = rateLimit({ windowMs: 3600000, max: 1000 }); // 1000/hour
app.use(burstLimiter, sustainedLimiter);
Rate limiting is one of those things that’s invisible when it works and catastrophic when it doesn’t. The right algorithm, combined with informative headers and well-designed client-side backoff, makes your API resilient without creating friction for legitimate users.
使用以下連結支持我們:Cloudflare | DigitalOcean
Free Newsletter
Level Up Your Dev Workflow
Get new tools, guides, and productivity tips delivered to your inbox.
Plus: grab the free Developer Productivity Checklist when you subscribe.
Found this guide useful? Check out our free developer tools.
Affiliate disclosure: Some links below are affiliate links — we may earn a small commission at no extra cost to you. Learn more.
Recommended Tools & Resources
DigitalOcean
$200 credit for new users. Simple, affordable cloud hosting for developers.
GitHub Student Pack
Free access to 100+ developer tools. Perfect for students and new devs.
Vercel
Deploy frontend apps instantly. Free tier is generous for side projects.
DevPlaybook Products
Boilerplates, scripts & AI toolkits to 10x your dev workflow.