Rate limiting restricts how many API requests a client can make in a given time period. It protects your API from abuse, ensures fair usage, and prevents server overload.
Rate Limiting
A technique to control the number of API requests a client can make within a time window.
Why Rate Limit
Prevent abuse: Without limits, one client can consume all server resources or scrape your entire database.
Ensure fairness: Limits prevent one heavy user from degrading experience for others.
Control costs: Each request costs compute, bandwidth, and third-party API calls.
Security: Rate limiting slows down brute force attacks and credential stuffing.
Common Rate Limiting Strategies
Fixed Window: Count requests per time window (e.g., 100 requests per minute). Simple but allows bursts at window boundaries.
Sliding Window: Smooth out the fixed window problem. Consider requests in a rolling time period.
Token Bucket: Bucket fills with tokens at fixed rate. Each request takes a token. Allows controlled bursts while maintaining average rate.
Leaky Bucket: Requests queue like water in a bucket with a hole. Processes at constant rate, smoothing traffic spikes.
Choosing Limits
| User Type | Typical Limits |
|---|---|
| Anonymous | 60/hour - prevent scraping |
| Free tier | 100/hour - encourage upgrade |
| Paid tier | 1000/hour - based on plan |
| Enterprise | Custom - negotiated SLA |
Endpoint-specific limits:
- Read endpoints: Higher limits (1000/hour)
- Write endpoints: Lower limits (100/hour)
- Expensive operations: Very low (10/hour)
Response Headers
Standard headers to communicate rate limit status:
| Header | Description | Example |
|---|---|---|
X-RateLimit-Limit | Max requests allowed | 100 |
X-RateLimit-Remaining | Requests left | 95 |
X-RateLimit-Reset | When limit resets (Unix timestamp) | 1640995200 |
Retry-After | Seconds to wait (when limited) | 60 |
Always include these headers so clients can track their usage.
Handling Rate Limit Exceeded
Return 429 Too Many Requests: Standard HTTP status for rate limiting.
Include helpful body: Tell client when they can retry and how many requests they made.
Don't charge for 429 responses: Rate limited requests shouldn't count against quota.
Implementation Approaches
In-memory (single server): Fast but doesn't work with multiple servers. Lost on restart.
Redis: Shared state across servers. Fast, persistent. Industry standard.
Database: Works but slower. OK for low-volume APIs.
API Gateway: AWS API Gateway, Kong, etc. Offloads rate limiting from your code.
Rate Limiting by Identifier
By IP address: Simple but problematic. NAT means many users share IP. VPNs bypass it easily.
By API key: Best for authenticated APIs. Fair per-account limits.
By user ID: Good for user-facing apps. Ties to authenticated session.
Combined: IP for anonymous, user ID for authenticated. Different limits for each.
Best Practices
Start lenient, tighten later: Easier to reduce limits than apologize for breaking clients.
Document your limits: Publish rate limits in API docs. No surprises.
Provide rate limit headers: Clients need to know their status before hitting limits.
Different limits per endpoint: Expensive operations need stricter limits than simple reads.
Grace period for new limits: Warn before enforcing. Give clients time to adapt.
Common Mistakes
1. No rate limiting: Any public API without limits will be abused.
2. Rate limiting only by IP: Punishes shared networks. Easy to bypass with proxies.
3. No headers in response: Clients can't implement proper backoff without knowing their status.
4. Blocking vs throttling: Consider returning slower responses instead of hard blocking for some cases.
5. Same limits for all operations:
GET /users and POST /users have very different costs. Limit accordingly.
Client-Side Handling
Good API clients should:
- Read rate limit headers
- Implement exponential backoff
- Queue requests during limit periods
- Cache responses to reduce requests
Distributed Rate Limiting
With multiple servers, you need shared state:
- Redis with atomic increment
- Sliding window with sorted sets
- Approximate counting for high volume
Code Examples
Rate Limiting Middleware with Redis
import Redis from 'ioredis';
const redis = new Redis();
// Sliding window rate limiter
async function rateLimiter(req, res, next) {
const identifier = req.user?.id || req.ip;
const limit = req.user ? 1000 : 100; // Authenticated vs anonymous
const window = 3600; // 1 hour in seconds
const key = `ratelimit:${identifier}`;
const now = Date.now();
const windowStart = now - (window * 1000);
// Remove old entries, count recent, add current
const multi = redis.multi();
multi.zremrangebyscore(key, 0, windowStart);
multi.zcard(key);
multi.zadd(key, now, `${now}-${Math.random()}`);
multi.expire(key, window);
const results = await multi.exec();
const requestCount = results[1][1];
// Set headers
res.set('X-RateLimit-Limit', limit);
res.set('X-RateLimit-Remaining', Math.max(0, limit - requestCount - 1));
res.set('X-RateLimit-Reset', Math.floor((now + window * 1000) / 1000));
if (requestCount >= limit) {
res.set('Retry-After', window);
return res.status(429).json({
error: 'Too Many Requests',
message: `Rate limit exceeded. Try again in ${window} seconds.`
});
}
next();
}
app.use(rateLimiter);Related Terms
API Versioning
Strategies for managing breaking changes in APIs while maintaining backward compatibility.
GraphQL
A query language for APIs that allows clients to request exactly the data they need.
REST API
An architectural style for building web APIs using HTTP methods and stateless communication.
Idempotency
A property where making the same API request multiple times produces the same result.