Rate Limiting

A technique to control the number of API requests a client can make within a time window.

Rate limiting restricts how many API requests a client can make in a given time period. It protects your API from abuse, ensures fair usage, and prevents server overload.

Why Rate Limit

Prevent abuse: Without limits, one client can consume all server resources or scrape your entire database.

Ensure fairness: Limits prevent one heavy user from degrading experience for others.

Control costs: Each request costs compute, bandwidth, and third-party API calls.

Security: Rate limiting slows down brute force attacks and credential stuffing.

Common Rate Limiting Strategies

Fixed Window: Count requests per time window (e.g., 100 requests per minute). Simple but allows bursts at window boundaries.

Sliding Window: Smooth out the fixed window problem. Consider requests in a rolling time period.

Token Bucket: Bucket fills with tokens at fixed rate. Each request takes a token. Allows controlled bursts while maintaining average rate.

Leaky Bucket: Requests queue like water in a bucket with a hole. Processes at constant rate, smoothing traffic spikes.

Choosing Limits

User TypeTypical Limits
Anonymous60/hour - prevent scraping
Free tier100/hour - encourage upgrade
Paid tier1000/hour - based on plan
EnterpriseCustom - negotiated SLA

Endpoint-specific limits:

  • Read endpoints: Higher limits (1000/hour)
  • Write endpoints: Lower limits (100/hour)
  • Expensive operations: Very low (10/hour)

Response Headers

Standard headers to communicate rate limit status:

HeaderDescriptionExample
X-RateLimit-LimitMax requests allowed100
X-RateLimit-RemainingRequests left95
X-RateLimit-ResetWhen limit resets (Unix timestamp)1640995200
Retry-AfterSeconds to wait (when limited)60

Always include these headers so clients can track their usage.

Handling Rate Limit Exceeded

Return 429 Too Many Requests: Standard HTTP status for rate limiting.

Include helpful body: Tell client when they can retry and how many requests they made.

Don't charge for 429 responses: Rate limited requests shouldn't count against quota.

Implementation Approaches

In-memory (single server): Fast but doesn't work with multiple servers. Lost on restart.

Redis: Shared state across servers. Fast, persistent. Industry standard.

Database: Works but slower. OK for low-volume APIs.

API Gateway: AWS API Gateway, Kong, etc. Offloads rate limiting from your code.

Rate Limiting by Identifier

By IP address: Simple but problematic. NAT means many users share IP. VPNs bypass it easily.

By API key: Best for authenticated APIs. Fair per-account limits.

By user ID: Good for user-facing apps. Ties to authenticated session.

Combined: IP for anonymous, user ID for authenticated. Different limits for each.

Best Practices

Start lenient, tighten later: Easier to reduce limits than apologize for breaking clients.

Document your limits: Publish rate limits in API docs. No surprises.

Provide rate limit headers: Clients need to know their status before hitting limits.

Different limits per endpoint: Expensive operations need stricter limits than simple reads.

Grace period for new limits: Warn before enforcing. Give clients time to adapt.

Common Mistakes

1. No rate limiting: Any public API without limits will be abused.

2. Rate limiting only by IP: Punishes shared networks. Easy to bypass with proxies.

3. No headers in response: Clients can't implement proper backoff without knowing their status.

4. Blocking vs throttling: Consider returning slower responses instead of hard blocking for some cases.

5. Same limits for all operations: GET /users and POST /users have very different costs. Limit accordingly.

Client-Side Handling

Good API clients should:

  • Read rate limit headers
  • Implement exponential backoff
  • Queue requests during limit periods
  • Cache responses to reduce requests

Distributed Rate Limiting

With multiple servers, you need shared state:

  • Redis with atomic increment
  • Sliding window with sorted sets
  • Approximate counting for high volume

Code Examples

Rate Limiting Middleware with Redis

import Redis from 'ioredis';

const redis = new Redis();

// Sliding window rate limiter
async function rateLimiter(req, res, next) {
  const identifier = req.user?.id || req.ip;
  const limit = req.user ? 1000 : 100; // Authenticated vs anonymous
  const window = 3600; // 1 hour in seconds

  const key = `ratelimit:${identifier}`;
  const now = Date.now();
  const windowStart = now - (window * 1000);

  // Remove old entries, count recent, add current
  const multi = redis.multi();
  multi.zremrangebyscore(key, 0, windowStart);
  multi.zcard(key);
  multi.zadd(key, now, `${now}-${Math.random()}`);
  multi.expire(key, window);

  const results = await multi.exec();
  const requestCount = results[1][1];

  // Set headers
  res.set('X-RateLimit-Limit', limit);
  res.set('X-RateLimit-Remaining', Math.max(0, limit - requestCount - 1));
  res.set('X-RateLimit-Reset', Math.floor((now + window * 1000) / 1000));

  if (requestCount >= limit) {
    res.set('Retry-After', window);
    return res.status(429).json({
      error: 'Too Many Requests',
      message: `Rate limit exceeded. Try again in ${window} seconds.`
    });
  }

  next();
}

app.use(rateLimiter);