Latency

The time delay between a request being sent and a response being received.

Latency is the time between sending a request and receiving a response. In APIs, it's usually measured in milliseconds (ms). Low latency means fast responses; high latency means users wait. Every 100ms of latency can reduce conversions by 7% according to Amazon's research.

Types of Latency

Network latency: Time for data to travel over the network. Affected by distance, routing, and network congestion.

Server latency: Time for server to process the request. Affected by code efficiency, database queries, and server load.

Client latency: Time for client to process the response. Affected by parsing, rendering, and client-side logic.

Total latency = Network (request) + Server + Network (response) + Client

Measuring Latency

MetricWhat It MeasuresUse For
Response timeTotal time from request to responseUser experience
TTFBTime to First ByteServer performance
P50 (median)50% of requests are fasterTypical experience
P9595% of requests are fasterCatching slow outliers
P9999% of requests are fasterWorst case performance

Why percentiles matter: Average can hide problems. If 99 requests take 50ms and 1 takes 5 seconds, average is 100ms - looks fine. P99 of 5000ms reveals the issue.

What Causes High Latency

Database queries: Slow queries, missing indexes, N+1 problems. Often the biggest contributor.

Network distance: Server in US, user in Australia = ~200ms round trip minimum (speed of light).

Cold starts: Serverless functions take time to initialize. First request is slow.

External API calls: Calling third-party APIs adds their latency to yours.

Large payloads: Sending 10MB response takes longer than 10KB.

Serialization: Converting objects to JSON takes CPU time, especially for large responses.

Latency Budgets

Set target latencies and track them:

Endpoint TypeTargetAcceptable
Health check< 10ms< 50ms
Simple read< 50ms< 200ms
List with filters< 100ms< 500ms
Complex query< 200ms< 1s
Write operation< 100ms< 500ms

If endpoint exceeds budget, investigate and optimize.

Reducing Latency

Database optimization:

  • Add indexes for frequent queries
  • Use connection pooling
  • Avoid N+1 queries with eager loading
  • Consider read replicas for heavy read loads

Caching:

  • Cache frequent reads (Redis, Memcached)
  • Use CDN for static content
  • Implement HTTP caching headers

Geographic distribution:

  • Deploy servers closer to users
  • Use CDN with edge locations
  • Consider multi-region deployment

Code optimization:

  • Profile slow endpoints
  • Reduce unnecessary computation
  • Use async/parallel processing
  • Optimize serialization

Payload reduction:

  • Return only needed fields
  • Use compression (gzip, brotli)
  • Paginate large lists

Monitoring Latency

What to track:

  • Response time distribution (histogram)
  • P50, P95, P99 percentiles
  • Latency by endpoint
  • Latency over time (trends)

Alerting:

  • Alert when P95 exceeds threshold
  • Alert on sudden latency increases
  • Track latency as part of SLOs

Tools:

  • APM: DataDog, New Relic, Dynatrace
  • Open source: Prometheus + Grafana
  • Built-in: Cloud provider monitoring

Common Mistakes

1. Only tracking averages: Average hides outliers. Track percentiles (P95, P99) to catch slow requests.

2. Measuring in wrong place: Measure from client perspective, not just server. Network adds latency too.

3. Testing only locally: Local tests don't show network latency. Test from realistic distances.

4. Ignoring cold starts: Serverless first requests are slower. Include warm-up in your latency budget.

5. Over-optimizing: Don't spend weeks reducing 50ms to 40ms if users won't notice. Focus on slow endpoints first.

Best Practices

Set latency budgets: Define acceptable latency for each endpoint type. Monitor and alert on violations.

Profile before optimizing: Find the slow part before rewriting everything. Usually it's one slow query.

Consider perceived performance: Return partial results quickly. Show loading states. Users perceive responsiveness, not just speed.

Test under load: Latency increases under heavy load. Test with realistic concurrency.

Document latency expectations: API consumers should know expected response times. Include in API documentation.

Code Examples

Measuring and Logging API Latency

// Express middleware for latency tracking
const latencyMiddleware = (req, res, next) => {
  const start = process.hrtime.bigint();

  res.on('finish', () => {
    const end = process.hrtime.bigint();
    const latencyMs = Number(end - start) / 1_000_000;

    // Log with endpoint info
    console.log(JSON.stringify({
      method: req.method,
      path: req.route?.path || req.path,
      status: res.statusCode,
      latencyMs: latencyMs.toFixed(2),
      timestamp: new Date().toISOString()
    }));

    // Track in metrics system
    metrics.histogram('api.latency', latencyMs, {
      endpoint: req.route?.path,
      method: req.method,
      status: res.statusCode
    });

    // Alert on slow requests
    if (latencyMs > 1000) {
      alerting.warn('Slow API response', {
        endpoint: req.path,
        latencyMs
      });
    }
  });

  next();
};

app.use(latencyMiddleware);

// Client-side latency measurement
async function fetchWithTiming(url) {
  const start = performance.now();

  const response = await fetch(url);
  const data = await response.json();

  const latency = performance.now() - start;
  console.log(`Request to ${url} took ${latency.toFixed(0)}ms`);

  return data;
}