Latency

The time delay between a request being sent and a response being received.

Latency is the time between sending a request and receiving a response. In APIs, it's usually measured in milliseconds (ms). Low latency means fast responses; high latency means users wait. Every 100ms of latency can reduce conversions by 7% according to Amazon's research.

Types of Latency

Network latency: Time for data to travel over the network. Affected by distance, routing, and network congestion.

Server latency: Time for server to process the request. Affected by code efficiency, database queries, and server load.

Client latency: Time for client to process the response. Affected by parsing, rendering, and client-side logic.

Total latency = Network (request) + Server + Network (response) + Client

Measuring Latency

Metric	What It Measures	Use For
Response time	Total time from request to response	User experience
TTFB	Time to First Byte	Server performance
P50 (median)	50% of requests are faster	Typical experience
P95	95% of requests are faster	Catching slow outliers
P99	99% of requests are faster	Worst case performance

Why percentiles matter: Average can hide problems. If 99 requests take 50ms and 1 takes 5 seconds, average is 100ms - looks fine. P99 of 5000ms reveals the issue.

What Causes High Latency

Database queries: Slow queries, missing indexes, N+1 problems. Often the biggest contributor.

Network distance: Server in US, user in Australia = ~200ms round trip minimum (speed of light).

Cold starts: Serverless functions take time to initialize. First request is slow.

External API calls: Calling third-party APIs adds their latency to yours.

Large payloads: Sending 10MB response takes longer than 10KB.

Serialization: Converting objects to JSON takes CPU time, especially for large responses.

Latency Budgets

Set target latencies and track them:

Endpoint Type	Target	Acceptable
Health check	< 10ms	< 50ms
Simple read	< 50ms	< 200ms
List with filters	< 100ms	< 500ms
Complex query	< 200ms	< 1s
Write operation	< 100ms	< 500ms

If endpoint exceeds budget, investigate and optimize.

Reducing Latency

Database optimization:

Add indexes for frequent queries
Use connection pooling
Avoid N+1 queries with eager loading
Consider read replicas for heavy read loads

Caching:

Cache frequent reads (Redis, Memcached)
Use CDN for static content
Implement HTTP caching headers

Geographic distribution:

Deploy servers closer to users
Use CDN with edge locations
Consider multi-region deployment

Code optimization:

Profile slow endpoints
Reduce unnecessary computation
Use async/parallel processing
Optimize serialization

Payload reduction:

Return only needed fields
Use compression (gzip, brotli)
Paginate large lists

Monitoring Latency

What to track:

Response time distribution (histogram)
P50, P95, P99 percentiles
Latency by endpoint
Latency over time (trends)

Alerting:

Alert when P95 exceeds threshold
Alert on sudden latency increases
Track latency as part of SLOs

Tools:

APM: DataDog, New Relic, Dynatrace
Open source: Prometheus + Grafana
Built-in: Cloud provider monitoring

Common Mistakes

1. Only tracking averages: Average hides outliers. Track percentiles (P95, P99) to catch slow requests.

2. Measuring in wrong place: Measure from client perspective, not just server. Network adds latency too.

3. Testing only locally: Local tests don't show network latency. Test from realistic distances.

4. Ignoring cold starts: Serverless first requests are slower. Include warm-up in your latency budget.

5. Over-optimizing: Don't spend weeks reducing 50ms to 40ms if users won't notice. Focus on slow endpoints first.

Best Practices

Set latency budgets: Define acceptable latency for each endpoint type. Monitor and alert on violations.

Profile before optimizing: Find the slow part before rewriting everything. Usually it's one slow query.

Consider perceived performance: Return partial results quickly. Show loading states. Users perceive responsiveness, not just speed.

Test under load: Latency increases under heavy load. Test with realistic concurrency.

Document latency expectations: API consumers should know expected response times. Include in API documentation.

Code Examples

Measuring and Logging API Latency

// Express middleware for latency tracking
const latencyMiddleware = (req, res, next) => {
  const start = process.hrtime.bigint();

  res.on('finish', () => {
    const end = process.hrtime.bigint();
    const latencyMs = Number(end - start) / 1_000_000;

    // Log with endpoint info
    console.log(JSON.stringify({
      method: req.method,
      path: req.route?.path || req.path,
      status: res.statusCode,
      latencyMs: latencyMs.toFixed(2),
      timestamp: new Date().toISOString()
    }));

    // Track in metrics system
    metrics.histogram('api.latency', latencyMs, {
      endpoint: req.route?.path,
      method: req.method,
      status: res.statusCode
    });

    // Alert on slow requests
    if (latencyMs > 1000) {
      alerting.warn('Slow API response', {
        endpoint: req.path,
        latencyMs
      });
    }
  });

  next();
};

app.use(latencyMiddleware);

// Client-side latency measurement
async function fetchWithTiming(url) {
  const start = performance.now();

  const response = await fetch(url);
  const data = await response.json();

  const latency = performance.now() - start;
  console.log(`Request to ${url} took ${latency.toFixed(0)}ms`);

  return data;
}

Latency

Types of Latency

Measuring Latency

What Causes High Latency

Latency Budgets

Reducing Latency

Monitoring Latency

Common Mistakes

Best Practices

Code Examples

Related Terms

HTTP Status Codes

N+1 Problem

WebSocket

CORS