Latency is the time between sending a request and receiving a response. In APIs, it's usually measured in milliseconds (ms). Low latency means fast responses; high latency means users wait. Every 100ms of latency can reduce conversions by 7% according to Amazon's research.
Latency
The time delay between a request being sent and a response being received.
Types of Latency
Network latency: Time for data to travel over the network. Affected by distance, routing, and network congestion.
Server latency: Time for server to process the request. Affected by code efficiency, database queries, and server load.
Client latency: Time for client to process the response. Affected by parsing, rendering, and client-side logic.
Total latency = Network (request) + Server + Network (response) + Client
Measuring Latency
| Metric | What It Measures | Use For |
|---|---|---|
| Response time | Total time from request to response | User experience |
| TTFB | Time to First Byte | Server performance |
| P50 (median) | 50% of requests are faster | Typical experience |
| P95 | 95% of requests are faster | Catching slow outliers |
| P99 | 99% of requests are faster | Worst case performance |
Why percentiles matter: Average can hide problems. If 99 requests take 50ms and 1 takes 5 seconds, average is 100ms - looks fine. P99 of 5000ms reveals the issue.
What Causes High Latency
Database queries: Slow queries, missing indexes, N+1 problems. Often the biggest contributor.
Network distance: Server in US, user in Australia = ~200ms round trip minimum (speed of light).
Cold starts: Serverless functions take time to initialize. First request is slow.
External API calls: Calling third-party APIs adds their latency to yours.
Large payloads: Sending 10MB response takes longer than 10KB.
Serialization: Converting objects to JSON takes CPU time, especially for large responses.
Latency Budgets
Set target latencies and track them:
| Endpoint Type | Target | Acceptable |
|---|---|---|
| Health check | < 10ms | < 50ms |
| Simple read | < 50ms | < 200ms |
| List with filters | < 100ms | < 500ms |
| Complex query | < 200ms | < 1s |
| Write operation | < 100ms | < 500ms |
If endpoint exceeds budget, investigate and optimize.
Reducing Latency
Database optimization:
- Add indexes for frequent queries
- Use connection pooling
- Avoid N+1 queries with eager loading
- Consider read replicas for heavy read loads
Caching:
- Cache frequent reads (Redis, Memcached)
- Use CDN for static content
- Implement HTTP caching headers
Geographic distribution:
- Deploy servers closer to users
- Use CDN with edge locations
- Consider multi-region deployment
Code optimization:
- Profile slow endpoints
- Reduce unnecessary computation
- Use async/parallel processing
- Optimize serialization
Payload reduction:
- Return only needed fields
- Use compression (gzip, brotli)
- Paginate large lists
Monitoring Latency
What to track:
- Response time distribution (histogram)
- P50, P95, P99 percentiles
- Latency by endpoint
- Latency over time (trends)
Alerting:
- Alert when P95 exceeds threshold
- Alert on sudden latency increases
- Track latency as part of SLOs
Tools:
- APM: DataDog, New Relic, Dynatrace
- Open source: Prometheus + Grafana
- Built-in: Cloud provider monitoring
Common Mistakes
1. Only tracking averages: Average hides outliers. Track percentiles (P95, P99) to catch slow requests.
2. Measuring in wrong place: Measure from client perspective, not just server. Network adds latency too.
3. Testing only locally: Local tests don't show network latency. Test from realistic distances.
4. Ignoring cold starts: Serverless first requests are slower. Include warm-up in your latency budget.
5. Over-optimizing: Don't spend weeks reducing 50ms to 40ms if users won't notice. Focus on slow endpoints first.
Best Practices
Set latency budgets: Define acceptable latency for each endpoint type. Monitor and alert on violations.
Profile before optimizing: Find the slow part before rewriting everything. Usually it's one slow query.
Consider perceived performance: Return partial results quickly. Show loading states. Users perceive responsiveness, not just speed.
Test under load: Latency increases under heavy load. Test with realistic concurrency.
Document latency expectations: API consumers should know expected response times. Include in API documentation.
Code Examples
Measuring and Logging API Latency
// Express middleware for latency tracking
const latencyMiddleware = (req, res, next) => {
const start = process.hrtime.bigint();
res.on('finish', () => {
const end = process.hrtime.bigint();
const latencyMs = Number(end - start) / 1_000_000;
// Log with endpoint info
console.log(JSON.stringify({
method: req.method,
path: req.route?.path || req.path,
status: res.statusCode,
latencyMs: latencyMs.toFixed(2),
timestamp: new Date().toISOString()
}));
// Track in metrics system
metrics.histogram('api.latency', latencyMs, {
endpoint: req.route?.path,
method: req.method,
status: res.statusCode
});
// Alert on slow requests
if (latencyMs > 1000) {
alerting.warn('Slow API response', {
endpoint: req.path,
latencyMs
});
}
});
next();
};
app.use(latencyMiddleware);
// Client-side latency measurement
async function fetchWithTiming(url) {
const start = performance.now();
const response = await fetch(url);
const data = await response.json();
const latency = performance.now() - start;
console.log(`Request to ${url} took ${latency.toFixed(0)}ms`);
return data;
}Related Terms
HTTP Status Codes
Standard response codes that indicate the result of an HTTP request.
N+1 Problem
A performance anti-pattern where an application makes N additional queries for N items.
WebSocket
A protocol enabling full-duplex, real-time communication between client and server.
CORS
Cross-Origin Resource Sharing - A security mechanism that controls how web pages can request resources from different domains.