Latency is the time between sending a request and receiving a response. In APIs, it's usually measured in milliseconds (ms). Low latency means fast responses; high latency means users wait. Every 100ms of latency can reduce conversions by 7% according to Amazon's research.
Latency
The time delay between a request being sent and a response being received.
Types of Latency
Network latency: Time for data to travel over the network. Affected by distance, routing, and network congestion.
Server latency: Time for server to process the request. Affected by code efficiency, database queries, and server load.
Client latency: Time for client to process the response. Affected by parsing, rendering, and client-side logic.
Total latency = Network (request) + Server + Network (response) + Client
Measuring Latency
| Metric | What It Measures | Use For |
|---|---|---|
| Response time | Total time from request to response | User experience |
| TTFB | Time to First Byte | Server performance |
| P50 (median) | 50% of requests are faster | Typical experience |
| P95 | 95% of requests are faster | Catching slow outliers |
| P99 | 99% of requests are faster | Worst case performance |
Why percentiles matter: Average can hide problems. If 99 requests take 50ms and 1 takes 5 seconds, average is 100ms - looks fine. P99 of 5000ms reveals the issue.
What Causes High Latency
Database queries: Slow queries, missing indexes, N+1 problems. Often the biggest contributor.
Network distance: Server in US, user in Australia = ~200ms round trip minimum (speed of light).
Cold starts: Serverless functions take time to initialize. First request is slow.
External API calls: Calling third-party APIs adds their latency to yours.
Large payloads: Sending 10MB response takes longer than 10KB.
Serialization: Converting objects to JSON takes CPU time, especially for large responses.
Latency Budgets
Set target latencies and track them:
| Endpoint Type | Target | Acceptable |
|---|---|---|
| Health check | < 10ms | < 50ms |
| Simple read | < 50ms | < 200ms |
| List with filters | < 100ms | < 500ms |
| Complex query | < 200ms | < 1s |
| Write operation | < 100ms | < 500ms |
If endpoint exceeds budget, investigate and optimize.
Reducing Latency
Database optimization:
- Add indexes for frequent queries
- Use connection pooling
- Avoid N+1 queries with eager loading
- Consider read replicas for heavy read loads
Caching:
- Cache frequent reads (Redis, Memcached)
- Use CDN for static content
- Implement HTTP caching headers
Geographic distribution:
- Deploy servers closer to users
- Use CDN with edge locations
- Consider multi-region deployment
Code optimization:
- Profile slow endpoints
- Reduce unnecessary computation
- Use async/parallel processing
- Optimize serialization
Payload reduction:
- Return only needed fields
- Use compression (gzip, brotli)
- Paginate large lists
Monitoring Latency
What to track:
- Response time distribution (histogram)
- P50, P95, P99 percentiles
- Latency by endpoint
- Latency over time (trends)
Alerting:
- Alert when P95 exceeds threshold
- Alert on sudden latency increases
- Track latency as part of SLOs
Tools:
- APM: DataDog, New Relic, Dynatrace
- Open source: Prometheus + Grafana
- Built-in: Cloud provider monitoring
Common Mistakes
1. Only tracking averages: Average hides outliers. Track percentiles (P95, P99) to catch slow requests.
2. Measuring in wrong place: Measure from client perspective, not just server. Network adds latency too.
3. Testing only locally: Local tests don't show network latency. Test from realistic distances.
4. Ignoring cold starts: Serverless first requests are slower. Include warm-up in your latency budget.
5. Over-optimizing: Don't spend weeks reducing 50ms to 40ms if users won't notice. Focus on slow endpoints first.
Best Practices
Set latency budgets: Define acceptable latency for each endpoint type. Monitor and alert on violations.
Profile before optimizing: Find the slow part before rewriting everything. Usually it's one slow query.
Consider perceived performance: Return partial results quickly. Show loading states. Users perceive responsiveness, not just speed.
Test under load: Latency increases under heavy load. Test with realistic concurrency.
Document latency expectations: API consumers should know expected response times. Include in API documentation.
Code Examples
Measuring and Logging API Latency
// Express middleware for latency tracking
const latencyMiddleware = (req, res, next) => {
const start = process.hrtime.bigint();
res.on('finish', () => {
const end = process.hrtime.bigint();
const latencyMs = Number(end - start) / 1_000_000;
// Log with endpoint info
console.log(JSON.stringify({
method: req.method,
path: req.route?.path || req.path,
status: res.statusCode,
latencyMs: latencyMs.toFixed(2),
timestamp: new Date().toISOString()
}));
// Track in metrics system
metrics.histogram('api.latency', latencyMs, {
endpoint: req.route?.path,
method: req.method,
status: res.statusCode
});
// Alert on slow requests
if (latencyMs > 1000) {
alerting.warn('Slow API response', {
endpoint: req.path,
latencyMs
});
}
});
next();
};
app.use(latencyMiddleware);
// Client-side latency measurement
async function fetchWithTiming(url) {
const start = performance.now();
const response = await fetch(url);
const data = await response.json();
const latency = performance.now() - start;
console.log(`Request to ${url} took ${latency.toFixed(0)}ms`);
return data;
}Related Terms
API Testing
The process of verifying that APIs work correctly, securely, and perform well.
API Mocking
Simulating API behavior to enable development and testing without real backend services.
Contract Testing
A testing approach that verifies API consumers and providers adhere to a shared contract, ensuring integration compatibility.
REST API
An architectural style for building web APIs using HTTP methods and stateless communication.