Load Balancers
Distributing network and application traffic across servers. Learn about Layer 4 vs Layer 7 load balancing, routing algorithms, session persistence, health checks, and scaling strategies.
Load Balancing
A single web server can only handle a limited number of concurrent connections. To scale beyond a single machine, we use Load Balancers (LBs). A load balancer acts as a traffic cop, routing incoming client requests across a pool of backend servers. This prevents any single server from becoming a bottleneck, eliminates single points of failure, and enables horizontal scaling.
1. Multi-Tier Load Balancing Architecture
In large-scale systems, load balancing is not handled by a single device. Instead, it is structured in multiple layers to handle massive throughput:
- DNS Tier (Anycast/GeoDNS): Routes the client to the topologically closest data center.
- L4 Tier (IP/Port): A high-throughput, hardware or low-level software load balancer (e.g., Linux Virtual Server (LVS), AWS Network Load Balancer (NLB)) receives millions of raw packets and forwards them to a cluster of L7 load balancers without reading application headers.
- L7 Tier (HTTP/HTTPS): A content-aware reverse proxy (e.g., NGINX, HAProxy, Envoy) performs SSL termination, inspects HTTP headers/cookies, and forwards requests to specific application instances.
2. Layer 4 vs. Layer 7 Load Balancing
The division between Layer 4 and Layer 7 corresponds to the OSI (Open Systems Interconnection) network model.
Layer 4 (L4) Load Balancing
L4 load balancers operate at the Transport Layer (TCP/UDP). They make routing decisions purely based on packet headers (IP addresses and port numbers) without opening or reading the message payload.
- Pros: Extremely fast. Minimal CPU/memory overhead since it does not parse application protocols.
- Cons: Cannot do smart routing (e.g., routing
/api/v1/paymentsto a payments service and/staticto a CDN). Cannot terminate SSL or handle cookies.
Layer 7 (L7) Load Balancing
L7 load balancers operate at the Application Layer (HTTP/HTTPS/gRPC). They terminate the network connection, decrypt SSL/TLS, and read the entire application message (URL, cookies, headers, query parameters).
- Pros: Content-aware routing. Can manage session cookies (sticky sessions), perform rate limiting, compress responses (Gzip/Brotli), and handle path-based routing.
- Cons: High CPU and memory utilization. Slower than L4 due to connection termination and packet inspection.
3. Load Balancing Algorithms
Load balancers use specific strategies to choose which server receives the next request:
1. Round Robin
Requests are distributed sequentially down the list of servers.
- Pros: Extremely simple; no state required.
- Cons: Assumes all backend servers have the same capacity and that all requests consume the same amount of resources.
class RoundRobin {
private servers: string[];
private index = 0;
constructor(servers: string[]) {
this.servers = servers;
}
getNextServer(): string {
const server = this.servers[this.index];
this.index = (this.index + 1) % this.servers.length;
return server;
}
}2. Weighted Round Robin
Each server is assigned a weight representing its processing capacity (e.g., CPU, RAM). Servers with higher weights receive more traffic.
- Pros: Handles heterogeneous server pools (e.g., combining 16-core servers with 4-core servers).
interface WeightedServer {
host: string;
weight: number;
}
class WeightedRoundRobin {
private servers: WeightedServer[];
private currentIndex = 0;
private currentWeight = 0;
private maxWeight = 0;
constructor(servers: WeightedServer[]) {
this.servers = servers;
this.maxWeight = Math.max(...servers.map(s => s.weight));
}
getNextServer(): string {
while (true) {
this.currentIndex = (this.currentIndex + 1) % this.servers.length;
if (this.currentIndex === 0) {
this.currentWeight = this.currentWeight - 1;
if (this.currentWeight <= 0) {
this.currentWeight = this.maxWeight;
}
}
if (this.servers[this.currentIndex].weight >= this.currentWeight) {
return this.servers[this.currentIndex].host;
}
}
}
}3. Least Connections
Directs traffic to the server with the fewest active, concurrent connections.
- Pros: Dynamically adapts. If Server A receives a slow request (e.g., a file upload) and Server B receives fast requests, the balancer will route subsequent requests to Server B.
- Best For: Long-lived connections (e.g., WebSockets, Database pools) and varying request processing times.
4. IP Hash (Session Affinity)
Hashes the client's IP address and uses it to select a server. This guarantees that a specific user always lands on the same backend server.
- Pros: Provides session persistence without storing state in a shared database.
- Cons: If a backend server crashes, all users hashed to that server lose their session state. Causes uneven load distribution if many users share a gateway IP (NAT).
4. Session Persistence: Sticky Sessions vs. Stateless Sessions
If your application server stores session state in local memory (stateful design), the user must hit the same server on every request.
- Sticky Sessions (Cookie-based): The load balancer inserts a cookie (e.g.,
SERVERID=app1) into the first HTTP response. Subsequent requests read this cookie to route the user back toapp1.- Downside: Hard to scale horizontally. Server crashes cause data loss.
- Shared Session State (Stateless): Backend servers do not store session data locally. Instead, they fetch session credentials from a fast, shared distributed cache (e.g., Redis).
- Upside: Highly scalable; any server can process any request.
5. Health Checks & High Availability
To prevent routing traffic to dead servers, the load balancer continuously performs Health Checks.
# Conceptual health check configuration (Nginx-like)
health_check:
path: "/healthz"
interval: 10s # Probe every 10 seconds
timeout: 5s # Fail if no response in 5 seconds
unhealthy_limit: 3 # Mark dead after 3 consecutive failures
healthy_limit: 2 # Return to service after 2 consecutive successesActive vs. Passive Health Checks
- Active Health Checks: The load balancer proactively sends periodic ping/HTTP requests to a specific
/healthzendpoint on all backend servers. - Passive Health Checks: The load balancer monitors real-world user traffic. If a server starts returning
502 Bad Gatewayor timing out on actual user requests, the load balancer dynamically marks it as offline.
6. SSL/TLS Management: Termination vs. Passthrough
Handling cryptographic handshakes for SSL/TLS is CPU-intensive. Load balancers handle this in two ways:
| SSL Strategy | Flow | Pros | Cons |
|---|---|---|---|
| SSL Termination (Offloading) | Client ──[HTTPS]──► LB ──[Plain HTTP]──► Servers | Offloads CPU decryption work from app servers. Enables L7 path routing. | Plaintext HTTP inside the internal network. Requires strict VPC firewall security. |
| SSL Passthrough | Client ──[HTTPS]──► LB ──[HTTPS]──► Servers | End-to-end encryption. The LB cannot read the data, ensuring high privacy. | App servers must consume CPU decrypting. No L7 smart routing possible (L4 only). |
7. Best Practices for System Designers
- Redundant Load Balancers: A load balancer is a Single Point of Failure (SPOF). Always run a pair of load balancers using Keepalived or floating IPs (VRRP) so a backup takes over if the primary fails.
- Choose L7 for Web, L4 for Scale: Use L7 load balancers (like NGINX/Envoy) for routing web traffic, SSL offloading, and security. Use L4 (like AWS NLB or LVS) at the ingress tier to distribute load among your NGINX cluster.
- Implement Graceful Shutdowns: When deploying code, configure the load balancer to stop sending new connections to a server but allow existing connections to finish processing (connection draining).