Load Balancers

Distributing network and application traffic across servers. Learn about Layer 4 vs Layer 7 load balancing, routing algorithms, session persistence, health checks, and scaling strategies.

Load Balancing

A single web server can only handle a limited number of concurrent connections. To scale beyond a single machine, we use Load Balancers (LBs). A load balancer acts as a traffic cop, routing incoming client requests across a pool of backend servers. This prevents any single server from becoming a bottleneck, eliminates single points of failure, and enables horizontal scaling.


1. Multi-Tier Load Balancing Architecture

In large-scale systems, load balancing is not handled by a single device. Instead, it is structured in multiple layers to handle massive throughput:

  1. DNS Tier (Anycast/GeoDNS): Routes the client to the topologically closest data center.
  2. L4 Tier (IP/Port): A high-throughput, hardware or low-level software load balancer (e.g., Linux Virtual Server (LVS), AWS Network Load Balancer (NLB)) receives millions of raw packets and forwards them to a cluster of L7 load balancers without reading application headers.
  3. L7 Tier (HTTP/HTTPS): A content-aware reverse proxy (e.g., NGINX, HAProxy, Envoy) performs SSL termination, inspects HTTP headers/cookies, and forwards requests to specific application instances.

2. Layer 4 vs. Layer 7 Load Balancing

The division between Layer 4 and Layer 7 corresponds to the OSI (Open Systems Interconnection) network model.

Layer 4 (L4) Load Balancing

L4 load balancers operate at the Transport Layer (TCP/UDP). They make routing decisions purely based on packet headers (IP addresses and port numbers) without opening or reading the message payload.

  • Pros: Extremely fast. Minimal CPU/memory overhead since it does not parse application protocols.
  • Cons: Cannot do smart routing (e.g., routing /api/v1/payments to a payments service and /static to a CDN). Cannot terminate SSL or handle cookies.

Layer 7 (L7) Load Balancing

L7 load balancers operate at the Application Layer (HTTP/HTTPS/gRPC). They terminate the network connection, decrypt SSL/TLS, and read the entire application message (URL, cookies, headers, query parameters).

  • Pros: Content-aware routing. Can manage session cookies (sticky sessions), perform rate limiting, compress responses (Gzip/Brotli), and handle path-based routing.
  • Cons: High CPU and memory utilization. Slower than L4 due to connection termination and packet inspection.

3. Load Balancing Algorithms

Load balancers use specific strategies to choose which server receives the next request:

1. Round Robin

Requests are distributed sequentially down the list of servers.

  • Pros: Extremely simple; no state required.
  • Cons: Assumes all backend servers have the same capacity and that all requests consume the same amount of resources.
Code
class RoundRobin {
    private servers: string[];
    private index = 0;
 
    constructor(servers: string[]) {
        this.servers = servers;
    }
 
    getNextServer(): string {
        const server = this.servers[this.index];
        this.index = (this.index + 1) % this.servers.length;
        return server;
    }
}

2. Weighted Round Robin

Each server is assigned a weight representing its processing capacity (e.g., CPU, RAM). Servers with higher weights receive more traffic.

  • Pros: Handles heterogeneous server pools (e.g., combining 16-core servers with 4-core servers).
Code
interface WeightedServer {
    host: string;
    weight: number;
}
 
class WeightedRoundRobin {
    private servers: WeightedServer[];
    private currentIndex = 0;
    private currentWeight = 0;
    private maxWeight = 0;
 
    constructor(servers: WeightedServer[]) {
        this.servers = servers;
        this.maxWeight = Math.max(...servers.map(s => s.weight));
    }
 
    getNextServer(): string {
        while (true) {
            this.currentIndex = (this.currentIndex + 1) % this.servers.length;
            if (this.currentIndex === 0) {
                this.currentWeight = this.currentWeight - 1;
                if (this.currentWeight <= 0) {
                    this.currentWeight = this.maxWeight;
                }
            }
            if (this.servers[this.currentIndex].weight >= this.currentWeight) {
                return this.servers[this.currentIndex].host;
            }
        }
    }
}

3. Least Connections

Directs traffic to the server with the fewest active, concurrent connections.

  • Pros: Dynamically adapts. If Server A receives a slow request (e.g., a file upload) and Server B receives fast requests, the balancer will route subsequent requests to Server B.
  • Best For: Long-lived connections (e.g., WebSockets, Database pools) and varying request processing times.

4. IP Hash (Session Affinity)

Hashes the client's IP address and uses it to select a server. This guarantees that a specific user always lands on the same backend server.

  • Pros: Provides session persistence without storing state in a shared database.
  • Cons: If a backend server crashes, all users hashed to that server lose their session state. Causes uneven load distribution if many users share a gateway IP (NAT).

4. Session Persistence: Sticky Sessions vs. Stateless Sessions

If your application server stores session state in local memory (stateful design), the user must hit the same server on every request.

  1. Sticky Sessions (Cookie-based): The load balancer inserts a cookie (e.g., SERVERID=app1) into the first HTTP response. Subsequent requests read this cookie to route the user back to app1.
    • Downside: Hard to scale horizontally. Server crashes cause data loss.
  2. Shared Session State (Stateless): Backend servers do not store session data locally. Instead, they fetch session credentials from a fast, shared distributed cache (e.g., Redis).
    • Upside: Highly scalable; any server can process any request.

5. Health Checks & High Availability

To prevent routing traffic to dead servers, the load balancer continuously performs Health Checks.

Code
# Conceptual health check configuration (Nginx-like)
health_check:
  path: "/healthz"
  interval: 10s       # Probe every 10 seconds
  timeout: 5s         # Fail if no response in 5 seconds
  unhealthy_limit: 3  # Mark dead after 3 consecutive failures
  healthy_limit: 2    # Return to service after 2 consecutive successes

Active vs. Passive Health Checks

  • Active Health Checks: The load balancer proactively sends periodic ping/HTTP requests to a specific /healthz endpoint on all backend servers.
  • Passive Health Checks: The load balancer monitors real-world user traffic. If a server starts returning 502 Bad Gateway or timing out on actual user requests, the load balancer dynamically marks it as offline.

6. SSL/TLS Management: Termination vs. Passthrough

Handling cryptographic handshakes for SSL/TLS is CPU-intensive. Load balancers handle this in two ways:

SSL StrategyFlowProsCons
SSL Termination (Offloading)Client ──[HTTPS]──► LB ──[Plain HTTP]──► ServersOffloads CPU decryption work from app servers. Enables L7 path routing.Plaintext HTTP inside the internal network. Requires strict VPC firewall security.
SSL PassthroughClient ──[HTTPS]──► LB ──[HTTPS]──► ServersEnd-to-end encryption. The LB cannot read the data, ensuring high privacy.App servers must consume CPU decrypting. No L7 smart routing possible (L4 only).

7. Best Practices for System Designers

  1. Redundant Load Balancers: A load balancer is a Single Point of Failure (SPOF). Always run a pair of load balancers using Keepalived or floating IPs (VRRP) so a backup takes over if the primary fails.
  2. Choose L7 for Web, L4 for Scale: Use L7 load balancers (like NGINX/Envoy) for routing web traffic, SSL offloading, and security. Use L4 (like AWS NLB or LVS) at the ingress tier to distribute load among your NGINX cluster.
  3. Implement Graceful Shutdowns: When deploying code, configure the load balancer to stop sending new connections to a server but allow existing connections to finish processing (connection draining).