Tautik Agrahari

Load Balancers: The Salt of System Design

Load balancer is like salt. We assume they are there - by default. So load balancer is one of the most important components in a system. We may not necessarily draw it in our system design diagram, but it's assumed to be there. It's like salt. You see it every day.

But load balancer is one of the most crucial things out there because load balancers make scaling easy. They abstract out the elasticity of the infrastructure. For example, because of this load balancer, you don't care how many API servers are behind this. You simply don't care. You add more nodes, you remove nodes, it doesn't matter. The users don't need to know.

https://bear-images.sfo2.cdn.digitaloceanspaces.com/tautik/53pm-2.webp

Again, framework of opposites: either user knows the topology or user does not know the topology. In this case, of course users don't need to know how many servers are running behind the scenes. They don't, because they're like "hey, I want to publish this tweet" - post, done.

So load balancer essentially abstracts out the topology behind it. Now load balancers help you achieve two things:

  1. Fault tolerance - which means even if one of the servers is not available, your request can be routed to any other available server
  2. No overloaded servers - because the load is balanced

Absorb the patterns to achieve High Availability

Design Requirements

When designing a load balancer, we need:

Setting Up the Foundation: Terminology

Before we go into brainstorming, let me lay the foundation. We'll be having two types of servers in the system:

So you have your clients, there are LB servers, and multiple LB servers constitute load balancer. So: client → LB server → backend server.

The classic load balancer implementation works like this: Your LB server is the first point of contact. When the request comes in, it gets a request and then what it would do is - given this request, it would figure out which of the three backend servers should it forward the request to. When it gets a request (imagine a simple HTTP request), it sees "hey, I have three backend servers behind me," so it would pick one of them - at random, with any load balancing algorithm (round robin, weighted round robin, hash, whatever). It picks one, forwards a request to that, gets the response, and sends response back to the user.

This is what its job is. Now if I write it down, it would be:

  1. Get the request
  2. Find the backend server
  3. Forward the request
  4. Get response
  5. Send response to user

This is essentially what your load balancer is doing. These five steps. And like how we do it, we just pick one step and try to think of implementation and challenges that come with that.

The Configuration Challenge

So with getting the request, of course we don't have to do anything - load balancer got the request because you made a request to that machine. Pretty straightforward, HTTP request came to the load balancer.

Now what is load balancer doing when it got a request? It has to find the backend server. Now for it to find the backend server, it needs to know the backend servers that are behind it. Of course we need a database to hold this information. So let's call it simple config DB.

What Configuration Do We Need?

{
  "load_balancer_1": {
    "backend_servers": [
      "backend_server_1",
      "backend_server_2", 
      "backend_server_3"
    ],
    "health_endpoint": "/health",
    "poll_frequency": 5000,
    "algorithm": "round_robin"
  }
}

Configuration DB will store config per load balancer. A simple KV would work.

But here's the catch - making a DB call upon getting every request will be catastrophic to the response times. If your LB server is serving millions of requests, that's millions of DB lookups. Not acceptable.

Hence we keep a copy of the configuration in-memory of the LB server.

https://bear-images.sfo2.cdn.digitaloceanspaces.com/tautik/08pm.webp

Now as soon as you created a cached entry here, it introduces another problem: how do we keep these two in sync? Because when someone changes the config in the DB, how would it be reflected in the LB's cache?

Push vs Pull: The Configuration Sync Pattern

This is your AWS customer, this is your LB console which is a UI, it's a front-end package that you see all combined in one. It makes a call here, it updates the database. "I want to put this machine behind the load balancer. I want to add this machine behind this load balancer."

So as soon as config changes, what are you seeing? You have options:

Option 1: Push-Based Updates

The LB console pings the LB server saying "OK, there's an updated config, update your cache," and then the LB fetches from the DB.

Problems with pure push:

Option 2: Pull-Based Updates

LB servers poll the DB periodically. Super simple. No problem whatsoever. You don't have to worry about pushing and them acknowledging.

Poll frequency: Say 30 seconds or a minute.

Problems with pure pull:

Option 3: The Hybrid Approach (Best of Both Worlds)

You could do something like a queue - pub/sub. Redis or Kafka? Redis, not Kafka. Kafka is overkill for this use case.

https://bear-images.sfo2.cdn.digitaloceanspaces.com/tautik/55pm.webp

How it works:

  1. LB console updates the database
  2. LB console pushes an event to Redis pub/sub
  3. LB servers subscribed to Redis receive notification immediately
  4. LB servers refresh their cache from DB
  5. As a fallback, LB servers also poll every 30 seconds

Why this is brilliant:

So now this becomes highly available. Now even if we added this component, our system is still highly available and still functional irrespective of when anything goes down.

The core ideology behind high availability is to have little redundancy. You have to have a Plan B for a Plan B for a Plan B. That sort of stuff. So if this goes down, what's your plan B? If that goes down, what's your plan B? So always thinking of a fallback for a fallback for a fallback is the secret sauce.

Health Checks: Knowing When Servers Are Down

Now what if a backend server goes down? What do you as a load balancer do? You need to first know that it went offline. Heartbeat - so you need to first know that it's non-responsive.

Three Approaches to Health Checks

Approach 1: LB Server Does Health Check

Pros: Simple to implement - just have a scheduled thread doing the polling
Cons: If you have 5 LB servers, each backend gets pinged 5 times as often. Your access logs would show /health /health /health getting called multiple times. This doesn't happen in real world.

Approach 2: Backend Server Pushes Heartbeat

Why this is bad: Poor user experience. You're forcing customers to implement heartbeat logic just to use your load balancer. Imagine AWS telling you "to use our load balancer, modify your app to send heartbeats to these IPs." Absurd!

https://bear-images.sfo2.cdn.digitaloceanspaces.com/tautik/12pm.webp

Approach 3: Separate Component (Orchestrator) ✓

This is what actually happens. A separate component handles all health checks.

https://bear-images.sfo2.cdn.digitaloceanspaces.com/tautik/19pm.webp

The Orchestrator Pattern

Orchestrator Responsibilities

  1. Health monitoring - Keeps an eye on backend servers
  2. Configuration updates - If any backend server is unhealthy:
    • Updates the DB (removes unhealthy server from list)
    • Changes reach LB servers through pub/sub
  3. LB server monitoring - Monitors health of LB servers & scales them up/down

But wait - what about high availability of the orchestrator itself? If orchestrator goes down, who monitors it? Do we need a monitor for the monitor? And then a monitor for that monitor?

When you see this recursive pattern, that's when you need leader election.

Leader Election: Breaking the Recursion

This is the point where leader election takes place. Leader election is the end of recursion. When you are in a place where you have to think of a load balancer for a load balancer, or monitor for a monitor for monitor - that's your indication to use leader election because it makes things self-healing.

The Manager-Worker Pattern

Real world analogy - think of your workplace:

Orchestrator Master (Manager)

Orchestrator Workers (Employees)

https://bear-images.sfo2.cdn.digitaloceanspaces.com/tautik/23pm.webp

What happens when the manager (master) is absent? The senior-most person (or first to detect) becomes the temporary manager. That's leader election - bully algorithm, ring algorithm, whatever you want to use.

Your system becomes self-healing. We live and breathe this every day.

This approach solves two problems:

  1. No standby nodes - Every machine is doing useful work (no wasted money)
  2. Shared infrastructure - This orchestrator cluster can manage multiple load balancers

Monitoring and Auto-Scaling

How will the orchestrator decide when to scale LB servers? It needs metrics: CPU, Memory, TCP connections.

Enter Prometheus:

https://bear-images.sfo2.cdn.digitaloceanspaces.com/tautik/46pm.webp

The orchestrator queries Prometheus for metrics, decides if scaling is needed, and updates the configuration. We can also show these metrics to customers through the LB console.

Scaling Load Balancers: The DNS Solution

Now the only thing that remains is - LB servers cannot be just one instance. So how do we scale? What is that one thing that's shared and scales well? DNS

When you have multiple LB servers, how would customers know which IP to connect to? You give your load balancer a domain name:

lb.payment.google.com → [10.0.0.1, 10.0.0.2, 10.0.0.3]

https://bear-images.sfo2.cdn.digitaloceanspaces.com/tautik/52pm.webp

That's why when you spin up a load balancer on AWS, GCP, they give you a domain name. They're doing DNS-based load balancing.

DNS Server Properties:

https://bear-images.sfo2.cdn.digitaloceanspaces.com/tautik/15pm-1.webp

Making DNS Highly Available

We added DNS - but what about its availability? If you're AWS, you need this to never go down.

The Virtual IP Solution

Two DNS servers sharing the same Virtual IP address:

When secondary detects primary is down, it "takes over" the Virtual IP. Clients don't need to change configuration because the Virtual IP remains the same.

https://bear-images.sfo2.cdn.digitaloceanspaces.com/tautik/32pm-2.webp

This uses VRRP (Virtual Router Redundancy Protocol):

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1234
    }
    virtual_ipaddress {
        192.168.1.100
    }
}

The secondary constantly monitors the primary. If primary fails, secondary starts advertising the virtual IP. This is configured at router/firewall level.

Three Types of Load Balancing

Here's your buffet - three flavors of load balancing:

1. DNS-Based Load Balancing

How it works: lb.payment.google.com → [IP1, IP2, IP3, ...]

Pros:

Cons:

2. ECMP-Based Load Balancing

Equal Cost Multi-Path routing - allows router to have multiple next-hop routes for same destination.

Router configuration:

ip route add 10.0.0.100 \
  nexthop via 192.168.1.101 dev eth0 \
  nexthop via 192.168.1.102 dev eth0

On each LB server:

# Both LB1 and LB2 have same IP
ip addr add 10.0.0.100/32 dev lo
ip link set dev lo up

How it works:

3. Anycast with BGP

How it works: Multiple machines advertise the same IP address globally

Key features:

Requirements:

Who uses this:

Final Architecture Summary

I ain't drawing whole shit again

https://bear-images.sfo2.cdn.digitaloceanspaces.com/tautik/30pm.webp

Remember: High availability is about having a Plan B for your Plan B for your Plan B. Every component should be able to fail without taking down the system. That's the secret sauce.