Load Balancers: The Salt of System Design
Load balancer is like salt. We assume they are there - by default. So load balancer is one of the most important components in a system. We may not necessarily draw it in our system design diagram, but it's assumed to be there. It's like salt. You see it every day.
But load balancer is one of the most crucial things out there because load balancers make scaling easy. They abstract out the elasticity of the infrastructure. For example, because of this load balancer, you don't care how many API servers are behind this. You simply don't care. You add more nodes, you remove nodes, it doesn't matter. The users don't need to know.
Again, framework of opposites: either user knows the topology or user does not know the topology. In this case, of course users don't need to know how many servers are running behind the scenes. They don't, because they're like "hey, I want to publish this tweet" - post, done.
So load balancer essentially abstracts out the topology behind it. Now load balancers help you achieve two things:
- Fault tolerance - which means even if one of the servers is not available, your request can be routed to any other available server
- No overloaded servers - because the load is balanced
Absorb the patterns to achieve High Availability
Design Requirements
When designing a load balancer, we need:
- Balance the load - distribute requests evenly
- Tunable algorithm - support different distribution strategies
- Scaling beyond one machine - handle growth
Setting Up the Foundation: Terminology
Before we go into brainstorming, let me lay the foundation. We'll be having two types of servers in the system:
- Backend servers - all service, payment service, purposes - what could be behind the load balancer, like what you are load balancing against
- LB servers - actually the load balancer, because load balance is also a process that is running, it's a server that is handling those
So you have your clients, there are LB servers, and multiple LB servers constitute load balancer. So: client → LB server → backend server.
The classic load balancer implementation works like this: Your LB server is the first point of contact. When the request comes in, it gets a request and then what it would do is - given this request, it would figure out which of the three backend servers should it forward the request to. When it gets a request (imagine a simple HTTP request), it sees "hey, I have three backend servers behind me," so it would pick one of them - at random, with any load balancing algorithm (round robin, weighted round robin, hash, whatever). It picks one, forwards a request to that, gets the response, and sends response back to the user.
This is what its job is. Now if I write it down, it would be:
- Get the request
- Find the backend server
- Forward the request
- Get response
- Send response to user
This is essentially what your load balancer is doing. These five steps. And like how we do it, we just pick one step and try to think of implementation and challenges that come with that.
The Configuration Challenge
So with getting the request, of course we don't have to do anything - load balancer got the request because you made a request to that machine. Pretty straightforward, HTTP request came to the load balancer.
Now what is load balancer doing when it got a request? It has to find the backend server. Now for it to find the backend server, it needs to know the backend servers that are behind it. Of course we need a database to hold this information. So let's call it simple config DB.
What Configuration Do We Need?
- Balancing algorithm - round robin, weighted, hash-based, etc.
- Backend server list - which servers to route to
- Health endpoint - how to check if servers are alive
- Poll frequency - how often to check health Example:
{
"load_balancer_1": {
"backend_servers": [
"backend_server_1",
"backend_server_2",
"backend_server_3"
],
"health_endpoint": "/health",
"poll_frequency": 5000,
"algorithm": "round_robin"
}
}
Configuration DB will store config per load balancer. A simple KV would work.
But here's the catch - making a DB call upon getting every request will be catastrophic to the response times. If your LB server is serving millions of requests, that's millions of DB lookups. Not acceptable.
Hence we keep a copy of the configuration in-memory of the LB server.
Now as soon as you created a cached entry here, it introduces another problem: how do we keep these two in sync? Because when someone changes the config in the DB, how would it be reflected in the LB's cache?
Push vs Pull: The Configuration Sync Pattern
This is your AWS customer, this is your LB console which is a UI, it's a front-end package that you see all combined in one. It makes a call here, it updates the database. "I want to put this machine behind the load balancer. I want to add this machine behind this load balancer."
So as soon as config changes, what are you seeing? You have options:
Option 1: Push-Based Updates
The LB console pings the LB server saying "OK, there's an updated config, update your cache," and then the LB fetches from the DB.
Problems with pure push:
- If there are multiple LB servers, the LB console needs to know IP addresses of all of them
- These LB servers can come up and down at their own sweet time - lot of metadata management
- What if some servers are not responding? How long would you wait?
- If you're constantly bringing up new machines, it will create a lot of lag
Option 2: Pull-Based Updates
LB servers poll the DB periodically. Super simple. No problem whatsoever. You don't have to worry about pushing and them acknowledging.
Poll frequency: Say 30 seconds or a minute.
Problems with pure pull:
- The time from when the update happens to when the LBs get updated - there will be up to 30 second lag
- If you don't have jitter, all your LBs would end up calling the DB at the same time
- Not acceptable when you're scaling rapidly and need immediate updates
Option 3: The Hybrid Approach (Best of Both Worlds)
You could do something like a queue - pub/sub. Redis or Kafka? Redis, not Kafka. Kafka is overkill for this use case.
How it works:
- LB console updates the database
- LB console pushes an event to Redis pub/sub
- LB servers subscribed to Redis receive notification immediately
- LB servers refresh their cache from DB
- As a fallback, LB servers also poll every 30 seconds
Why this is brilliant:
- Best case: Instantaneous updates via push
- Worst case: 30-second delay if Redis is down
- High availability: System works even if Redis fails
So now this becomes highly available. Now even if we added this component, our system is still highly available and still functional irrespective of when anything goes down.
The core ideology behind high availability is to have little redundancy. You have to have a Plan B for a Plan B for a Plan B. That sort of stuff. So if this goes down, what's your plan B? If that goes down, what's your plan B? So always thinking of a fallback for a fallback for a fallback is the secret sauce.
Health Checks: Knowing When Servers Are Down
Now what if a backend server goes down? What do you as a load balancer do? You need to first know that it went offline. Heartbeat - so you need to first know that it's non-responsive.
Three Approaches to Health Checks
Approach 1: LB Server Does Health Check
Pros: Simple to implement - just have a scheduled thread doing the polling
Cons: If you have 5 LB servers, each backend gets pinged 5 times as often. Your access logs would show /health /health /health getting called multiple times. This doesn't happen in real world.
Approach 2: Backend Server Pushes Heartbeat
Why this is bad: Poor user experience. You're forcing customers to implement heartbeat logic just to use your load balancer. Imagine AWS telling you "to use our load balancer, modify your app to send heartbeats to these IPs." Absurd!
Approach 3: Separate Component (Orchestrator) ✓
This is what actually happens. A separate component handles all health checks.
The Orchestrator Pattern
Orchestrator Responsibilities
- Health monitoring - Keeps an eye on backend servers
- Configuration updates - If any backend server is unhealthy:
- Updates the DB (removes unhealthy server from list)
- Changes reach LB servers through pub/sub
- LB server monitoring - Monitors health of LB servers & scales them up/down
But wait - what about high availability of the orchestrator itself? If orchestrator goes down, who monitors it? Do we need a monitor for the monitor? And then a monitor for that monitor?
When you see this recursive pattern, that's when you need leader election.
Leader Election: Breaking the Recursion
This is the point where leader election takes place. Leader election is the end of recursion. When you are in a place where you have to think of a load balancer for a load balancer, or monitor for a monitor for monitor - that's your indication to use leader election because it makes things self-healing.
The Manager-Worker Pattern
Real world analogy - think of your workplace:
Orchestrator Master (Manager)
- Monitors workers health
- Assigns mutually exclusive work to workers
- Redistributes work if a worker is down
Orchestrator Workers (Employees)
- Do the grunt work
- Perform health checks
- Update configurations
What happens when the manager (master) is absent? The senior-most person (or first to detect) becomes the temporary manager. That's leader election - bully algorithm, ring algorithm, whatever you want to use.
Your system becomes self-healing. We live and breathe this every day.
This approach solves two problems:
- No standby nodes - Every machine is doing useful work (no wasted money)
- Shared infrastructure - This orchestrator cluster can manage multiple load balancers
Monitoring and Auto-Scaling
How will the orchestrator decide when to scale LB servers? It needs metrics: CPU, Memory, TCP connections.
Enter Prometheus:
The orchestrator queries Prometheus for metrics, decides if scaling is needed, and updates the configuration. We can also show these metrics to customers through the LB console.
Scaling Load Balancers: The DNS Solution
Now the only thing that remains is - LB servers cannot be just one instance. So how do we scale? What is that one thing that's shared and scales well? DNS
When you have multiple LB servers, how would customers know which IP to connect to? You give your load balancer a domain name:
lb.payment.google.com → [10.0.0.1, 10.0.0.2, 10.0.0.3]
That's why when you spin up a load balancer on AWS, GCP, they give you a domain name. They're doing DNS-based load balancing.
DNS Server Properties:
- Very lightweight (just returns IPs for domain names)
- Can handle 32,000 requests/second on a 4GB machine
- Heavily cached at multiple layers
- Can balance resolution based on weights
Making DNS Highly Available
We added DNS - but what about its availability? If you're AWS, you need this to never go down.
The Virtual IP Solution
Two DNS servers sharing the same Virtual IP address:
- Primary Core DNS (active)
- Secondary Core DNS (passive backup)
When secondary detects primary is down, it "takes over" the Virtual IP. Clients don't need to change configuration because the Virtual IP remains the same.
This uses VRRP (Virtual Router Redundancy Protocol):
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1234
}
virtual_ipaddress {
192.168.1.100
}
}
The secondary constantly monitors the primary. If primary fails, secondary starts advertising the virtual IP. This is configured at router/firewall level.
Three Types of Load Balancing
Here's your buffet - three flavors of load balancing:
1. DNS-Based Load Balancing
How it works: lb.payment.google.com → [IP1, IP2, IP3, ...]
Pros:
- Easy, simple, flexible
- Supports various strategies (round-robin, geo-routing, weighted)
- Great for initial distribution
Cons:
- Changes take time due to heavy caching across layers
- No real-time load awareness
- DNS just returns IPs, doesn't know actual server load
2. ECMP-Based Load Balancing
Equal Cost Multi-Path routing - allows router to have multiple next-hop routes for same destination.
Router configuration:
ip route add 10.0.0.100 \
nexthop via 192.168.1.101 dev eth0 \
nexthop via 192.168.1.102 dev eth0
On each LB server:
# Both LB1 and LB2 have same IP
ip addr add 10.0.0.100/32 dev lo
ip link set dev lo up
How it works:
- Multiple servers configured with same IP
- Router distributes traffic based on flow (using IPs, ports, protocol)
- Makes connections sticky to specific servers
3. Anycast with BGP
How it works: Multiple machines advertise the same IP address globally
Key features:
- Routes to geographically nearest instance
- Leverages internet's shortest path routing
- BGP propagates info about which network is reachable via which path
Requirements:
- BGP authentication (only authorized nodes can advertise IPs)
- Proper prefix configuration
Who uses this:
- CDNs for edge servers
- DNS providers (Google's 8.8.8.8)
- Any service needing geo-distribution
Final Architecture Summary
I ain't drawing whole shit again
Remember: High availability is about having a Plan B for your Plan B for your Plan B. Every component should be able to fail without taking down the system. That's the secret sauce.