How Load Balancers Really Work (Without Cloud Buzzwords)
January 5, 2026When people hear load balancer, they often imagine a magical box that “handles traffic at scale”. In reality, a load balancer is much simpler and much more powerful than that.
This post explains what load balancers actually do, why they exist, and how they behave under the hood, without mentioning any specific cloud provider or tool.
The Core Problem
Imagine you run a website on a single server.
At first:
- Everything works
- Traffic is low
- Life is good
Then traffic grows:
- Requests pile up
- Responses slow down
- The server crashes
You add a second server.
Now the real question appears:
How does a user know which server to talk to?
Users need one stable address.
Servers need flexibility to
scale.
This mismatch is why load balancers exist.
What a Load Balancer Is
A load balancer is:
A stable network endpoint that receives incoming traffic and forwards each request to one of many backend servers.
Key point:
- Clients never talk to backend servers directly
- Backend servers can change freely The load balancer acts as a traffic switch.
Clients want:
ONE address
Systems want:
MANY servers
The load balancer bridges this contradiction.
The Core Clarification
A single load balancer distributes traffic across multiple instances of the same service, not different microservices.
This is fundamental.
A load balancer answers:
“Which replica of this service should handle this request?”
Not:
“Which service should handle this request?”
Each microservice has its own set of replicas.
Client
→ LB (user-service) → user-1
→ user-2
→ LB (payment-service) → pay-1
→ pay-2
One LB per service boundary (or one routing layer in front, like ingress).
Load balancers assume:
“All targets are functionally identical.”
The Core Question
When a new server is created dynamically, how does a load balancer discover it and start sending traffic to it?
This sounds magical — but it’s not.
High-Level Truth
A load balancer does not “find” servers on its own.
Instead:
Some external control system explicitly tells the load balancer: “Here is a new backend. Start checking it.”
The load balancer is passive: it does not create instances, discover them, or decide when to scale. It simply routes traffic to a list of backends provided by an external control system. That is why discovery is orchestrated, not automatic.
Behind the Scenes: The Three Actors
There are conceptually three logical components involved:
[ Control Plane ]
|
v
[ Load Balancer ] <——> [ Backend Servers ]
Let’s break their roles.
1️. The Backend Server (New Instance)
When a new instance is created it gets assigned,
- An IP address
- Network connectivity
- It does NOT announce itself.
- It does NOT talk to the load balancer.
Servers are kind of dumb workers.
2. The Control Plane (The Brain)
This is the most important and least understood piece.
The control plane:
- Knows desired state
- Knows current state
- Reconciles differences
Examples (conceptually):
- Auto-scaler
- Orchestrator
- Cluster manager
Its job:
“I want N healthy servers behind this service. Let's make it happen.”
3. The Load Balancer (The Traffic Switch)
The load balancer:
- Maintains a routing table of registered, healthy, and reachable backends (servers) and sends traffic to them
Step-by-Step: What REALLY Happens
Let’s walk through a real sequence.
Step 1: Scaling Decision Is Made
The control plane decides:
Current servers: 2
Desired servers: 3
→ Create 1 more server
Step 2: New Server Is Created
- VM/container is started
- IP is assigned
- App process starts
At this moment, Load balancer knows NOTHING about the new instance.
Step 3: Control Plane Registers the Server
The control plane now performs an explicit action of adding the newly created server to the internal list (often called a service registry or target list) of available healthy servers:
Register backend:
IP: 10.0.2.15
Port: 80
Service: backend-service
This could be:
- An API call
- A config update
- A dynamic config push
This service registry or target list is maintained by the control plane, something like a registry from where the LB can fetch or know about the health of the server.
Step 4: Load Balancer Starts Health Checks
A load balancer must know which backend servers are alive and able to handle requests. To do this, it continuously verifies the health of every registered backend.
After a server is registered, the load balancer begins periodic probing. This process is called a health check.
Health checks are configured for every service, so that the existing or newly created instances for a specific service follow a common health check rule.
Most commonly, a health check is an HTTP or HTTPS request (for example, GET /health), sent at regular intervals.
GET /health
- Server responds a successful response (e.g., 200 OK).
Only servers that consistently pass health checks are considered eligible for traffic and added to routing pool of the Load balancer.
Step 5: Traffic Begins Flowing
Now and only now traffic is routed to this new server.
From client’s perspective:
- Nothing changed
- Same endpoint
- Same behavior
Summary
- A load balancer provides a single stable endpoint for many backend servers.
- It distributes traffic across replicas of the same service, not different services.
- Load balancers are not responsible for registration or deregistration of servers.
- Health checks allows to determine healthy servers to which traffic can be routed.
What’s Next
This post focused on how load balancers work at a fundamental level. In upcoming posts, I plan to cover:
- What Is a Control Plane and Why Load Balancers Don’t Discover Servers
- How Auto Scaling Fits into the Bigger Picture
- How Ingress Controllers Route Traffic to Different Services and How They Differ from Load Balancers
Each post builds on the previous one, starting from fundamentals and gradually moving toward more complex system design concepts.
More posts in this series coming soon.