How Load Balancers Really Work

How Load Balancers Really Work (Without Cloud Buzzwords)

January 5, 2026

When people hear load balancer, they often imagine a magical box that “handles traffic at scale”. In reality, a load balancer is much simpler and much more powerful than that.

This post explains what load balancers actually do, why they exist, and how they behave under the hood, without mentioning any specific cloud provider or tool.

The Core Problem

Imagine you run a website on a single server.

At first:

Everything works
Traffic is low
Life is good

Then traffic grows:

Requests pile up
Responses slow down
The server crashes

You add a second server.

Now the real question appears:

How does a user know which server to talk to?

Users need one stable address.
Servers need flexibility to scale.

This mismatch is why load balancers exist.

What a Load Balancer Is

A load balancer is:

A stable network endpoint that receives incoming traffic and forwards each request to one of many backend servers.

Key point:

Clients never talk to backend servers directly
Backend servers can change freely The load balancer acts as a traffic switch.

Clients want:

ONE address

Systems want:

MANY servers

The load balancer bridges this contradiction.

The Core Clarification

A single load balancer distributes traffic across multiple instances of the same service, not different microservices.

This is fundamental.

A load balancer answers:

“Which replica of this service should handle this request?”

Not:

“Which service should handle this request?”

Each microservice has its own set of replicas.

Client
→ LB (user-service) → user-1
                    → user-2

→ LB (payment-service) → pay-1
                       → pay-2

One LB per service boundary (or one routing layer in front, like ingress).

Load balancers assume:

“All targets are functionally identical.”

The Core Question

When a new server is created dynamically, how does a load balancer discover it and start sending traffic to it?

This sounds magical — but it’s not.

High-Level Truth

A load balancer does not “find” servers on its own.

Instead:

Some external control system explicitly tells the load balancer: “Here is a new backend. Start checking it.”

The load balancer is passive: it does not create instances, discover them, or decide when to scale. It simply routes traffic to a list of backends provided by an external control system. That is why discovery is orchestrated, not automatic.

Behind the Scenes: The Three Actors

There are conceptually three logical components involved:

[ Control Plane ]
       |
       v
[ Load Balancer ] <——> [ Backend Servers ]

Let’s break their roles.

1️. The Backend Server (New Instance)

When a new instance is created it gets assigned,

An IP address
Network connectivity
- It does NOT announce itself.
- It does NOT talk to the load balancer.

Servers are kind of dumb workers.

2. The Control Plane (The Brain)

This is the most important and least understood piece.

The control plane:

Knows desired state
Knows current state
Reconciles differences

Examples (conceptually):

Auto-scaler
Orchestrator
Cluster manager

Its job:

“I want N healthy servers behind this service. Let's make it happen.”

3. The Load Balancer (The Traffic Switch)

The load balancer:

Maintains a routing table of registered, healthy, and reachable backends (servers) and sends traffic to them

Step-by-Step: What REALLY Happens

Let’s walk through a real sequence.

Step 1: Scaling Decision Is Made

The control plane decides:

Current servers: 2
Desired servers: 3
→ Create 1 more server

Step 2: New Server Is Created

VM/container is started
IP is assigned
App process starts

At this moment, Load balancer knows NOTHING about the new instance.

Step 3: Control Plane Registers the Server

The control plane now performs an explicit action of adding the newly created server to the internal list (often called a service registry or target list) of available healthy servers:

Register backend:
    IP: 10.0.2.15
    Port: 80
    Service: backend-service

This could be:

An API call
A config update
A dynamic config push

This service registry or target list is maintained by the control plane, something like a registry from where the LB can fetch or know about the health of the server.

Step 4: Load Balancer Starts Health Checks

A load balancer must know which backend servers are alive and able to handle requests. To do this, it continuously verifies the health of every registered backend.

After a server is registered, the load balancer begins periodic probing. This process is called a health check.

Health checks are configured for every service, so that the existing or newly created instances for a specific service follow a common health check rule.

Most commonly, a health check is an HTTP or HTTPS request (for example, GET /health), sent at regular intervals.

GET /health

Server responds a successful response (e.g., 200 OK).

Only servers that consistently pass health checks are considered eligible for traffic and added to routing pool of the Load balancer.

Step 5: Traffic Begins Flowing

Now and only now traffic is routed to this new server.

From client’s perspective:

Nothing changed
Same endpoint
Same behavior

Summary

A load balancer provides a single stable endpoint for many backend servers.
It distributes traffic across replicas of the same service, not different services.
Load balancers are not responsible for registration or deregistration of servers.
Health checks allows to determine healthy servers to which traffic can be routed.

What’s Next

This post focused on how load balancers work at a fundamental level. In upcoming posts, I plan to cover:

What Is a Control Plane and Why Load Balancers Don’t Discover Servers
How Auto Scaling Fits into the Bigger Picture
How Ingress Controllers Route Traffic to Different Services and How They Differ from Load Balancers

Each post builds on the previous one, starting from fundamentals and gradually moving toward more complex system design concepts.

More posts in this series coming soon.