Skip to main content

Kubernetes Load Balancer Strategies for Maximum Availability and Scalability

The Kubernetes logo illustrated with containers showing network lines to symbolize network load balancer strategies.

Load balancing is a key component of Kubernetes container management. A load balancer distributes network traffic among multiple Kubernetes services, allowing you to use your containers more efficiently and maximize the availability of your services. Let’s take a closer look at how load balancing works, before comparing the most common Kubernetes load balancer strategies for maximizing availability and scalability.

How Does a Kubernetes Load Balancer Work?

First, we need to acknowledge that, in Kubernetes, “load balancer” can mean a number of different things. For the purposes of this blog, we’re focusing on two functions: exposing Kubernetes services to the outside world, and balancing network traffic loads to those services.

In Kubernetes, your containers that are related by function will be organized into pods. All your related pods are then organized into a service. Pods are not designed to be persistent—Kubernetes will automatically create and destroy pods as needed. Every new pod is assigned a new IP address, and since pods are not persistent, their IP addresses aren’t either.

However, services (groups of pods) are assigned a stable ClusterIP, which is accessible only within that Kubernetes cluster. Other Kubernetes containers can then access pods within a service through that ClusterIP. However, the ClusterIP is not accessible from outside the cluster. That’s why you need a load balancer to handle all requests from outside the cluster and pass that traffic along to the services. The first two load balancers we’ll be discussing, NodePort and LoadBalancer, are concerned with this function.

The other kind of load balancer we’ll talk about involves true network traffic load balancing. This type of Kubernetes load balancer distributes network traffic to services according to predetermined routing rules or algorithms. The third Kubernetes load balancer in this blog post, Ingress, provides this functionality in addition to exposing pods to external traffic. There are several different load distribution strategies you can use with Ingress (or your external network load balancer of choice) depending on your unique environment and business goals. 

Cluster Access Strategies for Maximum Availability and Scalability

The first thing you’ll need to determine is how you’re going to expose your Kubernetes services to the outside world. We’ll discuss the three most popular options—NodePort, LoadBalancer, and Ingress.


When you enable NodePort for a Kubernetes service, you open a port on every node in the cluster that has a pod for that service. When one of those ports receives a request, it directs that traffic to a specific port on the service’s ClusterIP. NodePort is the easiest way to expose a service to external traffic, assuming your cluster only has one or two nodes and doesn’t need any advanced routing rules.

However, NodePort doesn’t provide any in-built functionality to track which ports you’ve exposed on which pods, so you’ll need to keep track of this yourself. You can also only expose one service per port, and there’s a limit to which ports are available to NodePort (the 30,000 to 32,767) range. For these reasons, NodePort is only recommended in testing or development environments, not in production.


Many cloud-based Kubernetes deployments prefer LoadBalancer because it supports multiple protocols and multiple ports per service. LoadBalancer works with external network load balancers to distribute traffic according to your preferred load balancing strategy. LoadBalancer works best with large public cloud providers because it can be configured to automatically provision and de-provision external IP addresses and load balancers for your services.

The downside of LoadBalancer is primarily the cost. By default, it assigns an individual external IP address to every service, and then each IP needs its own external load balancer configured in the cloud. This can feel like overkill, especially when you’re running multiple services on every cluster, which is basically the standard in Kubernetes. The costs of a large pool of IP addresses and load balancers will quickly add up as your Kubernetes environment grows, which can limit your scalability.


Ingress is an API that uses HTTPS/HTTP routing rules to manage external access to your Kubernetes services. It allows you to consolidate your routing rules into a single resource that runs as part of a Kubernetes cluster, rather than needing an external load balancer. The Ingress API object provides the routing rules, and the Ingress Controller is the actual load balancer that processes the instructions set by the API. There are a variety of Ingress controllers available, with the most popular including NGINX, Contour, and HAProxy.

Ingress is becoming the most popular load balancing method because it’s easily scalable and it simplifies and consolidates your Kubernetes service routing rules. Ingress can also load balance traffic on both layer 4 (TCP/IP) and layer 7 (application requests), unlike the other two methods which only work on layer 4.

Load Balancing Strategies for a Kubernetes Service

To fully maximize the efficiency and availability of your Kubernetes services, you’ll need to decide how to balance the traffic to your pods. Some popular Kubernetes load balancer strategies include:

Round Robin

The round robin algorithm sends traffic to a sequence of eligible pods in a predetermined order. For example, if you had five pods in a round robin configuration, the load balancer would send the first request to pod 1, the second request to pod 2, and so on down the line in a repeating cycle. The round robin algorithm is static, which means it will not account for variables such as the current load on a particular server. That’s why round robin is typically preferred for testing environments and not for production traffic.

Consistent Hash

The consistent hash load balancing strategy uses a hashing algorithm to send all requests from a given client or session to the same pod. This is useful for Kubernetes services that need to maintain per-client state. However, since client workloads may not be equal, evenly distributing the load between different servers can be challenging with a consistent hash algorithm. Also, at large scale, the computational cost of hashing algorithms can cause some latency.

Resource Based/Least Load

The resource based, or least load, algorithm will send new HTTP requests to the Kubernetes pod with the lightest load. However, this algorithm is HTTP-specific, so it will default non-HTTP traffic to the “least connections” strategy.

Least Connections

Least connection is a dynamic load balancing algorithm that distributes client requests to the pod with the least number of active connections and the lowest connection load. The least connections algorithm is adaptive to slower or unhealthy servers, but when all pods are equally healthy, the load will be equally distributed.

Choosing a Kubernetes Load Balancer Strategy

It’s important to note that there are varieties of some of these Kubernetes load balancing algorithms that strengthen their utility, such as weighted round robin, which allow administrators to lower the priority level of weaker pods, so they receive fewer requests. Depending on which method you use to handle external requests, you may be limited in which load distribution algorithm you’re able to employ. 

That’s why it’s important to choose a Kubernetes load balancer strategy that can safely handle external connections according to your unique business requirements while allowing you to take advantage of the load distribution algorithm that makes the most sense for your applications.