Horizontal scaling of Kubernetes pods: how does it work?

Autoscaling is one of the main features of the Kubernetes cluster. When configured correctly, it saves administrators time, avoids performance bottlenecks, and helps avoid financial waste. This is a feature where the cluster can increase the number of pods as service response demand increases and decrease the number of pods as needs decrease.

One of the ways Kubernetes enables autoscaling is horizontal pod autoscaling. HPA can help applications scale to meet increased demand or scale when resources are no longer needed. This type of automatic scaling does not apply to objects that cannot be scaled.

In this article, we’ll dive deeper into horizontal pod autoscaling in Kubernetes. We’ll define HPA, explain how it works, and provide a step-by-step tutorial for setting up HPA. But before that, let’s start by understanding what Kubernetes is.

So, without further ado, let’s get started!

What is Kubernetes?

Kubernetes is an open-source container management tool that automates container deployment, scaling, and load balancing. It schedules, runs, and manages isolated containers running on virtual, physical, and cloud machines.

Horizontal scaling of Kubernetes pods (HPA):

Kubernetes horizontal pod autoscaling automatically scales the number of pods in a replication controller, deployment, or replica set based on the CPU utilization of that resource.

Kubernetes has the ability to automatically scale pods based on observed CPU utilization, which corresponds to horizontal scaling of pods. Scaling can only be done for scalable objects such as controllers, deployments, or replica sets. HPA is implemented as a Kubernetes application programming interface (API) resource and as a controller.

With the controller, one can periodically adjust the number of replicas in a deployment or replication controller to match the observed average CPU utilization to the user-specified target.

How does a horizontal PodAutoscaler work?

In simpler terms, HPA operates in a “check, update, check again” style loop. Here’s how each of the steps in this loop works:

1. Horizontal Pod Autscaler continues to monitor the metrics server for resource usage.

2. HPA will calculate the required number of replicas based on the collected resource usage.

3. Next, HPA decides to scale the application to the number of replicas required.

4. After that, HPA will modify the desired number of replicas.

5. Since HPA is monitoring continuously, the process repeats from step 1.

How does a horizontal PodAutoscaler work?

Configuring Horizontal Pod Autoscaling

Let’s create a simple deployment:-

kind: Deployment                      #Defines to create deployment type Object apiVersion: apps/v1
metadata: name: mydeploy         #deployment name
spec:
replicas: 2                                        #define number of pods you want
selector:              #Apply this deployment to any pods which has the specific label
matchLabels:
name:deployment
template:
metadata:
name: testpod8            #pod name
labels:
name: deployment
spec:
containers: -
name: c00                    #container name
Image: httpd
ports:
- containerPort: 80         #Containers port exposed
resources:
limits:
cpu: 500m
requests:
cpu: 200m

Now create the autoscaling

  • kubectl autoscale deployment mydeploy –cpu-percent=20 –min=1 –max=10

Let’s check the HPA entries.

Talk to our experts

Final Thoughts

We hope this blog has been helpful in understanding how Kubernetes pod horizontal autoscaling works and how it can be configured. HPA allows you to scale your applications based on different metrics. By dynamically adapting the correct number of pods, you can use the application efficiently and economically.

If you still need help with how Horizontal Pod Scaling works or want to learn more about it, you can contact a reliable and trustworthy software development company. Experts and developers can guide you through the process and help you better understand the concept.


Maria D. Ervin