Tech Twitter: Dynamic Kubernetes Cluster Scaling on Application and Infrastructure Levels

An important part of running infrastructure is ensuring our cloud spending automatically scales with demand, both up and down. Our traffic fluctuates heavily every day, and our cloud footprint should scale dynamically to support this.

This article will shed some light on horizontal and vertical scaling both on infra and Kubernetes application levels.

Scaling Kubernetes on Infrastructure Level

A Kubernetes cluster typically consists of a master (or a couple of them) and multiple nodes where application pods are scheduled. The maths is quite simple here: the more applications you run in your cluster, the more resources (nodes) you need. Say, you have a microservices application consisting of 3 services, each started as an individual pod which requests 1GiB of RAM.

It means that you will need a 4 GiB node (K8s components and OS will require some RAM too). What if you need additional RAM in case of high load, potential memory leaks or if you deploy more services to the cluster? Correct, you either need a larger node or add an additional node to the cluster.

Usually in both cases, you will pay for the exact amount of resources that come with a VM (i.e. you will pay for, say, 3GiB of RAM even if half of it is unused). That’s not the case with Infrastructure though.

Vertical Scaling of Kubernetes Nodes

Let’s get back to maths. If the application roughly needs 3GiB of RAM, and there’s not much going on in the cluster, you need just one node. Sure thing, having some extra free RAM is always a good idea, so a 5GiB node makes a lot of sense.

Again, not with Infrastructure. What you can do is request a 3GiB node and have a 2GiB in stash. When your application (K8s pod) starts consuming more (which is configured on K8s side too) or you simply deploy more pods (as in the chart below), those 2 extra GiB become immediately available, and you start paying for those resources only when they are used.

As a result, you can do some simple math and figure out the best cluster topology: say, 3 nodes, with 4 GiB of reserved RAM and 3GiB of dynamic resources.

Horizontal AutoScaling of Kubernetes Nodes

Having one huge node in a Kubernetes cluster is not a good idea since all deployments will be affected in case of an outage or any other major incident.

Having several nodes in a stand-by mode is not cost efficient. Is it possible that Kubernetes adds a node when it needs it? Yes, a Kubernetes cluster in Infrastructure can be configured with horizontal node auto-scaling.

New nodes will be added to a cluster when RAM, CPU, I/O or Disk usage reaches certain levels. Needless to say, you get billed for additional resources only when they are used. Newly added nodes will be created according to current topology, i.e. existing vertical scaling configurations will be applied.

The system will scale down as soon as resource consumption gets back to expected levels. Your Kubernetes cluster will not starve, yet you will not pay for unused resources.

Scaling Kubernetes on Application Level

Kubernetes has its own horizontal pod auto-scalers (HPA). In simple words, HPA will replicate chosen deployments based on utilization of CPU. If CPU consumption of all pods grows more than, say, 70%, HPA will schedule more pods, and when CPU consumption gets back to normal, deployment is scaled back to the original number of replicas.

Why is this cool and how does it work with automatic horizontal scaling of Kubernetes nodes? Say, you have one node with a couple of running pods. All of a sudden, a particular service in the pod starts getting a lot of requests and performing some CPU costly operations. RAM utilization does not grow, and as a result at this point there is no mechanism to scale the application which will soon become unresponsive.

Kubernetes HPA will scale up pods, and an internal K8s load balancer will redirect requests to healthy pods. Those new pods will require more resources, and this is where Infrastructure horizontal and vertical scaling comes into play. New pods will be either placed on the same node and utilize dynamic RAM, or a new node will be added (in case there’s not enough resources on existing ones).

On top of that, you may set resource caps on Kubernetes pods. For example, if you know for sure that a particular service should not consume more than 1GiB, and there’s a memory leak if it does, you instruct Kubernetes to kill the pod when RAM utilization reaches 1GiB. A new pod will start automatically. This gives you control over resources your Kubernetes deployments utilize.

Cluster Autoscaler ImprovementCustom gRPC Expander
The most significant improvement we made to Cluster Autoscaler was to provide a new method for determining node groups to scale. Internally, Cluster Autoscaler maintains a list of node groups which map to different candidates for scaling, and it filters out node groups that do not satisfy pod scheduling requirements by running a scheduling simulation against the current set of Pending (unschedulable) pods. 
If there are any Pending (unschedulable) pods, Cluster Autoscaler attempts to scale the cluster to accommodate these pods. Any node groups that satisfy all pod requirements are passed to a component called the Expander.

The Expander is responsible for further filtering the node groups based on operational requirements. Cluster Autoscaler has a number of different built-in expander options, each with different logic. 
For example, the default is the random expander, which selects from available options uniformly at random. Another option,and the one that Airbnb has historically used, is the priority expander, which chooses which node group to expand based on a user-specified tiered priority list.
Cluster Autoscaler Improvement

Kubernetes	Microservices
K8s_introduction Introduction To Docker & Docker-Swarm Mastering Kubernetes Design Patterns common_commands Deep Dive into Kubeproxy: Unraveling Its Inner Workings in Kubernetes Helm KubeApiServer QoS A Deep Dive into Kubernetes Sidecar, Init Containers & Container Communication A Comprehensive Guide to Different Types of Services in Kubernetes Troubleshooting Kubernetes Ingress vs Service Mesh What is Prometheush Simplifying Kubernetes Complexity with the Operator Pattern Dynamic kubernetes cluster scaling POWERFUL TOOLS TO MANAGE KUBERNETEST All k8s Post	MicroServices Design Patterns Reverse proxy v/s Forward proxy How To Implement Hystrix Circuit Breaker In Microservices Application? What is Externalized configuration - Build Once, Run Anywhere in Ms? What is Prometheus Monitoring system & time series database What is an API gateway and why is it important?
Python	AI/ML
Python libraries and frameworks Python Basic Concepts ALL Post Python Intermediate Concepts ALL Post	AI: Categories and Subcategories
Spring Framework	Spring Boot
Spring Framework- Introduction What is bean In Spring Framework? Inversion Of Control [IOC] Spring - Beans AutoWiring Spring - Bean Validations Spring - Event Handling Spring - Internationalization (I18N) Spring - Bean Manipulations or Bean Wrappers Spring - Property Editors Spring - Profiling Spring Expression Language – SpEL API & Example	Building A Dockerizing Spring Boot App Part1 - End-to-End data Encryption Using Public and Private Keys in java / Spring Boot Part2 - End-to-End data Encryption - Different methods of encryption using public and private keys Demystifying Role based JWT Authentication in Modern Web Applications using spring boot
Core Java	Java Coding Question
Java_Fundamentals Java_8_To_18_Features Design_Patterns_&_Principles Benefits of setting initial and maximum memory size to the same value StackoverflowError causes-solutions	Java8_Coding_Question String_Coding_Question Array_Coding_Question Stack_Coding_Question Queue_Coding_Question Linked_List_Coding_Question Binary_Tree_Coding_Question Binary_Search_Tree_Coding_Question Sorting_Coding_Question Graph_Coding_Question DynamicProgramming_Easy_coding_Question Dynamic_Programming_Coding_Question Miscellaneous_Programming_Coding_Question
Maven	AWS
Demystifying the Maven Build Lifecycle: Phases, Goals, and Custom Lifecycles Mastering Maven Profiles: Tailoring Your Builds with Precision Mastering Maven Plugins and Dependency Management with Spring Boot	AWS Basics service AWS Service Sketch AWS v/s Azure Service All AWS Post

Tech Twitter

Wednesday, October 19, 2022

Dynamic Kubernetes Cluster Scaling on Application and Infrastructure Levels

Vertical Scaling of Kubernetes Nodes

Horizontal AutoScaling of Kubernetes Nodes

Custom gRPC Expander

You may also like

Get new posts by email: