Wednesday, October 19, 2022

Dynamic Kubernetes Cluster Scaling on Application and Infrastructure Levels

An important part of running infrastructure is ensuring our cloud spending automatically scales with demand, both up and down. Our traffic fluctuates heavily every day, and our cloud footprint should scale dynamically to support this.

This article will shed some light on horizontal and vertical scaling both on infra and Kubernetes application levels. 

Scaling Kubernetes on Infrastructure Level 

  • A Kubernetes cluster typically consists of a master (or a couple of them) and multiple nodes where application pods are scheduled. The maths is quite simple here: the more applications you run in your cluster, the more resources (nodes) you need. Say, you have a microservices application consisting of 3 services, each started as an individual pod which requests 1GiB of RAM. 
  • It means that you will need a 4 GiB node (K8s components and OS will require some RAM too). What if you need additional RAM in case of high load, potential memory leaks or if you deploy more services to the cluster? Correct, you either need a larger node or add an additional node to the cluster. 
  • Usually in both cases, you will pay for the exact amount of resources that come with a VM (i.e. you will pay for, say, 3GiB of RAM even if half of it is unused). That’s not the case with Infrastructure though.

Vertical Scaling of Kubernetes Nodes

  • Let’s get back to maths. If the application roughly needs 3GiB of RAM, and there’s not much going on in the cluster, you need just one node. Sure thing, having some extra free RAM is always a good idea, so a 5GiB node makes a lot of sense.

  • Again, not with Infrastructure. What you can do is request a 3GiB node and have a 2GiB in stash. When your application (K8s pod) starts consuming more (which is configured on K8s side too) or you simply deploy more pods (as in the chart below), those 2 extra GiB become immediately available, and you start paying for those resources only when they are used.
  • As a result, you can do some simple math and figure out the best cluster topology: say, 3 nodes, with 4 GiB of reserved RAM and 3GiB of dynamic resources.

               

Horizontal AutoScaling of Kubernetes Nodes

  • Having one huge node in a Kubernetes cluster is not a good idea since all deployments will be affected in case of an outage or any other major incident. 
  • Having several nodes in a stand-by mode is not cost efficient. Is it possible that Kubernetes adds a node when it needs it? Yes, a Kubernetes cluster in Infrastructure can be configured with horizontal node auto-scaling. 
  • New nodes will be added to a cluster when RAM, CPU, I/O or Disk usage reaches certain levels. Needless to say, you get billed for additional resources only when they are used. Newly added nodes will be created according to current topology, i.e. existing vertical scaling configurations will be applied. 
  • The system will scale down as soon as resource consumption gets back to expected levels. Your Kubernetes cluster will not starve, yet you will not pay for unused resources.

Scaling Kubernetes on Application Level 

  • Kubernetes has its own horizontal pod auto-scalers (HPA). In simple words, HPA will replicate chosen deployments based on utilization of CPU. If CPU consumption of all pods grows more than, say, 70%, HPA will schedule more pods, and when CPU consumption gets back to normal, deployment is scaled back to the original number of replicas.
  • Why is this cool and how does it work with automatic horizontal scaling of Kubernetes nodes? Say, you have one node with a couple of running pods. All of a sudden, a particular service in the pod starts getting a lot of requests and performing some CPU costly operations. RAM utilization does not grow, and as a result at this point there is no mechanism to scale the application which will soon become unresponsive. 
  • Kubernetes HPA will scale up pods, and an internal K8s load balancer will redirect requests to healthy pods. Those new pods will require more resources, and this is where Infrastructure horizontal and vertical scaling comes into play. New pods will be either placed on the same node and utilize dynamic RAM, or a new node will be added (in case there’s not enough resources on existing ones).

  • On top of that, you may set resource caps on Kubernetes pods. For example, if you know for sure that a particular service should not consume more than 1GiB, and there’s a memory leak if it does, you instruct Kubernetes to kill the pod when RAM utilization reaches 1GiB. A new pod will start automatically. This gives you control over resources your Kubernetes deployments utilize.
Cluster Autoscaler Improvement

Custom gRPC Expander

  • The most significant improvement we made to Cluster Autoscaler was to provide a new method for determining node groups to scale. Internally, Cluster Autoscaler maintains a list of node groups which map to different candidates for scaling, and it filters out node groups that do not satisfy pod scheduling requirements by running a scheduling simulation against the current set of Pending (unschedulable) pods. 
  • If there are any Pending (unschedulable) pods, Cluster Autoscaler attempts to scale the cluster to accommodate these pods. Any node groups that satisfy all pod requirements are passed to a component called the Expander.


  • The Expander is responsible for further filtering the node groups based on operational requirements. Cluster Autoscaler has a number of different built-in expander options, each with different logic. 
  • For example, the default is the random expander, which selects from available options uniformly at random. Another option,and the one that Airbnb has historically used, is the priority expander, which chooses which node group to expand based on a user-specified tiered priority list.

Cluster Autoscaler Improvement

You may also like

Kubernetes Microservices
Python AI/ML
Spring Framework Spring Boot
Core Java Java Coding Question
Maven AWS