This write-up is an insight into auto-scaling in cloud computing. What is it? How does it work? Why do I even care about it? Why is it so important to my workload running on the cloud? This write-up answers all your questions in detail.

So, without any further ado. Let’s get on with it.


1. What is Auto Scaling?

Auto-scaling is the cloud platform’s ability to react to the variation in the live traffic load on the application/workload by spinning up or down server instances on the fly.

The ability to auto scale enables the cloud to add or remove additional computing power to the instance cluster based on the demand & thus ensuring smooth handling of the traffic surge/slump.

So, when additional instances are added on the fly, they share the traffic load of the existing running servers, diminishing the risk of them crumbling under the heavy traffic load.

When the traffic subsides & the workload needs comparatively less computing power the cloud auto-scaler obliges by freeing instances by shutting them down.

Auto scaling in the cloud

Auto-scaling is also known as auto-provisioning. The cloud auto-scaler logic runs on a predefined set of rules or policies.

How do we set these rules? I’ll talk about it further down the article. Stay tuned.


2. Benefits of Autoscaling

There are several benefits to auto-scaling, the most important of them is high availability.

As I’ve talked about it earlier, when instances get added on the fly in the server fleet. The risk of the functional instances dying down, crumbling under the load is reduced to a minimum.

Even if a few instances go down, the others are still up to serve the user requests. The application is still up, unimpacted. We can also call this as fault tolerance. The platform with the help of auto-scaling is equipped pretty well to handle failures.

With auto-scaling, instances can also be spun up, across the different availability zones spread out geographically to handle the traffic spikes.

Another upside of autoscaling is cost-effectiveness.

When the traffic subsides, additional instances earlier added are terminated & removed from the fleet. Businesses only have to pay for the computing power utilized.

This is one solid reason why Serverless, FaaS Functions as a Service got so popular. It’s a step towards further cutting down the deployment costs.


3. How Effective Is Auto Scaling in Comparison to the Regular Pre-provisioning of Servers/Instances on the Cloud?

Pokemon Go, the augmented reality game, was launched on Google Cloud. The engineering team predicted 1x user traffic with a worst-case scenario of 5x traffic load.

The game after launch like literally exploded, the traffic spiked upto 50x. This is a real-world scenario where the cloud’s ability to auto scale really shined.

If it wasn’t for the auto-scaling ability of the cloud. The limited pre-provisioned instance fleet would have come crumbling down under the massive traffic load.

There are different ways to auto scale, what are they? Let’s find out next.


4. Types of Autoscaling

Scheduled Autoscaling

Auto scaler runs on pre-defined rules & policies. But why? Simply due to the reason that businesses, especially startups have limited resources.

They can’t just keep adding up server instances on the fly. Computing power costs serious money. We do have to set the rules & configurations as per our budget.

Scheduled autoscaling is proactive scheduling where we set up all the configuration upfront like the maximum number of instances which can be summoned in the fleet for the support. CPU utilization and stuff.

The scheduled autoscaling policy holds all the data which commands the autoscaler on how to react when hit with varied traffic patterns.


Predictive Autoscaling

Predictive autoscaling makes use of machine learning to study the recent & historical traffic patterns, trends for respective workloads.

Based on the study the right number of instances are provisioned to serve the anticipated future traffic.


Dynamic Autoscaling

Dynamic autoscaling is the method where instances are spun up on the fly based on several different metrics such as CPU utilization of the instances, load balancer utilization & monitoring metrics.

CPU Utilization

In the auto scale policy, the threshold for the CPU utilization of the cluster is set like 75 or 80% or so, beyond which new instances start spinning up to share the load on the workload.

Load Balancer Utilization

Another trigger to spin up instances is the requests handled per second by the load balancer. Depending on the value set instance can be spun up or terminated.

Monitoring Metrics

Besides the above two metrics, monitoring metrics like the container stats etc. are also considered when setting up the auto scale policy.

All the above-stated auto scale types, policies, triggers are ideally used in conjunction with each other to achieve the best results.


5. More On the Blog

Why Use Cloud? How Is Cloud Computing Different from Traditional Computing?

Is My Data Safe in the Cloud? – A Deep Dive – All Your Questions Answered

Instagram Architecture – How Does It Store & Search Billions of Images

How Hotstar scaled with 10.3 million concurrent users – An architectural insight


Guys, this is pretty much it on autoscaling. If you liked the article do share it with your folks.
You can follow on social media. Consider subscribing to the browser notifications to stay updated on the new content on the blog.

I’ll see you in the next writeup.
Until then…