This write-up is an insight into auto-scaling in cloud computing. What is it? How does it work? Why do I even care about it? Why is it so important to my workload running on the cloud? This write-up answers all your questions in detail.
So, without any further ado. Let’s get on with it.
1. What is Auto Scaling?
Auto-scaling is the cloud platform’s ability to react to the variation in the live traffic load on the application/workload by spinning up or down server instances on the fly.
The ability to auto scale enables the cloud to add or remove additional computing power to the instance cluster based on the demand & thus ensuring smooth handling of the traffic surge/slump.
So, when additional instances are added on the fly, they share the traffic load of the existing running servers, diminishing the risk of them crumbling under the heavy traffic load.
When the traffic subsides & the workload needs comparatively less computing power the cloud auto-scaler obliges by freeing instances by shutting them down.
Auto-scaling is also known as auto-provisioning. The cloud auto-scaler logic runs on a predefined set of rules or policies.
How do we set these rules? I’ll talk about it further down the article. Stay tuned.
2. Benefits of Autoscaling
There are several benefits to auto-scaling, the most important of them is high availability.
As I’ve talked about it earlier, when instances get added on the fly in the server fleet. The risk of the functional instances dying down, crumbling under the load is reduced to a minimum.
Even if a few instances go down, the others are still up to serve the user requests. The application is still up, unimpacted. We can also call this as fault tolerance. The platform with the help of auto-scaling is equipped pretty well to handle failures.
With auto-scaling, instances can also be spun up, across the different availability zones spread out geographically to handle the traffic spikes.
Another upside of autoscaling is cost-effectiveness.
When the traffic subsides, additional instances earlier added are terminated & removed from the fleet. Businesses only have to pay for the computing power utilized.
This is one solid reason why Serverless, FaaS Functions as a Service got so popular. It’s a step towards further cutting down the deployment costs.
3. How Effective Is Auto Scaling in Comparison to the Regular Pre-provisioning of Servers/Instances on the Cloud?
The game after launch like literally exploded, the traffic spiked upto 50x. This is a real-world scenario where the cloud’s ability to auto scale really shined.
If it wasn’t for the auto-scaling ability of the cloud. The limited pre-provisioned instance fleet would have come crumbling down under the massive traffic load.
There are different ways to auto scale, what are they? Let’s find out next.
4. Types of Autoscaling
Auto scaler runs on pre-defined rules & policies. But why? Simply due to the reason that businesses, especially startups have limited resources.
They can’t just keep adding up server instances on the fly. Computing power costs serious money. We do have to set the rules & configurations as per our budget.
Scheduled autoscaling is proactive scheduling where we set up all the configuration upfront like the maximum number of instances which can be summoned in the fleet for the support. CPU utilization and stuff.
The scheduled autoscaling policy holds all the data which commands the autoscaler on how to react when hit with varied traffic patterns.
Predictive autoscaling makes use of machine learning to study the recent & historical traffic patterns, trends for respective workloads.
Based on the study the right number of instances are provisioned to serve the anticipated future traffic.
Dynamic autoscaling is the method where instances are spun up on the fly based on several different metrics such as CPU utilization of the instances, load balancer utilization & monitoring metrics.
In the auto scale policy, the threshold for the CPU utilization of the cluster is set like 75 or 80% or so, beyond which new instances start spinning up to share the load on the workload.
Load Balancer Utilization
Another trigger to spin up instances is the requests handled per second by the load balancer. Depending on the value set instance can be spun up or terminated.
Besides the above two metrics, monitoring metrics like the container stats etc. are also considered when setting up the auto scale policy.
All the above-stated auto scale types, policies, triggers are ideally used in conjunction with each other to achieve the best results.
5. More On the Blog
Guys, this is pretty much it on autoscaling. If you liked the article do share it with your folks.
You can follow scaleyourapp.com on social media. Consider subscribing to the browser notifications to stay updated on the new content on the blog.
I’ll see you in the next writeup.
- Distributed Systems, Scalability & System Design #1 – Heroku Client Rate Throttling
- Zero to Software/Application Architect – Learning Track
- Java Full Stack Developer – The Complete Roadmap – Part 2 – Let’s Talk
- Java Full Stack Developer – The Complete Roadmap – Part 1 – Let’s Talk
- Best Handpicked Resources To Learn Software Architecture, Distributed Systems & System Design