Automatic scaling is a feature offered by many cloud providers such as AWS and Google Cloud Platform, which handles the creation and deletion of new servers in your network automatically, so you can scale your application to meet different loads.
What is automatic scaling?
Suppose you have two servers behind a load balancer, both of which handle half of your traffic equally. If you need to handle more demand, add another server. However, this requirement is often cyclical and peaks every day with a higher load, so it would be difficult to handle this manually.
Automatic scaling handles it, as the name suggests, automatically. You define a preventative template that is used to start a copy of your servers from scratch. When your network reaches a predetermined load, such as 70% CPU usage, automatic scaling will launch a new instance to even things out. When it calms down, it reduces the number or occurrences.
Of course, it is not easy to install this template, but GCP has tools to make it easier, such as being able to use a container as a machine image.
While Auto Scaling allows you to scale up to meet any demand, it can also save money by reducing when not needed. With traditional server management, you need to plan for the highest demand ̵1; if your server can not handle peak traffic, you need a better server. However, this is a waste of money, usually, because during open hours when your application is not under peak load, you pay more than you need.
Even if you only use one or two servers, setting up automatic scaling can help your network handle peak activity and is a useful feature for any high availability network.
Set up a managed instance group
From the GCP Management Console, select Compute Engine> Instance Groups.
Of course, you need an instance template that is set to define what data is loaded on your server and how a new node in the Auto Scaling group starts. If you already have one, select it here. If not, you can read our guide on how to set them up.
Below you will find the settings for automatic scaling. The default mode is automatically scaled up and down, but you can disable scaling in and only have the network scale up. You can also set the measurement value used for Auto Scale, which is set to CPU usage to 60% by default.
The cooling down period is basically how long it takes for a new server to upload – if your server takes a minute or two to get everything set up, you do not want GCP to look at these metrics while it is being configured, as it may report unexpectedly high CPU usage.
You can also change the minimum and maximum number of instances to ensure performance and limit costs, respectively.
The last feature is Autohealing, which regularly performs health checks on the services run for each instance. If an instance starts to act, it can be easily replaced. If you have a load equalizer, it will direct traffic automatically but does not fix the occurrence itself without automatic healing. We recommend that you enable this feature.
Click “Create” and the minimum number of instances will be created. You can manage them individually from the Compute Engine console or manage the instance template to edit the settings for the entire group.