What is Auto-scaling?
Auto-scaling is the ability to automatically add or remove server resources from your cloud computing environment.
We all know the challenges and daunting tasks of adding additional physical servers in an on-premise environment; it has an entire process of space planning, power supply, procurement, installation, configuration etc.
With Cloud, it has become possible to add resources on-demand with a click of a button. Auto-scaling has made it further convenient to add or remove resources as per the need of the application usage; it adds cost efficiencies by paying only for resources consumed by applications.
Is there a need for auto-scaling?
There are various instances in our organizations that demand to increase/decrease server resources as per business needs, such as:
- When the live users on your application are increasing during events/campaigns or a certain period of the year, e.g. Black Friday, Christmas etc., initiating the process to improve the server resources on in-premise hardware or cloud resources is normal. Once the demand decreases, say, post-festival season, it becomes challenging to remove additional servers and causes cost overrun.
- While designing a test lab for performance testing in product organizations, additional servers are needed to perform various tests such as soak tests, overload tests, spike tests etc., and these servers remain underutilized for the period when there is no load testing.
- For multiple scrum teams working in parallel and their need for performance, testing is hard to plan, making it challenging to optimize the timeline and delivery cycles for performance testing.
- Companies are now deploying new features daily to remain engaged with their customers, triggering the need to increase demand and decrease server resources.
- The cyclic nature of business causes increased traffic in certain months of the year. e.g. increased traffic on university websites during admission months and people planning holidays during specific months.
When your application needs additional resources (server, memory, etc.) to accommodate the higher demand generated by increased user traffic, auto-scaling helps augment these resources and decreases them when traffic is less. One can achieve auto-scaling by configuring a specific resource consumption value; if demand increases beyond this pre-configured limit, the cloud service provider will automatically add pre-defined resources (CPU, memory, etc.) in steps. On the other hand, if the load on the system is going below a specific value, the cloud service provider will automatically release the resource in pre-defined values.
- Achieves cost optimization by removing the resources; when the load on the application is lower.
- It Helps achieves higher system availability, as, during peak load, the system gets additional resources, and the application does not crash due to a resource crunch.
- It handles the variations of user traffic on the websites.
- Removes human error aspects for increasing or decreasing cloud resources.
- Adding and decreasing the resources is automatic and can be configured as per the organization’s need, with real-time monitoring available on traffic and resource usage.
Monitoring additional resources with the increase in load
There are various tools and platforms to monitor the cloud resources, such as:
CloudWatch Metrics: It provides the metrics and monitors the critical resources in the cloud environment. We can create the CloudWatch alarm to track specific metrics and receive notifications on configured email addresses. This can be configured for a small duration, such as every minute, wherein it will receive load balancer report data.
Access Logs: Access logs are analyzed for traffic analysis and identifying the core issue to fix the problem.
Request Tracing: The load balancer adds a tracking identifier for each request it receives; it helps monitor and analyze HTTP requests transferred between nodes.
CloudTrail Logs: Trail logs track the operations performed by every user, thereby providing the mechanism on what changes are done and by which user.
To get the best results for auto-scaling, you have to use automating scaling policy that uses the predetermined target metric.
Hence, it is recommended to find the peak requests-per-second (RPS) your application can handle on production and latency of requests before configuring auto-scaling in your cloud environment.
You should test your auto-scaling configuration to ensure it works as expected for increasing/decreasing user traffic on your endpoints.
Yatender has 20+ years of experience in software test engineering. As the head of Testing Practice at IGT Solutions, Yatender is actively involved in innovations related to test engineering covering new tools, technologies, and solutions, and enabling IGT’s clients to achieve faster time to market quality improvement, and optimization of developer efforts in overall SDLC. A result-oriented leader, proficient in delivering high customer value and achieving excellence in service delivery management with proven skills in consulting and managing large and complex test programs. When away from work, he enjoys reading on a variety of topics and spending time with kids.