Autoscaling containers using local request counters

Question

Somewhere around I got to know about different approaches which were used to scale our webapp which included Scaling using local request counters. Below that they had written the drawbacks of this approach adding that

Each instance would reach the threshold almost at the same time and hence each would demand a new instance, leading to a large number of instances even though the number should have increased by one only

I was curious to know if there's a solution to this problem or any workaround ?

score 3 · Answer 1 · answered Apr 20 '17 at 08:17

The problem can easily be solved using the following components:

On the instances serving your webapp continue to monitor the number of incoming requests – and anything else you see fit.
Publish the number of incoming requests to a monitoring system. If it is not yet implemented, this step will improve your monitoring capacities, and will help you to monitor the load on each hosts as well as host balancing.
From the incoming requests, deduce an estimated required number of webaspp servers needed to serve that work load, as well as the difference between the actual number and the estimated needed number. In the case you describe, it seems that the estimated number is just a scalar function of the total number of incoming requests on a recent period of time. On other systems or after some times, more subtle strategies can be implemented. Monitoring these quantities and the difference could ease the traceability of the auto-scaling strategy, and will monitor its responsivity.
Last implement the auto-scaling itself, at this point, this is really just reading some number from your monitoring system and writing it to your scaling system.

score 2 · Answer 2 · answered Apr 20 '17 at 18:49

One possible approach is to allow such instances to make demands for new instances based on local request counters, but instead of directly reacting to those demands you would funnel them to a central instance creation logic.

That logic would immediately react to the first demand, but also start a "cool off" countdown timer. Any subsequent demand received while the timer is still active would be considered to be caused by the same traffic spike that triggered the first demand and would thus be ignored.

A similar logic could be used to gradually shutdown idle instances, if the local counters remain below a minimum level.

Note: the local counters would need to operate in a leaky bucket manner or be periodically reset for such approach to be possible - so that the demands are repeated if the traffic remains high.

Another possible approach is to just publish the local counters. A central piece of logic would periodically collect and aggregate these local counter values in order to decide on lauching new instances or shutting down existing ones.

The advantage of such method is the lack of a single global counter which would require write access locking (to prevent corruption) which would be a scalability limitation.

This method is described in mode details in Sharding counters.

Autoscaling containers using local request counters

2 Answers2