One possible approach is to allow such instances to make demands for new instances based on local request counters, but instead of directly reacting to those demands you would funnel them to a central instance creation logic.
That logic would immediately react to the first demand, but also start a "cool off" countdown timer. Any subsequent demand received while the timer is still active would be considered to be caused by the same traffic spike that triggered the first demand and would thus be ignored.
A similar logic could be used to gradually shutdown idle instances, if the local counters remain below a minimum level.
Note: the local counters would need to operate in a leaky bucket manner or be periodically reset for such approach to be possible - so that the demands are repeated if the traffic remains high.
Another possible approach is to just publish the local counters. A central piece of logic would periodically collect and aggregate these local counter values in order to decide on lauching new instances or shutting down existing ones.
The advantage of such method is the lack of a single global counter which would require write access locking (to prevent corruption) which would be a scalability limitation.
This method is described in mode details in Sharding counters.