6

At the moment I have an AutoScaling Group (ASG) of GoCD build agents without any scaling policies. I have created some custom metrics that indicate how many build agents are currently idle and I'd like to scale based off of that. My concern is that when scaling down, the ASG may terminate instances that are in the middle of a build. This may result in failed builds and delayed builds.

How can I scale down an ASG without terminating instances that are in use?

030
  • 13,383
  • 17
  • 76
  • 178
user2640621
  • 1,405
  • 9
  • 20

3 Answers3

3

Auto Scaling groups have a useful feature for this, named lifecycle hooks.

Worflow taken from the documentation above:

Lifecycle workflow

As you can notice there's a Scale in step, triggering a Terminating:Wait for the autoscaling group and notifying the instance to be terminated, the instance has now to do it's work and once done signal it is ok to be terminated.

If the task can take longer than the autoscaling group HeartbeatTimeout parameter you can reset the timeout with (quoting still from the same page):

Restart the timeout period by recording a heartbeat, using the record-lifecycle-action-heartbeat command or the RecordLifecycleActionHeartbeat operation. This increments the heartbeat timeout by the timeout value specified when you created the lifecycle hook. For example, if the timeout value is 1 hour, and you call this command after 30 minutes, the instance remains in a wait state for an additional hour, or a total of 90 minutes.

So for your case, the lifecycle notification should startup a kind of script/program which will:

  • prevent this builder to receive new builds
  • loop periodically to check if the ongoing builds are done
  • if there's build still in progress and the timeout is near, reset the timer
  • if there's no more builds in progress, signal to proceed to termination
Tensibai
  • 11,416
  • 2
  • 37
  • 63
2

My concern is that when scaling down, the ASG may terminate instances that are in the middle of a build.

It the builds are running then one should prevent down scaling. When the build has been completed then one could down scale as the resources will not be required anymore.

030
  • 13,383
  • 17
  • 76
  • 178
0

I think we have an answer here. Using GOCD API it can be managed. https://amaysim.engineering/auto-scaling-build-agents-for-gocdtags-a10f12d5b77c