8

This is my basic setup:

  • I run a server (DL380 G7; linux 3.13 kernel) that is host to ~10 virtual machines
  • It is set for automatic power on
  • I use NUT for UPS management
  • Graceful shutdown duration of the Host (including first shutting down the VMS) is ~8-10 minutes
  • Total runtime of the UPS (I have 2, each powering one PSU in the server and one PSU the attached storage) on fully charged batteries is ~75 minutes.
  • I have set the levels of UPS/NUT so that I have the critical level (LOWBATT) i.e. initiate shutdown at 15 minutes remaining (I dare not go lower)

The following scenario that has so far happened to me twice during the last 12 months:

  • Power loss, UPS take over just fine
  • Power remains off for about 1 hour -> shutdown initiated, as it should be
  • The server stops the vms, begins shutdown procedure
  • --> sometime here power comes back
  • Server completes shutdown and powers off
  • Server does not come back online, since the UPS has power (again) and the server actually never lost power (being supplied by the UPS), so basically it looks to the server as if it had been an intentional graceful shutdown.
  • As soon as I become aware I remotely power on the server via ILO [last time this happened was today at 03:46am :-), so that is why I am asking ]

As ewwhite has pointed out, the specific UPS models would be helpful:

  • Eaton 5PX 2200VA, with +1 EBM
  • Roline Prosecure II, 1500VA RM2U, with +1 EBM

Have any of you run into the same problem? Is there an out of the box solution with some UPSes?

So far I have considered setting up some low power linux device (Raspberry Pi?) to take over the monitoring; it would check the ups units for sufficient charge of the batteries and input power status and then restart the server via ILO/IPMI.

Is any automatic solution just too much bother (for my case and in general) and should I just go with manual intervention when and if it happens?

regards

martin

ewwhite
  • 201,205
martin
  • 83

3 Answers3

1

This is a case where you shouldn't be using two UPS units, where each feeds a power supply. That may be a big part of your problem, as a single UPS can restore the previous power state following an outage (this is the default in the HP ProLiant BIOS as well). Having two seems to mess up this logic.

Are you connected to the UPS via serial or USB cable?

See the specific suggestions at:
How to wake a server after UPS Shuts it down when Mains power is restored?

This should be easy to test, but to be honest, I spend very little time dealing with these edge cases. Server room power is one of the easiest things to plan for, in that you can spec x-hours of battery runtime and be able to ride through power-loss scenarios like this.

If the outages are longer, I just make sure I can remote in and handle things manually.

ewwhite
  • 201,205
1

An alternative solution that requires no hardware change is to setup the shutdown process to reboot if the UPS has power after all the VMs have shutdown. This will involve figuring out where in the shutdown process you can put your init script and you need to make sure that nut doesn't get closed beforehand as you need it to communicate to your UPS.

Are you sending a shutdown command to the UPS at the end of the server shutdown? If not you could consider also the option to do that and then you can set the delay until shutdown so your server really finished the shutdown and also a timeout after the ups has gone down and until it powers up the server after power is back. If power is back before the shutdown completed you will still have your server powered down completely by the shutdown command but it will be brought back up after some timeout.

Check the NUT upscmd shutdown.return and the associated timeouts.

Baruch Even
  • 1,131
1

A ghetto solution, but it works. Get a small mikrotik router or a Linux board, and put wake on LAN in it. Place the device without ups backup, configure it to send wake on LAN every minute or 30 sec. So when it loses power, it sends no commands, but when it gets power it sends WOL every 1m or 30s. So your server never shuts down when power is on.

Cory Knutson
  • 1,886