2

So, basically I am testing and diagnosing my UPS's as well as the power sources (PSU) of my server. For that purpose I am powering down the server "the hard way", by unplugging it from the wall as to simulate a power loss.

This I have been doing like the way described, and has helped me find which UPS's are not working properly, as well as what PSU's need change (if the server shuts down then something needs change, else everything is ok). However, I am starting to worry that constantly unplugging my server and "killing" it the hard way may cause some damage to it or my data.

This leads me to my question: Is there an alternate way of performing these tests as to minimize the chances of damaging the server or its parts? Or there is no problem in doing what I am currently doing?

Again, I am trying to determine what power sources are defective (that is, UPS is ok but server dies anyway when unplugged). The UPS's I can test by their own as to avoid trying this method with the server, but I can't figure out how to test if my PSU's can handle fluctuations and spikes without actually trying them on a live server. Any guidance is greatly appreciated.


The server in question: HP ProLiant DL380 G7 server, with Intel Xeon. I also have it with RAID 1 level on its HDD's. It has Ubuntu 16.04.3 LTS running on its SSD's.

3 Answers3

2

You have an HP ProLiant DL380 G7. Look at the following:

The Systems Insight Display (SID) shows the health of the internal components. enter image description here

If you have an amber light on the either of the power supplies: shown on the SID or on the actual physical units, there's a problem.

You can also log into the server's ILO to check the Integrated Management Log. If you lose power suddenly, there may be an entry in the log indicating something like:

- Server reset.
- Server power removed. 
- Server power restored.

You have the option of not connecting both power supplies to the same UPS. Connect one to the power mains and observe the behavior.

Check the firmware on your system. G7 servers are old now, but by running Ubuntu, you're probably missing the HP reporting and management agents (they're optimized for RHEL/CentOS/VMware/Windows). You can download the full set of firmware for this model using this HP bootable DVD.

ewwhite
  • 201,205
1

Do not unplug your UPS from the wall. I asked a similar question 9 years ago on this site and got the following answer from Evan Anderson:

The UPS is losing its electrical ground when you unplug it from the wall. While it's unlikely that anything would go wrong, the UPS designers "expect" that path to ground to remain available at all times, and if something did short during your test you might see sparks (smoke, flame, etc) when the electricity takes another path to ground. I've unplugged UPSs from the wall for testing before, but seeing a flash of "lightning" and hearing a loud "bang" coming out of a UPS during one such test gave me "religion" about not doing that again.

So if you are on a switched outlet, switch it off. If you're not on a switched outlet, consider flipping power to the breaker so that ground circuit stays connected.

As for disconnecting your servers by pulling the plugs, you shouldn't be doing any physical damage to the machines by doing that. You may corrupt non-battery-backed RAID arrays, or disrupt in-flight writes which can cause messy file systems and data loss - but your physical servers should be fine.

As for your actual problem, which is that during brownouts/blackouts/surges you still lose your servers upstream of your UPS there are a few things that might cause this:

  1. If you have dual power supplies in your servers and one of them is on UPS and one is not (which is common enough), you may have a fault in the PSU switching inside the server
  2. Again if you have dual power supplies, perhaps one of them is overloading and the server is shutting down for safety reasons
  3. Depending on the type of UPS you have, it may no longer be functioning correctly. I had a site once that had constant brownouts - 20 to 30 times a day the power would drop below 200v (normally 230v) and the UPS would go into boost mode, and sometimes the voltage would spike to 250v and the UPS would go into buck mode. This shortened the life of the traditional UPS dramatically (I typically got around a year out of the UPS). We switched to a double-conversion UPS (also called an Online UPS) which solved this issue.
1

Two notes:

One is that the best way to connect the UPS is through the dual power supply of your servers. If either the power or the UPS (battery) fails, everything stays up.

Second: except for what was said about loosing ground, it's not bad to unplug a server (if you don't care about data corruption), except for the SSD. Depending on which SSDs you may have, it may have a super capacitor to deal with it. But, losing power could damage blocks that are being erased or written.

Edit about the dual power supply: the correct way is one power supply in the wall, the other power supply through the UPS. Wrong would be to connect only one power supply, or connect them both through the UPS. If you do, failed UPS self-tests will interrupt power, and you can't turn it off to replace the battery.

Of course, one doesn't have that luxury with servers without dual PSU.

Halfgaar
  • 8,534