2

CentOS 5.10 / VMWare ESX 5.1

I've got an older email server running CentOS 5.10 (with SendMail) and it's experiencing intermittent hangs wherein the system becomes completely unresponsive. During these times, I can't connect to it at all and the virtual console is unresponsive.

The strange part is that our VMWare admin group aren't seeing any obvious resource spikes that would be indicative of insufficient resources, load spikes, etc. Furthermore, when I examine the system logs (e.g. maillog, messages, etc) there's a noticeable absence in ALL log activity during the time of the hang which suggests that these outages are severe enough to prevent logging (or perhaps there's a filesystem/disk issue).

The one abnormality is that sendmail logging on the box was pretty high (98 instead of the usual level 9). I'm going to set it back to normal shortly.

I'm stumped on where I can go for more info here. Is there a thread dump that would tell me what the OS was working on during the hang?

Additional information:

  • Kernel version is: 2.6.18-371.4.1.el5 #1 SMP Thu Jan 30 06:09:24 EST 2014 i686 i686 i386 GNU/Linux
  • The storage is handled on a shared SAN.
  • VMWare tools is not installed on the system as per internal policy however we've been running for a long time without vmware tools so we don't think the absence of it is necessarily the root cause.
  • Specific version of VMWare is: VMware ESXi 5.1.0 build-2000251
  • Hardware is IBM 3850 M2, Model 7233AC1
Mike B
  • 12,304

2 Answers2

2

So, 32-bit CentOS 5.10... That's not necessarily a problem...

But you should always have the VMware tools installed when running an operating system supported by VMware. This can be extremely helpful when vSphere/ESXi host memory gets constrained, plus it adds the memory balloon driver, better NIC interface options (for your EL5 system) and power management.

In general, look at what the SAN is doing at the time these issues occur. Also, if you're not using VMware tools, there's a good chance that ESXi isn't on a stable revision level. Please report back on the ESXi build number. You'll see it at the top of the vSphere Client when connected to the host.


Edit:

Since this is a vSphere cluster, can you have the team check memory allocation. I've seen Linux VMs hang or lock-up because of bad memory configuration. This can include setting RAM limit in the vSphere client for the VM in question. This can also include situations where your cluster is too overcommitted on RAM and/or where the VMs have been allocated too much RAM.

See: vSphere education - What are the downsides of configuring VMs with *too* much RAM?

Any deeper analysis would require seeing some of the VMware cluster/resource status screens.

ewwhite
  • 201,205
1

I just wanted to close the loop on this one. The mysterious hangs stopped occurring after we scaled back SendMail logging from 99 down to 9 (default). Admittedly, that was a reaaaaaaly high log-level setting but I’ve never seen that completely grind a server to a halt. Also no idea how long it had been set that way.

My guess is that the intermittent nature of this stemmed from a combination of mediocre disk I/O speeds and occasional SMTP load spikes.

Thanks everyone for your help.

Mike B
  • 12,304