1

I have a linux server with 16gb of RAM and 8 cores. It never goes into swap and CPU usage never exceeds ~1.5. I believe it is safe to say there is plenty of capacity.

Occasionally I get some [warn] mod_fcgid: process 28341 graceful kill fail, sending SIGKILL.

On Apache/2.2.15 (CentOS 6.3) mod_fcgid/2.3.7, all mod_fcgid settings below are not present, thus default:

FcgidMinProcessesPerClass
FcgidMaxProcessesPerClass
FcgidMaxProcesses
FcgidIdleTimeout
FcgidProcessLifeTime
FcgidIdleScanInterval
FcgidOutputBufferSize

I want to identify in which vhost are processes have been getting the SIGKILLed. So I loaded mod_status and turned ExtendedStatus ON. I set log_server_status to run every minute, since I can't afford to manually reload the /server-status/ page and at the same time keep an eye on the logs, all day, waiting for a SIGKILL to happen.

But the output of log_server_status is not very helpful. This is all I see in the logs created by the script:

180131::::
all the way to
235501::::
235601::::
235701::::
235801::::
235901::::

I want to track down the vhosts responsible for SIGKILL. How do I go about it? Am I doing something wrong with regards to log_server_status? The output seems useless...

Gaia
  • 1,975
  • 5
  • 36
  • 63

2 Answers2

2

You seem to be running PHP through mod_fcgid. As long as the same wrapper is used for starting the PHP interpreter for all vhosts, the processes spawned by mod_fcgid are cross-used as you appear to have no vhost-specific directives for fcgid. They remain running after startup and get re-used to run whatever PHP code is passed to them for processing (which is the very salt of mod_fcgid BTW). Refer to the mod_fcgid documentation for details.

There is a documented bug breaking this behavior and leading to a situation where PHP processes might be spawned for each vhost disregarding any defined per-class limits under certain conditions, but this bug only applies to the old 2.3.6 version of the module, is undesired behavior and has been fixed in 2.3.7.

Other than that, the log warnings you are seeing are not due to resource exhaustion, this is normal mod_fcgid activity. mod_fcgid terminates the running processes periodically (either after an idle timeout, a certain lifetime or after a certain number of requests). The termination happens by sending a SIGTERM to the process. If the process is not able to handle the SIGTERM in time for some reason (it might be too busy, but might also be just catching and ignoring SIGTERM requests), it is ended forcibly via a SIGKILL - this is what the warning is about.

If you are unhappy with the timing of the process terminations, just adjust the respective parameters with the FcgidIdleTimeout, FcgidProcessLifetime and FcgidMaxRequestsPerProcess directives.

the-wabbit
  • 41,352
2

I had to manually comb the apache error logs, daily, for entries that occurred at the same time the SIGKILL messages were being logged in syslog. This allowed mne to find which vhosts 'processes were getting SIGKILLED. I started to monitor (manually) which files were being accessed at said timestamps on those vhosts and after a few days I had enough data to track down which php files were generating errors.

The problem is solved and I am not getting any more SIGKILL warnings.

As a side note, which applies only to my specific case: the warnings came from magento cron entries that were not able to finish within the maximum allowed time for script execution. So I increased the execution time to 180 (for a couple days) and those cron jobs started to finish sucessfully. I then reduced the max allowed time and they can now finish under 60 seconds. The long execution time was because a few jobs had not run in a long time and they had a larger than usual load to deal with.

Gaia
  • 1,975
  • 5
  • 36
  • 63