1

We're running about a dozen WordPress sites on this server, and performance is really slow, with 'Service Unavailable' messages popping up from time to time. Restarting Apache and PHPFPM solves that for a short while, but it grinds down again sometime in a few days, sometimes in only a handful of hours.

We're also running Memcached - default config except 256M

I'm monitoring the www.pool log, and it keeps suggesting raising the start_servers, and min/max_spare server settings, however while that eliminates that warning, it doesn't solve the problem.

Currently those are set to start=70, min=50, max=90 and max_requests = 1000

This is a 6 core, 16GB Linode. Running HTOP shows all 6 cores mostly pinned at 100%

Memory usage is 11GB out of 16GB, with Swap 374M out of 512M

I had noticed when running journalctl -u php8.3-fpm some OOM entries, so have backed the start/min/max numbers down to their current levels (were 80/60/100).

I've been reading posts about configuration for days, and everything is pushing me to increasing the start/min/max settings, but that's not making the situation better, and may be making things worse, so I'm thinking I'm either looking in the wrong place, or adjusting in the wrong direction.

The primary goal is stability - this box has enough resources to be bulletproof. The secondary goal is performance - it should be extremely responsive, so I want to be sure I'm getting the most out of the available ram without compromising stability.

Any insight or suggestions would be greatly appreciated.

Update: The 'Tuning Apache' articles assume the use of mod_prefork, whereas PHPFPM uses mod_event. There's also the addition of memcached, making this a bit of a three-body problem. The article suggested is also 5 years old.

While I understand that configurations will depend on individual application loads, there also has to be a generally accepted, average configuration for a 16GB Ubuntu server running Apache2, PHPFPM, and Memcached. Hosting companies aren't tuning individual server instances for the specific customer sites that happen to get load balanced onto an instance, they must use a base, general purpose configuration and I can't find that documented anywhere. Maybe those configs are proprietary secret sauce, but a good starting point shouldn't be black magic.

mpm_event configuration in apache is currently:

    StartServers              6
    MinSpareThreads          25
    MaxSpareThreads          75
    ThreadLimit              64
    ThreadsPerChild          25
    MaxRequestWorkers        500
    MaxConnectionsPerChild   0

My understanding is on a 6 core server, StartServers should be 6. Can anyone confirm that?

1 Answers1

0

Computer system capacity planning is not a mature science. It is not like say civil engineering, where a licensed named person runs some standard formulas, puts their stamp on public plans, and is liable if things go wrong. Over in IT land, there is no mandate that response times are good, and given the many variables workloads are unfortunately poorly understood.

Presumably your memcached is for caching WordPress requests. Confirm it is configured to be used, document any plugins and their versions. Do a sanity check that there are cached objects stored, by reviewing stats from memcached, or perhaps listing generated files for the static file type of cache plugin.

Currently a Linode 6 CPU 16 GB instance is on the shared tier. These CPUs are oversubscribed, which is not the best possible response time if other customers also demand resource at the same instant. Experiment with Linode dedicated tier, or multiple shared hosts, and measure the difference. Yes this is more expensive, predictable latency costs more.

Find out if there are obvious demands on resources. On Linux, watch the "better load average" called pressure stall information in /proc/pressure/ cpu io memory. Not all tools are aware of PSI. A couple I know, htop is interactive, and netdata is a web based view of a time series database.

Memory at 69% utilized is a good place to be, using it with some margin. Aggressive memory reclaim if this is undersized can be very bad for performance. Although keep memory in your metrics to monitor, especially MemAvailable from /proc/meminfo.

If these components are in their own cgroups, an accounting of their memory use is easy. On systemd systems, systemd-cgtop -m will print units by memory use, note down httpd and php in particular. Divide by the number of processes, establish how many MB each these cost. Also collect memory during degraded performance, check if any service has significantly more memory than baseline.

CPU sustained at 100%, almost certainly this is a limiting factor. Fortunately sampling what exactly is on CPU is relatively simple. On Linux, run perf top to see which functions in which programs are most often running. Learn a little about what is php code, which are kernel functions, and what else is running. In general, rendering a fancy php page is expensive in CPU. But cached or static content is already rendered, saving some time and just sending it with some I/O.

After surveying what fraction of your CPU is spent on php, drill down into where exactly in the code is spending most of the time. Several php profilers exist, consider one that is relatively low impact to enabling in production, and can visualize the data in a useful way. Flame graphs are an excellent visualization I recommend, which can also be done with the perf_events data I mentioned earlier. Rarely will there be the expertise and the time to turn profiling data into efficiency savings in code. Even so, knowing what user workflow or plugin code is expensive focuses your investigation.

Before going too deep on number of workers tuning, check if you have data on number of concurrent users. From tracking analytics, request log analysis. Chasing hundreds of concurrent connections is easier to justify when there is evidence it will improve the response time of a few hundred real people at any given moment.

httpd, enable mod_status with ExtendedStatus On. Under heavy load, count the number of workers in each status. Most threads doing work would be Write or Keepalive states, a lot in Reading implies large file uploads or unusual holding connections open. Tune MaxRequestWorkers such that at least a few are in "waiting for connection" for the best response times. These threads are relatively lightweight in memory, given they share a lot of things with their thread group. Note that with a default ServerLimit of 16, to get more than 16 * 25 = 400, you would also want to increase ServerLimit. To 20, if 500 is indeed the maximum concurrent connections desired. StartServers 6, or one per CPU, is a decent guess at starting processes, but httpd will start more once the needed connections exceeds 6 * 25 = 150, minus 25 due to the reserved MinSpareThreads.

And there are the php processes. Regarding that warning message, php-fpm only knows that max children processes has been reached in dynamic mode. It does not know if increasing limits is a good idea. You might be near memory limits, and more causes out of memory kills, causing a bad user experience and eating into your error budget.

Consider enabling fpm's status path to get some statistics somewhat like httpd's scoreboard. Including a second opinion as to how much memory it uses. Consistently having a requests backlog could mean extra slow requests for some reason, or just that there is a large number of concurrent connections.

Your current maximum of 90 is not a lot compared to 400 for httpd, but maybe httpd also is serving static content that should not go through PHP. If you do establish exactly how many processes you can afford, setting pm = static will get there right at startup, causing less delay while servicing requests, and at least silencing the warning.

Regarding php-fpm tuning, see also on Server Fault and some article.

Even after all of this monitoring and optimization, the quickest mitigation may be add more CPU or RAM.

John Mahowald
  • 36,071