Improving cold query performance with pg_prewarm

Question

We have a projects UI page where users can search projects. I noticed that search queries will be slow the first run (up to 10 seconds) and faster on subsequent runs (around 2 seconds). Could this explain the 2.28% disk hit? This seems like a high percentage right? If we want to ensure these queries are always fast, could we use pg_prewarm to keep them in memory? Would we need to run pg_prewarm periodically? This projects table is 15 GB and we have over 100 GB of server memory on Heroku.

           table name            | disk hits | % disk hits | % cache hits |  total hits
---------------------------------+-----------+-------------+--------------+--------------
 all                             | 730484143 |        0.32 |        99.68 | 225556991278
 projects                        |  34777074 |        2.28 |        97.72 |   1526319404

I'm also trying to understand the pg_prewarm config autoprewarm_interval. What is this used for? This means it will unload the projects table from memory for example?

Here's the memory from the logs https://devcenter.heroku.com/articles/heroku-postgres-metrics-logs

6 GB - sample#memory-postgres: Approximate amount of memory used by your database’s Postgres processes in kB. This includes shared buffer cache as well as memory for each connection.

125 GB - sample#memory-total: Total amount of server memory available.

875 MB - sample#memory-free: Amount of free memory available in kB.

114 GB - sample#memory-cached: Amount of memory being used by the OS for page cache, in kB.

Erwin Brandstetter · Accepted Answer · 2021-11-17T23:14:41.037

Could this explain the 2.28% disk hit?

Certainly could. Disk hits are the reason for the slow first run each time.

could we use pg_prewarm to keep them in memory? Would we need to run pg_prewarm periodically?

Yes, pg_prewarm can populate the cache. But nothing keeps the OS (or Postgres - two different caches) from evicting pages from cache if it's needed for other data. So it depends on competing activity how long pages remain in cache.
If your queries don't have side-effects (purely SELECT) and don't include expensive computations, you might also just run the queries of interest from time to time. May be even better to just pull in the right data pages from table and indices. (Neither has to be cached as a whole.) That depends on what you know about expected queries.

See:

This projects table is 15 GB and we have over 100 GB of server memory on Heroku.

That's not the whole picture.

How much of the 100 GB RAM is available for caching? (Postgres file buffer and OS cache.) work_mem, maintenance_work_mem or other things compete for the same resource.
The table has 15 GB. What about indices?
Surely, this will not be the only table in use?
Heroku may be doing Heroku-things to cache memory that I am not aware of.

Improving cold query performance with pg_prewarm

1 Answers1

Linked