1

we face issues with on execution og pg_basebackup once in a while. One cure seems to be to increase wal_keep_size. If you go down that rabbit hole the vuestion on which value is suitable pops up naturally.

As the Cybertec Blog is one of sources of truth I am consulting the most I stumbled over this blog post on the matter. It says

When you for example see that your ‘find pg_xlog/ -cmin -60’ (file attributes changed within last hour) yields 3, you’ll know that you’re writing ca 1.2GB (31624) per day and can set wal_keep_segments accordingly.

I think that means, if you find 3 files change in the recent 60 minutes, a value of 3 * 24 * 16 MB would be suitable for wal_keep_size.

is that a correct interpretation? In one particular case I get 47 wal segments from find pg_wal -ctime -60 -type f ! -empty ! -name *.backup | wc -l, which translates into wal_keep_size = 17.6 GiB (47 * 24 * 16 / 1024).

That looks really like a lot to me, but maybe it's the reality we have to face.

  • The data_directory measures 247 GiB of size.
  • lsmem shows Total online memory: 31G
  • there are 6 other postgres instances (most of them significantly smaller (2.5 GiB, 100 MiB, ... for data_directory) then the one we are talking about here) sharing that same Machine

Can anybody kindly comment whether my understanding and the conclusion I am drawing from that are reasonable?

vrms
  • 269
  • 1
  • 4
  • 12

1 Answers1

1

If you are facing "issues" with pg_basebackup that get fixed by increasing wal_keep_size, it must be that you are using the option --wal-method=fetch. Unless you are also using the option --format=tar, a simple solution would be to use --wal-method=stream instead; that would avoid having to keep all WAL around.

Your calculation is fundamentally correct, but you don't have to keep old WAL around for a whole day. You only need to keep it around for the duration of the backup. So if your backup takes an hour, set wal_keep_size so that it retains WAL for two hours or so.

Laurenz Albe
  • 61,070
  • 4
  • 55
  • 90