Postgresql backup with ZFS snapshots: Is pg_start_backup()/pg_stop_backup() necessary?

Question

The title says it all. I have found this from 10 years ago saying that if database data is all in the same snapshot, then pg_start_backup() isn't needed. PostgreSQL will start from the snapshot like after a typical crash.

But what if there isn't a single snapshot for the whole database? What if there are some tablespaces in other datasets and WAL is in its own dataset too? This way, snapshots could be out of sync by a very small period of time. Would this make necessary to run pg_start_backup() to ensure no data corruption?

I have found this too from 8 years ago, by a guy testing exactly this, if PostgreSQL would start again, creating an intentional delay between WAL and data snapshots, but using virtual machine snapshot technology. So it seems that it can work, the question would be, will it always work?

In fact, going one step further, why would be pg_start_backup() needed in any circumstance? Isn't WAL replay capable of fixing the internal inconsistencies of a non-instantaneous backup?

Best regards.

score 1 · Answer 1 · answered Nov 12 '22 at 17:40

Yes, it is still needed if that is how you want to do your backups and you want them to be robust. If you don't do it, then you could get lucky and it would work, or you could get kind of lucky and it would just obviously blow up when you try to restore it, or you could get unlucky and it would claim to work while leaving your data subtly corrupted.

In fact, going an step further, why would be pg_start_backup() needed in any circumstance? Isn't WAL replay capable of fixing the internal inconsistencies of a non instantaneous backup?

It can if it knows to do it. And that is why pg_start_backup() is needed. Without that, it might start WAL recovery from the wrong spot, because it doesn't know what the right spot is. (Also, if the pg_wal is not located in the last snapshot taken and you aren't using a WAL archive, then it might be missing some of the WAL it needs. But it won't know that, and will just produce silently corrupted data. However, pg_start_backup() doesn't do anything to fix this problem, so that is not really what you question is about...)

score 0 · Answer 2 · answered Nov 12 '22 at 03:34

(xfer comment to answer)

A lot has changed in 9 years, including the development of pg_basebackup and the removal of exclusive backup mode in PG15. No point setting up a deprecated backup method.

I would set up point-in-time recovery with a WAL archive and a backup-only standby. You get coverage from both pool failure and user mistakes, and most importantly, they are supported solutions with guarantees of recovery, if implemented properly.

Postgresql backup with ZFS snapshots: Is pg_start_backup()/pg_stop_backup() necessary?

2 Answers2