64

I have a 3GB database that is constantly modified and I need to make backups without stopping the server (Postgres 8.3).

My pg_dump runs for 5 minutes. What if the data is modified during the process? Do I get consistent backups? I don't want to find out when disaster strikes.

Postgres documentation http://www.postgresql.org/docs/8.3/static/app-pgdump.html doesn't say anything about this.

Derek Downey
  • 23,568
  • 11
  • 79
  • 104
Roman
  • 793
  • 1
  • 6
  • 7

4 Answers4

77

From the manual:

It makes consistent backups even if the database is being used concurrently.

So yes, you can trust the backup. Of course, it's PostgreSQL, you can trust your data in PostgreSQL.

Frank Heikens
  • 24,036
  • 1
  • 29
  • 20
30

The dump begins with setting TRANSACTION ISOLATION LEVEL SERIALIZABLE.

Please read the manual on Transaction Isolation and for example the User List discussion about this.

Quote:

The backup will only see transactions that were commited before the isolation level was set.

Basically everything after pg_dump sets the transaction isolation level is not part of this dump.

The read/write operations are not affected(locked) during that time.

András Váczi
  • 31,778
  • 13
  • 102
  • 151
Dennis Nolte
  • 465
  • 6
  • 14
15

pg_dump starts a transaction, similarly to how any other long running query will work. The consistency guarantees there come from the MVCC implementation. The dump will always be self-consistent within those rules.

All the fuzzy parts of MVCC are around around things like what order UPDATE transactions become visible to other clients and how the locks are acquired. pg_dump is strict about the ordering and acquires a read lock on the whole database to dump it. For most people, that's what they expect, and the mechanism used never causes any trouble. The main concurrency risk is that clients trying to change the database structure will be blocked while the dump is running. That doesn't impact the quality of the dump though.

Greg Smith
  • 900
  • 5
  • 7
-1

First sorry for answering on this old thread. But from my experience I can not confirm the statement that you can trust pg_dump/

I switched some month ago for a project to postgres. And of course I made all backups with pg_dump from the live system as suggested. But after the first backup check I figured out that all dumps having different sizes. After restoring random some of them, I figured out that some tables are missing. So I started to analyze why this is happen and I figured out that if the overlaying application, e.g. app servers, locking some tables and pg_dump is not waiting until they are released. I was needed every time to stop the server for the backup period which is not a good solution.

So I still looking for a fix but as I said I tottally disagree with the statement that you can trust pg_dump.