24

The PostgreSQL built-in backup mechanism isn't always very suitable. Sometimes, you want to put the application in a quiescent state, because it has external data with which you want to backup at the same time you back up the PG data. But the only way to put the application in a quiescent state is to "lock" the database also. PG lacks a database-wide or cluster-wide locking mechanism. Putting PG into a read-only state would be a piece in the following solution:

  1. Quiesce application data (disable logins)
  2. Quiesce database (by making it read-only)
  3. Perform a PG checkpoint or pg_xlog_switch()
  4. Create a snapshot of the App and Data volumes
  5. Resume the database (make it RW again)
  6. Resume the application
  7. Backup the snapshots
blong
  • 115
  • 1
  • 6
Otheus
  • 654
  • 1
  • 4
  • 13

2 Answers2

21

After culling answers elsewhere on the internet, I devised a solution. The other answers were in and of themselves, incomplete. So I am presenting an answer here in hopes it will benefit others.

The Strategy

  1. Disable connections to the database (not the cluster).
  2. Set the database's default_transaction_read_only setting to true.
  3. Terminate the existing connections to that database.
  4. Re-enable (read-only) connections.

Once that is done, you would (in my solution):

  1. Perform the CHECKPOINT (I think this is the safest, but a pg_xlog_switch() would be appropriate for very high-load servers)
  2. Take the volume snapshot
  3. Reverse the previous steps. (But this is tricky!)

Pitfalls

  1. Terminating connections while they are mid-transaction is probably a bad idea. Better to kill idle connections, wait for a few seconds, then kill idle ones, wait a few more, repeat until they are all gone.
  2. At some point, you'll have to kill open/hung queries or abort the backup.
  3. At the start of a transaction of session, Postgresql takes a kind of a snapshot of the process table. You have to reset this snapshot every time you go to check if unwanted processes are still there. See pg_stat_clear_snapshot()
  4. Restoring the read-write state is not so simple. If read-only connections now exist, you must terminate them in order for the new read-write status to take effect. But new connections might arrive while killing existing ones. So again, you must

    1. Disable connections to the database
    2. change default_transaction_read_only status to false
    3. kill the existing connections
    4. Re-enable (r/w) connections to the database

Alternate strategy

Another strategy is to change the permissions on the role used by the application. This can be quite messy and is less general.

For instance, you'd have to revoke/re-grant on not just tables, but sequences, large objects, and probably the schema itself. Further, what exactly is the behavior of existing connections when you change the access? Probably no impact, which means you also need to kill those backends. Finally, let's say the application has read-write access to most tables, but not to others in the schema. You'd have to make sure your re-granting doesn't include those objects as well.

Another possibility is to LOCK all the tables, by querying the catalog and performing a dynamic query. That seemed to perilous for my tastes.

Implementation

Pause_service

The database instance name is 'gitlabhq' and the username of the application is 'gitlab'. Replace it with your own:

  psql -Upostgres  <<'PAUSE_DB'
    -- 1. disable new connections
    alter database gitlabhq_production with allow_connections = off;
    -- 2. Make DB read-only
    alter database gitlabhq set default_transaction_read_only = true;
    -- 3. Inobtrusively but safely terminate current connections
    DO $X$ BEGIN
        -- kill open idle connections, try up to 9x. Last time, kill regardless
        FOR i IN 1..10 LOOP
          PERFORM pg_terminate_backend(pid) from pg_stat_activity where usename = 'gitlab'
            and (i >= 10 OR state in ('idle', 'disabled' ));
          PERFORM pg_stat_clear_snapshot();
          EXIT WHEN NOT EXISTS ( select pid from pg_stat_activity where usename = 'gitlab' );
          RAISE NOTICE 'pg backends still open: sleeping 2 seconds';
          PERFORM pg_sleep(2);
          PERFORM pg_stat_clear_snapshot();
        END LOOP;
        -- send notice if still open connections
        IF EXISTS ( select pid from pg_stat_activity where usename = 'gitlab' ) THEN
            RAISE NOTICE 'Hung backends. Backup might not be 100%% consistent';
        END IF;
    END;$X$;
    -- 4. Allow read-only connections while checkpointing/snapshotting
    alter database gitlabhq with allow_connections = on;
    CHECKPOINT;

Resume

    alter database gitlabhq_production with allow_connections = off;
    alter database gitlabhq set default_transaction_read_only = false;
    SELECT pg_stat_clear_snapshot();
    SELECT pg_terminate_backend(pid) from pg_stat_activity where usename = 'gitlab';
    alter database gitlabhq with allow_connections = on;

There's a possibility that in this last step you will kill long-running read-only/SELECT queries, but in my experience, such long-running queries can last minutes if not hours, and it's acceptable to kill these in order to ensure uptime for everyone else.

Otheus
  • 654
  • 1
  • 4
  • 13
10

I think it'd be desirable to have this functionality as an official PostgreSQL feature, personally.

Doing it simply

If you don't want to get your hands dirty with C coding for a PostgreSQL extension, you can just put a connection pooler in front of PostgreSQL. Like pgBouncer.

pgBouncer has the ability to pause application activity built-in. Though to make it very useful you need to connect directly (not via pgbouncer) and cancel active connections once you've paused new ones from coming in. Just select pg_terminate_backend(pid) from pg_stat_activity where pid <> pg_backend_pid().

Doing it right

If you're willing to get your hands dirty, though, it can be done with a C extension. The extension must:

  • Be loaded in shared_preload_libraries so it can register a small static shared memory segment with a boolean flag like db_is_locked.

  • Register a ProcessUtility_hook and ExecutorStart_hook that tests the is-locked flag in shmem and, if set, sleeps in a WaitLatch loop until it sees the flag has been cleared again. (You could possibly use a parser hook instead).

  • Write two SQL-callable functions in C. One sets the flag. Another clears the flag and iterates through PGPROC setting the latch of all user processes, so they know to wake up immediately.

  • Optionally write a 3rd function that, if the flag is set, iterates through PGXACT to find open write transactions and signals them to terminate.

All this has already been implemented as part of the BDR extension, but it's part of a much larger system. You could quite likely extract the relevant parts into your own extension. See bdr_locks.c, bdr_commandfilter.c, bdr_executor.c, bdr.c, etc.

Note that this won't make PostgreSQL read-only on disk - the checkpointer will continue to run, the bgwriter will continue to run, the archiver will still run, etc. So it's not enough to let you make a DB backup without an atomic file system snapshot or pg_start_backup() / pg_stop_backup(). But it's fine for your use case, pausing application actiivity at the DB.

Craig Ringer
  • 57,821
  • 6
  • 162
  • 193