6

I have a postgres 9.3 db running on ubuntu server.
About a month ago we've had some problems with the hardware on the server which was resolved by our VPS hosting company.

The problem was resolved fairly quickly and everything seemed to be working fine.
We have backups running using barman that is on a different server - and backups and restores were working (I checked).

The first sign of data corruption was few days ago: I decided to do a full pg_dump on our DB as I do every once in a while, and it failed (invalid page header in block...) - The data seemed to have been corrupted a long time ago - around the time of the hardware problem (that was the date on the corrupted record). I resorted to finding the corrupted record and I deleted it and restored it manually.

After that I was able to do a full pg_dump.

In order to check for additional corruption - I've set up different db server from a backup and run pg_repack on all the tables to verify that I'm able to rebuild all the indexes and tables.

My questions are:
1. How can I be certain that I don't have any additional corruption in my DB?

2. What can I do to periodically check my data integrity?
3. What else can I do to verify the integrity of our DB besides dumping the whole DB and re-indexing it (which I already did)?

P.S - I don't have block checksums enabled.

culix
  • 113
  • 5
Rubinsh
  • 171
  • 1
  • 1
  • 5

3 Answers3

3

If you have initialized your cluster with checksum, you can control this while backuping with pg_backrest.

PostgreSQL has supported page-level checksums since 9.3. If page checksums are enabled pgBackRest will validate the checksums for every file that is copied during a backup.

You will find more information about cheksums here: https://www.postgresql.org/docs/current/static/app-initdb.html#APP-INITDB-DATA-CHECKSUMS

You can also make a pg_dump to /dev/null (if you don't have enough space to keep a logical dump). This should ensure you that you can read each data block.

PGDG had a discussion this year beacuse some wanted to activate checksum by default. You can find it there: http://www.postgresql-archive.org/Checksums-by-default-td5940158.html

Arkhena
  • 1,610
  • 10
  • 15
3

If I may, there are now new tools for this:

Both can activate or deactivate checksums on the cluster as well as run checks on an offline cluster. If you are worried about corruption you can run either on a whole cluster just portion(s) of it. Note that from personal experience, pg_verify_checksum works fine on older versions and pg_checksum is expect to have the same behavior.

Also eventhough it is now quite old, I would advise reading this post: http://blog.ringerc.id.au/2012/10/avoiding-postgresql-database-corruption.html

2

As stated elsewhere, there are no corruption-specific tools in core PostgreSQL similar to those found in SQL Server (DBCC CHECKDB) or Oracle. Which does not mean that you cannot look after corruption and try to prevent it as much as possible. A few thoughts:

  • Use good hardware. This hardware should properly honor fsync. Not tunable in your case, but important nevertheless.
  • Use checksums when you initialize your cluster. Sure there is a little penalty for doing so, but it really is worth the hassle. You could create a new cluster, dump and reload your database in this new instance and enjoy a more robust environment. It is available since 9.3, so fine for you.
  • Run regular, automated logical backups with pg_dump. It's good to have an automated physical backup with Barman, but this will not check corruption. Only pg_dump (which will read all rows in all tables) will give you insurance against table corruption. This is something you should definetely automate. If you don't have the place, redirect pg_dump to /dev/null. Corrupted indexes can always be rebuilt if you have a correct table. I think there are some index corruption finding tools out there.
culix
  • 113
  • 5
KookieMonster
  • 2,037
  • 19
  • 18