4

I am using Postgres to store a variety of application data for a webapp. Part of the application involves storing and retrieving user uploaded files. I am storing the files in the filesystem with some associated metadata in the database.

I am trying to come up with a backup and archive strategy so that I can effectively backup and archive/restore the database and the linked files. Here are the things I want to accomplish.

  1. Perform routine backups that can be used for recovery from failures and which include all DB data and the linked files. Ideally, this backup would be done while the app is running. Live backup is certainly possible with a DB but I am not sure how to keep the linked files consistent with the database during the backup process

  2. Archive chunks of data as they become "old". These chunks must includes the database data plus any linked files. It should be possible to put the archived data back into production again. It would be ideal if it were easy to determine which ranges of objects were stored in each chunk.

Do you have any advice for how to accomplish these goals? If the files were in the database as BLOBS these tasks would be much easier since normal database backup and restore functionality would handle this. I am not sure how to accomplish the same thing when file data is linked to database rows.

Note: I asked this question on ServerFault but I did not get any response there. The link to that question is: https://serverfault.com/q/284218/85719

busyspin
  • 43
  • 3

2 Answers2

2

If you store your files by hash to and link with the hash to the database row, you can simplify maintaining consistency.

In that case you can either:

  • keep on incrementally backing up the filesystem (eg with rsync) and never delete

    or

  • if you need to free up space, keep an audit of link count and when the last link was deleted - you can then periodically purge the filesystem of files not linked to in your recovery window. Something like rsnapshot might help you archive these long-term in a space efficient way
Jack Douglas
  • 40,517
  • 16
  • 106
  • 178
1

LVM + snapshots may be your best bet, if you want a clean and easy restore. It doesn't provide Point-in-time recovery, as the changes that are reverted or overwritten between snapshots are invisible, but at least your files and database are backed up in consistent state.

http://tldp.org/HOWTO/LVM-HOWTO/snapshots_backup.html

http://www.howtoforge.com/linux_lvm_snapshots

Bryan Agee
  • 801
  • 1
  • 8
  • 16