Dealing with data, when your database is under version control

Question

Currently my database is not under some kind of vcs, we can get deltas but that's about it. I would like to try and make product deployments more automated, and less time consuming. I understand that placing a db's schema files under source control allow you to manage versions, and that these files are basically for dropping the old tables/indexes/etc and then adding the new versions.

My question is, what about the data that's already there? By dropping everything we'd lose all of the data. So, we would have to do a data dump before updating the database, and then re-load the data back after the update has been done. Problem is, some of our largest databases have 80+ GB of data, and we probably have a total of 20 sets of databases (6 DBs per set).

I'm sure that this would work, but given the size of everything, is there a simpler solution that would cut out the need to dump and reload everything each time a schema update took place? And, if not, wouldn't we have to dump the data such that reloading it took in to account the new schema?

score 2 · Answer 1 · answered Aug 14 '14 at 16:49

What we used in a company I used to work at is Liquibase. It is a solution in Java, but since you only ever need to run it once per DB update it will likely work for other languages as well. I'm pretty sure there are other solutions that work in a similar way, so I'm just going to describe how it functions in general so you can search for something that works for you:

You have a file (or set of files) with changelogs - commands that need to be run on a DB to update it from version a to version b. The DB itself then has a special table that tracks which of these commands have been executed against it along with checksums, timestamps and other misc info.

Then when you need to make an update to the schema - add the commands that need to be run to update the DB without dropping the data (likely UPDATE statements). Then when you actually run the update, the updater will find the version of the DB schema, find all updates that need to be run to get from its current version to the latest version and run only those in correct order.

That way you do not recreate the schema every time and don't need to reload the data. It's also automated and fairly quick compared to doing it manually or recreating the DB completely

Dealing with data, when your database is under version control

1 Answers1

Linked