How do you sync changes to the data in your database?

Question

I am new to a team developing a very large web application. Prior to my start, the team had 0 automation in place and just did everything by hand. This was possible because they were only a team of 2. I am now gradually changing this but obviously I can't just make everyone start from scratch and "do it right".

A problem I'm struggling with are changes to data (not schema) in the database. There are quite a few things that are configured in the database, such as translations, navigations, roles/permissions and mail templates. Luckily these are usually just defined by the team and pushed to staging/production, not both ways.

Our setup is every dev has their own dev environment and their own database. Then there is a staging and production environment. The idea is that a dev can create his new navigation entries, permissions and translations on his dev box then somehow apply these changes to a) the other dev's boxes and b) staging and production.

The obvious way to monitor these changes is to store them as code or SQL files, however this gets quite tricky considering a mail template stores it's data in 3-4 tables and has foreign keys. Another way would be to create export/import functionality, but that would be quite time consuming.

I know this topic has been discussed before but usually the discussion ends with "track your changes in sequential sql or code files". I'm hoping to hear what other teams might be doing and maybe there's even a really great tool for handling this somewhat generically?

score 9 · Answer 1 · answered Aug 18 '15 at 12:46

SQL scripts are the right way to go:

The information was originally created via a sql statements, most reliable repeat-ability would come from the same script
Import/export can also get tricky if two devs are modifying the same table.
It's hard to code review an export file.
It's hard to debug an export file

We have 5 database environments in our world: dev, staging, alpha, beta, production. Developers only run their own scripts against dev. They are responsible for checking these scripts into source control, just like the rest of our source. We have an automated deployment system which retrieves those scripts from source control and runs them against each successive environment.

score 4 · Answer 2 · edited Apr 12 '17 at 07:31

As you figured out by yourself, SQL is the way to go, but if you put the SQL commands into scripts or embed them in code does actually not make a big difference. What matters is that you make the upgrade process robust. Therefore, I would recommend to consider the following improvement over

track your changes in sequential sql or code files

Instead, track your changes (in sql and/or code files) which work even when they are not executed in the original sequence, or when they are executed twice. Make them idempotent. In one of our systems, we use a flag table for this purpose, where each change to the data of the database is marked by a flag. The pseudocode for upgrading the database then looks like this

 ' part added by developer A during development of feature "Foo"
 IF NOT CheckFlagTableIfUpgradeFooAlreadyApplied() THEN
     IF ApplyUpgradeFoo() = SUCCESS THEN
        MarkUpgradeFooInFlagTable()
        COMMIT
     ELSE
        ROLLBACK
        WriteToLog("UpgradeFooFailed")
     END IF
 END IF

 ' part added by developer B during development of feature "Bar"
 IF NOT CheckFlagTableIfUpgradeBarAlreadyApplied() THEN
     IF ApplyUpgradeBar() = SUCCESS THEN
         MarkUpgradeBarInFlagTable()
         COMMIT
     ELSE
        ROLLBACK
        WriteToLog("UpgradeBarFailed")
     END IF
 END IF

That way, you can always take an old database with some changes applied, and others not, and upgrade it to the newest state. For example, the above script will work when the changes "Foo" and "Bar" are missing, when only the change "Foo" was applied, only the change "Bar", or both. It also very easy to add postconditions that way, to deal with cases of interdependent updates. Schema changes can be handled similar, but you often do not even need a flag table for this: just implement a test method like DoesColumnXYZExist() for verifying if a specific column was already added to the schema.

This makes it much easier and more reliable when people work in parallel to integrate they database changes at different times to all their different development databases, and to staging, testing or production databases as well.

See also this former question on Programmers, where a tool named Liquibase was mentioned for implementing this process.

How do you sync changes to the data in your database?

2 Answers2