52

Background

I'm working in a team that's looking to implement zero-downtime deployments. We're planning on using a blue/green deployment strategy in order to achieve this. One of the things I'm realising in doing the research is how complicated it becomes to make database changes. A simple operation like renaming a column can take 3 full release cycles until it's completed!

It seems to me that having the full rollout of a change take multiple release cycles introduces a lot of potential for human error. In the linked article it shows that code changes are necessary for 2 releases and a database migration is needed for 3 releases.

What I'm looking for

Currently, if we want to remember to do something, we can create a ticket in our issue management system, which creates clutter and also might get moved to a later sprint or the backlog by management; or we can create a TODO comment, which will probably be forgotten about completely.

What I'm looking for is a way that a TODO comment can have a deadline against it, and our Continuous Integration system (current undecided which we'll use) would reject the build if this deadline was expired.

For example, if we rename a column we could create the initial migration for it, and then two TODO comments to ensure that the remaining two migrations are created:

// TODO by v55: Create migration to move constraints to new column, remove references to old column in app
// TODO by v56: Create migration to drop old column

This seems fairly simple to implement, but I'm wondering if something like this already exists, because I don't want to re-invent the wheel.

Additional thoughts

I feel like I might be suffering from XY problem here, given that rolling deployments and blue/green deployments are considered a best-practice it seems strange that I can't find a solution for making database updates less painful. If you think I'm looking into the wrong thing entirely, please let me know in a comment! That said, the database example I gave is just one example, and I think TODO comments with due dates would be useful in other situations too, so even if I'm approaching this specific situation all wrong I'd really like to answers to my actual question too. Thanks!

EDIT: I just thought of another situation where this could be helpful. If you use Feature Toggles to turn on parts of your app when they are ready, you have to be careful to clean them up, otherwise you may end up with Toggle Debt. Comments with deadlines could be a good way of remembering this.

7 Answers7

54

This question is really two questions in one.

Todo comments

Of all the ways to track action items, this is the worst. TODO comments are good during active work or as a way of suggestion to a maintainer, "here is something that could maybe be improved on in the future". But if you rely on TODO comments for getting work done, you're doomed to fail.

What to do about it

TODO comments are basically technical debt, so they should be handled like any other technical debt. Either tackle them right away, if you have time, or put them in the backlog so they can be tracked and prioritized.

Generally speaking, and this is totally opinionated and open for debate, TODO comments could be considered a code smell. If a TODO comment makes it as far as being checked into version control, you have to ask yourself, are you actually going to follow through on it right now? If not, that's ok. Just be honest with yourself and put it in the backlog.

How you manage this backlog comes down to business process, company politics, and perhaps some personal autonomy. But you still need a tracked and prioritized backlog to make sure it happens.

Database changes

Yes, database changes are tricky with a zero-downtime policy. Some tricks to help make it less painful:

Post-deploy process

Create a post-deploy process that runs as part of the same release. However you want it to work. On the last system I worked on, I designed a 4-phase deployment:

  1. preapp database scripts
  2. web apps
  3. postapp database scripts
  4. maintenance window database scripts

The idea was that wherever possible, we would put as much of the database changes into preapp as possible.

Postapp was reserved for the unusual cases where we needed to make incompatible schema changes. In those cases, preapp would make enough of a change to make the new application code compatible (maybe creating a temporary view for compatibility), and postapp would cleanup any such temporary artifacts.

Maintenance window phase was reserved for changes which truly required downtime or where the risk or cost of a live deployment was not worth it. For example, scripts that change massive amounts of data may need to lock an entire table.

Deploy frequently

If you deploy new releases frequently enough, you can reach a point where carrying a change across 2 or 3 releases is trivial. Long release cycles amplify the cost of database changes.

Brandon
  • 4,575
24

Do not use TODOs. You already have a TODO list in your project. It's called the issue tracker.

I think the real problem is in this sentence:

we can create a ticket in our issue management system, which creates clutter and also might get moved to a later sprint or the backlog by management.

If your issue tracker creates to much clutter, find ways to fix that. Maybe a special issue type/tag that involves less ceremony. Maybe sub-issues. Maybe less ceremony altogether. We can't really tell. But if your issue tracker creates so much work, that people rather formulate an elaborate question on a public forum than to just add that issue, something is seriously wrong.

If your management unduly delays the last part of a task you have two options:

  1. talk to your management why this is a bad idea.

  2. handle it as a single task. This might be the gold standard solution. In a perfect world you should be able to make the three changes needed in each step. Apply one to the master branch, let it build and deploy. In the meantime apply the second to the master branch, let it build and deploy and so on so that everything happens in the same sprint, and if it doesn't it's not done. Maybe even something automatic makes sense where you logically do one deployment, but it is actually split into 3.

9

What I'm looking for is a way that a TODO comment can have a deadline against it, and our Continuous Integration system (current undecided which we'll use) would reject the build if this deadline was expired.

What your asking for is doable if you're willing to do the work and follow through.

// TODO by v55: Create migration to move constraints to new column, remove references to old column in app // TODO by v56: Create migration to drop old column

grep for //TODO by v55 when it's time to deploy v55. Deploy build runs a script that does that as an integration test.

You can tie 55 into your version tracking or just prompt for it.

It gets interesting if you want to check for //TODO by v54 when doing 55. Rather then search the code base 55 times just search for //TODO by. Then filter that result for 1 to 55. Now 56 won't trigger a fail.

You might think "oh we won't need that. We'll fix these every time so long as we have the check". No. No you won't.

candied_orange
  • 119,268
4

We had a very similar problem in our team. To solve this we wrote a static analysis check which handles these TODO's by checking the JIRA issue or Git Issue that they reference. Our build fails when the specified Issue moves past the "In Development" column.

Therefore we can comfortably have TODO's without worrying about them getting forgotten about.

I created an open-source implementation of this, in Java. Yes, a disclaimer is that I wrote this, but like I said it is completely open sourced and licensed.

The tool is called Westie and an example of the the Jira issue checker is on the README.md. See also the GitIssueAnalyser.

To prevent self promoting if you have any further questions, send me a message. If you decide to use it and have any suggestions, please raise any issues on github.

Algamest
  • 149
4

Don't ToDo. Do it now.

TLDR: Write (and test) your DB scripts now, not later; just code them so their execution is contingent on DB version.

Example

For an example, let's imagine that you wish to change a column name from SSN to TaxID, a common requirement when going international.

To make this happen, maybe you will temporarily have both a TaxID and an SSN column. And while supporting both versions, you will have a trigger to update one from the other. But you don't want to keep that trigger around indefinitely, so later, when backward compatiblity is no longer needed, you want that trigger removed (and the SSN column dropped). We are going to code all that up front without any need for ToDo items.

In our example, we will be deploying build 102 (which has the new column) while maintaining compatibility with build 101 (which doesn't).

Here are the steps.

1. Set up versioning table

  1. Add a single table called Configuration with two columns, Name and Value.

  2. Add a row with a Name of "TargetVersion" and set the Value to the version of the new build to be deployed.

  3. Add a row with a Name of "CompatibleWith" and set the Value to the minimum version number with which the deployment must be compatible.

Inspect and update these rows before every deployment.

2. Modify deployment scripts

  1. Add a script that creates a new column of TaxID, side by side with SSN, and populates it from the SSN column. Enclose this code in an If statement that checks TargetVersion; if the target version is too low (i.e. the TaxID isn't needed yet), skip.

    SELECT @TargetVersion = TargetVersion FROM Configuration
    IF @TargetVersion < '102' THEN RETURN
    ALTER TABLE Customer ADD COLUMN taxID VarChar(12) NOT NULL
    UPDATE Customer SET TaxID = SSN
    
  2. Add a script that creates a trigger that populates TaxID when inserting or updating SSN and vice versa. Enclose this code in an If statement that checks the target version and compatible version; skip if TargetVersion is too low (the TaxID isn't needed) or if the CompatibleWith version is too high (the SSN field isn't needed).

    SELECT @TargetVersion  = TargetVersion,
           @CompatibleWith = CompatibleWith 
    FROM Configuration
    IF @TargetVersion  < '102' THEN RETURN
    IF @CompatibleWith > '101' THEN RETURN
    CREATE TRIGGER SSNAndTaxIDTrigger ON Customer etc.
    
  3. Add a script to remove the SSN column. Enclose in an If statement that removes the column only if the CompatibleWith version is high enough (SSN is no longer needed).

    SELECT @CompatibleWith = CompatibleWith FROM Configuration
    IF @CompatibleWith <= '101' THEN RETURN
    IF OBJECT_ID('SSNAndTaxIDTrigger') IS NOT NULL DROP TRIGGER SSNAndTaxIDTrigger
    IF EXISTS (SELECT * FROM syscolumns c JOIN sysobject o ON o.id = c.is WHERE o.Name = 'Custeomr' AND c.Name = 'SSN') BEGIN
        ALTER TABLE Customer DROP COLUMN SSN
    END
    

3. Testing

Be sure to test your deployment with any combination of Blue/Green version numbers that you wish to be able to support in production. You can test as soon as the code is ready, by manipulating the Configuration table in your QA environment.

4. In your deployment playbook

Add a step for an engineer to update the CompatibleWith version and TargetVersion rows. If you are deploying to Blue, set TargetVersion to Blue's version number and the CompatibleWith version to Green's version number; reverse them if you're deploying Green.

Pitfalls

It's OK for your deployment scripts to reference and rely on those version numbers held in that DB table. NOT runtime code.

If you start writing your runtime code to inspect version numbers, you are introducing a new level of complexity in your application that could potentially become a huge maintainability problem. Each runtime execution path must be tested; if you carry around these conditions going forward, QA will have to put together a matrix of pain to validate them with each and every release. My advice is to keep conditions like these in deployment scripts only.

The outcome of all this

In the end, you should be able to write all the code up front (and test it, too) without fear that it will be executing too early. Also, the code will clean up the backward-compatibility trigger when the time comes without you needing to worry about it further.

This way you can write and test all the code up front, when you're thinking about it, and you don't need to deal with those messy ToDo comments.

John Wu
  • 26,955
1

You're getting a lot of pushback on your TODO idea, but I personally see no problem with it. In the end, the best (and easiest) way to make sure the migration goes into production is by failing a unit test if it doesn't. It will literally take you less than a minute to stub out an empty migration function that throws an exception if the version is 55 or more (or whatever the requirements are).

Then if you try to release it, you'll end up with a failed test, and someone will have to turn that exception into an actual migration code.

Eternal21
  • 1,604
-2

No one seems to be focusing on the root of his complaint which is the fact that database changes can take too many release cycles. He wants to carry on with his blue/green deployment schedule and the solution should already be there but unless I'm missing something his description seems to indicate that there is only one database which is shared by both systems. Not a true Blue/Green system if that is the case. Since it appears that the database is the long pole in the tent it should be duplicated as well so that no matter how long or how many release cycles it takes to implement the database changes on the offline system they don't go live until completed and fully tested. In the interim offline system scripts can keep the offline database fully updated daily.