67

I've been reading Martin Fowler's note on Continuous Integration and he lists as a must "Everyone Commits To the Mainline Every Day".

I do not like to commit code unless the section I'm working on is complete and that in practice I commit my code every three days: one day to investigate/reproduce the task and make some preliminary changes, a second day to complete the changes, and a third day to write the tests and clean it up^ for submission. I would not feel comfortable submitting the code sooner.

Now, I pull changes from the repository and integrate them locally usually twice a day, but I do not commit that often unless I can carve out a smaller piece of work.

Question: is committing everyday such a good practice that I should change my workflow to accomodate it, or it is not that advisable?

Edit: I guess I should have clarified that I meant "commit" in the CVS meaning of it (aka "push") since that is likely what Fowler would have meant in 2006 when he wrote this.

^ The order is more arbitrary and depends on the task, my point was to illustrate the time span and activities, not the exact sequence.

Thomas Owens
  • 85,641
  • 18
  • 207
  • 307
Sled
  • 1,908

9 Answers9

111

I commit code several times a day. Whenever I reach a point where the code is complete enough to compile and doesn't break other things, it goes in.

You should look at breaking up your work so you can safely check-in a few times a day.

The rationales for this are two:

  1. Any work that is not checked in may be lost - your computer may have a catastrophic failure. In this case, the longer you wait, the more work you lose.
  2. The more work you do without checking in, the more code others will need to integrate when you finally decide that it bakes. This introduces more chances of conflicts and merge issues.
N4TKD
  • 214
Oded
  • 53,734
46

I do not agree with this rule and I agree with what Mason Wheeler said. I would like to add a few ideas.

I try to commit every time I have a meaningful change to commit: this can be several times a day if I fix several small bugs, or once a week if I am working on a larger piece of software that cannot be used by the rest of the code in any meaningful way until it reaches a consistent state.

Also, I interpret committing as publishing a meaningful revision that contributes new functionality to the code base. I think one should try to clean up the code before committing so that other developers can understand the meaning and the purpose of the change when they look at the revision history. The fewer changes other developers see in the history, the better: when I look at the revision history I want to see increments that add some meaningful functionality; I am not interested in every small idea each developer had and wanted to try out before they reached the solution.

Furthermore, I do not think it is a good idea to use the SVN server (or whatever version control system) as a backup facility to which the current snapshot of the code is committed (provided that it compiles): you can use a USB stick or an external USB-drive or a network disk to mirror your current code so that it does not get lost if your computer breaks down. Revision control and data backup are two different things. Publishing a revision is not the same as saving a snapshot of your code.

Finally, I think that it should not be a problem to commit every now and then (i.e. only when one is really satisfied with the current state of the code) and avoiding merge conflicts is not a good justification for committing (too) often. Many merge conflicts happen when different people work on the same files at the same time, which is a bad practice (see e.g. this article, point 7). Merge conflicts should be reduced by splitting a project into modules with clear interfaces and as few dependencies as possible, and by coordinating the work of developers so that the code they work on overlaps as little as possible.

Just my 2 cents.

EDIT

Another reason against premature commits that came to my mind is that a (very) buggy version cannot be tested. If you are committing on the trunk and your test team is testing every day, they might have no testable version for a few hours (or for a day). Even if you do not try to fix the bug and just revert your changes, a rebuild can take a couple of hours. With, say, five testers working in your team, you have wasted 5 x 2 = 10 hours of the team's time due to inactivity. It happened to me once so I really try to avoid premature commits in the name of commit as soon as possible.

Giorgio
  • 19,764
41

Slavishly adhering to any methodology or practice without understanding the reasons behind it is never a good idea. That's where cargo-cult programming comes from.

Therefore, "I should commit every day because Martin Fowler said so" is just stupid. And sometimes it's impractical too. If you're working on a complicated new feature, you might not reach a point where it's worth checking in until you've already worked on it for a few days.

This doesn't mean you should make sure everything's perfect before checking it in. That's a good way to lose work if something goes wrong. The correct thing to do is to develop and use good judgment on the matter. Rules of thumb can only help you so much.

Mason Wheeler
  • 83,213
15

Oded gave two important reasons to commit code as frequently as possible. I'll add a few more:

  1. While working on your piece of code, other might need some functions on that code. They shouldn't wait 6 days to get it. In this case my colleagues usually create a prototype in my piece of code, commit it, I add the body and commit it again. And this is usually done in a few hours.

  2. The 'common' code is for everyone to see every change as soon as possible. If the piece of code you're working on is totally separate from others' work and you will not have them wait, then it is recommended to create a branch for you to work on, and then, if everything is successful, merge it to the mainline.

superM
  • 7,373
8

I'm a strong believer in committing every logical change that is worth keeping. Commit often, and if code isn't worth keeping, revert it back to a clean state. The longer you wait to push/publish your code back, the harder it is to implement, and the more problems you'll run into. You'll also get feedback about your contributions a lot quicker:

  • do they break the build?
  • are you duplicating another team member's efforts?
  • are you doing something incorrect?
  • or are people waiting on things from you?

Small changes are a lot easier to manage.

Also, it's worth noting the difference between different version control systems. Some, such as Git (distributed), will allow you to commit and control your entire history locally, only pushing when you are ready to publish. Others, like SVN (centralized), will combine the two steps making small commits very inefficient.

Don't forget that your commits are essentially change documentation. When things go wrong, you'll be glad to have more history than not enough. A single commit for a weeks work seems useless to me. I'd just end up reading every single line of code changed rather than the summary of each logical chunk.

5

I think most of the answers here misses one of the main points in Martin Fowlers statement. This is related to Continuous Integration. Code that isn't checked in (pushed/published/merged) into the mainline isn't tested.

This should not be read as an encouragement to commit whatever code you have in your local machine whenever it's time to leave the office. As pointed out by several others here that would be bad, would break the build and cause an unstable mainline.

However, it is an encouragement to try to make your changes in small steps that can be checked in to the mainline without causing problems. This encourages evolution of the code instead of ripping it all apart and rewriting.

Now, what's good about this way of working?

  1. Not committing large chunks of code or revolutionary changes reduces the chance of breaking the build.
  2. If your commit breaks the build it is fairly trivial to identify what the problems are, to revert it and then commit a fixed version quickly.
  3. By making sure all tests run on every small change in the code, you ensure that you don't introduce subtle bugs or regressions that can come from having code grow outside of the continuous integration scheme.

Of course not all changes lend themselves to this approach. As others pointed out, no rule is absolute. However, for changes that are expected to stay out of mainline for a long time, set up an alternative mainline with it's own continuous integration scheme and follow the same approach towards it. With the distributed VCS's of today that's a fairly easy thing to do.

harald
  • 1,953
3

Arguments for checking in every day:

  • Code is stored and backed up against harddrive failure
  • Activity can be recorded in commit notes (what did I do on Thursday...?)
  • Integration with existing code base happens earlier and in smaller chunks, hopefully identifying conflicts or merge issues sooner
  • Your team have visibility of what you have been working on
  • Your colleagues can work against your interfaces sooner, giving them more time to integrate with your 'big complex bit of code'
  • Your code will be real-world tested sooner, or at least exposed to more use than you will give it, leading to earlier identification of bugs or omissions.

Arguments against checking in every day:

  • Don't need to or don't want to
  • Haven't 'cleaned up' my code yet, it's a mess
  • Don't have time

I don't believe there's any good reason to check in less than daily apart from laziness or disorganisation. Nothing worse than see the code running in the development environment doesn't match the code in the development branch because someone 'hasn't finished yet' and thus hasn't checked in.

I'd love to be wrong on this so please let me know any legitimate argument against daily check-in.

2

If you're meaning "commit" as "merge into mainline", then you definitely should not be doing that everyday on a software project that's being released to the customers. You should be merging changes that are done and tested, so that the mainline is always working and releasable, and not in some broken state with half-finished features.

However, the luxury of working with today's distributed version control is that you can both keep mainline stable, and at the same time do your git/hg/whatever commit every time you feel you want to preserve the state of things. I do this once every few hours and definitely at the end of every day.

With DVCS, you can publish your work, collaborate on it with others in your team, and keep it up to date with changes in the mainline branch. You can do all this without polluting the stability of code your customers and/or other teams depend on.

In times when Subversion was the latest technology and there was no way to fork and merge feature branches without extreme pain, having a mainline where several different features were in simultaneous construction might have been the best approach. But this superiority does not scale beyond 2010.

che
  • 236
2

In Team Foundation Server you can 'Shelve' which is not the same as a check in, but just makes a backup of your code so that if your machine dies you have not lost the changes.

I have also seen software houses that have a 'developer line' and a 'mainline'. Devs are free to check in to the developer line whenever they deem fit and only the team leader has access to the mainline so they are responsible for copying code from dev to main when it is production ready.