306

I try to understand the benefits of distributed version control system (DVCS).

I found Subversion Re-education and this article by Martin Fowler very useful.

Mercurial and others DVCS promote a new way of working on code with changesets and local commits. It prevents from merging hell and other collaboration issues

We are not affected by this as I practice continuous integration and working alone in a private branch is not an option, unless we are experimenting. We use a branch for every major version, in which we fix bugs merged from the trunk.

Mercurial allows you to have lieutenants

I understand this can be useful for very large projects like Linux, but I don't see the value in small and highly collaborative teams (5 to 7 people).

Mercurial is faster, takes less disk space and full local copy allows faster logs & diffs operations.

I'm not concerned by this either, as I didn't notice speed or space problems with SVN even with very large projects I'm working on.

I'm seeking for your personal experiences and/or opinions from former SVN geeks. Especially regarding the changesets concept and overall performance boost you measured.

UPDATE (12th Jan): I'm now convinced that it worth a try.

UPDATE (12th Jun): I kissed Mercurial and I liked it. The taste of his cherry local commits. I kissed Mercurial just to try it. I hope my SVN Server don't mind it. It felt so wrong. It felt so right. Don't mean I'm in love tonight.

FINAL UPDATE (29th Jul): I had the privilege to review Eric Sink's next book called Version Control by Example. He finished to convince me. I'll go for Mercurial.

10 Answers10

337

Note: See "EDIT" for the answer to the current question


First of all, read Subversion Re-education by Joel Spolsky. I think most of your questions will be answered there.

Another recommendation, Linus Torvalds' talk on Git: http://www.youtube.com/watch?v=4XpnKHJAok8. This other one might also answer most of your questions, and it is quite an entertaining one.

BTW, something I find quite funny: even Brian Fitzpatrick & Ben Collins-Sussman, two of the original creators of subversion said in one google talk "sorry about that" referring to subversion being inferior to mercurial (and DVCSs in general).

Now, IMO and in general, team dynamics develop more naturally with any DVCS, and an outstanding benefit is that you can commit offline because it implies the following things:

  • You don't depend on a server and a connection, meaning faster times.
  • Not being slave to places where you can get internet access (or a VPN) just to be able to commit.
  • Everyone has a backup of everything (files, history), not just the server. Meaning anyone can become the server.
  • You can commit compulsively if you need to without messing others' code. Commits are local. You don't step on each other's toes while committing. You don't break other's builds or environments just by committing.
  • People without "commit access" can commit (because commiting in a DVCS does not imply uploading code), lowering barrier for contributions, you can decide to pull their changes or not as an integrator.
  • It can reinforce natural communication since a DVCS makes this essential... in subversion what you have instead are commit races, which force communication, but by obstructing your work.
  • Contributors can team up and handle their own merging, meaning less work for integrators in the end.
  • Contributors can have their own branches without affecting others' (but being able to share them if necessary).

About your points:

  • Merging hell doesn't exist in DVCSland; doesn't need to be handled. See next point.
  • In DVCSs, everyone represents a "branch", meaning there are merges everytime changes are pulled. Named branches are another thing.
  • You can keep using continuous integration if you want. Not necessary IMHO though, why add complexity?, just keep your testing as part of your culture/policy.
  • Mercurial is faster in some things, git is faster in other things. Not really up to DVCSs in general, but to their particular implementations AFAIK.
  • Everyone will always have the full project, not only you. The distributed thing has to do with that you can commit/update locally, sharing/taking-from outside your computer is called pushing/pulling.
  • Again, read Subversion Re-education. DVCSs are easier and more natural, but they are different, don't try to think that cvs/svn === base of all versioning.

I was contributing some documentation to the Joomla project to help preaching a migration to DVCSs, and here I made some diagrams to illustrate centralized vs distributed.

Centralized

alt text

Distributed in general practice

alt text

Distributed to the fullest

alt text

You see in the diagram there is still a "centralized repository", and this is one of centralized versioning fans favourite arguments: "you are still being centralized", and nope, you are not, since the "centralized" repository is just repository you all agree on (e.g. an official github repo), but this can change at any time you need.

Now, this is the typical workflow for open-source projects (e.g. a project with massive collaboration) using DVCSs:

alt text

Bitbucket.org is somewhat of a github equivalent for mercurial, know that they have unlimited private repositories with unlimited space, if your team is smaller than five you can use it for free.

The best way you can convince yourself of using a DVCS is trying out a DVCS, every experienced DVCS developer that has used svn/cvs will tell you that is worth it and that they don't know how they survived all their time without it.


EDIT: To answer your second edit I can just reiterate that with a DVCS you have a different workflow, I'd advise you not to look for reasons not to try it because of best practices, it feels like when people argue that OOP is not necessary because they can get around complex design patterns with what they always do with paradigm XYZ; you can benefit anyways.

Try it, you'll see how working in "a private branch" is actually a better option. One reason I can tell about why the last is true is because you lose the fear to commit, allowing you to commit at any time you see fit and work a more natural way.

Regarding "merging hell", you say "unless we are experimenting", I say "even if you are experimenting + maintaing + working in revamped v2.0 at the same time". As I was saying earlier, merging hell doesn't exist, because:

  • Everytime you commit you generate an unnamed branch, and everytime your changes meet other persons' changes, a natural merge occurs.
  • Because DVCSs gather more metadata for each commit, less conflicts occur during merging... so you could even call it an "intelligent merge".
  • When you do bump into merge conflicts, this is what you can use:

alt text

Also, project size doesn't matter, when I switched from subversion I actually was already seeing the benefits while working alone, everything just felt right. The changesets (not exactly a revision, but a specific set of changes for specific files you include a commit, isolated from the state of the codebase) let you visualize exactly what you meant by doing what you were doing to a specific group of files, not the whole codebase.

Regarding how changesets work and the performance boost. I'll try to illustrate it with an example I like to give: the mootools project switch from svn illustrated in their github network graph.

Before

alt text

After

alt text

What you are seeing is developers being able to focus on their own work while commiting, without the fear of breaking others' code, they worry about breaking others' code after pushing/pulling (DVCSs: first commit, then push/pull, then update) but since merging is smarter here, they often never do... even when there is a merge conflict (which is rare), you only spend 5 minutes or less fixing it.

My recommendation to you is to look for someone that knows how to use mercurial/git and to tell him/her to explain it to you hands-on. By spending about half an hour with some friends in the command line while using mercurial with our desktops and bitbucket accounts showing them how to merge, even fabricating conflicts for them to see how to fix in a ridiculous ammount of time, I was able to show them the true power of a DVCS.

Finally, I'd recommend you to use mercurial+bitbucket instead of git+github if you work with windows folks. Mercurial is also a tad more simple, but git is more powerfull for more complex repository management (e.g. git rebase).

Some additional recommended readings:

Kyralessa
  • 3,724
dukeofgaming
  • 14,023
  • 6
  • 52
  • 77
58

What you are saying is among other things that if you essentially stay on single branch, then you don't need distributed version control.

That is true, but isn't it a needlessly strong restriction to your way of working, and one that doesn't scale well to multiple locations in multiple timezones? Where should the central subversion server be located, and should everybody go home if that server for some reason is down?

DVCSes are to Subversion, what Bittorrent is to ftp

(technically, not legally). Perhaps if you think that over, you might understand why it is such a large leap forward?

For me, our switch to git, immediately resulted in

  • Our backups being easier to do (just "git remote update" and you're done)
  • Easier to commit small steps when working without access to the central repository. You just work, and synchronize when you come back to the network hosting the central repository.
  • Faster Hudson builds. Much, much faster to use git pull than updating.

So, consider why bittorrent is better than ftp, and reconsider your position :)


Note: It has been mentioned that there are use-cases where ftp is quicker and than bittorrent. This is true in the same way that the backup file your favorite editor maintains is quicker to use than a version control system.

47

The killer feature of distributed version control systems is the distributed part. You don't checkout a "working copy" from the repository, you clone an entire copy of the repository. This is huge, as it provides powerful benefits:

  • You can enjoy the benefits of version control, even when you don't have internet access such as... This one is unfortunately overused and overhyped as a reason DVCS is awesome---it is just not a strong selling point as many of us find ourselves coding without internet access about as often as it starts raining frogs.

  • The real reason having a local repository is killer is that you have total control over your commit history before it gets pushed to the master repository.

Ever fixed a bug and ended up with something like:

r321 Fixed annoying bug.
r322 Argh, unexpected corner case to annoying bug in r321!
r323 Ok, really fixed corner case in r322
r324 Oops, forgot to remove some debugging code related to r321
...

And so on. History like that is messy---there was really only one fix, but now the implementation is spread between many commits that contain unwanted artifacts such as the addition and removal of debugging statements. With a system like SVN, the alternative is to not commit (!!!) until everything is working in order to keep the history clean. Even then, mistakes slip by and Murphy's Law is waiting to brutalize you when significant amounts of work are not protected by version control.

Having a local clone of the repository, that you own, fixes this as you can re-write history by continuously rolling the "fix it" and "oops" commits into the "bug fix" commit. At the end of the day, one clean commit gets sent to the master repository that looks like:

r321 Fixed annoying bug.

Which is the way it should be.

The ability to re-write history is even more powerful when combined with the branching model. A developer can do work that is entirely isolated within a branch and then when it's time to bring that branch into the trunk you have all sorts of interesting options:

  • Do a plain vanilla merge. Brings everything in warts and all.

  • Do a rebase. Allows you to sort through the branch history, re-arrange the order of commits, throw commits out, join commits together, re-write commit messages---even edit commits or add new ones! A distributed version control system has deep support for code review.

Once I learned how local repositories allowed me to edit my history for the sake of the sanity of my fellow programmers and my future self, I hung up SVN for good. My Subversion client is now git svn.

Allowing developers and managers to exercise editorial control over commit history results in better project history and having clean history to work with really helps my productivity as a programmer. If all this talk about "rewriting history" scares you, don't worry as that is what central, public or master repositories are for. History may (and should!) be re-written up to the point where someone brings it into a branch in a repository that other people are pulling from. At that point the history should be treated as though carved upon a stone tablet.

Sharpie
  • 922
18

dukofgamings answer is probably about as good as it can get, but I want to approach this from a different direction.

Let us assume that what you say is absolutely true, and that by applying good practices you can avoid the problems that DVCS was designed to fix. Does this mean that a DVCS would offer you no advantage? People stink at following best practices. People are going to mess up. So why would you avoid the software that is designed to fix a set of problems, choosing instead to rely on people to do something that you can predict in advance they are not going to do?

philosodad
  • 1,785
9

Yes, it hurts when you have to merge large commits in subversion. But this is also a great learning experience, making you do everything possible to avoid merging conflicts. In other words, you learn to check in often. Early integration is a very good thing for any co-located project. As long as everyone is doing that, using subversion shouldn't be much of a problem.

Git, for instance, was designed for distributed work and encourages people to work on their own projects and create own forks for an (eventual) merge later. It was not specifically designed for continuous integration in a "small and highly collaborative teams" which is what the OP is asking for. It is rather the opposite, come to think of it. You wont have any use for its fancy distributed features if all you're doing is sitting in the same room working together on the same code.

So for a co-located, CI-using team, I really don't think it matters much if you use a distributed system or not. It boils down to a matter of taste and experience.

8

Because you should continuously challenge your own knowledge. You are fond of subversion, and I can understand because I used it for many years, and was very happy about it, but that doesn't mean that it is still the tool that would suit you best.

I believe that when I started using it, it was the best choice at the time. But other tools come up over time, and now I prefer git, even for my own spare time projects.

And subversion does have some shortcomings. E.g. if you rename a directory on disc, then it is not renamed in the repository. File move is not supported, making a file move a copy/delete operation, making merging changes when files have been moved/renamed difficult. And merge tracking is not really build into the system, rather implemented in the form of a workaround.

Git does solve these problems (including automatically detecting if a file has been moved, you don't even need to tell it that is a fact).

On the other hand git does not allow you to branch on individual directory levels like subversion does.

So my answer is, you should investigate alternatives, see if it fits your needs better than what you are familiar with, and then decide.

Pete
  • 9,016
7

With respect to performance, Git or any other DVCS has a big advantage over SVN when you have to switch from one branch to another, or to jump from one revision to another. As everything is stored locally, things are much quicker than for SVN.

This alone could make me switch!

Xavier Nodet
  • 3,754
3

Instead hanging onto the idea that "by applying best practices you don't need a DVCS", why not consider that the SVN workflow is one workflow, with one set of best-practices, and the GIT/Hg workflow is a different workflow, with a different set of best practices.

git bisect (and all of its implications on your main repository)

In Git, a very important principle is that you can find bugs using git bisect. To do this, you take the last version you ran that was known to work, and the first version you ran that was known to fail, and you perform (with Git's help) a binary search to figure out which commit caused the bug. To do this, your entire revision history has to be relatively free of other bugs that can interfere with your bug search (believe it or not, this actually works pretty well in practice, and Linux kernel developers do this all the time).

To achieve git bisect capability, you develop a new feature on its own feature branch, rebase it and clean up the history (so you don't have any known non-working revisions in your history -- just a bunch of changes that each get you partway to fixing the problem), and then when the feature is done, you merge it into the main branch with working history.

Also, to make this work, you have to have discipline about which version of the main branch you start your feature branch from. You can't just start from the current state of the master branch because that may have unrelated bugs -- so the advice in the kernel community is to start work from the latest stable version of the kernel (for big features), or to start work from the latest tagged release candidate.

You can also back up your intermediate progress by pushing the feature branch to a server in the meantime, and you can push the feature branch to a server to share it with someone else and elicit feedback, before the feature is complete, before you have to turn code into a permenant feature of the codebase that everybody* in your project has to deal with.

The gitworkflows man page is a good introduction to the workflows that Git is designed for. Also Why Git is Better than X discusses git workflows.

Large, distributed projects

Why do we need lieutenants in a highly collaborating team with good practices and good design habits?

Because in projects like Linux, there are so many people involved who are so geographically distributed that it's hard to be as highly collaborating as a small team that shares a conference room. (I suspect that for developing a large product like Microsoft Windows that even if people are all located in the same building, the team is just too large to maintain the level of collaboration that makes a centralized VCS work without lieutanants.)

Ken Bloom
  • 2,404
2

Why not use them in tandem? On my current project we are forced to use CVS. However, we also keep local git repositories in order to do feature development. This is the best of both worlds imo because you can try various solutions and keep versions of what you're working on, on your own machine. This allows you to rollback to previous versions of your feature or try several approaches without getting into issues when you mess up your code. Having a central repository then gives you the benefits of having a centralized repository.

Vadim
  • 644
1

I have no personal experience with DVCS, but from what I gather from the answers here and some linked documents, the most fundamental difference between DVCS and CVCS is the used working model

DVCS

The working model of DVCS is that you are doing isolated development. You are developing your new feature/bugfix in isolation from all other changes until the moment you decide to release it to the rest of the team. Until that time, you can do whatever check-ins you like, because nobody else is going to be bothered with it.

CVCS

The working model of CVCS (in particular Subversion) is that you are doing collaborative development. You are developing your new feature/bugfix in direct collaboration with all the other team members and all changes are immediately available to all.

Other differences

Other differences between svn and git/hg, such as revisions vs changesets are incidental. It is very well possible to create a DVCS based on revisions (as Subversion has them) or a CVCS based on changesets (as Git/Mercurial have them).

I am not going to recommend any particular tool, because it mostly depends on the working model that you (and your team) are most comfortable with.
Personally, I have no problems with working with a CVCS.

  • I have no fear of checking in stuff, as I have no problems getting it into a incomplete, but compilable state.
  • When I experienced merge-hell, it was in situations where it would have occurred in both svn and git/hg. For example, V2 of some software was being maintained by a different team, using a different VCS, while we were developing V3. Occasionally, bugfixes would have to be imported from the V2 VCS to the V3 VCS, which basically meant doing a very large check-in on the V3 VCS (with all bugfixes in a single changeset). I know it was not ideal, but it was a management decision to use different VCS systems.