120

My office is trying to figure out how we handle branch splits and merges, and we've run into a big problem.

Our issue is with long-term sidebranches -- the kind where you've got a few people working a sidebranch that splits from master, we develop for a few months, and when we reach a milestone we sync the two up.

Now, IMHO, the natural way to handle this is, squash the sidebranch into a single commit. master keeps progressing forward; as it should - we're not retroactively dumping months of parallel development into master's history. And if anybody needs better resolution for the sidebranch's history, well, of course it's all still there -- it's just not in master, it's in the sidebranch.

Here's the problem: I work exclusively with the command line, but the rest of my team uses GUIS. And I've discovered the GUIS don't have a reasonable option to display history from other branches. So if you reach a squash commit, saying "this development squashed from branch XYZ", it's a huge pain to go see what's in XYZ.

On SourceTree, as far as I'm able to find, it's a huge headache: If you're on master, and you want to see the history from master+devFeature , you either need to check master+devFeature out (touching every single file that's different), or else scroll through a log displaying ALL your repository's branches in parallel until you find the right place. And good luck figuring out where you are there.

My teammates, quite rightly, do not want to have development history so inaccessible. So they want these big, long development-sidebranches merged in, always with a merge commit. They don't want any history that isn't immediately accessible from the master branch.

I hate that idea; it means an endless, unnavigable tangle of parallel development history. But I'm not seeing what alternative we have. And I'm pretty baffled; this seems to block off most everything I know about good branch management, and it's going to be a constant frustration to me if I can't find a solution.

Do we have any option here besides constantly merging sidebranches into master with merge-commits? Or, is there a reason that constantly using merge-commits is not as bad as I fear?

Zanon
  • 329
Standback
  • 1,330

7 Answers7

245

Even though I use Git on the command line – I have to agree with your colleagues. It is not sensible to squash large changes into a single commit. You are losing history that way, not just making it less visible.

The point of source control is to track the history of all changes. When did what change why? To that end, every commit contains pointers to parent commits, a diff, and metadata like a commit message. Each commit describes the state of the source code and the complete history of all changes that led up to that state. The garbage collector may delete commits that are not reachable.

Actions like rebasing, cherry-picking, or squashing delete or rewrite history. In particular, the resulting commits no longer reference the original commits. Consider this:

  • You squash some commits and note in the commit message that the squashed history is available in original commit abcd123.
  • You delete[1] all branches or tags that include abcd123 since they are merged.
  • You let the garbage collector run.

[1]: Some Git servers allow branches to be protected against accidental deletion, but I doubt you want to keep all your feature branches for eternity.

Now you can no longer look up that commit – it just doesn't exist.

Referencing a branch name in a commit message is even worse, since branch names are local to a repo. What is master+devFeature in your local checkout might be doodlediduh in mine. Branches are just moving labels that point to some commit object.

Of all history rewriting techniques, rebasing is the most benign because it duplicates the complete commits with all their history, and just replaces a parent commit.

That the master history includes the complete history of all branches that were merged into it is a good thing, because that represents reality.[2] If there was parallel development, that should be visible in the log.

[2]: For this reason, I also prefer explicit merge commits over the linearized but ultimately fake history resulting from rebasing.

On the command line, git log tries hard to simplify the displayed history and keep all displayed commits relevant. You can tweak history simplification to suit your needs. You might be tempted to write your own git log tool that walks the commit graph, but it is generally impossible to answer “was this commit originally committed on this or that branch?”. The first parent of a merge commit is the previous HEAD, i.e. the commit in the branch that you are merging into. But that assumes that you didn't do a reverse merge from master into the feature branch, then fast-forwarded master to the merge.

The best solution to long-term branches I've encountered is to prevent branches that are only merged after a couple of months. Merging is easiest when the changes are recent and small. Ideally, you'll merge at least once per week. Continuous integration (as in Extreme Programming, not as in “let's set up a Jenkins server”), even suggest multiple merges per day, i.e. not to maintain separate feature branches but share a development branch as a team. Merging before a feature is QA'd requires that the feature is hidden behind a feature flag.

In return, frequent integration makes it possible to spot potential problems much earlier, and helps to keep a consistent architecture: far reaching changes are possible because these changes are quickly included in all branches. If a change breaks some code, it will only break a couple of days work, not a couple of months.

History rewriting can make sense for truly huge projects when there are multiple millions lines of code and hundreds or thousands of active developers. It is questionable why such a large project would have to be a single git repo instead of being divided into separate libraries, but at that scale it is more convenient if the central repo only contains “releases“ of the individual components. E.g. the Linux kernel employs squashing to keep the main history manageable. Some open source projects require patches to be sent via email, instead of a git-level merge.

amon
  • 135,795
112

I like Amon's answer, but I felt one small part needed a lot more emphasis: You can easily simplify history while viewing logs to meet your needs, but others cannot add history while viewing logs to meet their needs. This is why keeping the history as it occurred is preferable.

Here's an example from one of our repositories. We use a pull-request model, so every feature looks like your long running branches in history, even though they usually only run a week or less. Individual developers sometimes choose to squash their history before merging, but we often pair up on features, so that's relatively unusual. Here's the top few commits in gitk, the gui that comes bundled with git:

standard gitk view

Yes, a bit of a tangle, but we also like it because we can see precisely who had what changes at what time. It accurately reflects our development history. If we want to see a higher-level view, one pull request merge at a time, we can look at the following view, which is equivalent to the git log --first-parent command:

gitk view with --first-parent

git log has many more options designed to give you precisely the views you want. gitk can take any arbitrary git log argument to build a graphical view. I'm sure other GUIs have similar capabilities. Read the docs and learn to use it properly, rather than enforcing your preferred git log view on everyone at merge time.

Karl Bielefeldt
  • 148,830
34

Our issue is with long-term sidebranches -- the kind where you've got a few people working a sidebranch that splits from master, we develop for a few months, and when we reach a milestone we sync the two up.

My first thought is - don't even do this unless absolutely necessary. Your merges must be challenging sometimes. Keep branches independent and as short-lived as possible. It's a sign that you need to break your stories up into smaller implementation chunks.

In the event that you have to do this, then it is possible to merge in git with --no-ff option so that the histories are kept distinct on their own branch. The commits will still appear in the merged history but can also be seen separately on the feature branch so that at least it's possible to determine which line of development they were part of.

I have to admit when I first started using git I found it a little strange that the branch commits appeared in the same history as the main branch after the merge. It was a little disconcerting because it didn't seem like those commits belonged in that history. But in practice, it's not something that's really painful at all, if one considers that the integration branch is just that - its whole purpose is to combine the feature branches. In our team, we don't squash, and we do frequent merge commits. We use --no-ff all the time to ensure that its easy to see the exact history of any feature should we want to investigate it.

12

Let me answer your points directly and clearly:

Our issue is with long-term sidebranches -- the kind where you've got a few people working a sidebranch that splits from master, we develop for a few months, and when we reach a milestone we sync the two up.

You usually do not want to let your branches unsynced for months.

Your feature branch has branched off of something depending on your workflow; let's just call it master for the sake of simplicity. Now, whenever you commit to master, you can and should git checkout long_running_feature ; git rebase master. This means that your branches are, by design, always in sync.

git rebase is also the correct thing to do here. It is not a hack or something weird or dangerous, but completely natural. You lose one bit of information, which is the "birthday" of the feature branch, but that's it. If someobody finds that to be important, it could be provided by saving it somewhere else (in your ticket system, or, if the need is great, in a git tag...).

Now, IMHO, the natural way to handle this is, squash the sidebranch into a single commit.

No, you absolutely do not want that, you want a merge commit. A merge commit also is a "single commit". It does not, somehow, insert all the individual branch commits "into" master. It is a single commit with two parents - the master head and the branch head at the time of the merge.

Be sure to specify the --no-ff option, of course; merging without --no-ff should, in your scenario, strictly be forbidden. Unfortunately, --no-ff is not the default; but I believe there is an option you can set that makes it so. See git help merge for what --no-ff does (in short: it activates the behaviour I described in the previous paragraph), it is crucial.

we're not retroactively dumping months of parallel development into master's history.

Absolutely not - you are never dumping something "into the history" of some branch, especially not with a merge commit.

And if anybody needs better resolution for the sidebranch's history, well, of course it's all still there -- it's just not in master, it's in the sidebranch.

With a merge commit, it is still there. Not in master, but in the sidebranch, clearly visible as one of the parents of the merge commit, and kept for eternity, as it should be.

See what I've done? All things you describe for your squash commit are right there with the merge --no-ff commit.

Here's the problem: I work exclusively with the command line, but the rest of my team uses GUIS.

(Side remark: I almost exclusively work with the command line as well (well, that's a lie, I usually use emacs magit, but that's another story - if I am not in a convenient place with my individual emacs setup, I prefer the command line as well). But please do yourself a favour and try at least git gui once. It is so much more efficient for picking lines, hunks etc. for adding/undoing adds.)

And I've discovered the GUIS don't have a reasonable option to display history from other branches.

That is because what you are trying to do is totally against the spirit of git. git builds from the core on a "directed acyclic graph", which means, a lot of information is in the parent-child-relationship of commits. And, for merges, that means true merge commits with two parents and one child. The GUIs of your colleagues will be just fine as soon as you use no-ff merge commits.

So if you reach a squash commit, saying "this development squashed from branch XYZ", it's a huge pain to go see what's in XYZ.

Yes, but that is not a problem of the GUI, but of the squash commit. Using a squash means you leave the feature branch head dangling, and creating a whole new commit into master. This breaks the structure on two levels, creating a big mess.

So they want these big, long development-sidebranches merged in, always with a merge commit.

And they are absolutely right. But they are not "merged in", they are just merged. A merge is a truly balanced thing, it has no preferred side that is merged "into" the other (git checkout A ; git merge B is exactly the same as git checkout B ; git merge A except for minor visual differences like the branches being swapped around in git log etc.).

They don't want any history that isn't immediately accessible from the master branch.

Which is completely correct. At a time when there are no unmerged features, you would have a single branch master with a rich history encapsulating all feature commit lines there ever were, going back to the git init commit from the beginning of time (note that I specifically avoided to use the term "branches" in the latter part of that paragraph because the history at that time is not "branches" anymore, although the commit graph would be quite branchy).

I hate that idea;

Then you are in for a bit of pain, since you are working against the tool you are using. The git approach is very elegant and powerful, especially in the branching/merging area; if you do it right (as alluded to above, especially with --no-ff) it is by leaps and bounds superiour to other approaches (e.g., the subversion mess of having parallel directory structures for branches).

it means an endless, unnavigable tangle of parallel development history.

Endless, parallel - yes.

Unnavigable, tangle - no.

But I'm not seeing what alternative we have.

Why not work just like the inventor of git, your colleagues and the rest of the world do, every day?

Do we have any option here besides constantly merging sidebranches into master with merge-commits? Or, is there a reason that constantly using merge-commits is not as bad as I fear?

No other options; not as bad.

AnoE
  • 5,874
  • 1
  • 16
  • 17
10

Squashing a long term sidebranch would make you lose a lot of information.

What I would do is try to rebase master into the long term sidebranch before merging the sidebranch into master. That way you keep every commit in master, while making the commit history linear and easier to understand.

If I couldn't do that easily at each commit, I would let it be non-linear, in order to keep the development context clear. In my opinion, if I have a problematic merge during the rebase of master into the sidebranche, it means the non-linearity had real-world significance. That means it will be easier to understand what happened in case I need to dig into the history. I also get the immediate benefit of not having to do a rebase.

1

Personally I prefer to do my development in a fork, then pull requests to merge into the primary repository.

That means that if I want to rebase my changes on top of master, or squash some WIP commits, I can totally do that. Or I can just request that my whole history be merged in as well.

What I like to do is do my development on a branch but frequently rebase against master/dev. That way I get the most recent changes from master without having a bunch of merge commits into my branch, or having to deal with a whole load of merge conflicts when it's time to merge to master.

To explicitly answer your question:

Do we have any option here besides constantly merging sidebranches into master with merge-commits?

Yes - you can merge them once per branch (when the feature or fix is "complete") or if you don't like having the merge commits in your history you can simply do a fast forward merge on master after doing a final rebase.

Wayne Werner
  • 2,390
  • 2
  • 23
  • 23
-1

Revision control is garbage-in garbage-out.

The problem is the work-in-progress on the feature branch can contain a lot of "let's try this ... no that didn't work, let's replace it with that" and all the commits except the final "that" just end up polluting the history uselessly.

Ultimately, the history should be kept (some of it might be of some use in the future), but only a "clean copy" should be merged.

With Git, this can be done by branching the feature branch first (to keep all the history), then (interactively) rebasing the branch of the feature branch from master and then merging the rebased branch.