In git, how to do versioning for a dozen libraries all worked at in parallel

Question

We are doing projects, but we reuse a lot of code between the projects and have lots of libraries that contain our common code. As we implement new projects we find more ways to factor out common code and put it into libraries. The libraries depend on each other, and the projects depend on the libraries. Each project, and all libraries used in that project, need to use the same version of all the libraries they are referring to. If we release a piece of software we will have to fix bugs and maybe add new features for many years, sometimes for decades. We have about a dozen libraries, changes often cut across more than two, and several teams work on several projects in parallel, making concurrent changes to all these libraries.

We have recently switched to git and set up repositories for each library and each project. We use stash as a common repository, do new stuff on feature branches, then make pull requests and merge them only after review.

Many of the issues we have to deal with in projects requires us to do changes across several libraries and the project's specific code. These often include changes of library interfaces, some of which are incompatible. (If you think this sounds fishy: We interface with hardware, and hide specific hardware behind generic interfaces. Almost each time we integrate some other vendor's hardware we run into cases our current interfaces did not anticipate, and so have to refine them.) For example, imagine a project P1 using the libraries L1, L2, and L3. L1 also uses L2 and L3, and L2 uses L3 as well. The dependency graph looks like this:

   <-------L1<--+
P1 <----+  ^    |
   <-+  |  |    |
     |  +--L2   |
     |     ^    |
     |     |    |
     +-----L3---+

Now imagine a feature for this project requires changes in P1 and L3 which change the interface of L3. Now add projects P2 and P3 into the mix, which also refer to these libraries. We cannot afford to switch them all to the new interface, run all the tests, and deploy the new software. So what's the alternative?

implement the new interface in L3
make a pull request for L3 and wait for the review
merge the change
create a new release of L3
start working on the feature in P1 by making it refer to L3's new release, then implement the feature on P1's feature branch
make a pull request, have this reviewed, and merged

(I just noticed that I forgot to switch L1 and L2 to the new release. And I don't even know where to stick this in, because it would need to be done in parallel with P1...)

This is a tedious, error-prone, and very long process to implement this feature, it requires to independent reviews (which makes it much harder to review), does not scale at all, and is likely to put us out of business because we get so bogged down in process we never get anything done.

But how do we employ branching and tagging in order to create a process that allows us to implement new features in new projects without too much overhead?

dagnelies · Accepted Answer · 2015-06-19T15:27:50.970

Kind of putting out the obvious here, but maybe worth to mention it.

Usually, git repos are tailored per lib/project because they tend to be independent. You update your project, and don't care about the rest. Other projects depending on it will simply update their lib whenever they see fit.

However, your case seems highly dependent on correlated components, so that one feature usually affects many of them. And the whole has to be packaged as a bundle. Since implementing a feature/change/bug often requires to adapt many different libraries/projects at once, perhaps it makes sense to put them all in the same repo.

There are strong advantages/drawbacks to this.

Advantages:

Tracability: the branch shows everything changed in every project/lib related to this feature/bug.
Bundling: just pick a tag, and you'll get all the sources right.

Drawbacks:

Merging: ...it's sometimes already tough with a single project. With different teams working on shared branches, be ready to brace for impact.
Dangerous "oops" factor: if one employee messes up the repository by making some mistake, it might impact all projects & teams.

It's up to you to know if the price is worth the benefit.

EDIT:

It would work like this:

Feature X must be implemented
Create branch feature_x
All developers involved work on this branch and work paralelly on it, probably in dedicated directories related to their project/lib
Once it's over, review it, test it, package it, whatever
Merge it back in the master ...and this may be the tough part since in the meantime feature_y and feature_z may have been added too. It becomes a "cross-team" merge. This is why it is a serious drawback.

just for the record: I think this is in most cases a bad idea and should be done cautiously because the merge drawback is usually higher than the one you get through dependency management / proper feature tracking.

score 4 · Answer 2 · answered Jun 10 '15 at 18:01

The solution you are looking for is a dependency management tool in coordination with git submodules

Tools such as:

Maven
Ant
Composer

You can use those tools to define dependencies of a project.

You can require a submodule to be at least version > 2.x.x or denote a range of versions that are compatible = 2.2.* or less than a particular version < 2.2.3

Whenever you release a new version of one of the packages you can tag it with the version number, that way you can pull in that specific version of the code into all other projects

coredump · Answer 3 · 2015-06-19T14:31:49.023

Submodules

You should give a try to git submodules, as suggested in one comment.

When project P1 refers to the three submodules L1, L2 and L3, it actually stores a reference to particular commits in all three repositories: those are the working versions of each libraries for that project.

So multiple projects can work with multiple submodules: P1 might refer to the old version of library L1 while project P2 used the new version.

What happens when you deliver a new version of L3?

implement new interface in L3
commit, test, make pull request, review, merge, ... (you cannot avoid this)
ensure L2 works with L3, commit, ...
ensure L1 works with new L2, ...
ensure P1 works with the new versions of all libraries:
- inside P1's local working copy of L1, L2 and L3, fetche the changes you are interested in.
- commit changes, git add L1 L2 L3 to commit the new reference to modules
- pull request for P1, test, review, pull request, merge ...

Methodology

This is a tedious, error-prone, and very long process to implement this feature, it requires to independent reviews (which makes it much harder to review), does not scale at all, and is likely to put us out of business because we get so bogged down in process we never get anything done.

Yes, it requires independent reviews, because you change:

the library
libraries that depend on it
projects that depend on multiple libraries

Would you be put out of business because you deliver crap? (Maybe not, actually). If yes, then you need to perform tests and review changes.

With appropriate git tools (even gitk), you can easily see which versions of the libraries each project use, and you can update them independantly according to your needs. Submodules are perfect for your situation and won't slow your process down.

Maybe you can find a way to automate part of this process, but most of the steps above require human brains. The most effective way to cut time would be to ensure your libraries and projects are easy to evolve. If your codebase can handle new requirements gracefully, then code reviews will be simpler and take little of your time.

(Edit) another thing that might help you is to group related code reviews. You commit all changes and wait until you propagated those changes down to all the libraries and projects that use them before sumbitting pull requests (or before you take care of them). You end up doing a bigger review for the whole dependency chain. Maybe this can help you save time if each local change is small.

score 0 · Answer 4 · answered Jun 19 '15 at 16:44

So what i understand is you for P1 you want to change L3 interface but you want the other P2 and P3 which depend on L3 interface to change right away. This is a typical case of backward compatibility. There is a nice article on this Preserving Backward Compatibility

There are several ways you can solve this:

You have to create new interfaces each time which can extend the old interfaces.

OR

If you want to retire old interface after some time you can have several version of interfaces and once all dependent projects move you remove the older interfaces.

soru · Answer 5 · 2015-06-19T21:07:36.387

If I am getting your problem right:

your have 4 inter-related modules, P1 and L1 to L3
you need to make a change to P1 which ultimately will affect L1 to L3
it counts as a process failure if you have to change all 4 together
it counts as a process failure if you have to change them all 1 by 1.
it counts as a process failure if you have to identify in advance the chunks in which changes have to be made.

So the goal is you can do P1 and L1 in one go, and then a month later do L2 and L3 in another.

In the Java world, this is trivial, and perhaps the default way to work:

everything goes in one repository with no relevant use of branching
modules are compiled + linked together by maven based on version numbers, not the fact that the are all in the same directory tree.

So you can have the code on your local disk for L3 that wouldn't compile if it was compiling against the copy of the P1 in the other directory on your disk; luckily it isn't doing so. Java can straightforwardly do this because compiling/linking tales place against compiled jar files, not source code.

I'm not aware of a pre-existing widely-used solution to this problem for the C/C++ world, and I'd imagine you hardly want to switch languages. But something could easily be hacked together with make files that did the equivalent thing:

installed libraries + headers to known directories with embedded version numbers
changed compiler paths per module to the directory for the appropriate version numbers

You could even use the C/C++ support in maven, although most C developers would look at you strangely if you did...

score -1 · Answer 6 · answered Jun 18 '15 at 12:48

There is a simple solution: cut release branches across whole repository, merge all fixes to all actively shipped releases (it is easy in clear-case should be possible in git).

All alternatives will create a horrible mess over time and with project growth.

In git, how to do versioning for a dozen libraries all worked at in parallel

6 Answers6

Submodules

Methodology