How to keep the trunk stable when tests take a long time?

Question

We have three sets of test suites:

A "small" suite, taking only a couple of hours to run
A "medium" suite that takes multiple hours, usually ran every night (nightly)
A "large" suite that takes a week+ to run

We also have a bunch of shorter test suites, but I'm not focusing on them here.

The current methodology is to run the small suite before each commit to the trunk. Then, the medium suite runs every night, and if in the morning it turned out it failed, we try to isolate which of yesterday's commits was to blame, rollback that commit and retry the tests. A similar process, only at a weekly instead of nightly frequency, is done for the large suite.

Unfortunately, the medium suite does fail pretty frequently. That means that the trunk is often unstable, which is extremely annoying when you want to make modifications and test them. It's annoying because when I check out from the trunk, I cannot know for certain it's stable, and if a test fails I cannot know for certain if it's my fault or not.

My question is, is there some known methodology for handling these kinds of situations in a way which will leave the trunk always in top shape? e.g. "commit into a special precommit branch which will then periodically update the trunk every time the nightly passes".

And does it matter if it's a centralized source control system like SVN or a distributed one like git?

By the way I am a junior developer with a limited ability to change things, I'm just trying to understand if there's a way to handle this pain I am experiencing.

Joris Timmermans · Answer 1 · 2012-11-20T11:07:00.057

I know you're trying to avoid this, but the real insight here is to realize that something is seriously wrong with your codebase: you need to run a full suite of tests that takes a week just to be sure your code is stable!

The most advantageous way to fix this problem is to start separating your code base and tests into (independent) sub-units.
There are huge advantages to this:

The tests for each of those units will run faster (there are simply less of them), and they will not break if something goes wrong in one of the independent or downstream units.
A failed test will be pinpointed to a particular unit which will make it much easier to find the source of the problem.
You can separate the VCS locations of the different units so that your "stable" branch can be a pick 'n mix of the latest successfully tested build of each unit, so that a broken unit or two doesn't destabilize your "stable" version.

On the flipside management of your VCS structure will get more complicated, but at a full week for your full test, I think you can take the pain!

I still recommend using a "stable" and "development" branches strategy in some form or other, but there are many ways to go about that and you can pick the one that works best for your organization (meta-repositories with fixed revisions pointing to separate repositories for each unit, a stable branch and a dev branch, feature branches....)

Doc Brown · Answer 2 · 2012-11-20T13:50:27.380

IMHO this has nothing to do with the VCS you are using. Using an "under test" branch may be solution, which can be realized with centralized or distributed VCS as well. But honestly, I think the best thing in your situation is trying to optimize the medium test suite (seems that it contains the most important tests) so that it runs much faster, so and you can use it for pre-commit-to-trunk tests, just like you do it now with your "small suite".

score 1 · Answer 3 · answered Nov 20 '12 at 09:55

For SVN, I don't know about such a thing like "pre-commit". I think it's likely to produce commits and rollbacks when the test fails. As doc-brown says, the only way there is to commit on a temporary branch and merge it with trunk later on.

Using a distributed one like git or mercurial, I think it would be possible. Using a "testing" repository and a "stable" repository. You push on the test rep, test it nightly, and if everything runs fine, you push from test to stable. Otherwise, you rollback the testing rep. I'm a bit unsure how the version history would look like when you push from testing to stable, but I think it's possible to exclude the broken rollbacked stuff when doing so. A bit experimenting first would be the safest.

An alternative would also be to test each person's local trunk nightly. Then, the people with passed tests are allowed to push it to the central server in the morning.

score 1 · Answer 4 · edited Nov 20 '12 at 16:03

The failing medium tests: Is it true that most of the time the same tests fail?

If there is a failure are there always the same related tests that fail?

If true: May be you can selectively pick some medium tests that often fail (one test for every class of error) and execute them within the small set.

Are most of the tests integration-tests that use a real database? If so is it possible to replace them with a unittest that has a mocked-database?

score 1 · Accepted Answer · answered Nov 20 '12 at 15:39

The only way to fix the root cause of the instability is to decouple the code so changes are more isolated, as other answers have suggested.

However, as an individual developer, if you want a more stable build for you personally to work on, that's relatively easy to solve. Instead of working off of the tip, you only pull the last build that passed the overnight test suite into your working tree. If you can create feature branches for each change, then branch off of the last stable build.

Yes, your tree will be a few days behind, but most of the time that doesn't matter. Do your work against the stable build, so you know your changes are the ones that broke any tests, then before you check in, update to the latest and do your normal integration. Then after you check in, back up to the last stable build again.

You still have to do the messy integration work, but what I like about this method is it isolates the integration work to a time more convenient for me, and gives me a stable code base for development when it isn't convenient. I have a much better idea when it's my changes that likely broke the build versus someone else's.

score 1 · Answer 6 · answered Nov 20 '12 at 16:09

You need to make your tests run faster, there is no other way to square this circle.

Consider the problem: you want to be sure that when you check out, you have working code. Sure, you can delay commits and do branching until before the release, but that will only delay the onset of the problem until integration. As in, will you have to run the week-long suite after every merge? Methodology is not the solution, the solution is purely technical.

Here is what I suggest:

1) Make the tests as atmomic as possible, and maximize environment reuse.

2) Get a test-suite farm to run them. If rather than 8 big modules you end up with 50, you can spin up a bunch of an Amazon EC2 spot instances and run the whole suite in parallel. I'm sure this will cost some money, but it will save huge amounts of developer time.

score 0 · Answer 7 · answered Nov 20 '12 at 15:47

The key thing you are taking for granted in your question is that all commits must pass tests. While this is a nice rule to follow and it seems to make some sense, sometimes it's not practical. Your case is an example (although MadKeithV does make a point), and I can imagine keeping a VCS branch so pristine could be difficult if there isn't sufficient cooperation among devlopers.

In reality what you want is to somehow know which commits pass or fail. A "pre-commit branch" as you suggested would work, but that might require extra effort from developers when they make commits, which might be hard to sell.

A similar approach that could be easier is to leave the trunk for people to break as they please, and have a branch for commits that aren't broken. An automated script could go through commits as they are made to the trunk, run the tests on them and add them to the branch if they pass.

Or you could be absurdly simplistic and have a script that lists the passing commits in a text file (which may or may not itself be version controlled).

Or have a batch system that accepts requests for branches/revisions to test (from anywhere in the tree), and tests them and commits them to the trunk (or another branch) if they pass.

How to keep the trunk stable when tests take a long time?

7 Answers7

Linked

Related