12

My employers runs a monthly unit testing day competition. One entire day is dedicated to writing unit tests -- obviously we do more testing throughout the month, but this is an entire day -- and the "winner" of the competition is given a prize. However, we are finding it's hard to determine who the winner is.

We were assigning points for each test case. So if you wrote a unit test like this...

for (int i = 0; i < 100; i++) {
  assertTrue(i*i, square(i));
}

you would be given 100 points. Obviously this is a simplistic example but it demonstrates the problems with assigning "points" to each test case.

We're primarily a Java & Javascript shop. So I suggested counting number of code branches tested as a metric. We can easily count the branches tested via a code coverage tool (such as EclEmma). However, not sure how we would do this with our Selenium tests and getting a code coverage on the Javascript sources (any ideas?)

Does anyone have any suggestions on how we could better determine the winner of this competition?

Edit

I know how to write unit tests, I know how to write effective unit tests, I don't need help determining what to test. I have no control over this competition -- the competition will go on. So either I add some input to make it better or carry on gaming the tests (yeah, I game them. Of course I game them. There are prizes to be won)

Edit

This question here is obviously not a duplicate, though it contains useful information about how to find good test cases, it does not provide any useful metrics to evaluate the competition.

Shaun
  • 249

3 Answers3

15

Does anyone have any suggestions on how we could better determine the winner of this competition?

The only thing which makes sense to me is by voting - every dev can assign some points to every other dev's test (except his own). Maybe 3 points for the test he thinks it is the "most effective" one, 2 points for the second and one to the third. The test with the most points wins. It may give better results when the point assignment is done without knowing beforehand who wrote the particular test.

As a bonus, you will get all your tests peer reviewed.

Doc Brown
  • 218,378
6

So if you wrote a unit test like this...

for (int i = 0; i < 100; i++) {
 assertTrue(i*i, square(i));
}

you would be given 100 points.

I would give this person 0 points (even if the test were testing something actually relevant), because assertions within a loop make little sense and tests with multiple asserts (especially in a form of a loop or a map) are difficult to work with.

The problem is essentially to have a metric which cannot [easily] be cheated. A metric which is exclusively based on the number of asserts is exactly the same as paying developers per LOC written. As with pay-by-LOC which leads to huge and impossible to maintain code, your actual company policy leads to useless and possibly badly written tests.

If the number of asserts is irrelevant, the number of tests is irrelevant as well. This is also the case for many metrics (including combined ones) one could imagine for this sort of situations.

Ideally, you would be applying systemic approach. In practice, this can hardly work in most software development companies. So I can suggest a few other things:

  1. Using pair reviews for tests and have something similar to the number of WTFs per minute metric.

  2. Measure the impact of those tests over time on the number of bugs. This has several benefits:

    • Seems fair,
    • Can actually be measured if you collect enough data about bug reports and their fate,
    • Is actually worth it!
  3. Use branch coverage, but combine it with other metrics (as well as a review). Branch coverage has its benefits, but testing CRUD code just to get a better grade is not the best way to spend developers' time.

  4. Decide all together what are the metrics you want to enforce for the moment (such decisions may not be welcomed or even be possible in some companies and teams). Review and change the metrics often, picking the ones which become more relevant, and make sure everyone clearly understands what is measured and how.

5

I suppose your employer organizes this unit testing day in order to give you people incentives to find bugs, to achieve greater code coverage, and also to end up having more tests, which are useful forever after.

So, I would think it would make sense that the winner should be the developer who finds the most bugs, or the developer whose tests achieve the greatest increase in code coverage.

A test would earn you a point if it causes a new entry to be opened in your issue / bug / defect tracking system. If an entry is already open for that issue, it does not count. Also, as suggested in the comments, bugs in your own code do not count; only bugs in other people's code should count. Unfortunately, this approach does not deliver instant gratification because it may take a few days until all the failed tests are sifted through and the corresponding issues are opened. Also, this may not always work; as your system matures, it may start becoming exceedingly rare to discover bugs by adding tests.

The increase in code coverage might provide a more objective measurement of the improvement represented by the new tests. First, the total code coverage will have to be recorded the day prior to the competition. Then, each developer will need to somehow show the increase in code coverage which results from their tests alone, without taking into account the increase in code coverage resulting from tests written by other developers. This means that you will probably need a referee who will go to each developer's machine and record the new code coverage before anyone's tests have been committed.

Incidentally, taking code coverage into consideration provides fair reward to people who write real tests, instead of doing silly things like the example that you provided in the question.

Mike Nakis
  • 32,803