48

Programming languages like Scheme (R5RS) and Python (see this Question) round towards the nearest even integer when value is exactly between the surrounding integers.

What is the reasoning behind this?
Is there a mathematical idea that makes following calculations easier to reason about?

(R5RS references the IEEE floating point standard as source of this behaviour.)

2 Answers2

57

It's called banker's rounding. The idea is to minimize the cumulative error from many rounding operations.

Lets say you always rounded .5 down. Think of all those little interest payments, the bank pocketing half a cent each time...

Lets say you always rounded .5 up. Accounting is going to scream because you're paying out more interest than you should.

41

A while ago I constructed a test program for successive rounding, because it's basically a worst-case stress test for a rounding algorithm.

For each number from 0 to 9,999 it first rounds to the nearest 10, then to the nearest 100, then to the nearest 1000. (You could also think of this as 10,000 points in [0,1) being rounded to 3 places, then to 2, then to 1.) This set of numbers has a mean value of 4999.5.

If all three roundings are done using the method "round half up", then the results are as follows (first column is the rounding result, second column is how many numbers rounded to that result — i.e. it's a histogram).

0     445
1000  1000
2000  1000
3000  1000
4000  1000
5000  1000
6000  1000
7000  1000
8000  1000
9000  1000
10000 555

The result differs from a single "round half up" to the nearest thousand 550 times out of 10,000 and the average rounded value is 5055 (higher than the original average by 55.5).

If all three roundings are done by "round half down", then the results are:

0     556
1000  1000
2000  1000
3000  1000
4000  1000
5000  1000
6000  1000
7000  1000
8000  1000
9000  1000
10000 444

The result differs from a single "round half down" to the nearest thousand 550 times out of 10,000 and the and the average rounded value is 4944 (too low by 55.5).

If all three roundings are done using "round half odd", the result is:

0     445
1000  1111
2000  889
3000  1111
4000  889
5000  1111
6000  889
7000  1111
8000  889
9000  1111
10000 444

The result differs from a single "round half odd" to the nearest thousand 550 times out of 10,000 and the average rounded value is 4999.5 (correct).

Finally, if all three roundings are done using "round half even", the results are:

0     546
1000  909
2000  1091
3000  909
4000  1091
5000  909
6000  1091
7000  909
8000  1091
9000  909
10000 1091

The result differs from a single "round half even" to the nearest thousand 450 times out of 10,000 and the average rounded value is 4999.5 (correct).

I think it's obvious that round half up and round half down bias the rounded values, so that the average of rounded values no longer has the same expectation as the average of the original values, and that "round half even" and "round half odd" remove the bias by treating 5 one way half the time and the other way the other half. Successive rounding multiplies the bias.

Round half even and round half odd introduce their own kind of bias to the distribution: a bias towards even and odd digits, respectively. In both cases, again, this bias is multiplied by successive rounding, but it's worse for round half odd. I think that the explanation in this case is simple: 5 is an odd number, so round half odd has more results ending in 5 than round half even — and therefore, more results that will have to be handled specially by the next rounding.

So anyway, of the four choices, only two are unbiased, and of the two unbiased choices, round half even gives the best-behaved distribution when subject to repeated rounding.

hobbs
  • 1,320
  • 10
  • 12