How should I unit test mathematical formulae?

Question

I have a number of functions that are pretty close to the mathematical definition of a function. For example, a simplified version of one of these functions may look like:

int function foo(int a, int b) {
    return a * b;
}

I'd like to unit test them, but I'm not sure what the best way to approach those tests, as these functions have very large ranges for parameter values. Every approach I've come up with feels lacking:

If I test with a discrete pre-defined subset of possible values, I fear the set I choose may not be representative of the whole, leading to a potentially endless cycle of adding new values to the subset every time an unexpected value is found. This seems like defeating the point of having a unit test in the first place.
If I test with a discrete subset of possible values that's randomly determined, I inherit all of the problems of picking the subset manually as well as the real chance that tests will fail randomly every run.
If I test with every possible value (e.g. 0 to 2⁶⁴), the tests will take forever.

My final thought on how to approach the problem was to avoid testing the implementation of the function entirely, either by:

Only testing that the function accepts two integers and returns an integer, or
Not testing the function at all

But if I'm not testing the functions at all—either in practice or in principle—I'm not sure how I can guarantee the correctness of the functions in the case of an eventual refactor.

What is the best way to test functions that have completely deterministic output, but can operate on a large range of possible inputs?

score 19 · Answer 1 · answered Feb 11 '15 at 23:25

Create one or two typical test cases, and then focus your remaining testing on boundary conditions.

For example, how does your function behave with zeroes? with ones? with int.MaxValue? What are the largest positive or negative numbers that can be used in the function before it overflows or underflows? What happens when an overflow or underflow occurs?

And so forth.

score 13 · Accepted Answer · answered Feb 12 '15 at 00:14

It's very important to have reproducible failing or succeeding tests, so forget about any random input approach. Let's say you test the sin(x) function and you know its result is always in the range of [-1,1] and you say you test 100 random numbers then your assertion is mathematically right but what if there is any dirty fast sin(x) implementation which may produce a 1.00000000000001 and your test fails nearly every 1000th run.

Try to find good input parameters:

values producing nice round results
inflection points
relative and absolute borders

Sometimes parametrized functions keep those properties when changing only a specific parameter (for example the zero crossings and a scaling parameter), so also test some odd values.

And reduce boilerplate code with unit test data-providers if available.

Benjamin Hodgson · Answer 3 · 2015-02-19T22:16:38.487

Mathematical functions are especially amenable to property-based testing.

With property-based testing, you declaratively define high-level invariants that your API must satisfy. The test framework generates a large number of test cases and tries to find a counterexample to your invariants.

Let's write some properties for your example - a function * which multiplies two numbers. Property-based testing originated in the functional programming community, so I'm going to use QuickCheck (a Haskell-based tool), but similar test frameworks are now available for most imperative languages.

We define properties by writing functions which take arbitrary values as arguments. For each of these functions, QuickCheck will randomly generate 100 combinations of arguments and check that the property holds for each of them.

prop_commutative x y = (x * y) == (y * x)

prop_associative x y z = (x * y) * z == x * (y * z)

prop_distributive x y z = x * (y + z) == (x * y) + (x * z)

prop_identity x = (1 * x) == x

prop_zero x = (0 * x) == 0

Our tests look like mathematical definitions! They say "For all x and y, x * y equals y * x" and so on. If, say, you were developing a set of trigonometric functions, you could write a similar set of properties testing things like the double-angle formulae. Property-based testing really comes into its own for mathematical programming.

The output of running these tests looks like this:

=== prop_commutative ===
+++ OK, passed 100 tests.

=== prop_associative ===
+++ OK, passed 100 tests.

=== prop_distributive ===
+++ OK, passed 100 tests.

=== prop_identity ===
+++ OK, passed 100 tests.

=== prop_zero ===
+++ OK, passed 100 tests.

When a property is untrue, QuickCheck finds a counter-example. This property doesn't hold for numbers less than 1:

prop_multiplyingMakesItBigger x y = (x * y) > x

QuickCheck spots our mistake and prints out the values of x and y which disproved our property:

=== prop_multiplyingMakesItBigger ===
*** Failed! Falsifiable (after 1 test):  
0
0

QuickCheck tests your properties with values you hadn't anticipated, which helps you find bugs.

You can also use property-based testing to test against a model. If you have a naïve implementation of a function which you know is correct (but perhaps is not fast enough), you can use it as a model for regression testing of an improved version. Here's an example where we check that * is equivalent to repeated adding:

prop_repeatedAddingModel (NonNegative (Small x)) y = (x * y) == (foldr (+) 0 (replicate x y))

Here I'm also demonstrating how to use QuickCheck to constrain test values. Our model doesn't work for negative x, and is prohibitively slow for large x, so we tell QuickCheck that we want x to be small and non-negative.

Real World Haskell has a great chapter on QuickCheck, with lots of examples.

score 2 · Answer 4 · edited Dec 29 '16 at 10:10

Perfect is definitely the enemy of good when it comes to testing. Testing is rarely easy. You have to really understand what your function/formula is supposed to do. Even something as simple as multiplying two numbers has several properties we can check.

You may find that some of them will hold up only if you don't consider some of the edge cases.

// False assumption that when numbers are multiplied, the result is greater
foo(a,b) > a + b
* fails on 2,2
// Update bad assumption
foo(a, b) >= a + b

// A negative times a negative is a possitive
isneg(a) and isneg(b) and ispos(foo(a,b))

// Test for zero
foo(a, 0) = 0
// Test for 1
foo(a, 1) = a
// Commutative property
foo(a, b) = foo(b,a)

// Associative property
foo(a, b) * c = a * foo(b,c)

Being able to do this means you don't have to test every single parameter combination.

These kinds of things get caught manually in the business world all the time. Anyone with enough domain knowledge can help with making these assumptions. I may not know the quarterly report figures but it's a safe bet they will never exceed the year to date figures.

score 2 · Answer 5 · answered Feb 12 '15 at 18:44

You mention the dilemma of testing a fixed set of cases (which might just happen to "miss" the values which your function fails on) or testing many randomly generated cases (which can pass one time and fail the next, and is slow to boot).

Both of these techniques are useful, and they are not mutually exclusive. It can be useful to have one (fast) test suite which uses a fixed set of cases, and another (slow) suite which generates many, many random cases and 1) checks whether the outputs obey certain invariants, as well as 2) seeing if you hit a failed assertion, or throw an uncaught exception.

Run the fast suite on every commit. Run the slow (randomized, generative) suite as often as you want to, perhaps before each release. If the slow suite discovers an input which "breaks" your function, add that input as a new test case in the fast suite.

When the randomized suite starts, it should generate a new random seed and print it to the console. It should also provide a way to manually set the seed, for cases where you want to repeat a previous run exactly.

How should I unit test mathematical formulae?

5 Answers5

Linked

Related