70

Having worked in complex solutions that had Unit Tests and Integration Test in the CI/CD pipeline, I recall having a tough time with tests that failed randomly (either due to random values being injected or because of the async nature of the process being tested - that from time to time resulted in some weird racing condition). Anyway, having this random behavior in the CI pipeline was not a good experience, we could never say for sure the change a developer committed was really causing the build issue.

I was recently introduced to AutoFixture, which helps in the creation of tests, by randomly generating values - surprisingly I was the only one who did not feel it was a great idea to introduce it in all tests of our CI pipeline.

I mean, I understand fuzz testing, monkey testing, etc but I believe this should be done out of the CI/CD pipeline - which is the place I want to ensure my business requirements are being met by having sturdy, solid and strict to the point tests. Non linear behavior tests like this (and load testing, black box, penetration, etc) should be done outside of the build pipeline - or at least should not be directly linked to code changes.

If these side tests ever find a behavior that is not expected, a fix should be created and a new concrete and repeatable test case should be added to avoid going back to the previous state.

Am I missing something?

5 Answers5

81

Yes, I agree that randomness shouldn't be part of a testing suite. What you want is to mock any real randomness, to create deterministic tests.

Even if you genuinely need bulk random data, more than you can be bothered generating by hand, you should generate it randomly once, and then use that (now set in stone) data as the "random" input for your tests. The source of the data may have been random but because the same data is reused for each run, the test is deterministic.

However, what I said so far applies to running the tests you knowingly wrote and want to run. Fuzz testing, and therefore Autofixture, has a separate value: bug discovery. Here, randomization is actually desirable because it can help you find edge cases that you hadn't even anticipated.

This does not replace deterministic testing, it adds an additional layer to your test suite.

Some bugs are discovered through serendipity rather than by intentional design. Autofixture can help cycle through all possible values for a given input, in order to find edge cases that you likely wouldn't have stumbled on with limited hand-crafted simple test data.

If and when fuzz tests discover a bug, you should use that as the inspiration to write a new deterministic test to now account for this new edge case.

In short, think of fuzz tests as a dedicated QA engineer who comes up with the craziest inputs to stress test your code, instead of just testing with expected or sensible data.

Flater
  • 58,824
55

I've worked on projects which use anywhere from no to extensive randomness in tests, and I'm generally in favour of it.

The most important thing to remember is that the randomness must be repeatable. In the current project we use pytest-randomly with a seed based on the pipeline run ID in CI, so it's trivial to repeat a failing run identically, even though each pipeline run is different. This may be a showstopper if you want to run tests in parallel, because I could not find a (pytest) framework which will split tests into parallel runs reproducibly.

The randomness is used in two ways:

First, tests are run in random order. This virtually guarantees that any test interdependencies will eventually be discovered. This avoids those situations where a test fails when running the whole test suite, but when running the failing test on its own it succeeds. When that happens, it could take much longer to find and fix the actual issue, since you're effectively debugging two things at once, you can't be sure whether the test failure is because of a bad test or bad production code, and each test run to check a potential fix could take a long time.

Second, we use generator functions for any inputs which are irrelevant to the test result. Basically we have a bunch of functions like any_file_contents (returns bytes), any_past_datetime (datetime.datetime), any_batch_job_status (enum member) and any_error_message (str), and we use them to provide any required input which should not affect the test results. This can surface some interesting issues with both tests and production code, such as inputs being relevant when you thought they weren't, data not being escaped, escaped in the wrong way, or double escaped, and even ten-year-old core library bugs showing up in third party libraries. It is also a useful signal to whoever reads the test, telling them exactly which inputs are relevant for the result.

While this approach is not a replacement for fuzz testing, it is a much cheaper way to achieve what I would expect are similar results. It's not going to be sufficient if your software requires much more extensive fuzzing, such as a parsing library, but it should be a simple way to improve run-of-the-mill tests.

I don't have the numbers to back this up, but I believe this is the single most reliable test suite (in terms of false positives or negatives) I've worked on.

l0b0
  • 11,547
12

No. Random values in unit tests cause them to be not repeatable. As soon as one test will pass and another will fail without any change, people lose confidence in them, undermining their value. Printing a reproduction script is not enough.

That said, randomized edge case testing and fuzz testing can provide value. They’re just not unit tests at that point. And personally, I like linking them to CI even if they don’t necessarily block a deployment, or are necessarily run on every commit.

Telastyn
  • 110,259
7

I would recommend covering "obvious" edge cases with explicit test data inputs, rather than hoping the fuzz testing will catch them. E.g. for a function that operates, handle empty arrays, single-entry arrays, and arrays with multiple (e.g. 5) items.

This way, your fuzz tests are strictly additive to a baseline level of solid test coverage.

One way to help reduce the pain is to ensure that your CI logs contain enough information to fully reproduce a test case locally.

Anyway, having this random behavior in the CI pipeline was not a good experience, we could never say for sure the change a developer committed was really causing the build issue.

Think of the flip side: if the fuzz testing wasn't there, nothing else would have caught it, so you'd have a false green. Sure it won't disturb your development/shipping experience, but it'll disturb production, instead.

Alexander
  • 5,185
0

To quote AutoFixture:

"...designed to minimize the 'Arrange' phase of your unit tests in order to maximize maintainability. Its primary goal is to allow developers to focus on what is being tested rather than how to setup the test scenario.."

So I can see why you wouldn't want a test such as:

x = random int
actual = SquareRoot(x)
Assert(actual = x^2)

You would want to explicitly test max int, negative numbers, etc and be sure that the test is repeatable.

However, this isnt what AutoFixture is proposing. They are more interested in tests like

x = new Customer
x.firstname = ...
x.lastname = ..
x.middlename = ...
x.Address = new Address()
x.Address.Street = ...
....
x.Account = new Accout()
...
etc 
repo.Save(x)
actual = repo.Load(x.Id)
Assert(actual = x);

Now you can see that your test is unlikely to fail due to the values you assign to the various customer and sub classes fields. That's not really what you are testing.

But! it would save you a lot of typing and unimportant code if you could auto-populate all those fields.

Ewan
  • 83,178