How can I avoid chasing my own tail when testing against complicated return values?

Question

Sometimes there are functions that return complicated data and cannot be divided any further e.g. in the area of signal processing or when reading and decoding a bytestring to another format. How am I supposed to create stubs (e.g. the bytestring) to be able to assert equality of expected data with the return data without getting into trouble with complicated stub generation?

In the following example I want to test two functions. One writes my_package-objects to disk and the other reads them from disk into an object dictionary. The open-dependency is mocked away. How can I define the stub_binary()-function?

def read(filename):
    """Read any compatible my_package objects from disk."""
    with open(filename, 'rb') as f:
        return _decode(f.read())
def write(filename, **objs):
    """Write any compatible my_package objects to disk."""
    with open(filename, 'wb') as f:
        f.write(_encode(objs))

import my_package as mp

@patch('my_package.open', new_callable=mock_open)
def test_read(m_open):
    # arrange
    m_enter = m_open.return_value.__enter__.return_value
    m_enter.read.side_effect = lambda: stub_binary()
    # act
    obj_dict = mp.read('./obj_A_and_B')
    # assert
    assert obj_dict == {'A': A(), 'B': B()}


@patch('my_package.open', new_callable=mock_open)
def test_write(m_open):
    # arrange
    m_enter = m_open.return_value.__enter__.return_value
    # act
    mp.write('./obj_A_and_B', A=A(), B=B())
    # assert
    m_enter.write.assert_called_with(stub_binary())

Stub_binary could return a hard-coded bytestring, but that gets easily messy:

def stub_binary():
    return (
        b'\x93NUMPY\x01\x00v\x00{"descr": "<f8", "fortran_order": False, '
        b'"shape": (1, 3), }                                             '
        b'             \n\x00\x00\x00\x00\x00\x00\xf0?\x00\x00\x00\x00\x00'
        b'\x00\x00@\x00\x00\x00\x00\x00\x00\x08@')

Or reading the above byte string from a file that was generated with mp.write:

def stub_binary(): 
    with open(gen_data, 'rb') as f:
        return f.read()

Or just replace stub_binary() with the following:

def stub_binary(x):
    with io.BytesIO() as buffer:
        mp.write(buffer, A=A(), B=B())
    return buffer

I am tempted to create the data with my production code mp.write(). But this seems not right to me. How would you approach this?

score 1 · Answer 1 · answered Jan 16 '21 at 18:13

There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare

One thing to consider is whether your design can be simplified to the point that other measures of evaluating its correctness are more suitable.

def read(filename):
    """Read any compatible my_package objects from disk."""
    with open(filename, 'rb') as f:
        return _decode(f.read())

What stands out to me in this code is that it is already pretty simple. It's correct behavior depends primarily on functions that are provided to you by your run time.

And the kicker: it is unlikely to change - the only piece in this implementation that looks at all volatile is the decode method itself.

About the only reasonable change I could see making here would be to make the decode function configurable

def read(filename, decode):
    """Read any compatible my_package objects from disk."""
    with open(filename, 'rb') as f:
        return decode(f.read())

What I expect is that, having written this code (in either form), it is likely to survive many builds and releases without changing further.

Which is to say, you aren't going to derive much value from subjecting this code to repeated measurements to ensure that the behavior is consistent, because the implementation itself is unchanging (at least so long as the underlying runtime is stable).

More generally - if you have a design that separates the bits that are hard to test from the parts of the design that are unstable or complicated, then you don't also need to invest in automated tests that are specific to the hard to test bits.

It seems to me that in this design, the complicated part that needs testing is making sure that _decode understands the bits written by _encode, and potentially ensuring that _decode is compatible with previously released implementations of _encode.

But, given that your design already has those two functions isolated from the I/O, those test are relatively inexpensive to write and maintain. So if you are expecting to change the design of those methods often, it might make sense to invest in those tests.

score 1 · Answer 2 · answered Jan 16 '21 at 19:26

Generate a reference file.

Simply shove the result of mp.write on mock values inside the repository in a file then you'll very easily test the encode and decode functions work by testing their result on the same mock values against the file. Your option 2, basically.

Advantages over using mp.write on the fly are multiple, the most important being isolation. You won't break your decode test if you break your encode function. Advantage over raw encoded string, well it just keeps the tests clean and readable.

It's tempting to craft test based on production data but there are several pitfalls: long execution time if the dataset is not minimal, not being covering completely the function scope, and it's more difficult to reverse-engineer.

score 1 · Accepted Answer · answered Jan 18 '21 at 15:38

Generating complicated test data does not have a general solution. Sometimes it is useful to return a hard-coded string, if the behavior under test is very targeted. Sometimes it is useful to read an existing file, because a hard-coded string gets messy to maintain.

VoiceOfUnreason recommended splitting out the decoding function into its own parameter so you can isolate that behavior in their own tests. I echo that recommendation, but will focus on how to generate the test data.

Test fixtures are a general way to stub test data. Way too general. For your case I would build a class that contains methods returning the test data. Naming this class is up to you, but I recommend giving it a name that represents the kind of data it returns. You don't give us much to name things, but something like FooSignalFixtures where you replace "Foo" with some sort of noun or phrase representing the kind of signal data your code handles.

Method names in this class should guide test writers (and readers) as to why the test data exists:

class FooSignalFixtures:
    def invalid_header():
        # Open file with invalid header and return contents
def invalid_foo_column():
    # Open file with invalid value in column &quot;foo&quot; and return file contents

def empty_file():
    return ''

def header_with_no_contents():
    return '...' # Headers only as hard coded string

Focus most of your testing effort on the decoder, which should accept a bytestring as a parameter. Then your decoder tests would use the test fixtures:

def given_empty_contents_when_decoding_then_error_is_thrown():
    file_contents = FooSignalFixtures.empty_file()
try:
    decoder.decode(file_contents)
    # test fails
except InvalidSyntaxError:
    # test passes
except RuntimeError:
    # test fails

Now coming back to this test you can see this tests for behavior when the file is empty, further reinforced by the FooSignalFixtures.empty_file() method. Test readers do not need to understand how the test data is generated. If they do need to know, you have a class that specializes in this. The class can mix options 1 and 2, depending on the complexity of the test data.

This also gives you a way to write a test for a very specific production defect found only in one file, and keep that file in your test fixtures to guard against the defect from creeping back in to your code.

How can I avoid chasing my own tail when testing against complicated return values?

3 Answers3