-1

Let's assume that you are trying to refactor a legacy code to make it easier to understand and more testable, how can you do that?

In the book "Working with Legacy Code", a characterization/regression test was recommended. First, you start with a test that invokes the part of the code you want to refactor (e.g. a big method). What this test tells you is for that input you should expect that output. Therefore, if you refactor and the output has changed, this means something was broken.

The assumption here is that the legacy code is harder to understand and thus harder to test. You can find this case explained in another question here: Writing tests for code whose purpose I don't understand

Now you have a refactored code that you can start writing unit tests for (e.g. you broke the big method to smaller testable ones).

Unit tests might help you uncover bugs that you need to fix. Fixing those bugs will make you unit tests pass but will break the characterization/regression tests. Those tests are expecting a certain output which is the output of the code before the bug was found and fixed.

So what is the appropiate course of action here? Can I just remove/uncomment the test? I have seen in the book "Working with Legacy Code" an example of a test that was removed after refactoring. Does this apply here?

A.Shoman
  • 117

2 Answers2

7

Won't a characterization/regression test fail when a bug is fixed?

Yes, usually.

So what is the appropiate course of action here? Can I just remove/uncomment the test?

"It depends".

You shouldn't be thinking of the characterization test as something that lives forever. That style of test is brittle when measuring a system whose intended behavior is changing over time. So we want to retire (deprecate) those tests when they are no longer providing value -- which is to say, when other (less brittle) tests have made the characterization test redundant.

If a deprecated test fails, then you of course remove it.

The problem case occurs when the characterization test fails (because of an intended change in the behavior) and that characterization test is not entirely duplicated by our stable tests.

The right thing: The "golden master" is, in effect, a prediction about what output we should see from the test subject under specific conditions. The thing that we are really doing with the test is checking that the current behavior of the system matches our current prediction.

(It just happens, by accident, that our existing prediction is that the system under test will behave the same way it did yesterday.)

So when you are intending to change the behavior, the correct thing to do is treat it as a test first problem; update the golden master (by hand, if necessary) so that it predicts the new behavior that you want, verify that the new prediction is not satisfied by the old implementation, patch the bug, verify that the new prediction is satisfied by the new implementation.

Because you have made changes to the prediction by hand, the prediction acts as a check against other unintended changes in the behavior. In effect, it acts as a way to measure how well you can predict the real effects of your change.

In practice, you are more likely to see people fix the implementation (hopefully adding new changes for the specific change), generating a new copy of the output, and then running an audit on the differences between the old master and the new -- when those differences are finally satisfactory, you replace the master, and use the updated characterization test until you can finally retire it.

VoiceOfUnreason
  • 34,589
  • 2
  • 44
  • 83
0

You are looking at the wrong problem. “My test failed” is not a problem, it’s a feature. “My code changed its behaviour” is the problem.

With legacy code, changing the behaviour can be a problem. Even changing behaviour by fixing a bug can be a problem. You need to carefully investigate if the change is acceptable. And then you change the behaviour and the test, or you don’t.

gnasher729
  • 49,096