Should test cases show desired behavior or actual/current behavior, or can they do both?

Question

Let's say I want to add tests to a software that has some flaws/bugs.

I want to add the tests before I fix the bugs. And I want to actually merge them in the main branch, not just have them in a feature branch or bugfix branch somewhere.
(EDIT: See below as to why I might want to do this.)

If this is relevant, I am working with phpunit. But I am looking for more general answers.

I see the following options:

Create failing tests that describe the desired behavior, and merge those into the main branch. This means any QA pipeline has to be suppressed.
Create passing tests that describe the current (flawed) behavior, and merge those into the main branch. Add @todo comments if possible. Update them later as part of the bug fix.
Use a fancy mechanism that allows switching between "desired" mode and "current" mode. Running the tests in "desired" mode would make them fail, but running them in "current" mode would make them pass.
Don't merge the tests, until I fix them.

Personally, for my own projects, I like options 2 and 3.

But when working in a team, I want to be able to justify my strategies with online references, instead of just personal opinion.

So, is there any named "best practice" or "pattern" that matches option 2 or 3 above? Or am I missing something, and there are even better strategies available?

EDIT: Why merge the tests before fixing the bug?

(I am adding this section to avoid having to put all of it into the comments.)

Why would I want to merge the tests into the main branch, before the bug fix? This has been a contention in the comments, so I am summarizing possible reasons here.

Possible reasons why we can't fix the bug now:

The bug fix might be risky or disruptive, perhaps we want to postpone it for a new major version.
The bug fix, unlike the test, needs dedicated effort by the maintainer, whereas the new test can be developed by a contributor.
There is no budget / resources to fix the bug now.
The existing behavior is not fully understood, and we don't really understand the implications that fixing it would have.
We know the current behavior is wrong, but we don't know have precise spec for the desired behavior. Perhaps we need feedback from the client or management.
The bug has been discovered while doing something else, and fixing it now is out of scope.

So why would we want to merge the tests anyway?

(assuming option 2 from above)

The tests cover more than just this one bug.
We want to reward and credit contributors who provide tests, even before fixing a bug.
We want to provide a starting point for contributors who attempt to fix the bug.
We want a git diff of the fix to give a precise overview of behavior change.
We want to detect when the bug might be fixed as a side effect of other changes.
We want to detect when the wrong behavior changes to a different flavor of wrongness, as a side effect of other changes.
If a user of the software package discovers something that looks broken, they can look into the tests to find the behavior documented, and marked as a "known issue", with link to the issue queue.
The git history will reveal more information about the difference in behavior before and after the fix. It will also reveal in which version the problem was fixed.

mmathis · Answer 1 · 2022-01-14T03:24:09.947

Add the tests at the same time that you fix the bug (your option 4). The final tests would then describe the desired, non-buggy behavior because they would pass. This is pretty typical for bug-fixing.

Unless you're dealing with a massive bug, I can't see much reason to spend effort writing and merging tests that you know are incorrect, only to have to come back at some point in the future and update those tests again. There's also the possibility that you look at the passing tests and close the bug ticket as OBE or not reproducible or some such - maybe not likely, sure, but why chance it.

Edit to clarify: you can certainly write the tests with the current, buggy behavior; fix the bug; and update the tests to be sure you're not breaking anything else (although those other things have tests, too, right?). But don't spend the effort merging tests with incorrect expected results into your main branch. And definitely don't merge failing tests into your main branch!

Greg Burghardt · Answer 2 · 2022-01-14T12:29:36.477

Most times I agree with mmathis. Submit the updated tests along with the bug fix. There are cases where you might find a defect, but don't have time to fix it, or fixing that defect is out of scope for your current task. Not every test needs to pass or fail. Many test frameworks come with a third outcome called "inconclusive", "pending" or some similar name — basically a test that neither passes, nor fails.

I work mostly in .NET using MS Test. I have written failing unit tests before, but instead of letting it fail, I call Assert.Inconclusive("should begin passing when bug #1234 is fixed") at the beginning of the test. A quick search for PHPUnit seems like $this->markTestIncomplete("should begin passing when bug #1234 is fixed") might be a good choice.

This usually doesn't fail the build, but the test run might not appear "green", nor will it appear "red". When encountering these "in between" test outcomes, some test runners will report that the tests did not pass and did not fail. The test run would be appear "yellow". This might require some explanation to teammates or management.

The real challenge is to eventually fix the application, and get that test running and passing, but in the meantime at least you have a test that will fail if you comment out a line of code. That gives the future bug fixer an easy way to jump into correcting the code.

score 4 · Answer 3 · answered Jan 14 '22 at 08:14

Short answer: They can do both.

Slightly longer short answer:

Regression Tests describe the current system,
Progression Tests describe the target system.

Long Answer:

Option 1, is called Progressive Tests. More useful at the end to end/integration testing/use case level these tests serve as goals to be achieved. Its understandable that these tests are failing for a long time, and the QA part of the build pipeline should be aware of these. ie. no build with a failing progressive test goes to production, but it can go for local testing.

Option 2. Is refactoring. It is particularly good in areas of code that have descended into murkyness. Write a suite of tests, and (very importantly) refactor the code back into legibility, before doing any code fixes.

Option 3. ... Its horrible... Please don't do this. How are you supposed to know in which "special" mode the test is working. How many times do you have to re-run the test to ensure each of its modes works. How will you even known to re-run the test? If there is an anti-pattern here this is it.

But i could be miss reading you, see option 5 below.

Option 4 is what general development is all about. Make the changes to the code and to the tests necessary to describe the behavior change. When it is working check it in.

There is an option 5:

Option 5. It's useful when you want to preserve old behaviour even for a short amount of time. Write separate tests for each behaviour in the different supported modes. Each test is responsible for configuring the feature toggles for setting up the mode it is testing.

Should you no longer support a mode, delete the tests and the code behind the toggle. Simple.

MakePeaceGreatAgain · Answer 4 · 2025-01-14T07:09:49.240

This kind of tests is often called "Characterization test". It's nature is to describe what your system currently does, not what it's supposed to do. In the past I worked with different huge legacy-projects that don't have many tests (or none at all). Looking at the specs is often not helpfull, as edge-cases might be documented only in your ticket-system, on some network-drive or even not at all. This makes it practically impossible to indicate the correctness of your system.

Chances are what you consider a bug, is actually a feature for someone. This particularly applies to framework-functions that are used by dozenzs of clients, with many different expactations.

A recent example of this from my own codebase. I found this small C#-function that should convert a value from a database to number, or null if that's not possible.

public static int? ToIntegerOrNull(object value)
{
  int? returnValue;
try
  {
    returnValue = Convert.ToInt32(value);
  }
  catch
  {
    returnValue = null;
  }
  return returnValue;
}

There are of course better ways to write this in C#, but that's not the point here. What's important is the fact, that Convert.ToInt32 simply returns zero when providing a null-value. So albeit the function is called ...OrNull, it won't return null when providing one.

This function - as you can imagine - has hundreds of uses in many different appplications. I have no way to estimate the impact on the rest of my system when I change the behaviour. Chances are someone relies on the function to return 0 when providing null and then I broke the system.

I usually create a test that only describes the current behaviour, assuming what the system does currently is what someone asked somewhere in the past. This changes my thinking about testing software. Tests are not the "golden standard" anymore, but a way to keep tracking for changes in the systems beaviour. Also your safety-net and therefor your trust into your system becomes stronger with every test. Not because your system moves closer to a correct state, but to a deterministic one.

score 0 · Answer 5 · answered Jan 14 '25 at 08:48

You have “current designed behaviour” and “future designed behaviour”, each with possible bugs.

You should never test for behaviour with bugs. Fix the bugs. If you don’t have time, live with failing tests which will increase the priority of fixing the bugs. If you decide the bug is real but unimportant and the test failure interferes with development, check if there is a way to disable tests.

The other case is when expected behaviour changes. I’d go through all relevant tests and find which ones you expect to break when changed behaviour is implemented correctly. Those tests get disabled. Then start implementing the new behaviour, not caring at tests at this point. Every time some part of the new behaviour is deemed working, you re-enable old tests and fix them as necessary, or write new tests. Failures are now bugs needing fixing. Finally when your changed behaviour is complete without test failure check all the tests carefully that you had disabled.

Should test cases show desired behavior or actual/current behavior, or can they do both?

EDIT: Why merge the tests before fixing the bug?

5 Answers5