Should I write tests when I can prove code correctness?

Question

People say that "talking about TDD hardly works, if you want to convince someone to TDD, show them results". However, I'm already getting great results without TDD. Showing me that people who use TDD get good results won't be convincing, I want to see that people who write both TDD and not-TDD get better results with TDD.

Despite all of this, I'm interested in giving TDD a try. However I'm not convinced I will gain anything from this. If it does prove useful, I will try to push it to the rest of my team.

My main question is this: Would TDD serve any purpose for code, if I can already prove the code correctness?

Obviously, neither one is a silver bullet. Your proof might be wrong because you missed a detail, and your test might fail to spot a bug that you failed to test for. In the end, we're human, nobody can make 100% bug-free-code forever. We can only strive to get as close as possible.

However, would TDD actually save any time on code that had its correctness proven? i.e. code that, in the state machine that the code operates on, all valid possible states and their ranges are recognized by the developer, all are accounted for, and the code is designed in a whitelist-style error-checking that passes every exception to an upper handler to make sure nothing unexpected leaks -> without both displaying a (within-reason-)relevant message to the client and sending log notifications to an admin.

Answers with real-life examples would be better.

Some clarifications:

This question is not about whether you can prove code correctness or not. Lets assume by default that not all code can be proven correct within a reasonable timeframe, but that some pieces of code can be. For example, it's very easy to proven correctness of a FizzBuzz module. Not very easy for a cloud-based data syncing service.
Within this confine, the question asks the following: Start with the assumption that a codebase is divided into 2 parts: [I] parts that have been proven correct [II] parts that have not been proven correct, but manually tested to work.
I want to apply TDD practices to this codebase that did not have them until now. The question asks the following: should TDD be applied to every single module, or would it be enough to apply them to only modules that were not proven correct?
"Proven correct" means that you can consider this module completely functional-style, i.e., it does not rely on any global or outer state outside itself, and has entirely its own API for I/O that other modules that interact with it must follow. It is not possible to "break this module" by changing code outside the module, at worst you can misuse it and get formatted error messages returned to you.
Obviously, every rule has exceptions, compiler bugs in new compiler versions may introduce bugs to this module, but the same bugs could be introduced to tests that tested it and result in a false sense of safety from tests that no longer work as intended. The bottom line is that tests are not a magical solution, they're another layer of protection, and this question discusses the issue of whether this layer of protection is worth the effort in the specific case of a module that was proven correct (assume that it was indeed).

score 20 · Accepted Answer · edited Apr 05 '18 at 07:57

Yes.

Proofs are fine when they're available, but even at the best of times they only prove that a single bit of code will work as expected (for all inputs? accounting for interruptions in the middle of any operation? what about running out of memory? disk failure? network failure?).

What happens when it changes?

Tests are great because they serve as an implied contract about what the code should do. They provide some scaffolding so that your new intern can go in and make changes with some level of confidence. All via quick, clear results: pass or fail.

And frankly, I can coach an intern to write viable unit tests in a few months. I doubt that anyone on my team (myself included) could create proofs that guarantee anything meaningful for non-trivial code; let alone do it quickly and accurately.

score 5 · Answer 2 · answered Apr 02 '18 at 20:02

We don't know. We cannot answer your question.

While you spend lots of time explaining that process you have now seems to work to everyone's satisfaction, you are telling us only small sliver of what is actually happening.

From my experience, what you are describing is extreme rarity and I'm skeptical that it is actually your process and approach to coding that is actually cause of low bug count in your applications. There might be many other factors that influence your applications and you are telling us nothing about those factors.

So we don't know, in face of not knowing your exact development environment and culture, if TDD will help you or not. And we can spend days discussing and arguing about it.

There is only one recommendation we can give you : try it out. Experiment. Learn it. I know you are trying to spend least amount of effort to decide, but that is not possible. If you really want to know if TDD will work in your context, only way to find out is to actually do TDD. If you actually learn it and apply it to your application, you can compare it with your non-TDD process. It might be that TDD actually has advantages and you decide to keep it. Or it can come out that TDD doesn't bring anything new and only slows you down. In which case, you can fall back to your previous process.

score 5 · Answer 3 · answered Apr 02 '18 at 20:07

The main purpose of (unit) tests is safeguarding code, making sure it will not break unnoticed because of later changes. When the code is first written, it will get a lot of attention and it will be scrutinized. And you may have some superior system for that.

Six months later, when someone else is working on something seemingly unrelated, it may break and your super-duper code-correctness-prover will not notice it. An automatic test will.

score 5 · Answer 4 · answered Apr 03 '18 at 00:35

I want to apply TDD practices to this codebase that did not have them until now.

This is the hardest way to learn TDD. The later you test, the more it costs to write tests and the less you get out of writing them.

I'm not saying it's impossible to retrofit tests into an existing code base. I'm saying doing so isn't likely to make anyone into a TDD believer. This is hard work.

It's actually best to practice TDD the first time on something new and at home. That way you learn the real rhythm. Do this right and you'll find it addictive.

The question asks the following: should TDD be applied to every single module,

That is structural thinking. You shouldn't say things like test every function, or class, or module. Those boundaries are not important to testing and they should be able to change anyway. TDD is about establishing a testable behavioral need and not caring how it's satisfied. If it wasn't we couldn't refactor.

or would it be enough to apply them to only modules that were not proven correct?

It's enough to apply them where you find a need for them. I'd start with new code. You'll get much more back from testing early than from late. Don't do this at work until you've practiced enough to master it at home.

When you've shown TDD is effective with the new code at work and feel confident enough to take on the old code I'd start with the proven code. The reason why is because you'll be able to see right away if the tests you're writing are taking the code in a good direction.

My main question is this: Would TDD serve any purpose for code, if I can already prove the code correctness?

Tests don't just prove correctness. They show intent. They show what is needed. They point out a path to change. A good test says there are several ways to write this code and get what you want. They help new coders see what they can do without breaking everything.

Only once you have that down should you wander into the unproven code.

A warning against zealots: You sound like you've achieved success and so will be unlikely to jump in headfirst. But others looking to prove themselves will not be so reserved. TDD can be overdone. It's amazingly easy to create a suite of tests that actually hurts refactoring because they lock down trival and meaningless stuff. How does this happen? Because people looking to show off tests just write tests and never refactor. Solution? Make them refactor. Make them deal with feature changes. The sooner the better. That will show you the useless tests quickly. You prove flexibility by flexing.

A warning against structural categorizing: Some people will insist that a class is a unit. Some will call any test with two classes an integration test. Some will insist that you can't cross boundary x and call it a unit test. Rather than care about any of that I advise you to care about how your test behaves. Can it run in a fraction of a second? Can it be run in parallel with other tests (side effect free)? Can it be run without starting up or editing other things to satisfy dependencies and preconditions? I put these considerations ahead of if it talks to a DB, file system, or network. Why? Because these last three are only problems because they cause the other problems. Group your tests together based on how you can expect them to behave. Not the boundaries they happen to cross. Then you know what you can expect each test suite to do.

I've seen people say they don't want to use TDD because it would have too much overhead, and TDD supporters defend it by saying that once you get used to write TDD all the time there isn't much overhead.

That question already has answers here.

score 1 · Answer 5 · answered Apr 02 '18 at 18:17

Test Driven Development is more about prototyping and brainstorming an API, than testing. The tests created are often poor quality and eventually have to be thrown out. The main advantage of TDD is determining how an API will be used, before writing the API implementation. This advantage can also be obtained in other ways, for example by writing API documentation before the implementation.

Correctness proofs are always more valuable than tests. Tests don't prove anything. However, in order to use correctness proofs productively, it helps to have an automated proof checker, and you will need to work using contracts of some sort (design by contract or contract based design).

In the past, when working on critical sections of code, I would attempt manual correctness proofs. Even informal proofs are more valuable than any automated tests. But you still need the tests, unless you can automate your proofs, as people will break your code in the future.

Automated tests do not imply TDD.

score 0 · Answer 6 · answered Apr 21 '18 at 18:50

A) You reading the code and convincing yourself that it's correct isn't remotely close to proving it's correct. Otherwise why write tests at all?

B) When you change the code you want to have tests run that demonstrate the code is still correct or not.

Berin Loritsch · Answer 7 · 2018-04-03T12:38:36.057

I will caveat by saying that once you are used to using TDD effectively, it will save you time in the end-game. It takes practice to learn how to use TDD effectively, and it doesn't help when you are under a time crunch. When learning how to best make use of it, I recommend starting on a personal project where you have more leeway and less schedule pressure.

You'll find that your initial progress is slower while you are experimenting more and getting your API written. Over time, your progress will be quicker as your new tests start passing without changing code, and you have a very stable base to build from. In the late game, code that is not built using TDD requires you to spend a lot more time in the debugger as you try to figure out what is going wrong than should be necessary. You also run greater risk of breaking something that used to be working with new changes. Assess the effectiveness of TDD vs. not using it by total time to completion.

That said, TDD is not the only game in town. You can use BDD which uses a standard way of expressing the behavior of a full-stack application, and assess the correctness of the API from there.

Your whole argument hinges on "proving code correctness", so you need something that defines code correctness. If you aren't using an automated tool to define what "correct" means, then the definition is very subjective. If your definition of correct is based on the consensus of your peers, that can change on any given day. Your definition of correct needs to be concrete and verifiable, which also means it should be able to be evaluated by a tool. Why not use one?

The #1 win from using automated testing of any sort, is that you can verify your code remains correct even when OS patches are applied quickly and efficiently. Run your suite to make sure everything is passing, then apply the patch and run the suite again. Even better, make it part of your automated build infrastructure. Now you can verify your code remains correct after merging code from multiple developers.

My experience using TDD has led me to the following conclusions:

It's great for new code, difficult for changing legacy systems
You still have to know what you are trying to accomplish (i.e. have a plan)
Slow to start, but saves time later
Forces you to think about how to validate correctness and debugging from a user perspective

My experience using BDD has led me to the following conclusions:

It works for both legacy and new code
It validates the whole stack, and defines the specification
Slower to get up and running (helps to have someone who knows the toolset)
Fewer behaviors need to be defined than unit tests

Definition of Correct: Your code complies with requirements. This is best verified with BDD, which provides a means of expressing those requirements in a human readable fashion and verifying them at run time.

I am not talking about correctness in terms of mathematical proofs, which is not possible. And I am tired of having that argument.

Should I write tests when I can prove code correctness?

7 Answers7