35

Should I write unit tests for complex regular expressions in my application?

  • On the one hand: they are easy to test because input and output format is often simple and well-defined, and they can often become so complex so tests of them specifically are valuable.
  • On the other hand: they themselves are seldom part of the interface of some unit. It might be better to only test the interface and do that in a way that implicitly tests the regexes.

EDIT:

I agree with Doc Brown who in his comment notes that this is a special case of unit testing of internal components.

But as internal components regexes have a few special characteristics:

  1. A single line regex can be really complex without really being a separate module.
  2. Regexes map input to output without any side effects and hence are really easy to test separately.
Lii
  • 470

6 Answers6

103

Testing dogmatism aside, the real question is whether it provides value to unit test complex regular expressions. It seems pretty clear that it does provide value (regardless of whether the regex is part of a public interface) if the regex is complex enough, since it allows you to find and reproduce bugs and prevent against regressions.

JacquesB
  • 61,955
  • 21
  • 135
  • 189
21

Regex can be a powerful tool, but it is not a tool you can trust to just still work if you make even minor changes to complex regexes.

So create lots of tests that documents the cases that it should cover. And create lots of tests that documents cases it should fail, if it is used for validation.

Whenever you need to change your regexes you add the new cases as tests, modify your regex and hope for the best.

If I were in an organization that in general didn't use unit tests, I would still write a test program that would test any regex we'd use. I would even do it on my own time if I had to, my hair does not need to lose any more colour.

Bent
  • 2,596
3

Regular expressions are code along with the rest of your application. You should test that the code overall does what you expect it to do. This has several purposes:

  • Test are runnable documentation. It clearly demonstrates what you need the code to do. If it is tested it is important.
  • Future maintainers can be certain that if they modify it, the tests will ensure that the behavior is unchanged.

As there is an extra hurdle to overcome by having code in a different language embedded with the rest, you most likely should give this extra attention for the benefit of maintenance.

1

In short, you should test your application, period. Whether you test your regex with automated tests that run it in isolation, as part of a bigger black box or if you just fiddle around with it by hand is secondary to the point that you need to make sure it works.

The main advantage of unit tests is that they save time. They let you test the thing as many times as you like now or at any point in the future. If there's any reason at all to believe that your regex will at any point be refactored, tweaked, get more constraints etc, then yeah, you probably want some regression tests for it, or when you do change it, you'll have to go through an hour of thinking through all edge cases so you didn't break it. That, or you learn to live with being scared of your code and simply never change it.

sara
  • 2,579
-1

On the other hand: they themselves are seldom part of the interface of some unit. It might be better to only test the interface and do that in a way that implicitly tests the regexes.

I think with this you answered it yourself. Regexes in a unit are most likely an implementation detail.

What goes for testing your SQL probably also goes for regexes. When you change a piece of SQL, you probably run it through some SQL client by hand to see if it yields what you expect. The same goes for when I change a regex I use some regex tool with some sample input to see if it does what I expect.

What I find useful is a comment near the regex with a sample of text which it should match.

-5

If you have to ask, the answer is yes.

Suppose some FNG comes along and thinks he can "improve" your regex. Now, he's a FNG, so automatically an idiot. Exactly the kind of person who should not touch your precious code under any circumstances, ever! But maybe he's related to the PHB or something, so there's nothing you can do.

Except you know the PHB is going to drag you kicking and screaming back to this project to "maybe give the guy some pointers about how you made this mess" when everything goes bad. So you write down all the cases that you have carefully considered when building your beautiful masterwork of expressiondom.

And since you've written them all down, you're two-thirds of the way to having a set of test cases, since - let's face it - regex test cases are dead easy to run once you've got the framework built.

So now, you have a set of edge conditions, alternatives, and expected results. And suddenly the test cases are the documentation just as promised in all those me-too Agile blog posts. You just point out to the FNG that if his "improvement" doesn't pass the existing test cases, it's not much of an improvement, is it? And where are his proposed new test cases that demonstrate some problem with the original code, which since it works he doesn't need to be modifying, ever!!!

aghast
  • 117