12

I read an article about unit testing by Vladimir Khorikov, written in 2020. Here is The Article

https://vkhorikov.medium.com/dont-mock-your-database-it-s-an-implementation-detail-8f1b527c78be

It said that you shouldn't mock an (internal) application database because it's a managed dependency. Here is the relevant part:

Managed dependencies — out-of-process dependencies you have full control over. These dependencies are only accessible through your application; interactions with them aren’t visible to the external world. A typical example is the application database. External systems don’t access your database directly; they do that through the API your application provides.

And here is the part where it said that managed dependencies should use real instances:

Only unmanaged dependencies should be replaced with mocks. Use real instances of managed dependencies in tests.

However, in Khorikov's book "Unit Testing Principles, Practices, and Patterns", also from 2020 , the author said that unit test should isolate from each other and should not reach out to a shared state.

Here's the part of the book where the author said it:

Isolating tests from each other means it’s fine to exercise several classes at once as long as they all reside in the memory and don’t reach out to a shared state

So, which one is the right one? How can you use real instances of the managed dependencies/database when the criteria of a unit test should be isolated from each other?

Is there any mistake in my understanding?

Doc Brown
  • 218,378

9 Answers9

42

After finally getting access to a copy of the book, I found the citation

Isolating tests from each other means it’s fine to exercise several classes at once as long as they all reside in the memory and don’t reach out to a shared state

in chapter 2 "What's a unit test?" (specifically in 2.1.2 "The isolation issue: The classical take").

The explanation about "not mocking an application database because it's a managed dependency", however, is in chapter 10, which belongs to part 3, "Integration testing".

This clear distinction has been lost in that Medium article, and it should answer your question: when writing "classical" unit tests, it is fine to mock your database, but when writing integration tests, try to treat internal databases as implementation details.

Hence, my recommendation: don't take any such blog article too literally without going through the details of their primary sources (even when both come from the same author).

Doc Brown
  • 218,378
13

The source material you're reading is plain wrong. Unit tests mock external dependencies. Who designed this dependency is irrelevant. If it's external to the unit under test, it shouldn't be included as a real component in those unit tests.

The most optimistic interpretation of this guide might be that they got confused between unit and integration tests. But that is such a straightforward easy-to-spot mistake to make, which is really egregious in a guide that otherwise attempts to rigorously defined everything to then give you specific rules about every specific thing.

I've never seen the value in distinguishing between the different kinds of fake dependency, but this article goes to great lengths to define it all. I disagree with that take, not so much the exercise of naming things in and of itself (to each their own), but the undertone that there are very different treatments based for all these fake dependency variants.

In my opinion, it's an act of creating an arbitrary distinction only to then be able to sell your guide that helps you navigate said arbitrarily complex maze of rules. The article's content feels bloated with the intention of appearing to convey more information than is actually relevant in practice.
The article ending itself on a cliffhanger that suggests you should read (and therefore buy) their book seems to further support this interpretation of the article's content.

For this article to take such a declarative stance with such granularity, and then plainly fail on the advice on what should be faked in a unit test, makes it a really bad article. Them doubling down on the "because you control/manage it" argument further compounds how misguided this article is.

I'm not saying everything in this article is wrong, but I wouldn't trust any of it based on this article alone.

Flater
  • 58,824
10

I think the inconsistency you see is just a misunderstanding. When he says you shouldn't mock the database he doesn't mean that all tests should run against a real persistent database. If that was the case, a test might change data in such a way it would affect the following tests. This is a no-go.

More likely means something like using the real database engine, but creating fresh data per test and reset the data after each test. For example you could have an SQL script which drops all tables and then recreated them in a known state, before each test. This is a common pattern for automated tests.

When he talks about shared state in the third quote, it is about isolating tests against each other, i.e. a test should not affect shared state in such a way that it might affect the outcome of other tests.

Regarding the question "is it okay to mock a database when writing unit test?" - yes it is OK in the sense that you wont get arrested for it. But as a general rule, the less you mock the better. Using mocks easily lead you down the road where your tests are just testing the mocks rather then the actual code.

JacquesB
  • 61,955
  • 21
  • 135
  • 189
4

Important lesson: your confusion is not your fault. The literature sucks.


So part of the problem you are running into is that there are many definitions of "unit test", which aren't all consistent with each other. Furthermore, there are multiple reasons that someone might want a "unit test", and the different reasons that you might want them introduce different constraints on the design of the test.

Design is what we do to get more of what we want than we get by just doing it.

The constraints of "unit tests" follow from the "what we want".


For example, if your "unit tests" are tests written by developers for developers (for example, when practicing Test Driven Development), then one of the important constraints is that the suite of tests should be eyeblink fast (because they are going to be run so often, and we need to ensure that running the tests doesn't introduce a delay long enough to trigger a context shift).

So when Michael Feathers wrote in 2005:

A test is not a unit test if:

  • It talks to the database....

Feathers is working specifically in a context where the developer's attention is on improving the internal quality of the design; the extra overhead of talking to a live database, even a managed one, puts the flow in jeopardy.


On the other hand, if you are creating unit tests for testing (for example, to catch errors before your changes are automatically deployed to production), then different constraints are in play -- we don't have to worry about our test automation getting distracted by testing delays, and this eases some of the constraints that impact the design of our "unit tests".


As for why Khorikov's answers seem to be inconsistent... well, the generous answers are that he learned something during the interval between writing those things, or there may have been some subtle difference in context that changes his recommendations.

Less generously: perhaps he's just wrong. It happens.

VoiceOfUnreason
  • 34,589
  • 2
  • 44
  • 83
2

If you mock the DB in your test, make that clear.

Don’t expect me to know that because you called it a unit test. I’ve seen things called a unit test for no better reason then that they were run with jUnit.

If you mock something out you aren’t testing it. If you don’t test it don’t act surprised if what it does surprises you.

Maybe it’s not yours. Maybe someone else tested it and you trust them. Fine. Just make that clear.

Maybe you worked real hard to make sure the code under test was deterministic so the test would be reliable. Fine. Make that clear.

Just because a test exists doesn’t mean it has to test anything besides what it cares about. Just make clear what it cares about. That way we know what has and hasn’t been tested.

That’s what’s important. Not this unit vs integration stuff. That is literally arguing semantics.

candied_orange
  • 119,268
0

[...] Here is the relevant part:

[...] A typical example is the application database. External systems don’t access your database directly; they do that through the API your application provides.

It's actually difficult to fully make sense of what the book author Khorikov is saying, or even what context he is working in and where his remarks might make contextual sense.

An "application database" is in general accessed by multiple clients or multiple users at once, concurrently.

Whilst database engines can be pressed into all kinds of service, a single-user, single-thread database application is the exception rather than the rule. And such usage is the simplest possible usage of a database engine which involves the least amount of design complexity and requires the least amount of professional advice.

And database engines already inherently provide an API, and independent management tools, and direct backup and restore, and so on.

Whilst it's true (and common) that you can provide a further, application-specific API in front of the general API which a database engine provides, and whilst it may be true that nothing but instances of your own application would be accessing the database so that no "external systems" access it directly (save for supervisor activity occuring directly against the database via its own API and management tools), it's a very big stretch to characterise any part of the client application plus the database as together being a "unit" in the context of unit testing.

There's actually a risk here that Khorikov is talking so much nonsense and using terms in such strange ways that we're completely disoriented by it in Goebbelsian style, and can't even readily point out how it is false.

A more charitable interpretation might be that there is some kind of context missing - either not reproduced in the question because Khorikov has been selectively quoted, or because Khorikov has tackled something too complicated for his own capability as an author, so that legitimate thinking comes across in a confused way to the reader.

But any road, there seems enough to say that Khorikov can't be taken at face value, and that what he means is simply not clear.

The sky won't fall in just because you are "unit testing" things that are not in fact units in the conventional sense. But you might end up with a confused idea of what other people mean by "units", and you might end up struggling to apply other advice about "unit tests" which assume a conventional meaning of the word.

Steve
  • 12,325
  • 2
  • 19
  • 35
-1

Database state can be isolated

Isolating tests from each other means it’s fine to exercise several classes at once as long as they all reside in the memory and don’t reach out to a shared state

While the database as a whole is shared state, one can isolate the unit tests from each other. To use an analogy, the filesystem is shared state, but if you have a unit test which needs to read and write files the unit tests can create its own temp directories and temp files to isolate themselves from each other.

Similarly, unit tests which must read and write to a database can isolate themselves using transactions. One could make a new database and load a fresh schema for every unit test, but this rapidly gets expensive. For efficiency, the test suite begins by creating and initializing a schema. Unit tests can share the schema because it (usually) will not be changed; you can ensure this by removing the test's permission to make schema changes. Unit tests then isolate their database changes with transactions and other means.

Use the Repository Pattern to isolate database queries

If I'm using an ORM and wish to test a method on a model representing a table, it makes little sense to mock the database. An ORM model is intimately tied to the database. The whole point is whether or not the queries work. Eventually, one has to test the real database calls.

But what if you have a query in a method that's not intimately tied to the database? For example, one that displays the top 10 users. This might feature a database call, but it isn't really about the call. This is a good candidate for extracting that query into its own method. You could make it part of a model, but this runs the risk of fat models.

Better would be to use the repository pattern to add a layer between the data and the application. This isolates all your queries in one place, and it decouples the design from the database schema. The application layer asks repository classes to speak to the database. Unit testing the application mocks the repository classes.

Only unit tests for the repository classes use a real database; they must because the repository classes are intimately tied to the database and schema.

The result is you never have to mock the database itself, only the repository classes.

Schwern
  • 1,192
-1

Terminology, and personal preferences/project guidelines aside, it depends.

Let's take two scenarios:

Function A

This function is expected to do some complex calculations and then do a simple CRUD to a table. The "brains" are in the client code and the database only serves to save state.

A mock, if commonly used in the project, is reasonable enough.

Function B

This function either calls a stored procedure or it launches one or several complex queries in order to compute some results. The "brains" are more in the queries themselves.

If your app is database-centric, it doesn't make as much sense to use a mock because most of the actual real life complexity of the application resides in the database.

Other factors

  • Are the database or queries read-only? That would lessen the need for Mocks

  • How "heavy" is your database? Dealing with a multi-GB ERP test database with millions of rows in hundreds of tables is not the same thing as using Django tools to quickly setup a few dozen fake tables and data derived from its ORM-backed schema. That's true for both the costs of managing system refreshes between tests as well as the likelihood that using Mocks will hide problems under much later.

  • Do you/your project practice a lot of Mocking? Unit testing vs Integration testing?

Some well regarded developers are on the record as dismissing Mocks for hiding and/or inducing complexity. While others tend to privilege integration testing, i.e. exercising a number of components to produce an externally observable unit of work over unit testing - exercising a sub-component's internal logic.

Both forms of tests could be using the same testing frameworks - say the JUnit family - and just differ in emphasis. Still unit testing leans more towards mocking, while integration testing leans more towards interaction with actual databases.

  • Even projects that are more focused on integration testing can also find value in granular unit testing: if you have a particularly hairy function you may want to exercise with numerous calls with different arguments and edge cases. Those don't really benefit all that much from repeatedly calling a database to really update it. A Mock would be preferable in that case.

  • That would doubly apply if you planned to stress it some more with automated calls from a fuzzer.

JL Peyret
  • 185
-2

Forget the article, do not mock a database. Fake a database instead.

Database needs a flow. When you save something, you need to be able to read it afterwards. Mocks do not allow that, unless you specify it yourself, which is why mocks are not suitable for a database. Fake a database using a simple structure, that allows you to read your data after you save it, and is able to perform logic of your queries by code (like filtering, pagination etc.).

(maybe off-topic, but someone may read this question/answers and remember 'always mock a database in unit tests', which is wrong)

Shadov
  • 139