How do you test a function whose sole purpose is to query an external API, but the API uses a complex query syntax?

Question

The only real logic is in the query syntax for the external API. I don't want to test whether it queries the api, I want to test that it queries it in such a way that the correct data will be returned. For example, some pseudo-code:

function retrieve_related_data(id)
{
  query = "[potentially long, syntactically complex query that
            uses param id to get some data]";
  results = api_wrapper.query(query);
  return results;
}

A more concrete example with a made up API:

function retrieveLifeSupportingObjectsWithinRegion(id)
{
  query = "
    within region(" + id + ") as r
    find objects matching hydration>0 and temp_range has 75
    send name, id, relative(position, r)        
  ";
  results = astronomicalObjectApiWrapper.query(query);
  return results;
}

The query is in a syntax custom to the API and is complex and there are multiple ways to achieve the same or similar results. The purpose of the function is not to get data identified by id but to find a subset of other data based on a fuzzy relationship to the data identified by id that also meets a few other requirements. The other requirements are always the same regardless of id but may change over time as the system is modified. For example, if the example api added support for gravity information, we may want to change the query to also use gravity to refine the results. Or maybe we come up with a more efficient way to check the temp range, but it doesn't change the results.

What I want to test is that for a given input id the correct set of data is returned. I want to test this so that if someone messes up the query such that it is no longer returning the correct data based on id that it will fail, but I also want people to be able to modify the query to refine it without needing to also modify the test.

Options I've considered:

I could stub the api, but that would either be too simple (check that the id is present in the query and then return an expected set of data if it is or an unexpected set if not), too brittle (check that the query string is exactly what is in the function), or too complex (check that the query used is syntactically correct and will result in the correct data being returned).
I could submit the query to the real api, but the expected results could change over time as the data in the external system changes, outside of the control of the test system.
I could look at setting up a test install of the real api in order to control the data it has, but that is a lot of effort.

I'm leaning towards #2 and making this more of an integration test that doesn't get run often and seeing how often changes in the data of the external system causes the test to break. I think that would be simplest for now, but I am wondering if there are alternatives that I am not thinking of or better ways to tackle this issue. Any advice would be appreciated.

Laiv · Accepted Answer · 2023-06-27T22:45:03.787

It may seem that by validating the API response you would be testing your function, but it would not be true. You would be testing the API and its environment.

Your tests should focus on validating the behaviour of the code you have written, not the one written by 3rd parties. To some extent, you have to trust in others' code. For the same reason you don't test frameworks and libraries, you won't test external services.

What I want to test is that for a given input id the correct set of data is returned

What would you be testing? As you said, the data and its correctness are not under your control, so you would be conditioning the success of the tests to an external agent over which you have no influence. In these circumstances, tests become non-deterministic and definitively, you don't want that to happen.

You could, instead, test contracts¹ which don't involve your code. A simple and homemade implementation is possible with Postman and Newman. Once requests are recorded, and test programmed it only takes you to schedule the execution. The idea is to execute these tests in different pipelines and at a different time and pace than the rest of the code.

I want to test this so that if someone messes up the query such that it is no longer returning the correct data based on id that it will fail

What if the query is ok, but the data changed or is wrong due to bugs in the API? Not only the data is out of our control, the logic too. Moreover, what you have considered "correct" can be eventual. Maybe, the "correct" state of the data is achieved later without you knowing when, how or why.

Implementing functional and end-to-end tests may help here. You could address these tests so that if the API returns wrong data, you see it by detecting misbehaviour in your code.

But I also want people to be able to modify the query to refine it without needing to also modify the test.

I would suggest implementing an instrumental test. Instrumental tests are test classes, executed as test classes but they aren't included in the test plan.

Command line shells or Python scripts can be good alternatives too. One way or another, make it an instrument. One everyone can run, at any time without being forced to know the rest of the source code.

Related questions:

Integration testing : testing service to service

^{1: You might be interested in @DocBrown's answer regarding this topic}

score 3 · Answer 2 · answered Apr 03 '17 at 11:25

I have seen unit checks which check the generated query string matches an expected value.

However. This was in my opinion if limited use. The query syntax was complicated, possibly buggy, so A there were endless possibilities to check and B even if the string was 'correctly' generated, unexpected results could be returned in the live environment.

I think you are right to go for your option 2. run integration tests against the live instance.

As long as they are non destructive, these are the first tests you should write, as they will catch, although not identify the cause of, any error.

Going for option 3 'deploy a test instance with dummy data' is superior. But doesn't affect your test writing, as you can point the same tests at the test server if and when it becomes a good use of time to deploy one.

score 0 · Answer 3 · answered Apr 03 '17 at 13:03

It depends on the API, but if possible, go for option #3 (private testing instance).

Stubbing the API (option #1) is the worst option, because of the reasons you mentioned, and going this route will probably do more harm than good (lots of wasted time).

Running against the real API (option #2) makes the tests flaky and unreliable, and after a few false positives people will just stop using them. Not only can the data change, but the service might be down as well. In my opinion, this is akin to having no tests for the queries, and relying on integration/system tests to find the issues. That said, if the API data rarely changes and the API itself is almost always up, then this might be a viable option. Most APIs don't fit this description.

Eventually it comes down to how important and complex these queries are: if there are more than a handful, and some of them are complex enough that you feel the need to test them, I would invest the effort of setting up a private instance for testing. It will pay for itself just like other unit tests.

How do you test a function whose sole purpose is to query an external API, but the API uses a complex query syntax?

3 Answers3