11

From my experience, sql code changes almost always tend to be NOT incremental: someone creates a new stored procedure, or modifies an entire embedded sql query for optimization purposes, or creates a brand new table. When I receive one of these code review requests, I could not find any way except understanding the entire sql query. Often times these are really long nested queries, sometimes calling other procedures. Comprehending these changes and verifying them becomes a huge task. Then I am left with three options:

  1. approve without reviewing carefully;
  2. spend a significant amount of time to go line by line, understand the data model, run the query on some test system to see what it produces;
  3. ask the person to walk me through the changes.

I do not want to do 1 or 3. However, I also do not want to spend hours figuring out what the whole query is doing and whether the new version is equivalent but works faster.

Are there any technologies or tools or methodologies to ease this process? How do others perform such reviews without going through too much pain?

One fundamental difference from regular code changes seems like sql changes more frequently become (at least conceptually) entire rewrites which makes the problem more frequent to encounter - for me almost every sql review I do every single day.

Any recommendations are greatly appreciated.

Update:

You can reject a review or tell the author rewrite the change in more clear way, or add more comments or tests. However, what I am looking for is a way as a reviewer to actually take care of the review as it is. Most of the answers so far about improving the code change so that I can review more easily. That is obvious. On the other hand, the practical issue here is that I need to address the review without pushing it back to the author, without changing author behavior or restructure the organization.

I am also hoping for a sql specific recommendation, not some generic code review recommendation. I am hoping that may be there are tools that help the reviewer no matter how bad the code base is and how bad the code writing practices are.

Personally, I worked on many big and small companies as a developer close to 20 years. I maintain open source projects and I have a PhD in Computer Science. I would appreciate some help that says more than “change others and your world gets better” type of non pragmatic answer.

CEGRD
  • 235

4 Answers4

11

You may not like this, but:

This is not a problem to be solved easily by additional technologies or tools.

An SQL which contains "long nested queries, sometimes calling other procedures" and cannot be easily understood should at least have proper indentation and comments. Otherwise, it will become an unmaintainable mess. So if it is really that hard to understand the query line-by-line, reject the change and ask the author for making it more readable.

And asking authors to explain their code during a review is not the worst thing , quite the opposite. You could both together go through the code and add the missing comments together.

Of course, the technological solution to this would be to avoid writing SQL queries at all and switch to something like an ORM which generates most queries automatically. But unfortunately, this is not a realistic solution for many real-world systems, since it could mean to rewrite large parts of the system new from the ground (and often ORMs introduce their own class of new problems, sometimes they introduce more problems than they solve).

Doc Brown
  • 218,378
3

SQL queries are complex and brittle code that I experienced myself to be quite difficult to review. We do have a good portion of our codebase written as SQL templates (for good reasons I won't elaborate on), so I completely understand how the review process can become difficult (and yet, especially valuable), despite our continuous efforts to code quality.

What we did already have and helps a bit is to have code standards for SQL queries : normalized indentation levels, table naming conventions and so on.

But what did eased the process tremendously in our case, is to introduce unit testing of all modified/new SQL queries in our codebase alongside with code to be reviewed. This gives you a quick insight to what the query should be doing and confidence that it is correct in all cases. If the test suite doesn't give you this guarantee, the code should be rejected anyway.

Right now, I read SQL queries mostly to ensure conventions are followed and performance looks decent. Code correction and robustness is almost always better covered by an up-to-date test suite.

Diane M
  • 2,116
2

I guess my thoughts on reading your question are

  1. Are you defining what a code review is supposed to check?
  2. Why do you perceive a difference between incremental and non-incremental changes?
  3. omg stop putting business logic in sprocs

For me a code review checks a defined list of things, are there tests, is the CI pipeline configured correctly, is the style guide being followed, does the work actually do what was asked etc etc. Whatever, but its a list of checks that anyone can do on any code.

So, if your SQL change has tests, the tests pass, the test cover enough edge cases and the code does what the ticket asked for... Code Review passed!

In terms of understanding the impact of the change, sure you change the sproc you are calling but not the sproc code thats a big change that you can't understand from looking at the code.

But if even a small code change can break everything, a missing bracket or null check, an integer overflow etc Or a business logic change, which would code wise would seem correct either way but one way is required.

This is why you have tests. So you don't have to follow the code through and work out if it right. You run the test and it passes or doesnt and also the test explains what the code is supposed to do. The sproc updates the order, the select returns office around the world etc. You don't have to work out what the code is attempting because it's right there Assert.AreEqual(results, listOfOfficesinTheWorld)

My guess is that you have too much logic in sprocs which are maybe not even in source control and don't have tests, including performance tests for them.

I would suggest that you start to move logic out of the data layer, keep your sql simple, adding and removing data and have the manipulation logic in code with lots and lots of unit tests.

Update in response to changes:

I still think my answer applies. You want a :

tools or methodologies to ease this process (point 2)

Automating point 2 is equivalent to writing tests.

understand the data model,

Compare the before change and after change data models. If you have a test the commit to that test will show the incremental, or ground up change

run the query on some test system to see what it produces;

Run the sql, compare the result against expected. If you automate that there's your test right there and the commit showing any change in the expected result shows you the incremental or group up change. Naming and comments in the test explain the change EnsureCustomerIsAlwaysRight() SharesCanBeSoldInUnder100ms() etc

Expecting a tool to write the test for you is asking a bit much. If your code follows a pattern you might be able to template some basic stuff. But the expected result of your processes is going to be bespoke to your application. You are always going to have to fill in the hard bit manually.

Either fail the review on the basis that these tests aren't present, or as part of your current manual testing process, write the tests yourself. Then at least next time your job is easier.

This is the pragmatic answer. It's hard work and will take time, but it's simple to implement and understand, can be implemented incrementally, forces you to move towards separation of concerns and it's the normal solution to fix this common problem.

Ewan
  • 83,178
1

Gotta be honest; I'm not a fan of these kinds of systems. They are very difficult to maintain, for several reasons which you have already articulated. Your review process is not the problem; your architecture is the problem.

As I see it, your choices are twofold:

  1. Begin having architectural reviews on a regular basis so that everyone stays familiar with your tables and queries, and provide adequate documentation and training that explains how it all works, or

  2. Shift your simple (CRUD) queries to the front end or middleware application and do away with the stored procedures, reserving those mostly for batch and server operations.

Robert Harvey
  • 200,592