2

In the article 45 Ways To Speed Up Firebird Database, it states that

  1. Use derived tables to optimize SELECT with ORDER BY/GROUP BY

Another way to optimize SQL query with sorting is to use derived tables to avoid unnecessary sort operations. Instead of

SELECT FIELD_KEY, FIELD1, FIELD2, ... FIELD_N
FROM T
ORDER BY FIELD2

use the following modification:

SELECT T.FIELD_KEY, T.FIELD1, T.FIELD2, ... T.FIELD_N FROM
    (SELECT FIELD_KEY FROM T ORDER BY FIELD2) T2
JOIN T ON T.FIELD_KEY = T2.FIELD_KEY

I assume T.FIELD_KEY means a primary key column.

In other words, it shunts the sorting to a subquery, and then 'self-joins' to get the rest of the necessary columns.

As I understand it, this will let Firebird do the sorting on a narrower resultset which would use less memory and be more cache-friendly, but surely it shouldn't give that much a boost? Especially when it needs to fetch the other columns afterwards, potentially in a random I/O pattern too.

On the other hand, this reeks of undefined behaviour in that there's no guarantee (in standard SQL) that the engine would retain the order after the join -- it could just join the other way, wouldn't it?

Are my guesses correct? Am I missing something?

oals
  • 121
  • 2

1 Answers1

1

This is very much undefined behaviour, and exploits how the optimizer in Firebird works (or possibly, worked). Personally, I wouldn't rely on such hacks, without having tests in place that verify this when migrating to newer Firebird versions (and even then, it's still risky, as the optimizer might make different choices depending on statistics).

It relies on the fact that - at least usually - the optimizer picks the first (left-hand) table from an inner join as the driving table, and then retrieves rows from the second (right-hand) table ("nested loop join" algorithm). However, with new join algorithms ("merge join" and "hash join") introduced in recent Firebird versions, I'm not 100% sure it will still always do this.

To be honest, I'm not entirely sure why it would be a lot faster than the original query though. Maybe it has to do with the amount of memory needed for the entire result set before sorting, which probably spills over into a temporary file (especially with the very conservative default settings), while for the second query, the derived table can probably be held in memory.

Additionally, for the first query, if the in-memory image of the row is very big (larger than InlineSortThreshold setting, which defaults to 1000 bytes), then Firebird will throw away all columns not in the ORDER BY, and refetch them later. I'm not sure if this happens immediately after sorting or is delayed until fetch, but if immediately after sorting, that could also explain the additional execution time.

The fetching of the joined data in the second query would - probably - be delayed until actual fetch (which comes after the execute completes), so you don't see the cost of that random IO in the query execution itself.

Mark Rotteveel
  • 1,236
  • 7
  • 17