4

Consider this as example:

SELECT TOP 1 FROM customers
WHERE firstname LIKE 'John%'
   OR id IN (SELECT id_customer 
             FROM customer_emails 
             WHERE email like 'john.doe@%' )

Does query optimizer manage to prioritize the first clause, avoiding to do the subquery if the simpler clause is true for at least 1 record (from using Top 1)?

I am mainly asking this because the above query can be rewritten as Join, which technically would be faster according to Join vs. sub-query. However if the subquery is avoided, the above example is clearly superior to a Join one in situations where the first clause is likely to be true.

2 Answers2

7

Does query optimizer manage to prioritize the first clause, avoiding to do the subquery if the simpler clause is true for at least 1 record.

It can do

create table customers(id int identity primary key, firstname varchar(10)) ;
insert into customers values ('John'),  ('Johnathan'), ('George');
create table customer_emails(id_customer int, email varchar(100));
insert into customer_emails values (1, 'John@example.com'),  (2, 'Johnathan@example.com');

In the plan I got the first row matched the LIKE 'John%' predicate and the scan against customer_emails was not executed at all.

enter image description here

However your question is phrased as

if the simpler clause is true for at least 1 record

That would imply that the simpler WHERE clause is evaluated in its entirety and only if that fails is the second one evaluated.

For

SELECT TOP 1 *
FROM customers
WHERE firstname LIKE 'George%'
   OR id IN (SELECT id_customer 
             FROM customer_emails 
             WHERE email like 'George.doe@%' );

Three rows were processed before one was found matching LIKE 'George%'and there were two ensuing scans on customer_emails

Rewriting as follows...

create index ix on customers(firstname) include(id)

SELECT TOP 1 *
FROM customers with(forceseek)
WHERE firstname LIKE 'George%'
   OR  firstname NOT LIKE 'George%' AND  id IN (SELECT id_customer 
             FROM customer_emails 
             WHERE email like 'George.doe@%' );

... happens to give a plan where the operators in the shaded area for the IN part don't get executed if the simpler predicate returns a row but this plan isn't guaranteed.

enter image description here

Martin Smith
  • 87,941
  • 15
  • 255
  • 354
1

You said join and I don't think this can be done with an inner join

Top 1 is non deterministic without an order by

Typically you want to let the query optimizer do its job
It will use statistics and other logic to build an efficient plan

I think this is how I would write it
The way to test is to look at the query plans

SELECT distinct c.* 
FROM customers c 
LEFT JOIN customer_emails ce 
  on ce.id_customer = c.ID
 and ce.email like 'john.doe@%'
WHERE c.firstname LIKE 'John%'
   OR ce.ID is not null 

Or

SELECT c.* 
FROM customers c 
WHERE c.firstname LIKE 'John%'
   OR exist ( select 1 
              from customer_emails ce 
                on ce.id_customer = c.ID
               and ce.email like 'john.doe@%' )
paparazzo
  • 5,048
  • 1
  • 19
  • 32