Query performance

Question

Why

SELECT Barraportfolioname
FROM   portfolio
WHERE  id IN (SELECT DISTINCT i1.portfolioid
              FROM   Import i1
                     LEFT OUTER JOIN Import i2
                       ON i1.PortfolioID = i2.PortfolioID
                          AND i2.ImportSetID = 82
              WHERE  i1.ImportSetID = 83
                     AND i2.ID IS NULL)

is taking 0 seconds whereas following query is taking 5 seconds.

SELECT DISTINCT p.BarraPortfolioName AS name
FROM   Import i1
       INNER JOIN Portfolio p
         ON p.ID = i1.PortfolioID
       LEFT OUTER JOIN Import i2
         ON i1.PortfolioID = i2.PortfolioID
            AND i2.ImportSetID = 82
WHERE  i1.ImportSetID = 83
       AND i2.ID IS NULL;

I'm using SQL Server. Both the tables have indexes on all the columns which are being used in query i.e. portfolioid, id,importsetid.

Edit by gbn, based on OP comment

They said:

This works much better than previous two queries.

select
   BarraPortfolioName
from 
   (
   select distinct p.BarraPortfolioName,portfolioid
   from
        import i1
        inner join 
        portfolio p on p.id=i1.portfolioid
   where 
        importsetid = ?
   ) as p1
   left outer join
   (
   select distinct portfolioid 
   from import 
   where importsetid = ?
   ) as p2 on p1.portfolioid = p2.portfolioid
where 
   p2.portfolioid is null

score 9 · Answer 1 · edited May 23 '17 at 12:40

This assumes that both give the same results

The 1st is a "semi-join" because of the IN (subquery) (the DISTINCT isn't needed)
This means the subquery can "short circuit"
The 2nd is an outer join then restriction, followed by a DISTINCT aggregate.
This is 3 main discrete operations

This "short circuit" is the main reason for the difference even with the outer join in the sub query.

For simpler queries, the 2nd query would be optimised to the same plan as the 1st because it is semantically the same. More likely the later the version etc

See this for more (same logic, just reversed): The use of NOT logic in relation to indexes

And this about "IN vs. JOIN vs. EXISTS" from SO user Quassnoi on his site

And a similar SO example: https://stackoverflow.com/a/7221395/27535

score 4 · Answer 2 · answered Dec 12 '11 at 04:13

Have a try with:

SELECT
   (select p.BarraPortfolioName from Portfolio p where p.ID = i1.PortfolioID) AS name  
FROM Import i1  
WHERE i1.ImportSetID = 83 
group by i1.PortfolioID  
HAVING NOT EXISTS     
    (SELECT * FROM Import i2 
    WHERE i1.PortfolioID = i2.PortfolioID 
    AND i2.ImportSetID = 82);

Because what you're wanting to do is find the distinct i1.PortfolioIDs, filter that list for ones that weren't in importset 82, and show the name for those ids.

score 3 · Answer 3 · answered Dec 08 '11 at 11:47

If you have no unique index on BarraPortfolioName it needs to check through them all for duplicates, which is unnecessary in the first query because your IN clause handles that for you.

Try creating a unique index (or constraint) and seeing if this does the trick. Or else change the second script by dropping the distinct and putting GROUP BY p.id, p.BarraPortfolioName at the end. This should let it short-circuit the distinctification process.

Query performance

3 Answers3