2

so the simplification in the algebrizer is supposed to convert some of the subqueries to inner or outer joins. for example, i tried this

DBCC TRACEON(3604)
USE ContosoRetailDW
BEGIN TRAN 
SELECT ProductKey
FROM DimProduct 
WHERE ProductKey IN (SELECT ProductKey 
                     FROM FactSales
                     WHERE SALESKEY <2) 
OPTION(RECOMPILE,QUERYTRACEON 8605)
COMMIT TRAN 

the output was

*** Converted Tree: ***
    LogOp_Project QCOL: [ContosoRetailDW].[dbo].[DimProduct].ProductKey
    LogOp_Select

        LogOp_Get TBL: DimProduct DimProduct TableID=117575457 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

        ScaOp_SomeComp 2

            ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[DimProduct].ProductKey

            LogOp_Project

                LogOp_Select

                    LogOp_Get TBL: FactSales FactSales TableID=1602104748 TableReferenceID=0 IsRow: COL: IsBaseRow1002 

                    ScaOp_Comp x_cmpLt

                        ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[FactSales].SalesKey

                        ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=2)

                AncOp_PrjList 

    AncOp_PrjList 


if I re-write it like

BEGIN TRAN;
SELECT DISTINCT dp.ProductKey
FROM DimProduct dp
JOIN FactSales fs ON dp.ProductKey = fs.ProductKey
WHERE fs.SalesKey < 2
OPTION (RECOMPILE, QUERYTRACEON 8605);
COMMIT TRAN;

I get

*** Converted Tree: ***
    LogOp_GbAgg OUT(QCOL: [dp].ProductKey,) BY(QCOL: [dp].ProductKey,)
    LogOp_Project

        LogOp_Project QCOL: [dp].ProductKey

            LogOp_Select

                **LogOp_Join**

                    LogOp_Get TBL: DimProduct(alias TBL: dp) DimProduct TableID=117575457 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

                    LogOp_Get TBL: FactSales(alias TBL: fs) FactSales TableID=1602104748 TableReferenceID=0 IsRow: COL: IsBaseRow1001 

                    ScaOp_Comp x_cmpEq

                        ScaOp_Identifier QCOL: [dp].ProductKey

                        ScaOp_Identifier QCOL: [fs].ProductKey

                ScaOp_Comp x_cmpLt

                    ScaOp_Identifier QCOL: [fs].SalesKey

                    ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=2)

            AncOp_PrjList 

        AncOp_PrjList 

    AncOp_PrjList 


So, the question is: How can I get the LogOp_Join in the first query?

I tried creating a similar table with 50 million rows still did not happen. What are the conditions that make that happen?

Paul White
  • 94,921
  • 30
  • 437
  • 687
Suleyman Essa
  • 167
  • 1
  • 8

1 Answers1

8

Your first query contains a subquery, which is unnested (RemoveSubqInSel) to an apply semi join, then transformed (ApplyHandler) to a semi join.

*** Simplified Tree: ***
LogOp_LeftSemiJoin
    LogOp_Get TBL: DimProduct DimProduct
    LogOp_Select
        LogOp_Get TBL: FactSales FactSales
        ScaOp_Comp x_cmpLt
            ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[FactSales].SalesKey
            ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=2)
    ScaOp_Comp x_cmpEq
        ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[DimProduct].ProductKey
        ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[FactSales].ProductKey

A later cost-based transformation considers converting that to a join (via introducing a grouping/distincting operation).

The second query is quite different as it starts with a logical join.

Simplification does not occur in the algebrizer. There are many stages of early transformation and, optionally, cost-based exploration.

The output you showed was just at an early stage of a complex process.

Paul White
  • 94,921
  • 30
  • 437
  • 687