11

I'm writing on an upcoming blog post of mine on ranking and aggregate window functions, specifically the Segment and Sequence Project iterators. The way I understand it is that Segment identifies rows in a stream that constitute the end/beginning of a group, so the following query:

SELECT ROW_NUMBER() OVER (PARTITION BY someGroup ORDER BY someOrder)

Will use Segment to tell when a row belongs to a different group other than the previous row. The Sequence Project iterator then does the actual row number calculation, based on the output of the Segment iterator's output.

But the following query, using that logic, shouldn't have to include a Segment, because there's no partition expression.

SELECT ROW_NUMBER() OVER (ORDER BY someGroup, someOrder)

However, When I try this hypothesis both these queries use a Segment operator. The only difference is that the second query does not need a GroupBy on the Segment. Doesn't that eliminate the need for a Segment in the first place?

Example

CREATE TABLE dbo.someTable (
    someGroup   int NOT NULL,
    someOrder   int NOT NULL,
    someValue   numeric(8, 2) NOT NULL,
    PRIMARY KEY CLUSTERED (someGroup, someOrder)
);

--- Query 1:
SELECT ROW_NUMBER() OVER (PARTITION BY someGroup ORDER BY someOrder)
FROM dbo.someTable;

--- Query 2:
SELECT ROW_NUMBER() OVER (ORDER BY someGroup, someOrder)
FROM dbo.someTable;
James Rhoat
  • 1,577
  • 4
  • 12
  • 32
Daniel Hutmacher
  • 9,173
  • 1
  • 27
  • 52

2 Answers2

12

I found this 6 year old blog post mentioning the same behavior.

It looks like ROW_NUMBER() always includes a segment operator, whether PARTITION BY is used or not. If I had to guess I would say this is because it makes creating a query plan easier on the engine.

If the segment is needed in most cases, and in the cases where it's not needed it's essentially a zero-cost non-operation, it's a lot simpler to just always include it in the plan when a windowing function is used.

Glorfindel
  • 2,205
  • 5
  • 19
  • 26
JNK
  • 18,064
  • 6
  • 63
  • 98
11

According to the showplan.xsd for the execution plan, GroupBy appears without minOccurs or maxOccurs attributes which therefore default to [1..1] making the element compulsory, not necessarily content. The child element ColumnReference of type (ColumnReferenceType) has minOccurs 0 and maxOccurs unbounded [0..*], making it optional, hence the allowed empty element. If you manually attempt to remove the GroupBy and force the plan you get the expected error:

Msg 6965, Level 16, State 1, Line 29
XML Validation: Invalid content. Expected element(s): '{http://schemas.microsoft.com/sqlserver/2004/07/showplan}GroupBy','{http://schemas.microsoft.com/sqlserver/2004/07/showplan}DefinedValues','{http://schemas.microsoft.com/sqlserver/2004/07/showplan}InternalInfo'. Found: element '{http://schemas.microsoft.com/sqlserver/2004/07/showplan}SegmentColumn' instead. Location: /*:ShowPlanXML[1]/*:BatchSequence[1]/*:Batch[1]/*:Statements[1]/*:StmtSimple[1]/*:QueryPlan[1]/*:RelOp[1]/*:SequenceProject[1]/*:RelOp[1]/*:Segment[1]/*:SegmentColumn[1].

Interestingly I found you can manually remove the Segment operator to get a valid plan for forcing which looks like this:

enter image description here

However when you run with that plan (using OPTION ( USE PLAN ... ) ) the Segment Operator magically reappears. Just goes to show the optimizer only takes the XML plans as a rough guide.

My test rig:

USE tempdb
GO
SET NOCOUNT ON
GO
IF OBJECT_ID('dbo.someTable') IS NOT NULL DROP TABLE dbo.someTable
GO
CREATE TABLE dbo.someTable (
    someGroup   int NOT NULL,
    someOrder   int NOT NULL,
    someValue   numeric(8, 2) NOT NULL,
    PRIMARY KEY CLUSTERED (someGroup, someOrder)
);
GO

-- Generate some dummy data ;WITH cte AS ( SELECT TOP 1000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn FROM master.sys.columns c1 CROSS JOIN master.sys.columns c2 CROSS JOIN master.sys.columns c3 ) INSERT INTO dbo.someTable ( someGroup, someOrder, someValue ) SELECT rn % 333, rn % 444, rn % 55 FROM cte GO

-- Try and force the plan SELECT ROW_NUMBER() OVER (ORDER BY someGroup, someOrder) FROM dbo.someTable OPTION ( USE PLAN N'<?xml version="1.0" encoding="utf-16"?> <ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.2" Build="12.0.2000.8" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan"> <BatchSequence> <Batch> <Statements> <StmtSimple StatementCompId="1" StatementEstRows="1000" StatementId="1" StatementOptmLevel="TRIVIAL" CardinalityEstimationModelVersion="120" StatementSubTreeCost="0.00596348" StatementText="SELECT ROW_NUMBER() OVER (ORDER BY someGroup, someOrder)&#xD;&#xA;FROM dbo.someTable" StatementType="SELECT" QueryHash="0x193176312402B8E7" QueryPlanHash="0x77F1D72C455025A4" RetrievedFromCache="true"> <StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" /> <QueryPlan DegreeOfParallelism="1" CachedPlanSize="16" CompileTime="0" CompileCPU="0" CompileMemory="88"> <OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="131072" EstimatedPagesCached="65536" EstimatedAvailableDegreeOfParallelism="4" /> <RelOp AvgRowSize="15" EstimateCPU="8E-05" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1000" LogicalOp="Compute Scalar" NodeId="0" Parallel="false" PhysicalOp="Sequence Project" EstimatedTotalSubtreeCost="0.00596348"> <OutputList> <ColumnReference Column="Expr1002" /> </OutputList> <SequenceProject> <DefinedValues> <DefinedValue> <ColumnReference Column="Expr1002" /> <ScalarOperator ScalarString="row_number"> <Sequence FunctionName="row_number" /> </ScalarOperator> </DefinedValue> </DefinedValues>

            &lt;!-- Segment operator completely removed from plan --&gt;
            &lt;!--&lt;RelOp AvgRowSize=&quot;15&quot; EstimateCPU=&quot;2E-05&quot; EstimateIO=&quot;0&quot; EstimateRebinds=&quot;0&quot; EstimateRewinds=&quot;0&quot; EstimatedExecutionMode=&quot;Row&quot; EstimateRows=&quot;1000&quot; LogicalOp=&quot;Segment&quot; NodeId=&quot;1&quot; Parallel=&quot;false&quot; PhysicalOp=&quot;Segment&quot; EstimatedTotalSubtreeCost=&quot;0.00588348&quot;&gt;
              &lt;OutputList&gt;
                &lt;ColumnReference Database=&quot;[tempdb]&quot; Schema=&quot;[dbo]&quot; Table=&quot;[someTable]&quot; Column=&quot;someGroup&quot; /&gt;
                &lt;ColumnReference Database=&quot;[tempdb]&quot; Schema=&quot;[dbo]&quot; Table=&quot;[someTable]&quot; Column=&quot;someOrder&quot; /&gt;
                &lt;ColumnReference Column=&quot;Segment1003&quot; /&gt;
              &lt;/OutputList&gt;
              &lt;Segment&gt;
                &lt;GroupBy /&gt;
                &lt;SegmentColumn&gt;
                  &lt;ColumnReference Column=&quot;Segment1003&quot; /&gt;
                &lt;/SegmentColumn&gt;--&gt;


                &lt;RelOp AvgRowSize=&quot;15&quot; EstimateCPU=&quot;0.001257&quot; EstimateIO=&quot;0.00460648&quot; EstimateRebinds=&quot;0&quot; EstimateRewinds=&quot;0&quot; EstimatedExecutionMode=&quot;Row&quot; EstimateRows=&quot;1000&quot; LogicalOp=&quot;Clustered Index Scan&quot; NodeId=&quot;0&quot; Parallel=&quot;false&quot; PhysicalOp=&quot;Clustered Index Scan&quot; EstimatedTotalSubtreeCost=&quot;0.00586348&quot; TableCardinality=&quot;1000&quot;&gt;
                  &lt;OutputList&gt;
                    &lt;ColumnReference Database=&quot;[tempdb]&quot; Schema=&quot;[dbo]&quot; Table=&quot;[someTable]&quot; Column=&quot;someGroup&quot; /&gt;
                    &lt;ColumnReference Database=&quot;[tempdb]&quot; Schema=&quot;[dbo]&quot; Table=&quot;[someTable]&quot; Column=&quot;someOrder&quot; /&gt;
                  &lt;/OutputList&gt;
                  &lt;IndexScan Ordered=&quot;true&quot; ScanDirection=&quot;FORWARD&quot; ForcedIndex=&quot;false&quot; ForceSeek=&quot;false&quot; ForceScan=&quot;false&quot; NoExpandHint=&quot;false&quot; Storage=&quot;RowStore&quot;&gt;
                    &lt;DefinedValues&gt;
                      &lt;DefinedValue&gt;
                        &lt;ColumnReference Database=&quot;[tempdb]&quot; Schema=&quot;[dbo]&quot; Table=&quot;[someTable]&quot; Column=&quot;someGroup&quot; /&gt;
                      &lt;/DefinedValue&gt;
                      &lt;DefinedValue&gt;
                        &lt;ColumnReference Database=&quot;[tempdb]&quot; Schema=&quot;[dbo]&quot; Table=&quot;[someTable]&quot; Column=&quot;someOrder&quot; /&gt;
                      &lt;/DefinedValue&gt;
                    &lt;/DefinedValues&gt;
                    &lt;Object Database=&quot;[tempdb]&quot; Schema=&quot;[dbo]&quot; Table=&quot;[someTable]&quot; Index=&quot;[PK__someTabl__7CD03C8950FF62C1]&quot; IndexKind=&quot;Clustered&quot; Storage=&quot;RowStore&quot; /&gt;
                  &lt;/IndexScan&gt;
                &lt;/RelOp&gt;

            &lt;!--&lt;/Segment&gt;
            &lt;/RelOp&gt;--&gt;
          &lt;/SequenceProject&gt;
        &lt;/RelOp&gt;

      &lt;/QueryPlan&gt;
    &lt;/StmtSimple&gt;
  &lt;/Statements&gt;
&lt;/Batch&gt;

</BatchSequence> </ShowPlanXML>' )

Chop out the XML plan from the test rig and save it as a .sqlplan to view the plan minus the Segment.

PS I wouldn't spend too much time chopping around SQL plans manually as if you know me you would know I regard it as time-eating busy work and something I would never do. Oh hang on!? : )

Glorfindel
  • 2,205
  • 5
  • 19
  • 26
wBob
  • 10,420
  • 2
  • 25
  • 44