12

(It is more a question of documentation rather than behavior. It has been migrated from Stack overflow as was suggested there.)

While researching the answer to another question that required extracting XML nodes in the original element order, I came across several answers (here, here, and here) that used expressions of the form ROW_NUMBER() OVER (ORDER BY xml.node), and asserted or implied that the assigned row number values would be assigned in XML document order.

However, I cannot find anywhere that defines the behavior of ORDER BY xml.node. Although it seems to be allowed in the OVER() clause, the documentation does not specifically mention XML nodes.

For example, given:

DECLARE @xml XML = '<root>
<node>One</node>
<node>Two</node>
<node>Three</node>
<node>Four</node>
</root>'

SELECT ROW_NUMBER() OVER(ORDER BY xml.node) AS rn, xml.node.value('./text()[1]', 'varchar(255)') AS value FROM @xml.nodes('*/node') xml(node) ORDER BY ROW_NUMBER() OVER(ORDER BY xml.node)

The following results are returned:

rn | value
----------
1  | One
2  | Two
3  | Three
4  | Four

Question: Are these results guaranteed anywhere in the documentation? Is this accepted as guaranteed, but undocumented behavior? Or is this another case like ORDER BY (SELECT NULL) that appears to work anecdotally for small seemingly pre-ordered source data sets, but may eventually fail when scaled up? The reason I am asking is that I would prefer not to recommend use of a technique whose behavior and reliability is not supported by the documentation.

Interestingly, although an XML node is allowed in a windowed ORDER BY, it is not allowed in an ordinary SELECT ... ORDER BY. When used in an ordinary select order-by clause, ORDER BY xml.node yields the following error:

Msg 493 Level 16 State 1 Line 7

The column 'node' that was returned from the nodes() method cannot be used directly. It can only be used with one of the four XML data type methods, exist(), nodes(), query(), and value(), or in IS NULL and IS NOT NULL checks.

The above error message does not list the window-function OVER(ORDER BY ...) as an allowed use.

See this db<>fiddle.

T N
  • 231
  • 1
  • 7

1 Answers1

12

No, it's not documented and therefore not guaranteed.

In practice, it probably is guaranteed as I described in a Stack Overflow answer (reproduced below) but that doesn't meet your stated desire for official documentation.

You could ask them to document it by opening a documentation issue.

I would personally use it because the evidence is acceptable to me, but I'm not going to indemnify you against your losses if you do the same.


My Stack Overflow answer

There's no way to see it explicitly in an execution plan, but the id column returned by the nodes() method is a varbinary(900) OrdPath, which does encapsulate the original xml document order.

The solution offered by Mikael Eriksson on the related question Does the nodes() method keep the document order? relies on the OrdPath to provide an ORDER BY clause necessary to determine how identity values are assigned for the INSERT.

A slightly more compact usage follows:

CREATE TABLE #T 
(
    ID integer IDENTITY, 
    Fruit nvarchar(10) NOT NULL
);

DECLARE @xml xml = N' <Fruits> <Apple /> <Banana /> <Orange /> <Pear /> </Fruits> ';

INSERT #T (Fruit) SELECT N.n.value('local-name(.)', 'nvarchar(10)') FROM @xml.nodes('/Fruits/*') AS N (n) ORDER BY ROW_NUMBER() OVER (ORDER BY N.n);

SELECT T.ID, T.Fruit FROM #T AS T ORDER BY T.ID;

db<>fiddle

Using the OrdPath this way is presently undocumented, but the technique is sound in principle:

  1. The OrdPath reflects document order.
  2. The ROW_NUMBER computes sequence values ordered by OrdPath*.
  3. The ORDER BY clause uses the row number sequence.
  4. Identity values are assigned to rows as per the ORDER BY.

To be clear, this holds even if parallelism is employed. As Mikael says, the dubious aspect is using id in the ROW_NUMBER since id is not documented to be the OrdPath.


* The ordering is not shown in plans, but optimizer output using TF 8607 contains:

ScaOp_SeqFunc row_number order[CALC:QCOL: XML Reader with XPath filter.id ASC]
Paul White
  • 94,921
  • 30
  • 437
  • 687