10

Consider the following simple XML:

<xml>
  <customer name="Max">
    <email address="me@you.com" />
  </customer>
  <customer name="Erik">
    <email address="erik@your-mom.com" />
  </customer>
  <customer name="Brent">
    <email address="brentcom" />
  </customer>
</xml>

I want to get a list of <Customer> sequences where the address attribute of the <email> item does not contain an @.

So, I want output that looks like:

<customer name="Brent">
  <email address="brentcom" />
</customer>

mcve:

DECLARE @x XML = '<xml>
<customer name="Max"><email address="me@you.com" /></customer>
<customer name="Erik"><email address="erik@your-mom.com" /></customer>
<customer name="Brent"><email address="brentcom" /></customer>
</xml>';

This query:

SELECT WithValidEmail = @x.query('/xml/customer/email[contains(@address, "@")]')
    , WithInvalidEmail = @x.query('/xml/customer/email[contains(@address, "@")] = False');

Returns:

╔═══════════════════════════════════════╦══════════════════╗
║            WithValidEmail             ║ WithInvalidEmail ║
╠═══════════════════════════════════════╬══════════════════╣
║ <email address="me@you.com" />        ║                  ║
║ <email address="erik@your-mom.com" /> ║ false            ║
╚═══════════════════════════════════════╩══════════════════╝

This query:

SELECT WithInValidEmail = @x.query('/xml/customer/email')
WHERE @x.exist('/xml/customer/email[contains(@address, "@")]') = 0;

Returns:

╔══════════════════╗
║ WithInValidEmail ║
╚══════════════════╝
    (no results)

The WHERE clause in the query above is eliminating the entire set of XML because at least a single sequence exists where the email address contains an "@" sign.

Hannah Vernon
  • 70,928
  • 22
  • 177
  • 323

2 Answers2

11

An easy way to do this is to use the nodes method to get right to the address attribute and check for your @ sign.

The problem with the way you're looking now is that it's only checking that any email address has an @ in it. Parsing the XML nodes out lets you check individual emails for it.

DECLARE @x XML
    = '<xml>
<customer name="Max"><email address="me@you.com" /></customer>
<customer name="Erik"><email address="erik@your-mom.com" /></customer>
<customer name="Brent"><email address="brentcom" /></customer>
</xml>';


SELECT x.c.value('@address', 'VARCHAR(100)') AS [email]
FROM   @x.nodes('/xml/customer/email') AS x(c)
WHERE  x.c.exist('@address[contains(., "@")]') = 0;

If you need to query an actual table with an XML column like this, you'd just CROSS APPLY the nodes method like thusly:

SELECT x.c.value('@address', 'VARCHAR(100)') AS [email]
FROM @x_table AS xt
CROSS APPLY xt.x.nodes('/xml/customer/email') AS x(c)
WHERE  x.c.exist('@address[contains(., "@")]') = 0;

If you want to bring all the <customer>...</customer> XML for that "row" back, you can walk the axis back. Just be aware that walking back can make performance a bit woogy for large XML blocks.

SELECT x.c.query('..')
FROM @x_table AS xt
CROSS APPLY xt.x.nodes('/xml/customer/email') AS x(c)
WHERE  x.c.exist('@address[contains(., "@")]') = 0;

Another way of doing it is:

SELECT @x.query('/xml/customer[email/@address[not(contains(., "@"))]]') answer

Moving the square brackets to wrap around the email node effectively make that the WHERE clause applied to the customer node. Translating this XQuery to English looks like:

Get me all xml/customer nodes with an email node which has an address attribute which does not contain the @ symbol

wBob
  • 10,420
  • 2
  • 25
  • 44
Erik Reasonable Rates Darling
  • 45,549
  • 14
  • 145
  • 532
4

You were oh so close. You were definitely on the right track with using the .query() function and using the contains XQuery function. What you got wrong was:

  1. Putting the = False outside of the [...] (meaning, it was not part of the contains() expression)
  2. Using the word False instead of the function false()
  3. Not specifying the parent node by adding /.. to the end of the path (so that the result will include the <customer> element and not just the <email> element)

Correcting those three things results in the following XQuery expression that gets you what you are wanting:

'/xml/customer/email[contains(@address, "@") = false()]/..'

Putting that into your original example from the question gives you:

DECLARE @x XML = '<xml>
<customer name="Max"><email address="me@you.com" /></customer>
<customer name="Erik"><email address="erik@your-mom.com" /></customer>
<customer name="Brent"><email address="brentcom" /></customer>
</xml>';

SELECT
@x.query('/xml/customer/email[contains(@address, "@")]/..') AS [WithValidEmail],
@x.query('/xml/customer/email[contains(@address, "@")=false()]/..') AS [WithInvalidEmail;

That query returns the following result set of a single row with two XML fields:

WithValidEmail                            |     WithInvalidEmail
<customer name="Max">                     |     <customer name="Brent">
  <email address="me@you.com" />          |       <email address="brentcom" />
</customer>                               |     </customer>
<customer name="Erik">                    |
  <email address="erik@your-mom.com" />   |
</customer>                               |

This is probably more efficient than breaking the document out with the .nodes() function since it can parse the XML in a single shot and not need to start and stop the parser per each node.

The other benefit of keeping it within .query() is that you get a single XML document returned. So, if you receive an XML document / value containing multiple nodes worth of stuff, you can maintain the scalar value approach of it being a single entity without having to reconstruct the resulting nodes back into a document again. This also lets you use it in a subquery / CTE without changing the number of expected rows being returned.

Paul White
  • 94,921
  • 30
  • 437
  • 687
Solomon Rutzky
  • 70,048
  • 8
  • 160
  • 306