Finding repeating sets of data by foreign key

Question

I have a table, ResultItem:

CREATE TABLE ResultItem (
    IDResultItem INT PRIMARY KEY IDENTITY(1,1),
    ItemValue VARCHAR(20) NOT NULL,
    ResultID INT FOREIGN KEY REFERENCES Result(IDResult)
)

I need to find all the most recent records in Result that have the same group of values in ResultItem in a previous Result, and concatenate the ItemValues in another column; for example:

INSERT INTO ResultItem 
    (ItemValue, ResultID)
VALUES
    ('a', 1),
    ('b', 1),
    ('a', 2),
    ('b', 2),
    ('c', 3),
    ('d', 3),
    ('c', 4),
    ('d', 4),
    ('a', 5),
    ('d', 5),
    ('a', 6),
    ('a', 7) ;

the query should return:

| 2 | 'ab' |
| 4 | 'cd' |
| 7 | 'a'  |

as 1 and 2 are the same, 3 and 4 are the same, and 6 and 7 are the same. 5 has not yet been repeated so I do not want it. How would I go about doing this?

score 2 · Accepted Answer · edited Apr 13 '17 at 12:42

This is a hard one. I'll add a solution that will only work in vNext version, that has the new STRING_AGG() function for string concatenation:

WITH g AS
  ( SELECT 
        ResultID, 
        Items = STRING_AGG (ItemValue, ',') 
                    WITHIN GROUP (ORDER BY ItemValue)
    FROM dbo.ResultItem 
    GROUP BY ResultID
  )
SELECT 
    ResultId = MAX(ResultId),
    Items
FROM g
GROUP BY Items
HAVING COUNT(*) > 1 ;

The separator (',') should be a character that doesn't appear anywhere in ItemValue column.

For current and older versions, I've adapted the FOR XML PATH method, found in many articles (see Aaron Bertrand's blog post Grouped Concatenation in SQL Server which also has many other methods to do string concatenation):

WITH g AS
  ( SELECT 
        ResultID = r.IDResult, 
        Items = STUFF( 
                   ( SELECT N',' + ri.ItemValue 
                     FROM ResultItem AS ri
                     WHERE ri.ResultID = r.IDResult 
                     ORDER BY ri.ItemValue
                     FOR XML PATH(N'')
                   ), 1, 1, N'')
    FROM Result AS r
  ) 
SELECT 
    ResultId = MAX(ResultId),
    Items
FROM g
GROUP BY Items
HAVING COUNT(*) > 1 ;

Tested in rextester.com

See also the answer by Martin Smith from a similar problem: How to find parent rows that have identical sets of child rows?, where he adds an temporary table (appropriately indexed) to improve efficiency of this type of query.

Finding repeating sets of data by foreign key

1 Answers1