8

I need to calculate sequence numbers for consecutive values. That sounds like a job for ROW_NUMBER()!

DECLARE @Data TABLE
    (
    Sequence    TINYINT NOT NULL PRIMARY KEY,
    Subset  CHAR(1) NOT NULL
    )
INSERT INTO @Data (Sequence, Subset) VALUES
    (1, 'A'),
    (2, 'A'),
    (3, 'A'),
    (4, 'B'),  -- New subset
    (5, 'B'),
    (6, 'A')   -- New subset

SELECT
    Sequence, Subset,
    ROW_NUMBER() OVER (PARTITION BY Subset ORDER BY Sequence) AS SeqWithinGroup
FROM
    @Data

I expected the PARTITION clause to make the count reset at each change in Subset, but instead SQL Server collects all values for a given Subset value and numbers them. Here's what I expected, and what I got:

Sequence Subset Expected Actual
-------- ------ -------- -----
1        A      1        1
2        A      2        2
3        A      3        3
4        B      1        1
5        B      2        2
6        A      *1*      *4*

When SQL reaches line #6, it resumes numbering subset "A", whereas I see it as the first line of a new subset that just happens to also be named "A".

Is there a way to make ROW_NUMBER() partition strictly, rather than the default behavior?

There are a number of questions here and elsewhere about counting consecutive values with SQL. However, I have not yet seen one that addresses repeated values in the PARTITION BY field(s). Most deal only with increasing values, often dates.

ypercubeᵀᴹ
  • 99,450
  • 13
  • 217
  • 306
Jon of All Trades
  • 5,987
  • 7
  • 48
  • 63

3 Answers3

4

What we do here is,

  1. Calculate resets (column rst in code)
  2. sum() to get groups (column grp in code)
  3. Get the row_number() from the grouping.

Code,

SELECT row_number() OVER (PARTITION BY grp ORDER BY sequence) AS number,
  sequence,
  subset
FROM (
  SELECT count(rst) OVER (ORDER BY sequence) AS grp, *
  FROM (
    SELECT CASE WHEN subset != lag(subset) OVER (ORDER BY sequence) THEN 1 END AS rst, *
    FROM foo
  ) AS t1
) AS t2;

You can see the DBFiddle with results here

Evan Carroll
  • 65,432
  • 50
  • 254
  • 507
3

I was able to address this using LAG():

SELECT
    Sequence, Subset,
    CASE WHEN Sequence = 1 OR Subset <> LAG(Subset, 1) OVER (ORDER BY Sequence)
        THEN 'New subset'
        ELSE 'Continuation'
        END
FROM
    @Data

This returns "New subset" for records #1, #4, and #6. Apparently LAG() partitions slightly differently from ROW_NUMBER().

Obviously this doesn't provide row numbers, but it helped me reach the goal of identifying consecutive sequences of numbers, when the subset identifier can repeat.

Jon of All Trades
  • 5,987
  • 7
  • 48
  • 63
1

I used my answer from this post and modified it for your problem.

--Demo setup

;DECLARE @Data TABLE
    (
    Sequence    TINYINT NOT NULL PRIMARY KEY,
    Subset  CHAR(1) NOT NULL
    )
;INSERT INTO @Data (Sequence, Subset) VALUES
    (1, 'A'),
    (2, 'A'),
    (3, 'A'),
    (4, 'B'),  -- New subset
    (5, 'B'),
    (6, 'A')   -- New subset

--The solution
--Create a grouping for consecutive values called grp
;WITH _cte
AS (
    SELECT *
        ,DATEADD(DAY, - ROW_NUMBER() OVER (
                PARTITION BY Subset ORDER BY [Sequence]
                ), [Sequence]) AS grp
    FROM @Data
    )
   ,AddedRn    --add a row number for each entry in the group
AS (
    SELECT *
        ,ROW_NUMBER() OVER (
            PARTITION BY grp ORDER BY sequence
            ) AS rn
    FROM _cte
    )
--Select result ordering by Sequence
SELECT Sequence, Subset, grp, rn as SeqWithinGroup    
FROM AddedRn
order by Sequence

| Sequence | Subset | grp                     | SeqWithinGroup |
|----------|--------|-------------------------|----------------|
| 1        | A      | 1900-01-01 00:00:00.000 | 1              |
| 2        | A      | 1900-01-01 00:00:00.000 | 2              |
| 3        | A      | 1900-01-01 00:00:00.000 | 3              |
| 4        | B      | 1900-01-04 00:00:00.000 | 1              |
| 5        | B      | 1900-01-04 00:00:00.000 | 2              |
| 6        | A      | 1900-01-03 00:00:00.000 | 1              |
ypercubeᵀᴹ
  • 99,450
  • 13
  • 217
  • 306
Scott Hodgin - Retired
  • 24,062
  • 2
  • 29
  • 52