Aggregate overlapping date intervals

Question

I have a table with some accounts, and their subscription's start and end dates. However, these subscriptions overlap sometimes and I need the start and end dates of every connected subscription period. Like in the example image.

I tried to merge the subscription periods with a date reference table and marked the dates if there was a subscription. However, the code was getting quite complex. I guess there must be a simpler solution.

IF OBJECT_ID('tempdb..#Subscriptions') IS NOT NULL DROP TABLE #Subscriptions
CREATE TABLE #Subscriptions (
    account_id varchar(1)
    ,start_date date
    ,end_date date
)
INSERT INTO #Subscriptions (account_id, start_date, end_date) values
('A','2019-06-20','2019-06-29'),
('A','2019-06-25','2019-07-25'),
('A','2019-07-20','2019-08-26'),
('A','2019-12-25','2020-01-25'),
('A','2021-04-27','2021-07-27'),
('A','2021-06-25','2021-07-14'),
('A','2021-07-10','2021-08-14'),
('A','2021-09-10','2021-11-12'),
('B','2019-07-13','2020-07-14'),
('B','2019-06-25','2019-08-26')

score 8 · Accepted Answer · answered Aug 25 '22 at 17:39

Directly:

WITH
cte1 AS (
    SELECT account_id, start_date the_date, 1 weight
    FROM Subscriptions
    UNION ALL
    SELECT account_id, end_date, -1
    FROM Subscriptions
),
cte2 AS (
    SELECT account_id, 
           the_date, 
           SUM(weight) OVER (PARTITION BY account_id 
                             ORDER BY the_date, weight DESC) weight
    FROM cte1
),
cte3 AS (
    SELECT account_id, 
           the_date,
           SUM(CASE WHEN weight = 0 
                    THEN 1
                    ELSE 0
                    END) OVER (PARTITION BY account_id 
                               ORDER BY the_date DESC) group_no
    FROM cte2
)
SELECT account_id, 
       MIN(the_date) start_date,
       MAX(the_date) end_date
FROM cte3
GROUP BY account_id, group_no
ORDER BY 1,2

https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=82f72d57e6c527c3ce59d166c675e48c

score -2 · Answer 2 · answered Jan 03 '24 at 12:42

Just assign value 1 to each row where the start is greater than the previous end (ie. there is a gap in dates) then you can sum the values in order to give each range a group id and get min and max of each range. Simple!

with date_ranges as
(
   select 'one' as partition_columns, cast('2023-01-01' as date) as date_start, cast('2023-01-30' as date) as date_end
   union all
   select 'one' as partition_columns, cast('2023-01-15' as date) as date_start, cast('2023-02-20' as date) as date_end
   union all
   select 'two' as partition_columns, cast('2023-02-15' as date) as date_start, cast('2023-02-28' as date) as date_end
   union all
   select 'two' as partition_columns, cast('2023-03-01' as date) as date_start, cast('2023-03-10' as date) as date_end
   union all
   select 'two' as partition_columns, cast('2023-03-10' as date) as date_start, cast('2023-03-25' as date) as date_end
)
select partition_columns, min(date_start) as date_start, max(date_end) as date_end
from (
    select partition_columns, date_start, date_end, sum(start_flag) over (partition by partition_columns order by date_start) range_group
    from (
        select partition_columns, date_start, date_end,
               case when lag(date_end,1) over (partition by partition_columns order by date_start) >= date_start then 0 else 1 end as start_flag
        from date_ranges
    ) c
) g
group by partition_columns, range_group
order by 1, 2, 3

Aggregate overlapping date intervals

2 Answers2