1

Problem

When working with SCD type 2 data it's easy to see the state of a table "as of" a given point in time by using your date columns, eg: valid_from and valid_to. For example:

select * from table
where '2023-11-01' between valid_from and valid_to

You can then group by or use window functions to aggregate.

But what if I want to do that repeatedly, for each date in a range (eg: daily)? I do not need to aggregate across these dates, just within each one.

Example

Let's say I have a table that tracks the quantity of people for each reservation_id. Both the quantity of people and reservation_status can change over time, as tracked by the table. Each reservation is attached to an event_id. event_date is included to help limit scope if needed (see assumptions below).

reservation_id reservation_status quantity event_id event_date valid_from valid_to
1 active 4 100 2023-05-25 2023-01-01 2023-01-02
2 active 2 200 2024-01-07 2023-01-01 2023-01-03
3 active 7 100 2023-05-25 2023-01-02 9999-12-31
4 active 1 200 2024-01-07 2023-01-02 9999-12-31
1 active 5 100 2023-05-25 2023-01-03 9999-12-31
5 active 8 100 2023-05-25 2023-01-03 9999-12-31
2 cancelled 2 200 2024-01-07 2023-01-04 9999-12-31
6 active 3 100 2023-05-25 2023-01-06 9999-12-31

PostgreSQL since BigQuery is more difficult to test – db-fiddle / SQL:

CREATE TABLE Reservations (
  "reservation_id" INTEGER,
  "reservation_status" VARCHAR(9),
  "quantity" INTEGER,
  "event_id" INTEGER,
  "event_date" DATE,
  "valid_from" DATE,
  "valid_to" DATE
);

INSERT INTO Reservations ("reservation_id", "reservation_status", "quantity", "event_id", "event_date", "valid_from", "valid_to") VALUES ('1', 'active', '4', '100', '2023-05-25', '2023-01-01', '2023-01-02'), ('2', 'active', '2', '200', '2024-01-07', '2023-01-01', '2023-01-03'), ('3', 'active', '7', '100', '2023-05-25', '2023-01-02', '9999-12-31'), ('4', 'active', '1', '200', '2024-01-07', '2023-01-02', '9999-12-31'), ('1', 'active', '5', '100', '2023-05-25', '2023-01-03', '9999-12-31'), ('5', 'active', '8', '100', '2023-05-25', '2023-01-03', '9999-12-31'), ('2', 'cancelled', '2', '200', '2024-01-07', '2023-01-04', '9999-12-31'), ('6', 'active', '3', '100', '2023-05-25', '2023-01-06', '9999-12-31');

While this is ultimately for BigQuery, answer will be accepted in any dialect as long as it's somewhat generic.

Assumptions
  • "As of" dates can be a list or range based off valid_from min/max
  • valid_to of 9999-12-31 is the most recent data
  • All reservations for a given event will be between event_date - INTERVAL '2 years' and event_date. This doesn't change anything for this example, but maybe is useful for scaling(?)

Desired output

I'd like to know the sum of quantity grouped by event_id and reservation_status as of each interval (day).

as_of_date event_id reservation_status sum_quantity
2023-01-01 100 active 4
2023-01-01 200 active 2
2023-01-02 100 active 11
2023-01-02 200 active 2
2023-01-03 100 active 20
2023-01-03 200 active 3
2023-01-04 100 active 20
2023-01-04 200 active 1
2023-01-04 200 cancelled 2
2023-01-06 100 active 23

Rough estimate of row values. Will be different if using complete range of dates.

I essentially want to do the following:

/* Invalid SQL, just for conceptual purposes */

-- Given a list of dates, for each "date": select event_id, reservation_status, sum(quantity) from table where {{date}} between valid_from and valid_to group by event_id, reservation_status

I believe this can be done using procedural language, such as a for loop, but I feel like I am overthinking that, and just having trouble combining simpler concepts.

camtech
  • 143
  • 6

1 Answers1

0

Turns out I was very much overthinking it. Thank you to this Stack Overflow post for reminding me of the basics, particularly that you can just join on and with comparisons, and SergeyA's comment: just make a list of dates and join to it.

Here is one complete solution (db-fiddle):

with

-- Generate range of days based on valid_from -- Or use a calendar table/date dimension all_dates as ( select day::date from generate_series( (select min(valid_from) from Reservations), (select max(valid_from) from Reservations), '1 day' ) day ),

-- Quantities as of each day quantity_as_of as ( select day as as_of_date, Reservations.event_id, Reservations.reservation_status, sum(Reservations.quantity) as sum_quantity from all_dates as ad join Reservations on valid_from <= ad.day and ad.day < valid_to group by ad.day, Reservations.event_id, Reservations.reservation_status order by as_of_date, event_id )

select * from quantity_as_of

as_of_date event_id reservation_status quantity
2023-01-01 100 active 4
2023-01-01 200 active 2
2023-01-02 100 active 7
2023-01-02 200 active 3
2023-01-03 100 active 20
2023-01-03 200 active 1
2023-01-04 100 active 20
2023-01-04 200 cancelled 2
2023-01-04 200 active 1
2023-01-05 100 active 20
2023-01-05 200 active 1
2023-01-05 200 cancelled 2
2023-01-06 100 active 23
2023-01-06 200 cancelled 2
2023-01-06 200 active 1
camtech
  • 143
  • 6