Context
I have a query that is something like where I want to query all of the days that each product was on sale at at least 1 store. This query runs as part of a larger query and runs whenever a user loads a specific page.
TL;DR Schema and query
(data is for example purposes only)
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=0ad70a90a446e811e08aa1bd779550d1
Schema
I have a table sales_periods that tracks the date periods that products were on sale at each store.
| product_id | since | till | store_id |
|---|---|---|---|
| 42 | Aug 12, 2016 | Jan 27, 2018 | 19 |
| 42 | Jan 1, 2020 | Jan 27, 2021 | 19 |
| 43 | Feb 14, 2019 | Jan 27, 2022 | 20 |
Query
WITH dates(day) AS(
SELECT day FROM calendar WHERE day BETWEEN '2021-04-14'::date AND '2022-04-13'::date
)
SELECT
sales_periods.product_id,
COUNT(DISTINCT dates.day)
FROM
sales_periods
INNER JOIN dates ON dates.day >= sales_periods.since
AND dates.day <= sales_periods.till
GROUP BY
sales_periods.product_id
The problem
There could be 100s of stores and 10,000s of products. And I want to look at an entire year! From what I think I see from EXPLAIN, Postgres will do the JOIN (creating a lot of data in memory = 100 stores x 10,000 products x 365 days) and then it will reduce the data with the group (max 10,000 products x 365 days).
Goal
I would like the query to run <100ms. I think <<50ms is possible if the query can be made properly. I can see what it's doing is wrong but I don't know how to do better!
What I've tried
- Combining the sales_periods <- Takes too much time compared with the amount of time saved
- Use DISTINCT ON instead of GROUP BY <- slower
- Switching to daterange (https://dbfiddle.uk/?rdbms=postgres_14&fiddle=55fa23c2bbb7d44374ef6ab2acc570a2) <- slower
- I played around a lot with the indexes, including trying out GIST indexes, sorted index, etc. <- no change, sometimes worse
What I think I want
If there's a way to tell Postgres to stop as soon as it knows that each product is available for purchase from at least 1 store on each day.
Environment
Postgres 13
Constraints
It would be very hard to change the column structure of the sales_periods table but it could be done. Adding and removing columns, adding index and views are all easy.