Joining with generate_series for missing dates with derived date column

Question

There are a few similar questions to this (e.g.https://dba.stackexchange.com/questions/72419/filling-in-missing-dates-in-record-set-from-generate-series) but the solution does not appear to work in my case... Essentially I am trying to generate zero entries for dates not present in a series but I suspect the issue is that I am having to extract the date value from a timestamp? I've used SQL for years but very new to postgres - impressed so far, though.. Have tried both a left and right join here but no joy...

Here is a little test case (are sql fiddles still encouraged?):

-- temp test table - works as expected
WITH incomplete_data(payment_date, payment_id) AS (
   VALUES 
     ('2024-09-06 11:26:57.509429+01'::timestamp with time zone, 'uuid01')
    ,('2024-09-06 12:26:57.509429+01', 'uuid02')
    ,('2024-09-07 07:26:57.509429+01', 'uuid03')
    ,('2024-09-08 10:26:57.509429+01', 'uuid05')
    ,('2024-09-08 12:26:57.509429+01', 'uuid08')
    ,('2024-09-08 14:26:57.509429+01', 'uuid11')
    ,('2024-09-10 09:26:57.509429+01', 'uuid23')
   )
select * from incomplete_data;
-- generated dates - work as expected
select * FROM  (
   SELECT generate_series(timestamp '2024-01-01'
                        , timestamp '2024-01-01' + interval '1 year - 1 day'
                        , interval  '1 day')::date
   ) d(day)
;
-- join - failing to do what I was hoping..
WITH incomplete_data(payment_date, payment_id) AS (
   VALUES 
     ('2024-09-06 11:26:57.509429+01'::timestamp with time zone, 'uuid01')
    ,('2024-09-06 12:26:57.509429+01', 'uuid02')
    ,('2024-09-07 07:26:57.509429+01', 'uuid03')
    ,('2024-09-08 10:26:57.509429+01', 'uuid05')
    ,('2024-09-08 12:26:57.509429+01', 'uuid08')
    ,('2024-09-08 14:26:57.509429+01', 'uuid11')
    ,('2024-09-10 09:26:57.509429+01', 'uuid23')
   )
select count(payment_id), date_trunc('day',payment_date)::date as time
FROM  (
   SELECT generate_series(timestamp '2024-01-01'
                        , timestamp '2024-01-01' + interval '1 year - 1 day'
                        , interval  '1 day')::date
   ) d(day)
right  JOIN incomplete_data p ON date_trunc('day',payment_date) = d.day
where payment_date BETWEEN '2024-09-01T12:55:36.824Z' AND '2024-09-30T13:55:36.824Z'
GROUP  BY date_trunc('day',payment_date)
ORDER  BY date_trunc('day',payment_date);
count |    time
-------+------------
     2 | 2024-09-06
     1 | 2024-09-07
     3 | 2024-09-08
     1 | 2024-09-10
(4 rows)

I was hoping to get a row for every day in the month with zeroes for unpopulated days. The background is that this is for populating a grafana query.

Can anyone suggest what I am doing wrong or am I failing to grasp a bigger issue here? My version is: PostgreSQL 15.9 (Debian 15.9-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit

UPDATE
jjanes answer below helped clarify the sequence of joining and filtering for me - this is the required select:

WITH incomplete_data(payment_date, payment_id) AS (
   VALUES 
     ('2024-09-06 11:26:57.509429+01'::timestamp with time zone, 'uuid01')
    ,('2024-09-06 12:26:57.509429+01', 'uuid02')
    ,('2024-09-07 07:26:57.509429+01', 'uuid03')
    ,('2024-09-08 10:26:57.509429+01', 'uuid05')
    ,('2024-09-08 12:26:57.509429+01', 'uuid08')
    ,('2024-09-08 14:26:57.509429+01', 'uuid11')
    ,('2024-09-10 09:26:57.509429+01', 'uuid23')
   )
select count(payment_id), d.day as time
FROM  (
   SELECT generate_series(timestamp '2024-01-01'
                        , timestamp '2024-01-01' + interval '1 year - 1 day'
                        , interval  '1 day')::date
   ) d(day)
left JOIN incomplete_data p ON date_trunc('day',payment_date) = d.day
where d.day BETWEEN '2024-09-01T12:55:36.824Z' AND '2024-09-30T13:55:36.824Z'
GROUP  BY d.day
ORDER  BY d.day
;

score 0 · Accepted Answer · answered Nov 25 '24 at 17:54

The first problem is that you need a LEFT join, not a RIGHT join.

But then when payment_date is NULL, like it will be for the NULL-extended rows from the left join, it obviously can't satisfy the BETWEEN. That NULL value will also do bad things to your GROUP BY, ORDER BY, and the select list. So you need to refer to d.day, not payment_date, in all of those places.

Joining with generate_series for missing dates with derived date column

1 Answers1