PostgreSQL 13.2 (Debian 13.2-1.pgdg100+1) on x86_64-pc-linux-gnu,
compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
Hello All,
I'm a bit stuck on this one.
Working SQL
Currently I am using this query:
SELECT DATE_PART('year', created_at::timestamp) AS year,
DATE_PART('week', created_at::timestamp) AS week_number,
DATE_TRUNC('week', created_at::timestamp) AS week_starting,
name,
SUM(seconds) as total_seconds
FROM my_table
WHERE name = 'jeff'
AND created_at::date >= '2021-01-01'
AND created_at::date <= '2021-10-01'
GROUP BY year, week_number, week_starting, name
ORDER BY name;
This gives me:
| "year" |"week_number" | "week_starting" | "name" | "total_seconds" |
| "2021" | "18" | "2021-05-03 00:00:00" | "jeff" | "11" |
| "2021" | "19" | "2021-05-10 00:00:00" | "jeff" | "40" |
| "2021" | "23" | "2021-06-07 00:00:00" | "jeff" | "7" |
What I would like is a result with the gaps filled in:
| "year" |"week_number" | "week_starting" | "name" | "total_seconds" |
| "2021" | "18" | "2021-05-03 00:00:00" | "jeff" | "11" |
| "2021" | "19" | "2021-05-10 00:00:00" | "jeff" | "40" |
| "2021" | "20" | "2021-05-17 00:00:00" | "jeff" | "0" |
| "2021" | "21" | "2021-05-24 00:00:00" | "jeff" | "0" |
| "2021" | "22" | "2021-05-31 00:00:00" | "jeff" | "0" |
| "2021" | "23" | "2021-06-07 00:00:00" | "jeff" | "7" |
Setup DB table
CREATE TABLE IF NOT EXISTS my_table (
id INT GENERATED ALWAYS AS IDENTITY,
name VARCHAR(255) NOT NULL,
created_at TIMESTAMP,
seconds INT,
PRIMARY KEY(id)
);
INSERT INTO my_table(name, created_at, seconds) VALUES ('jeff', '2021-05-03 15:28', 10);
INSERT INTO my_table(name, created_at, seconds) VALUES ('jeff', '2021-05-03 15:29', 1);
INSERT INTO my_table(name, created_at, seconds) VALUES ('jeff', '2021-05-10 11:01', 20);
INSERT INTO my_table(name, created_at, seconds) VALUES ('jeff', '2021-05-11 13:23', 20);
INSERT INTO my_table(name, created_at, seconds) VALUES ('jeff', '2021-06-08 10:04', 2);
INSERT INTO my_table(name, created_at, seconds) VALUES ('jeff', '2021-06-09 06:27', 5);
What I have tried so far
So there is a lot of posts around about using GENERATE_SERIES
- Fill missing dates within groups
- https://ubiq.co/database-blog/fill-missing-dates-using-postgresql-generate_series/
- https://stackoverflow.com/questions/13100445/adding-missing-date-in-a-table-in-postgresql
And I can create a series of weeks starting using:
SELECT DATE_TRUNC('week', GENERATE_SERIES(
'2021-01-01'::timestamp,
'2021-10-01'::timestamp,
'1 week'::interval
)) as created_at
I have had a good look at using this and in my mind I have tried to create an "empty" weeks temporary table and then merge that into my_table to give the desired output of filling the gaps.
However the below query just gives me back the data thats in my_table and does not fill the gaps. Can someone explain what am I missing here?
Working query
SELECT empty_weeks.created_at,
DATE_PART('year', empty_weeks.created_at::timestamp) AS year,
DATE_PART('week', empty_weeks.created_at::timestamp) AS week_number,
DATE_TRUNC('week', empty_weeks.created_at::timestamp) AS week_starting,
t.name,
SUM(t.seconds) as total_seconds
FROM (
SELECT DATE_TRUNC('week', GENERATE_SERIES(
'2021-01-01'::timestamp,
'2021-10-01'::timestamp,
'1 week'::interval
)) as created_at,
'jeff' as name,
0 as seconds
) empty_weeks
LEFT JOIN my_table t
ON DATE_TRUNC('week', t.created_at) = empty_weeks.created_at
WHERE t.name = 'jeff'
AND empty_weeks.created_at::date >= '2021-01-01'
AND empty_weeks.created_at::date <= '2021-10-01'
GROUP BY year, week_number, week_starting, empty_weeks.created_at, t.name
ORDER BY t.name;
Provides
| "year" |"week_number" | "week_starting" | "name" | "total_seconds" |
| "2021" | "18" | "2021-05-03 00:00:00" | "jeff" | "11" |
| "2021" | "19" | "2021-05-10 00:00:00" | "jeff" | "40" |
| "2021" | "23" | "2021-06-07 00:00:00" | "jeff" | "7" |
What I would like
| "year" |"week_number" | "week_starting" | "name" | "total_seconds" |
| "2021" | "1" | "2020-12-28 00:00:00" | "jeff" | "0"
...
| "2021" | "18" | "2021-05-03 00:00:00" | "jeff" | "11" |
| "2021" | "19" | "2021-05-10 00:00:00" | "jeff" | "40" |
| "2021" | "20" | "2021-05-17 00:00:00" | "jeff" | "0" |
| "2021" | "21" | "2021-05-24 00:00:00" | "jeff" | "0" |
| "2021" | "22" | "2021-05-31 00:00:00" | "jeff" | "0" |
| "2021" | "23" | "2021-06-07 00:00:00" | "jeff" | "7" |
...
| "2021" | "40" | "2021-09-27 00:00:00" | "jeff" | "0"
Any help would be hugely appreciated but also an explanation would be great. I have had a good read up on LEFT JOIN and I'm expecting it to provide empty rows of data.
As always thanks in advance
Solution
For completeness and thanks to @mustaccio I changed the query to:
SELECT empty_weeks.created_at,
DATE_PART('year', empty_weeks.created_at::timestamp) AS year,
DATE_PART('week', empty_weeks.created_at::timestamp) AS week_number,
DATE_TRUNC('week', empty_weeks.created_at::timestamp) AS week_starting,
COALESCE(t.name, 'jeff'),
COALESCE(SUM(t.seconds), 0) as total_seconds
FROM (
SELECT DATE_TRUNC('week', GENERATE_SERIES(
'2021-01-01'::timestamp,
'2021-10-01'::timestamp,
'1 week'::interval
)) as created_at
) empty_weeks
LEFT JOIN my_table t
ON DATE_TRUNC('week', t.created_at) = empty_weeks.created_at
AND t.name = 'jeff'
AND empty_weeks.created_at::date >= '2021-01-01'
AND empty_weeks.created_at::date <= '2021-10-01'
GROUP BY year, week_number, week_starting, empty_weeks.created_at, t.name
ORDER BY week_number;