How could one write a Postgresql query that gives both daily and cumulative counts without gaps? E.g. if there is no data for a given day, it would show 0 as a daily count for that that day and the same cumulative sum as the previous day?
I think I'd need GENERATE_SERIES, but I don't know how to do it. I'm also not entirely sure if order by day asc rows between unbounded preceding and current row would work correctly always, but maybe not the biggest issue here.
I have tried to craft the query with PARTITION BY and
Let's say I write a table and add data such as
create table test
(
id int4 NOT NULL GENERATED ALWAYS AS IDENTITY,
data int4,
created_at timestamptz NOT NULL DEFAULT now()
);
insert into test(data, created_at) values(1, '2021-04-01');
insert into test(data, created_at) values(2, '2021-04-01');
insert into test(data, created_at) values(3, '2021-04-02');
insert into test(data, created_at) values(4, '2021-04-03');
insert into test(data, created_at) values(5, '2021-04-05');
insert into test(data, created_at) values(6, '2021-04-07');
and then create a queries such as
SELECT
created_at as "Date",
count(1) as "Daily count"
FROM test
WHERE created_at >= '2021-04-01'
AND created_at <= '2021-04-30'
GROUP BY 1
giving
| Date | Daily count |
|---|---|
| 2021-04-01 00:00:00 | 2 |
| 2021-04-02 00:00:00 | 1 |
| 2021-04-03 00:00:00 | 1 |
| 2021-04-05 00:00:00 | 1 |
| 2021-04-07 00:00:00 | 1 |
with data as (
select
date_trunc('day', created_at) as day,
count(1)
from test
group by 1
)
select
day,
running_total(count) over (order by day asc rows between unbounded preceding and current row)
from data
| day | running_total |
|---|---|
| 2021-04-01 00:00:00 | 2 |
| 2021-04-02 00:00:00 | 3 |
| 2021-04-03 00:00:00 | 4 |
| 2021-04-05 00:00:00 | 5 |
| 2021-04-07 00:00:00 | 6 |
But as noted, how could these two be combined without gaps on daily values? Somehow it feels I get close but bump into some (syntax) problem. Maybe those two queries are the simplest and cleanest examples of what I'm thinking.