13

I need to get the rolling sum over a period of 7 days for each row (1 row per day).

For example:

| Date       | Count | 7-Day Rolling Sum |
------------------------------------------
| 2016-02-01 | 1     | 1
| 2016-02-02 | 1     | 2
| 2016-02-03 | 2     | 4
| 2016-02-04 | 2     | 6
| 2016-02-05 | 2     | 8
| 2016-02-06 | 2     | 10
| 2016-02-07 | 2     | 12
| 2016-02-08 | 2     | 13 --> here we start summing from 02-02
| 2016-02-09 | 2     | 14 --> here we start summing from 02-03
| 2016-02-10 | 5     | 17 --> here we start summing from 02-04

I need this in one query that returns the rows with 7-day rolling sum and the date of the last day of the range of the sum. For example, day=2016-02-10, sum 17.

So far I have this but it's not fully working:

DO
$do$
DECLARE 
    curr_date date;
    num bigint;
BEGIN
FOR curr_date IN (SELECT date_trunc('day', d)::date FROM generate_series(CURRENT_DATE-31, CURRENT_DATE-1, '1 day'::interval) d)
LOOP 
    SELECT curr_date, SUM(count)
    FROM generate_series (curr_date-8, curr_date-1, '1 day'::interval) d
    LEFT JOIN m.ping AS p ON p.date = d
    LEFT JOIN m.ping_type AS pt ON pt.id = p.ping_type_id
    LEFT JOIN m.ping_frequency AS pf ON pf.id = p.ping_frequency_id
    WHERE
        pt.url_slug = 'active' AND
        pf.url_slug = 'weekly';
END LOOP;
END
$do$;

I am using PostgreSQL 9.4.5. There could be multiple rows with the same date. If there is a gap (a day is missing), the 7-consecutive-day range will still be followed.

Paul White
  • 94,921
  • 30
  • 437
  • 687
josesigna
  • 233
  • 1
  • 2
  • 6

3 Answers3

16

By far the cleanest solution is to use window function sum with rows between:

with days as (
        SELECT date_trunc('day', d)::date as day
        FROM generate_series(CURRENT_DATE-31, CURRENT_DATE-1, '1 day'::interval) d ),
    counts as (
        select 
            days.day,
            sum((random()*5)::integer) num
        FROM days
        -- left join other tables here to get counts, I'm using random
        group by days.day
    )
select
    day,
    num,
    sum(num) over (order by day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
from counts
order by day;

The important part is to generate the timeframe in days CTE and join onto it in order not to miss any days for which there is no data.

Example

For example, if I create some test data with 20 records in last 14 days:

SELECT (current_date - ((random()*14)::integer::text || 'days')::interval)::date as day, (random()*7)::integer as num
into test_data from generate_series(1, 20);;

And also add a value before that:

insert into test_data values ((current_date - '25 days'::interval), 5);

Then use the query above:

with days as (
        SELECT date_trunc('day', d)::date as day
        FROM generate_series(CURRENT_DATE-31, CURRENT_DATE-1, '1 day'::interval) d ),
    counts as (
        select 
            days.day,
            sum(t.num) num
        FROM days
        left join test_data t on t.day = days.day
        group by days.day
    )
select
    day,
    num,
    sum(num) over (order by day rows between 6 preceding and current row)
from counts
order by day;

And get the results for whole month:

    day     | num | sum 
------------+-----+-----
 2016-01-31 |     |    
 2016-02-01 |     |    
 2016-02-02 |     |    
 2016-02-03 |     |    
 2016-02-04 |     |    
 2016-02-05 |     |    
 2016-02-06 |   5 |   5
 2016-02-07 |     |   5
 2016-02-08 |     |   5
 2016-02-09 |     |   5
 2016-02-10 |     |   5
 2016-02-11 |     |   5
 2016-02-12 |     |   5
 2016-02-13 |     |    
 2016-02-14 |     |    
 2016-02-15 |     |    
 2016-02-16 |     |    
 2016-02-17 |     |    
 2016-02-18 |   2 |   2
 2016-02-19 |   5 |   7
 2016-02-20 |     |   7
 2016-02-21 |   4 |  11
 2016-02-22 |  15 |  26
 2016-02-23 |   1 |  27
 2016-02-24 |   1 |  28
 2016-02-25 |   2 |  28
 2016-02-26 |   4 |  27
 2016-02-27 |   9 |  36
 2016-02-28 |   5 |  37
 2016-02-29 |  11 |  33
 2016-03-01 |   5 |  37
(31 rows)
hruske
  • 630
  • 4
  • 11
0

Ended up using a FOR LOOP, a TEMP table and SELECT on the temp table once the for loop is done:

DO
$do$
DECLARE 
    curr_date DATE;
BEGIN

-- Create temp table to hold results
DROP TABLE IF EXISTS rolling_7day_sum;
CREATE TEMP TABLE rolling_7day_sum (
    date DATE,
    count BIGINT
);

-- Iterate dates and get 7 day rolling sum for each
FOR curr_date IN (SELECT date_trunc('day', d)::date FROM generate_series(
    -- Get earliest date from table
    (
        SELECT date FROM m.ping AS p
            LEFT JOIN m.ping_type AS pt ON pt.id = p.ping_type_id
            LEFT JOIN m.ping_frequency AS pf ON pf.id = p.ping_frequency_id
        WHERE
            pt.url_slug = 'active' AND
            pf.url_slug = 'weekly'
        ORDER BY date ASC
        LIMIT 1
    ), CURRENT_DATE-1, '1 day'::interval) d)
LOOP
    INSERT INTO rolling_7day_sum 
        SELECT curr_date, SUM(count)
            FROM generate_series (curr_date-8, curr_date-1, '1 day'::interval) d
                LEFT JOIN m.ping AS p ON p.date = d
                LEFT JOIN m.ping_type AS pt ON pt.id = p.ping_type_id
                LEFT JOIN m.ping_frequency AS pf ON pf.id = p.ping_frequency_id
            WHERE
                pt.url_slug = 'active' AND
                pf.url_slug = 'weekly';
END LOOP;
END
$do$;

SELECT date, count FROM rolling_7day_sum ORDER BY date ASC;

But I imagine there is a cleaner way of doing a 7-consecutive-rolling-sum than this.

josesigna
  • 233
  • 1
  • 2
  • 6
0

A recursive SQL query that goes 7 in depth may work, but I don't know how efficient it would be.

WITH RECURSIVE totals(start_day, end_day, total, depth) AS (
    SELECT date, date, count, 1 FROM table
  UNION ALL
    SELECT
      t.start_day,
      t.start_day + INTERVAL '1 day',
      total + COALESCE((SELECT count FROM table WHERE date = t.start_day + INTERVAL '1 day'), 0),
      t.depth + 1
    FROM totals t
) SELECT
  *
FROM totals
WHERE end_day = '2016-03-01' AND depth = 7;

Not tested for syntax or anything.

Joishi Bodio
  • 3,508
  • 2
  • 17
  • 20