2

I have a table like this:

Date Group name value
2022-01-01 A 1.0
2022-01-15 A 0.5
2022-01-31 A 0.2

but this contains only 3 days. I need a table with full-calendar dates, a row for every day of the year where the value is the last available.
For example, all records from 1 Jan 2022 to 14 Jan have value 1.0 (scoped to group_name 'A').

I have tried using LAST_VALUE() but it's not working.

WITH dates AS (
SELECT 
    date::date
FROM generate_series ( '2022-01-01'::timestamp, '2022-01-31'::timestamp, '1 day'::interval) date

), incomplete_table AS ( SELECT * FROM (VALUES ('2022-01-01'::date, 'a', 1), ('2022-01-15'::date, 'a', 0.5), ('2022-01-31'::date, 'a', 0.2), ('2022-01-02'::date, 'b', 0.1), ('2022-01-10'::date, 'b', 0.15), ('2022-01-20'::date, 'b', 0.15) ) AS t (date,group_name, value) ) SELECT dates.date, group_name, value, LAST_VALUE(value) OVER (ORDER BY dates.date DESC) as last_value_window FROM dates LEFT JOIN incomplete_table ON incomplete_table.date = dates.date ORDER BY dates.date DESC;

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
sparkle
  • 73
  • 4

1 Answers1

4

Assuming you want one row per day and group name:

WITH incomplete_table(date, group_name, value) AS (
   VALUES 
     ('2022-01-01'::date, 'a', 1)
   , ('2022-01-15'::date, 'a', 0.5)
   , ('2022-01-31'::date, 'a', 0.2)
   , ('2022-01-02'::date, 'b', 0.1)
   , ('2022-01-10'::date, 'b', 0.15)
   , ('2022-01-20'::date, 'b', 0.15)
   )
SELECT d.date, g.group_name, i.value
FROM  (
   SELECT date::date
   FROM   generate_series (timestamp '2022-01-01'
                         , timestamp '2022-01-31'
                         , interval '1 day') date
   ) d
CROSS  JOIN (SELECT DISTINCT group_name FROM incomplete_table) g  -- ①
LEFT   JOIN LATERAL (
   SELECT i.group_name, i.value
   FROM   incomplete_table i
   WHERE  i.group_name = g.group_name
   AND    i.date <= d.date
-- AND    i.date >= timestamp '2022-01-01'  -- ? ②
   ORDER  BY i.date DESC
   LIMIT  1
   ) i ON true
ORDER  BY g.group_name, d.date DESC;

db<>fiddle here

① If available, use a table "groups" providing distinct group names. Faster. Else, if the table is big, consider an emulated index-skip scan. See:

② The search for the latest value is not limited to the given time range unless you spell that out in the LATERAL subquery. About LATERAL:

You still get value IS NULL where no earlier value is found.

If incomplete_table is big, an index on (group_name, date) will help performance (a lot). Possibly even a "covering" index, adding column value. See:

Very similar case with more explanation:

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633