SQL (Postgres) Filling up missing rows with values from previous row and calculating on result set

Question

I have got this dataset:

 id | block_number | value | name 
----+--------------+-------+------
  2 |           47 |   100 | one
  3 |           52 |   200 | one
  4 |           58 |   120 | one
  5 |           60 |   180 | one

created from

create table test (id serial, block_number int, value int, name varchar);

and

insert into test (block_number, value, name) values (47, 100, 'one'), (52, 200, 'one'), (58, 120, 'one'), (60, 180, 'one');

I would like to fill up the rows (not actually add them to the table but just use them for further processing), so that the column 'block_number' shows up consecutively using the value of the previous row for 'value' like so:

block_number | value | name 
-------------+-------+------
          47 |   100 | one
          48 |   100 | one
          49 |   100 | one
          50 |   100 | one
          51 |   100 | one
          52 |   200 | one
          53 |   200 | one
          54 |   200 | one
          55 |   200 | one
          56 |   200 | one
          57 |   200 | one
          58 |   120 | one
          59 |   120 | one
          60 |   180 | one

Then I would like to multiply each of these rows with a given number and sum up the total of that, for example given number would be 2 then the result of the above would be 4,240.

I haven't gotten very far with this, only thing I could come up with is a statement that generates the missing rows. However, I am failing to find out how I could set the value of each row to the one of the previous row:

SELECT x.block_number, x.value, t.value
           FROM (
              SELECT generate_series(min(block_number), max(block_number)) AS block_number
              FROM   my_table
               ) x
           LEFT   JOIN my_table t USING (block_number)
           ORDER  BY x.block_number;

Any help greatly appreciated.

UPDATE:

I tried to reconcile this with @mustaccio's hint here: https://dba.stackexchange.com/posts/327530/edit#:~:text=Carry%20over%20long%20sequence%20of%20missing%20values%20with%20Postgres and I came up with this:

SELECT block_number
     , first_value(value) OVER (PARTITION BY block_number, grp ORDER BY block_number) AS value
FROM  (
   SELECT 
        count(value) OVER (PARTITION BY block_number ORDER BY block_number) AS grp
        , value
        , block_number
   FROM   (
SELECT x.block_number, t.value
           FROM (
              SELECT generate_series(min(block_number), max(block_number)) AS block_number
              FROM   test
               ) x
           LEFT   JOIN test t USING (block_number)
           ORDER  BY x.block_number
) test
) sub;

but it still doesn't get me the missing values:

 block_number | value 
--------------+-------
           47 |   100
           48 |      
           49 |      
           50 |      
           51 |      
           52 |   200
           53 |      
           54 |      
           55 |      
           56 |      
           57 |      
           58 |   120
           59 |      
           60 |   180

Any suggestions?

UPDATE 2

Thanks to @mustaccio's answer the first part is solved. I tried to amend the statement so that it does the calculation as mentioned above: Multiplying the 'value' in each row with a given number (here: 2) and then summing up all those values. What I tried:

SELECT SUM(a.value * 2)
FROM (
SELECT  block_number, value, first_value(value) OVER (PARTITION BY grp) 
FROM (
 SELECT 
  sum(CASE WHEN value IS NOT NULL THEN 1 END) OVER (ORDER BY block_number) AS grp
  , value
  , block_number
 FROM   (
   SELECT x.block_number, t.value
   FROM (
    SELECT generate_series(min(block_number), max(block_number)) AS block_number
    FROM   test
   ) x
   LEFT   JOIN test t USING (block_number)
 ) y
) z
) a;

However, that gives me a result of 1,200 instead of the expected 4,060.

Any ideas?

mustaccio · Accepted Answer · 2023-05-26T14:34:00.437

You were almost there:

SELECT  
 block_number, 
 first_value(value) OVER (PARTITION BY grp ORDER BY block_number) AS last_good_value
FROM (
 SELECT 
  sum(CASE WHEN value IS NOT NULL THEN 1 END) OVER (ORDER BY block_number) AS grp
  , value
  , block_number
 FROM   (
   SELECT x.block_number, t.value
   FROM (
    SELECT generate_series(min(block_number), max(block_number)) AS block_number
    FROM   test
   ) x
   LEFT   JOIN test t USING (block_number)
 ) y
) z
ORDER BY block_number

In order to make use of the first_value() function, we need to identify the group (partition) where that first value should come from. After you fill in missing block_numbers from generate_series, each group comprises some non-NULL value followed by NULL values, ordered by block_number; the next group begins with the next non-NULL value.

The sum() OVER ... window function calculates the sum of values from the beginning of the range to the current row, inclusive; since we don't include the PARTITION BY clause here, the range means the entire row set.

CASE WHEN value IS NOT NULL THEN 1 END ensures that we only increment the sum when we encounter the next non-NULL value. As a result, we assign consecutive numbers (grp) to each group of block_numbers.

Now that we can identify each group, we can use first_value(value) OVER (PARTITION BY grp) to assign the "last good" value to all rows in the same grp partition.

Fiddle

SQL (Postgres) Filling up missing rows with values from previous row and calculating on result set

UPDATE:

UPDATE 2

1 Answers1