0

I have got this dataset:

 id | block_number | value | name 
----+--------------+-------+------
  2 |           47 |   100 | one
  3 |           52 |   200 | one
  4 |           58 |   120 | one
  5 |           60 |   180 | one

created from

create table test (id serial, block_number int, value int, name varchar);

and

insert into test (block_number, value, name) values (47, 100, 'one'), (52, 200, 'one'), (58, 120, 'one'), (60, 180, 'one');

I would like to fill up the rows (not actually add them to the table but just use them for further processing), so that the column 'block_number' shows up consecutively using the value of the previous row for 'value' like so:

block_number | value | name 
-------------+-------+------
          47 |   100 | one
          48 |   100 | one
          49 |   100 | one
          50 |   100 | one
          51 |   100 | one
          52 |   200 | one
          53 |   200 | one
          54 |   200 | one
          55 |   200 | one
          56 |   200 | one
          57 |   200 | one
          58 |   120 | one
          59 |   120 | one
          60 |   180 | one

Then I would like to multiply each of these rows with a given number and sum up the total of that, for example given number would be 2 then the result of the above would be 4,240.

I haven't gotten very far with this, only thing I could come up with is a statement that generates the missing rows. However, I am failing to find out how I could set the value of each row to the one of the previous row:

SELECT x.block_number, x.value, t.value
           FROM (
              SELECT generate_series(min(block_number), max(block_number)) AS block_number
              FROM   my_table
               ) x
           LEFT   JOIN my_table t USING (block_number)
           ORDER  BY x.block_number;

Any help greatly appreciated.

UPDATE:

I tried to reconcile this with @mustaccio's hint here: https://dba.stackexchange.com/posts/327530/edit#:~:text=Carry%20over%20long%20sequence%20of%20missing%20values%20with%20Postgres and I came up with this:

SELECT block_number
     , first_value(value) OVER (PARTITION BY block_number, grp ORDER BY block_number) AS value

FROM ( SELECT count(value) OVER (PARTITION BY block_number ORDER BY block_number) AS grp , value , block_number FROM (

SELECT x.block_number, t.value FROM ( SELECT generate_series(min(block_number), max(block_number)) AS block_number FROM test ) x LEFT JOIN test t USING (block_number) ORDER BY x.block_number ) test

) sub;

but it still doesn't get me the missing values:

 block_number | value 
--------------+-------
           47 |   100
           48 |      
           49 |      
           50 |      
           51 |      
           52 |   200
           53 |      
           54 |      
           55 |      
           56 |      
           57 |      
           58 |   120
           59 |      
           60 |   180

Any suggestions?

UPDATE 2

Thanks to @mustaccio's answer the first part is solved. I tried to amend the statement so that it does the calculation as mentioned above: Multiplying the 'value' in each row with a given number (here: 2) and then summing up all those values. What I tried:

SELECT SUM(a.value * 2)
FROM (

SELECT block_number, value, first_value(value) OVER (PARTITION BY grp) FROM ( SELECT sum(CASE WHEN value IS NOT NULL THEN 1 END) OVER (ORDER BY block_number) AS grp , value , block_number FROM ( SELECT x.block_number, t.value FROM ( SELECT generate_series(min(block_number), max(block_number)) AS block_number FROM test ) x LEFT JOIN test t USING (block_number) ) y ) z ) a;

However, that gives me a result of 1,200 instead of the expected 4,060.

Any ideas?

1 Answers1

0

You were almost there:

SELECT  
 block_number, 
 first_value(value) OVER (PARTITION BY grp ORDER BY block_number) AS last_good_value
FROM (
 SELECT 
  sum(CASE WHEN value IS NOT NULL THEN 1 END) OVER (ORDER BY block_number) AS grp
  , value
  , block_number
 FROM   (
   SELECT x.block_number, t.value
   FROM (
    SELECT generate_series(min(block_number), max(block_number)) AS block_number
    FROM   test
   ) x
   LEFT   JOIN test t USING (block_number)
 ) y
) z
ORDER BY block_number

In order to make use of the first_value() function, we need to identify the group (partition) where that first value should come from. After you fill in missing block_numbers from generate_series, each group comprises some non-NULL value followed by NULL values, ordered by block_number; the next group begins with the next non-NULL value.

The sum() OVER ... window function calculates the sum of values from the beginning of the range to the current row, inclusive; since we don't include the PARTITION BY clause here, the range means the entire row set.

CASE WHEN value IS NOT NULL THEN 1 END ensures that we only increment the sum when we encounter the next non-NULL value. As a result, we assign consecutive numbers (grp) to each group of block_numbers.

Now that we can identify each group, we can use first_value(value) OVER (PARTITION BY grp) to assign the "last good" value to all rows in the same grp partition.

Fiddle

mustaccio
  • 28,207
  • 24
  • 60
  • 76