1

I have a table of the following structure, where all columns except the last are non-null:

ID receiver send_time recv_time
1 A 00:00:00 00:00:01
2 A 00:00:01 NULL
3 A 00:00:02 NULL
4 A 00:00:03 NULL
5 A 00:00:04 00:00:05
6 B 00:00:00 00:00:01
7 B 00:00:01 NULL
8 B 00:00:02 00:00:03
9 A 00:00:05 NULL
10 B 00:00:03 NULL
11 A 00:00:06 00:00:07

I want to select (and ultimately delete, but I think selecting would be the first step) all rows, where the recv_time column is NULL for n or more consecutive rows, partitioned by the receiver column and ordered by the send_time column.

So from the above example, if n=2, I want to select/delete the rows with ID 2, 3 and 4, but not rows 7, 9, or 10.

I think a count with partition is needed, but I can't figure out how to "reset" the count of null values as soon as a non-null row is encountered. This is what I've got so far:

SELECT 
    id, receiver, send_time, recv_time, 
    COUNT(CASE WHEN recv_time IS NULL THEN 1 END) OVER (PARTITION BY receiver ORDER BY send_time)
FROM test 
ORDER BY id

which gives me:

ID receiver send_time recv_time count
1 A 00:00:00 00:00:01 0
2 A 00:00:01 NULL 1
3 A 00:00:02 NULL 2
4 A 00:00:03 NULL 3
5 A 00:00:04 00:00:05 3
6 B 00:00:00 00:00:01 0
7 B 00:00:01 NULL 1
8 B 00:00:02 00:00:03 1
9 A 00:00:05 NULL 4
10 B 00:00:03 NULL 2
11 A 00:00:06 00:00:07 4

But I want the count to reset at rows 5, 8, 11. I'm also not sure how I would proceed after that, because I want to capture all rows in the "group" of consecutive NULL rows, but obviously the count for row 2 in this example is smaller than n.

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
Edgxxar
  • 13
  • 2

1 Answers1

1

To just select:

SELECT *
FROM  (
   SELECT *, count(*) OVER (PARTITION BY receiver, grp) AS grp_count
   FROM  (
      SELECT *, count(recv_time) OVER (PARTITION BY receiver ORDER BY send_time) AS grp
      FROM   test
      ) sub
   ) d
WHERE  d.grp_count > 2
AND    d.recv_time IS NULL;

Basically, the exact opposite of what you tried: we want the count to stop for rows with null values. Meaning, you were close, with one step in the logic inverted.

The "trick" is to do a running count(recv_time). Null values don't count, so the count stalls for those rows. Candidates for removal end up in groups (grp) with more than 2 members for n = 2.

To delete:

DELETE FROM test t
USING (  -- same as above
   SELECT *, count(*) OVER (PARTITION BY receiver, grp) AS grp_count
   FROM  (
      SELECT *, count(recv_time) OVER (PARTITION BY receiver ORDER BY send_time) AS grp
      FROM   test
      ) sub
   ) d
WHERE  d.grp_count > 2
AND    d.recv_time IS NULL
AND    t.id = d.id;

fiddle

Related:

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633