0

I have a simple table with 75 million rows (and roughly 47 million unique ref_ids)

CREATE TABLE tbl (
    id serial NOT NULL PRIMARY KEY,
    ref_id text NOT NULL,
    ts timestamp NOT NULL
);

And I am trying to run the following query on the table (essentially, delete rows where there is an existing row with the same ref_id and a newer ts :

DELETE FROM tbl t 
WHERE EXISTS (
    SELECT 1 FROM tbl t2 WHERE t.id < t2.id AND t.ref_id = t2.ref_id AND t.ts <= t2.ts
)

If I try to run this query manually (i.e. via PGAdmin), it will run for several hours before I give up on it. However, if I put this query in a procedure and schedule it to run 5 minutes later via pg_cron, it will complete in ~70 seconds. There is no traffic to this table during either the manual or cron runs.

To confirm, a count of the rows after the pg_cron run shows 47 million rows, where before the run it had 75.

I have also experienced similar behavior when running other queries, and I am trying to figure out what could be causing such a massive performance difference between pg_cron and manually running queries.

I am running postgres 14.4.

perennial_
  • 101
  • 1

0 Answers0