0

I have below query that I need to run on a table with 100million records but it's extremely slow (been running for 5 hours so far)

I am not sure how to optimize it, would be grateful for any help.

The table has an index on DID and week_no, and contains several other columns not indexed, and a primary key (id) indexed

DELETE FROM test 
WHERE "DID" IN (SELECT "DID" 
                FROM test 
                GROUP BY "DID" 
                having count(distinct week_no) < 4 );
roy naufal
  • 187
  • 1
  • 6

1 Answers1

0

A big part of the performance problem is DELETE, which will save the old values in case of a crash or other ROLLBACK.

If there is a composite INDEX(DID, week_no) (in that order), there is some hope. With that index, walk through it a few DIDs at a time. That is, put the delete in a loop that gradually increases DID.

See this for discussion of ways to do the walking, especial with a composite index.

The overall time taken by chunking may actually be less, but it will still be a long time. If you need to run this query frequently, let's discuss other techniques to prepare for the need.

Rick James
  • 80,479
  • 5
  • 52
  • 119