2

I am trying to optimize the 'IN' QUERY in Postgres 11.

I tested 3 approaches:

Case 1:

DELETE FROM mytable WHERE 
id1='fffe9411-3b9d-40dc-9cc6-14407785be8b' 
and 
id2 IN ('00000140-1ae9-41f7-9614-453c063cee52'...'0000693d-0570-41e2-81e9-288261b3b2e5');

Case 2:

DELETE FROM mytable 
USING  
unnest ('{00000140-1ae9-41f7-9614-453c063cee52..0000693d-0570-41e2-81e9-288261b3b2e5}'::text[]) unnestid 
WHERE 
id2 = unnestid 
and 
mytable.id1='fffe9411-3b9d-40dc-9cc6-14407785be8b';

Case 3:

DELETE FROM mytable WHERE 
id1='fffe9411-3b9d-40dc-9cc6-14407785be8b' 
and id2 
IN (VALUES ('00000140-1ae9-41f7-9614-453c063cee52')..('0000693d-0570-41e2-81e9-288261b3b2e5'));

Query Plan for case 1:

 Delete on mytable  (cost=0.56..7.18 rows=1 width=6) (actual time=0.601..0.602 rows=0 loops=1)
   ->  Index Scan using mytable_pkey on mytable  (cost=0.56..7.18 rows=1 width=6) (actual time=0.073..0.453 rows=6 loops=1)
         Index Cond: (id1 = 'fffe9411-3b9d-40dc-9cc6-14407785be8b'::text)
         Filter: (id2 = ANY ('{00000140-1ae9-41f7-9614-453c063cee52,00005327-2400-40cc-bd22-39cc7fc6744e,00005a71-3cad-4253-9afe-dc3fe5609dc1,000062be-95ae-4485-800d-d969a232ebf1,0000663e-7675-4b93-86a5-742de1bab70d,0000693d-0570-41e2-81e9-288261b3b2e5}'::text[]))
         Rows Removed by Filter: 95

Query Plan for case 2:

Delete on mytable  (cost=7.21..8.47 rows=1 width=62) (actual time=0.691..0.693 rows=0 loops=1)
   ->  Hash Join  (cost=7.21..8.47 rows=1 width=62) (actual time=0.650..0.656 rows=6 loops=1)
         Hash Cond: (unnestid.unnestid = mytable.id2)
         ->  Function Scan on unnest unnestid  (cost=0.00..1.00 rows=100 width=88) (actual time=0.019..0.022 rows=6 loops=1)
         ->  Hash  (cost=7.15..7.15 rows=5 width=43) (actual time=0.594..0.595 rows=101 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 16kB
               ->  Index Scan using mytable_pkey on mytable  (cost=0.56..7.15 rows=5 width=43) (actual time=0.087..0.554 rows=101 loops=1)
                     Index Cond: (id1 = 'fffe9411-3b9d-40dc-9cc6-14407785be8b'::text)

Query Plan for case 3:

Delete on mytable  (cost=0.71..7.31 rows=1 width=62) (actual time=0.649..0.651 rows=0 loops=1)
   ->  Hash Semi Join  (cost=0.71..7.31 rows=1 width=62) (actual time=0.140..0.598 rows=6 loops=1)
         Hash Cond: (followers.accountid = "*VALUES*".column1)
         ->  Index Scan using mytable_pkey on mytable  (cost=0.56..7.15 rows=5 width=43) (actual time=0.094..0.529 rows=101 loops=1)
               Index Cond: (id1 = 'fffe9411-3b9d-40dc-9cc6-14407785be8b'::text)
         ->  Hash  (cost=0.08..0.08 rows=6 width=88) (actual time=0.019..0.020 rows=6 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 9kB
               ->  Values Scan on "*VALUES*"  (cost=0.00..0.08 rows=6 width=88) (actual time=0.010..0.013 rows=6 loops=1)

I am not sure which approach is more effient?
Constraints:
1> My IN LIST can have Max 100 elements.
2> mytable has index on id1 and id2

I am gravitating to approach 1 & 3
[In case 3: we are making hash against IN List which is Max 100]

Reference: Optimizing a Postgres query with a large IN

As Per above link case 2 and case 3 is better but what confuses me is FILTER not less expensive than doing hash join?

Hash Join, Will do sequential Scan on both inner and outer loop..

But option 1 is only doing a seq scan on mytable + filter on the IN clause.

1 Answers1

3

Two revealing bits in the first query plan:

Index Scan using mytable_pkey on mytable  (cost=0.56..7.18 rows=1 width=6)
                                          (actual time=0.073..0.453 rows=6 loops=1)

rows=6. So it hardly matters how you apply the second filter. The first filter already did almost all the work! You are done here.
If the filter on the first UUID wasn't so selective, a multicolumn index on (id1, id2) might be useful.

Index Cond: (id1 = 'fffe9411-3b9d-40dc-9cc6-14407785be8b'::text)

You are storing UUID numbers as text, which is a big waste. Use the proper type uuid to make storage a lot smaller and everything a lot faster. (And enforce valid UUIDs.) See:

My old answer you refer to still isn't wrong. But it was written 2015 for Postgres 9.4. We have Postgres 14 now, which has gotten a lot smarter.

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633