What index can be created to optimise this query?

Question

I've got the below SQL query that runs extremely slowly. As for this query, this is due to the "ORDER BY" statement, since Postgres is scanning the changes table by "counter" which can have millions of values. Removing the "ORDER BY" statement makes the query fast.

For the other query mentioned above, I optimised it by creating an index on two fields. For this query however I'm not sure what index would be the right one. I tried with an index on (item_id, counter) but it didn't help at all, and I don't know what else I could try. Any suggestions?

Slow SQL query:

SELECT "id", "item_id", "item_name", "type", "updated_time", "counter"
FROM "changes"
WHERE counter > -1
AND type = 2
AND item_id IN (SELECT item_id FROM user_items WHERE user_id = 'xxxx')
ORDER BY "counter" ASC
LIMIT 200;

EXPLAIN (ANALYZE, BUFFERS, SETTINGS) result:

------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1001.15..27628.99 rows=200 width=99) (actual time=98730.912..116273.818 rows=200 loops=1)
   Buffers: shared hit=78113369 read=3224064 dirtied=3
   I/O Timings: read=137436.119
   ->  Gather Merge  (cost=1001.15..10431526.45 rows=78343 width=99) (actual time=98730.911..116273.783 rows=200 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=78113369 read=3224064 dirtied=3
         I/O Timings: read=137436.119
         ->  Nested Loop  (cost=1.13..10421483.70 rows=32643 width=99) (actual time=98493.185..112919.559 rows=75 loops=3)
               Buffers: shared hit=78113369 read=3224064 dirtied=3
               I/O Timings: read=137436.119
               ->  Parallel Index Scan using changes_pkey on changes  (cost=0.56..5949383.56 rows=6197986 width=99) (actual time=1.076..42523.117 rows=4075591 loops=3)
                     Index Cond: (counter > '-1'::integer)
                     Filter: (type = 2)
                     Rows Removed by Filter: 10370914
                     Buffers: shared hit=18993521 read=2672415
                     I/O Timings: read=85551.814
               ->  Index Scan using user_items_item_id_index on user_items  (cost=0.56..0.72 rows=1 width=23) (actual time=0.017..0.017 rows=0 loops=12226772)
                     Index Cond: ((item_id)::text = (changes.item_id)::text)
                     Filter: ((user_id)::text = 'xxxx'::text)
                     Rows Removed by Filter: 1
                     Buffers: shared hit=59119848 read=551649 dirtied=3
                     I/O Timings: read=51884.305
 Settings: effective_cache_size = '16179496kB', jit = 'off', work_mem = '100000kB'
 Planning Time: 1.465 ms
 Execution Time: 116273.929 ms
(26 rows)

Indexes:

"changes_pkey" PRIMARY KEY, btree (counter)
"changes_id_index" btree (id)
"changes_id_unique" UNIQUE CONSTRAINT, btree (id)
"changes_item_id_index" btree (item_id)
"changes_user_id_counter_index" btree (user_id, counter)
"changes_user_id_index" btree (user_id)

nbk · Answer 1 · 2024-04-12T14:15:56.627

you shold rewrite your query to

SELECT "id", c."item_id", "item_name", "type", "updated_time", "counter"
FROM "changes" c JOIN (SELECT item_id FROM user_items WHERE user_id = 'xxxx') ui
ON c.item_id = ui.item_id
WHERE counter > -1
AND type = 2
ORDER BY "counter" ASC
LIMIT 200;

With the indexes

  changes (type, item_id, counter) INCLUDE (id, item_name, updated_time)
  user_items (user_id)

that should gove the query some speed

The join is usually faster as the IN a

a combined index for chnages that include the three column in the ON and WHERE clause should incease alone the speed.

the same goes for user_item where the user should also have an index if it hasn't already one

score 1 · Accepted Answer · answered Apr 12 '24 at 19:02

The cause of the problem is the following: The optimizer thinks that there are enough rows in changes that are related to a user_items row with the correct user_id that it can quickly find 100 results by scanning changes in counter order and discarding rows that don't satisfy the condition until it has found 100 results and is done. However, it has to scan 10371014 rows until it has enough results, which takes very long. The cause might well be that all matching changes have rather high counter values.

There is very little you can do about that:

You can speed up the inner index scan as much as possible, like the other answers suggest.
You can change the ORDER BY so that PostgreSQL cannot use its preferred strategy:
```
ORDER BY counter + 0
```
Perhaps the resulting execution plan is faster.

score 0 · Answer 3 · answered Apr 12 '24 at 12:21

It looks like the following indexes would work for you.

The idea is to add first the equality predicates, then the join/sorting/inequality predicates, then add other columns as INCLUDE.

changes (type, counter) INCLUDE (id, item_id, item_name, updated_time)
user_items (user_id, item_id)

Another option, depending on the cardinality of the join (how many rows)

changes (type, item_id, counter) INCLUDE (id, item_name, updated_time)

What index can be created to optimise this query?

3 Answers3