2

I'm using PostgreSQL 11. I have a table with ~500к records and its size is around 2GB. If I'm running this query:

SELECT DISTINCT point.id
FROM "buildings_point" as point
    LEFT OUTER JOIN "equipment_equipment" as equipment ON (point.equipment_id = equipment.id)
WHERE (
    equipment.id = 4
    OR point.id = 4
);

And I do have both indexes in the buildings_point table:

"buildings_point_pkey" PRIMARY KEY, btree (id)
"buildings_point_997b9956" btree (equipment_id)

For the equipment_equipment table I do have index by id:

"equipment_equipment_pkey" PRIMARY KEY, btree (id)

But the scheduler ran: Parallel Seq Scan on buildings_point point which took about 18 sec to walk through all 500к records. If I remove any of equipment.id = 4 or point.id = 4 query will take around 5 ms. Why does the scheduler not use both indexes?

Explain analyze output:

   ->  Sort  (cost=175133.54..175133.59 rows=19 width=4) (actual time=19163.682..19181.748 rows=3 loops=1)
         Sort Key: point.id
         Sort Method: quicksort  Memory: 25kB
         ->  Gather  (cost=2307.62..175133.14 rows=19 width=4) (actual time=3958.040..19181.729 rows=3 loops=1)
               Workers Planned: 2
               Workers Launched: 2
               ->  Parallel Hash Left Join  (cost=1307.62..174131.24 rows=8 width=4) (actual time=8205.581..19155.693 rows=1 loops=3)
                     Hash Cond: (point.equipment_id = equipment.id)
                     Filter: ((equipment.id = 4) OR (point.id = 4))
                     Rows Removed by Filter: 218243
                     ->  Parallel Seq Scan on buildings_point point  (cost=0.00..172111.33 rows=271333 width=8) (actual time=4.661..18838.772 rows=218244 loops=3)
                     ->  Parallel Hash  (cost=1116.56..1116.56 rows=15285 width=4) (actual time=52.361..52.364 rows=12229 loops=3)
                           Buckets: 65536  Batches: 1  Memory Usage: 1984kB
                           ->  Parallel Index Only Scan using buildings_equipment_pkey on equipment_equipment equipment  (cost=0.29..1116.56 rows=15285 width=4) (actual time=0.848..34.102 rows=12229 loops=3)
                                 Heap Fetches: 176
 Planning Time: 0.275 ms
 Execution Time: 19182.162 ms
valex
  • 123
  • 5

3 Answers3

2

To put it all together, based on these assumptions:

  • Referential integrity is enforced with a FK constraint, so you don't need to join to equipment_equipment at all.

  • Both predicates are selective enough to benefit from an index. (Since buildings_point.id is the PK, that's at least true for id = 4, which returns exactly one row.)

Rewrite the ugly OR with UNION to make it faster:

SELECT id
FROM   buildings_point
WHERE  id = 4

UNION SELECT id FROM buildings_point WHERE equipment_id = 4;

UNION removes a duplicate id 4 if there should be one.

About the "ugly OR":

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
1

You can just change equipment.id = 4 into equipment_id = 4, and then you wouldn't need the join at all.

jjanes
  • 42,332
  • 3
  • 44
  • 54
0

Solution to such cases (OR in WHERE clause) can be UNION operator - in that case query planner will use Index scan for both queries.

valex
  • 123
  • 5