I am facing some curious cases in my postgres 9.6, mainly for queries that follow the format:
SELECT a.*
FROM a
INNER JOIN b ON a.id = b.a_id
WHERE
a.field1 = '' AND
b.field2 > ..
ORDER BY a.timestamp DESC
LIMIT 10
The explain analyze on this query returns as:
Limit (cost=0.86..13968.24 rows=10 width=24) (actual time=14933.751..14933.792 rows=10 loops=1)
-> Nested Loop (cost=0.86..6309063.89 rows=4517 width=24) (actual time=14933.751..14933.791 rows=10 loops=1)
-> Index Scan Backward using idx_timestamp on stac (cost=0.43..5229344.35 rows=988799 width=24) (actual time=0.007..2885.512 rows=3535779 loops=1)
Filter: ((field2 > '2020-01-01 18:59:59.999999-05'::timestamp with time zone) AND (field1 = 4))
Rows Removed by Filter: 168
-> Index Scan using id_idx on scene_data (cost=0.43..1.08 rows=1 width=16) (actual time=0.003..0.003 rows=0 loops=3535779)
Index Cond: (a.id = b.a_id)
Filter: ((another_field >= '14000'::double precision) AND (another_field = 'Some string'::text))
Rows Removed by Filter: 1
Planning time: 0.322 ms
Execution time: 14933.836 ms
3 ways this query became significantly faster:
- Fencing (edited)
SELECT * FROM (SELECT a.*
FROM a
INNER JOIN b ON a.id = b.a_id
WHERE
b.field_a = '...'
AND b.field_b >= 14000
AND a.field_a = 4
AND a.field_b > '2020-01-01T23:59:59.999999Z'
ORDER BY a.timestamp DESC) a
ORDER BY a.timestamp DESC LIMIT 10
(Edited)
Explain Analyze
Limit (cost=300441.32..300442.57 rows=10 width=353) (actual time=323.171..325.616 rows=10 loops=1)
-> Gather Merge (cost=300441.32..300731.00 rows=2519 width=353) (actual time=323.169..325.610 rows=10 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Sort (cost=299441.31..299447.60 rows=2519 width=353) (actual time=314.241..314.244 rows=10 loops=2)
Sort Key: table_a.timestamp DESC
Sort Method: top-N heapsort Memory: 27kB
Worker 0: Sort Method: top-N heapsort Memory: 35kB
-> Nested Loop (cost=0.99..299299.00 rows=2519 width=353) (actual time=0.117..297.406 rows=41558 loops=2)
-> Parallel Index Scan using idx_table_b_field_a on table_b (cost=0.43..28730.77 rows=37301 width=16) (actual time=0.067..24.940 rows=41558 loops=2)
Index Cond: (b.field_a = '....'::text)
Filter: (b.field_b >= '14000'::double precision)
Rows Removed by Filter: 174
-> Index Scan using table_a_pkey on table_a (cost=0.56..7.25 rows=1 width=353) (actual time=0.006..0.006 rows=1 loops=83116)
Index Cond: (table_a.id = table_b.a_id)
Filter: ((a.field_b > '2020-01-01 18:59:59.999999-05'::timestamp with time zone) AND (a.field_a = 4))
Planning Time: 0.728 ms
Execution Time: 325.703 ms
- Adding a second field
SELECT a.*
FROM a
INNER JOIN b ON a.id = b.a_id
WHERE
a.field1 = '' AND
b.field2 > ..
ORDER BY a.timestamp, a.id DESC LIMIT 10
Increasing limit to 500, which produces a different sort method/key such as
Limit (cost=511622.95..511624.20 rows=500 width=24) (actual time=385.317..385.383 rows=500 loops=1) -> Sort (cost=511622.95..511634.24 rows=4517 width=24) (actual time=385.315..385.348 rows=500 loops=1) Sort Key: a.timestamp DESC Sort Method: top-N heapsort Memory: 64kB -> Nested Loop (cost=1.12..511397.87 rows=4517 width=24) (actual time=0.028..374.960 rows=83116 loops=1) -> Index Scan using index_1 on b (cost=0.56..37701.50 rows=68004 width=16) (actual time=0.019..35.351 rows=83116 loops=1) Index Cond: (field_1 = 'some string'::text) Filter: (field_3 >= '14000'::double precision) Rows Removed by Filter: 349 -> Index Scan using a_pkey on stac (cost=0.56..6.96 rows=1 width=24) (actual time=0.004..0.004 rows=1 loops=83116) Index Cond: (a.id = b.a_id) Filter: (( > '2020-01-01 18:59:59.999999-05'::timestamp with time zone) AND (some_fild = 4)) Planning time: 0.382 ms Execution time: 385.440 ms
But why? How, to some extent, forcing "more complexity" into my query, makes the query planer decide "oh let's use heap sort", and they become quicker? I would love to understand why these generate faster queries before implementing them. My team and I also intend to migrate to newer versions but we're uncertain where to focus our efforts first to have concrete and understandable results.
Edit:
select
*
FROM a
INNER JOIN b
ON b.a_id = a.id --> This is a 1:1 Relationship
WHERE b.field_a = '...' --> ~83465 rows matching in b
AND b.field_b >= 14000 --> ~2938319 rows matching in b
AND a.field_a = 4 --> ~3868810 rows matching in a
AND a.field_b > '2020-01-01T23:59:59.999999Z' --> ~3818223 rows matching in a
ORDER BY observed desc LIMIT 10 OFFSET 0
I noticed a significant loss of performance when running simple counts for field_a and field_b when joining the two tables
select count(*) from a where a.field_a = 4 [0.5s] and
select count(*) from a inner join b on a.id = b.a_id where a.field_a = 4 [15s]
Also notice that table_a.id and table_b.a_id (FK) are UUID