My query:
EXPLAIN (ANALYZE, BUFFERS)
SELECT
work_token_assignment.work_id
FROM
work_token_assignment
JOIN token ON token.id = work_token_assignment.token_id
WHERE
work_token_assignment.work_id = 207064
ORDER BY
token.added_at,
token.id
LIMIT 1;
For most WORKID values, the plan is:
Limit (cost=901.37..1801.70 rows=1 width=16) (actual time=49.917..49.918 rows=1 loops=1)
Buffers: shared hit=55473
-> Incremental Sort (cost=901.37..8371230.78 rows=9297 width=16) (actual time=49.915..49.916 rows=1 loops=1)
Sort Key: token.added_at, token.id
Presorted Key: token.added_at
Full-sort Groups: 1 Sort Method: top-N heapsort Average Memory: 25kB Peak Memory: 25kB
Pre-sorted Groups: 1 Sort Method: top-N heapsort Average Memory: 25kB Peak Memory: 25kB
Buffers: shared hit=55473
-> Nested Loop (cost=0.99..8370812.42 rows=9297 width=16) (actual time=48.799..49.852 rows=180 loops=1)
Buffers: shared hit=55473
-> Index Scan using ix_token_added_at on token (cost=0.56..1702792.66 rows=14940404 width=12) (actual time=0.015..10.580 rows=15884 loops=1)
Buffers: shared hit=7819
-> Index Only Scan using work_token_assignment_pkey on work_token_assignment (cost=0.42..0.45 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=15884)
Index Cond: ((work_id = 18345) AND (token_id = token.id))
Heap Fetches: 0
Buffers: shared hit=47654
Planning:
Buffers: shared hit=15
Planning Time: 0.405 ms
Execution Time: 49.956 ms
And that is pretty fast.
But randomly for some ids, the plan is:
Limit (cost=817.04..1633.05 rows=1 width=16) (actual time=23821.261..23821.262 rows=1 loops=1)
Buffers: shared hit=32475418 read=90739 dirtied=13663
-> Incremental Sort (cost=817.04..8370622.85 rows=10257 width=16) (actual time=23821.260..23821.260 rows=1 loops=1)
Sort Key: token.added_at, token.id
Presorted Key: token.added_at
Full-sort Groups: 1 Sort Method: quicksort Average Memory: 25kB Peak Memory: 25kB
Buffers: shared hit=32475418 read=90739 dirtied=13663
-> Nested Loop (cost=0.99..8370161.28 rows=10257 width=16) (actual time=23819.997..23821.234 rows=2 loops=1)
Buffers: shared hit=32475415 read=90739 dirtied=13663
-> Index Scan using ix_token_added_at on token (cost=0.56..1702682.64 rows=14939188 width=12) (actual time=0.017..7344.302 rows=8228792 loops=1)
Buffers: shared hit=7789036 read=90739 dirtied=13663
-> Index Only Scan using work_token_assignment_pkey on work_token_assignment (cost=0.42..0.45 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=8228792)
Index Cond: ((work_id = 207064) AND (token_id = token.id))
Heap Fetches: 2
Buffers: shared hit=24686379
Planning:
Buffers: shared hit=15
Planning Time: 0.265 ms
Execution Time: 23821.286 ms
Which takes forever!
Setting set enable_incremental_sort TO off; changes the broken query plan to yet another one, using Heap Scans - but it is fast, as it is searching the assignment table first:
Limit (cost=42633.16..42633.28 rows=1 width=16) (actual time=38.210..40.668 rows=1 loops=1)
Buffers: shared hit=50202
-> Gather Merge (cost=42633.16..43630.50 rows=8548 width=16) (actual time=38.208..40.666 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=50202
-> Sort (cost=41633.14..41643.82 rows=4274 width=16) (actual time=12.396..12.397 rows=0 loops=3)
Sort Key: token.added_at, token.id
Sort Method: top-N heapsort Memory: 25kB
Buffers: shared hit=50202
Worker 0: Sort Method: quicksort Memory: 25kB
Worker 1: Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=256.48..41611.77 rows=4274 width=16) (actual time=0.233..11.650 rows=3333 loops=3)
Buffers: shared hit=50174
-> Parallel Bitmap Heap Scan on work_token_assignment (cost=255.92..5821.58 rows=4274 width=8) (actual time=0.226..0.864 rows=3333 loops=3)
Recheck Cond: (work_id = 207064)
Heap Blocks: exact=124
Buffers: shared hit=171
-> Bitmap Index Scan on work_token_assignment_pkey (cost=0.00..253.35 rows=10257 width=0) (actual time=0.608..0.609 rows=10000 loops=1)
Index Cond: (work_id = 207064)
Buffers: shared hit=47
-> Index Scan using token_pkey on token (cost=0.56..8.37 rows=1 width=12) (actual time=0.003..0.003 rows=1 loops=10000)
Index Cond: (id = work_token_assignment.token_id)
Buffers: shared hit=50003
Planning:
Buffers: shared hit=15
Planning Time: 0.296 ms
Execution Time: 40.709 ms
Postgres seems to know correctly that there are about 10K rows with the queried ID in the work_token_assignment table (in all cases described the number in question is comparable).
Most rows in token do not have a corresponding entry in work_token_assignment - a small minority do. There should not be a lot of duplicate added_at columns.