EDIT: The query plans in the question body come from EXPLAIN, but as @jjanes suggested, EXPLAIN (ANALYZE, BUFFERS) may be more useful. Since the output is very large I have uploaded them here: https://gist.github.com/vr2262/ab3cfb69ac758b5161e27d9cb77ad05f
I have a query that selects records from a table by an indexed bigint column using WHERE ... IN. Up to a certain number (7835, as it happens), the query is fast (about 150 milliseconds for sequential IDs, and about 1 second for random IDs), but adding a single one more results in a different query plan, and the query takes around 150 seconds. I looked around for other answers, and the solution suggested in https://dba.stackexchange.com/a/91254 (and elsewhere) is to insert the values into an indexed temporary table and join on it. However, that actually made it a little slower.
Here is the original query:
SELECT
my_table.id AS my_table_id,
my_table.joined_table_2_id AS my_table_joined_table_2_id,
my_table.big_where_id AS my_table_big_where_id,
ST_AsGeoJSON(my_table.geog) AS unrelated_geog,
joined_table_1.id AS joined_table_1_id,
FROM
my_table
LEFT OUTER JOIN
joined_table_a AS joined_table_1 ON my_table.id = joined_table_1.my_table_id
LEFT OUTER JOIN
joined_table_b AS joined_table_2 ON joined_table_2.id = my_table.joined_table_2_id
WHERE
my_table.joined_table_2_id = 1
AND my_table.big_where_id IN (1, 2, 3, ..., 7835);
... and the associated fast query plan:
Gather (cost=36576.06..15864926.71 rows=44743 width=139)
Workers Planned: 2
-> Hash Left Join (cost=35576.06..15859452.41 rows=18643 width=139)
Hash Cond: (my_table.joined_table_2_id = joined_table_2.id)
-> Nested Loop Left Join (cost=35574.99..15854534.26 rows=18643 width=246)
-> Parallel Bitmap Heap Scan on my_table (cost=35574.42..89742.05 rows=2845 width=201)
Recheck Cond: ((joined_table_2_id = 1) AND (big_where_id = ANY ('{1,2,3,...}'::bigint[])))
-> Bitmap Index Scan on my_table_joined_table_2_id_big_where_id_key (cost=0.00..35572.71 rows=6829 width=0)
Index Cond: ((joined_table_2_id = 1) AND (big_where_id = ANY ('{1,2,3,...}'::bigint[])))
-> Index Scan using ix_joined_table_a_my_table_id on joined_table_a joined_table_1 (cost=0.57..5512.89 rows=2834 width=53)
Index Cond: (my_table_id = my_table.id)
-> Hash (cost=1.05..1.05 rows=1 width=14)
-> Seq Scan on joined_table_b joined_table_2 (cost=0.00..1.05 rows=1 width=14)
Filter: (id = 1)
With one additional value of big_where_id, the query plan changes to:
Hash Left Join (cost=50982.39..15870462.06 rows=44750 width=139)
Hash Cond: (my_table.joined_table_2_id = joined_table_2.id)
-> Hash Right Join (cost=50981.33..15858658.19 rows=44750 width=246)
Hash Cond: (joined_table_1.my_table_id = my_table.id)
-> Seq Scan on joined_table_a joined_table_1 (cost=0.00..14184914.72 rows=618195072 width=53)
-> Hash (cost=50895.95..50895.95 rows=6830 width=201)
-> Index Scan using my_table_joined_table_2_id_big_where_id_key on my_table (cost=0.57..50895.95 rows=6830 width=201)
Index Cond: ((joined_table_2_id = 1) AND (big_where_id = ANY ('{1,2,3,...}'::bigint[])))
-> Hash (cost=1.05..1.05 rows=1 width=14)
-> Seq Scan on joined_table_b joined_table_2 (cost=0.00..1.05 rows=1 width=14)
Filter: (id = 1)
I tried using a temporary table like so:
CREATE TEMPORARY TABLE temp_table (id INTEGER PRIMARY KEY);
INSERT INTO temp_table (id) SELECT generate_series(1, 7836);
SELECT
my_table.id AS my_table_id,
my_table.joined_table_2_id AS my_table_joined_table_2_id,
my_table.big_where_id AS my_table_big_where_id,
ST_AsGeoJSON(my_table.geog) AS unrelated_geog,
joined_table_1.id AS joined_table_1_id,
FROM
my_table
LEFT OUTER JOIN
joined_table_a AS joined_table_1 ON my_table.id = joined_table_1.my_table_id
LEFT OUTER JOIN
joined_table_b AS joined_table_2 ON joined_table_2.id = my_table.joined_table_2_id
JOIN
temp_table ON my_table.big_where_id = temp_table.id
WHERE
my_table.joined_table_2_id = 1;
...but as stated it was a little slower than before. Here is the query plan (using EXPLAIN on the SELECT):
Hash Left Join (cost=126858.69..28741416.19 rows=138238 width=139)
Hash Cond: (my_table.joined_table_2_id = joined_table_2.id)
-> Hash Right Join (cost=126857.60..28706108.24 rows=138238 width=246)
Hash Cond: (joined_table_1.my_table_id = my_table.id)
-> Seq Scan on joined_table_a joined_table_1 (cost=0.00..14184914.72 rows=618195072 width=53)
-> Hash (cost=125995.86..125995.86 rows=21099 width=201)
-> Nested Loop (cost=0.57..125995.86 rows=21099 width=201)
-> Seq Scan on temp_table (cost=0.00..159.75 rows=11475 width=4)
-> Index Scan using ix_my_table_big_where_id on my_table (cost=0.57..10.95 rows=2 width=201)
Index Cond: (big_where_id = temp_table.id)
-> Hash (cost=1.04..1.04 rows=4 width=14)
-> Seq Scan on joined_table_b joined_table_2 (cost=0.00..1.04 rows=4 width=14)
Maybe a regular JOIN on the temporary table isn't right? I haven't had any luck trying other joins either, though.