I use Postgres 13 and have a table defined with the following DDL:
CREATE TABLE item_codes (
code bytea NOT NULL,
item_id bytea NOT NULL,
time TIMESTAMP WITH TIME ZONE NOT NULL,
PRIMARY KEY (item_id, code)
);
CREATE INDEX ON item_codes (code, time, item_id);
I use the following query:
SELECT DISTINCT time, item_id
FROM (
(SELECT time, item_id
FROM item_codes
WHERE code = '\x3965623166306238383033393437613338373162313934383034366139653239'
ORDER BY time, item_id
LIMIT 100)
UNION ALL
(SELECT time, item_id
FROM item_codes
WHERE code = '\x3836653432356638366638636338393364373935343938303233343363373561'
ORDER BY time, item_id
LIMIT 100)
) AS items
ORDER BY time, item_id
LIMIT 100;
The code is generated dynamically and the number of subqueries with UNION ALL depends on how many different code values are needed. It can get pretty long.
Naively rewriting the query to, what I think, is equivalent
SELECT DISTINCT time, item_id
FROM item_codes
WHERE code IN ('\x3965623166306238383033393437613338373162313934383034366139653239',
'\x3836653432356638366638636338393364373935343938303233343363373561')
ORDER BY time, item_id
LIMIT 100
makes it many times slower and unacceptable.
Two main questions:
Is it possible to rewrite the original query in a more concise way without duplicating subqueries for each
codevalue while still keeping a fast execution plan?Why can Postrgres not optimize the second query? Am I missing something and it's not an equivalent?
Query plan of the original query with UNIONs:
Limit (cost=1.12..7.33 rows=100 width=41)
-> Merge Append (cost=1.12..13.53 rows=200 width=41)
Sort Key: btc_tx_addresses.tx_time, btc_tx_addresses.tx_id"
-> Limit (cost=0.56..4.76 rows=100 width=41)
-> Index Only Scan using btc_tx_addresses_address_tx_time_tx_id_idx on btc_tx_addresses (cost=0.56..59576.94 rows=1417576 width=41)
Index Cond: (address = '\x3965623166306238383033393437613338373162313934383034366139653239'::bytea)
-> Limit (cost=0.56..4.76 rows=100 width=41)
-> Index Only Scan using btc_tx_addresses_address_tx_time_tx_id_idx on btc_tx_addresses btc_tx_addresses_1 (cost=0.56..60389.61 rows=1436923 width=41)
Index Cond: (address = '\x3836653432356638366638636338393364373935343938303233343363373561'::bytea)
Query plan of the slow query:
Limit (cost=411977.37..411978.97 rows=100 width=41)
-> Unique (cost=411977.37..433386.12 rows=1338843 width=41)
-> Sort (cost=411977.37..419113.62 rows=2854500 width=41)
Sort Key: time, item_id
-> Index Only Scan using item_codes_code_time_item_id_idx on item_codes (cost=0.56..105906.37 rows=2854500 width=41)
Index Cond: (code = ANY ('{"\\x3965623166306238383033393437613338373162313934383034366139653239","\\x3836653432356638366638636338393364373935343938303233343363373561"}'::bytea[]))
JIT:
Functions: 4
Options: Inlining false, Optimization false, Expressions true, Deforming true