6

I use Postgres 13 and have a table defined with the following DDL:

CREATE TABLE item_codes (
    code    bytea                    NOT NULL,
    item_id bytea                    NOT NULL,
    time    TIMESTAMP WITH TIME ZONE NOT NULL,
    PRIMARY KEY (item_id, code)
);

CREATE INDEX ON item_codes (code, time, item_id);

I use the following query:

SELECT DISTINCT time, item_id
FROM (
      (SELECT time, item_id
       FROM item_codes
       WHERE code = '\x3965623166306238383033393437613338373162313934383034366139653239'
       ORDER BY time, item_id
       LIMIT 100)
       UNION ALL
      (SELECT time, item_id
       FROM item_codes
       WHERE code = '\x3836653432356638366638636338393364373935343938303233343363373561'
       ORDER BY time, item_id
       LIMIT 100)
     ) AS items
ORDER BY time, item_id
LIMIT 100;

The code is generated dynamically and the number of subqueries with UNION ALL depends on how many different code values are needed. It can get pretty long.

Naively rewriting the query to, what I think, is equivalent

SELECT DISTINCT time, item_id
FROM item_codes
WHERE code IN ('\x3965623166306238383033393437613338373162313934383034366139653239',
                  '\x3836653432356638366638636338393364373935343938303233343363373561')
ORDER BY time, item_id
LIMIT 100

makes it many times slower and unacceptable.

Two main questions:

  1. Is it possible to rewrite the original query in a more concise way without duplicating subqueries for each code value while still keeping a fast execution plan?

  2. Why can Postrgres not optimize the second query? Am I missing something and it's not an equivalent?

Query plan of the original query with UNIONs:

Limit  (cost=1.12..7.33 rows=100 width=41)
       ->  Merge Append  (cost=1.12..13.53 rows=200 width=41)
        Sort Key: btc_tx_addresses.tx_time, btc_tx_addresses.tx_id"
        ->  Limit  (cost=0.56..4.76 rows=100 width=41)
              ->  Index Only Scan using btc_tx_addresses_address_tx_time_tx_id_idx on btc_tx_addresses  (cost=0.56..59576.94 rows=1417576 width=41)
                    Index Cond: (address = '\x3965623166306238383033393437613338373162313934383034366139653239'::bytea)
        ->  Limit  (cost=0.56..4.76 rows=100 width=41)
              ->  Index Only Scan using btc_tx_addresses_address_tx_time_tx_id_idx on btc_tx_addresses btc_tx_addresses_1  (cost=0.56..60389.61 rows=1436923 width=41)
                    Index Cond: (address = '\x3836653432356638366638636338393364373935343938303233343363373561'::bytea)

Query plan of the slow query:

Limit  (cost=411977.37..411978.97 rows=100 width=41)
     ->  Unique  (cost=411977.37..433386.12 rows=1338843 width=41)
        ->  Sort  (cost=411977.37..419113.62 rows=2854500 width=41)
              Sort Key: time, item_id
              ->  Index Only Scan using item_codes_code_time_item_id_idx on item_codes  (cost=0.56..105906.37 rows=2854500 width=41)
                    Index Cond: (code = ANY ('{"\\x3965623166306238383033393437613338373162313934383034366139653239","\\x3836653432356638366638636338393364373935343938303233343363373561"}'::bytea[]))
JIT:
  Functions: 4
  Options: Inlining false, Optimization false, Expressions true, Deforming true
Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633

1 Answers1

9

Q1. Alternative without repeating subqueries

Possible. Provide a set of input values, and then attach a LATERAL subquery:

SELECT DISTINCT time, item_id
FROM   unnest('{\\x3965623166306238383033393437613338373162313934383034366139653239
              , \\x3836653432356638366638636338393364373935343938303233343363373561}'::bytea[]) c(code)
CROSS  JOIN LATERAL (
   SELECT time, item_id
   FROM   item_codes ic
   WHERE  ic.code = c.code
   ORDER  BY 1, 2
   LIMIT  100
   ) ic
ORDER  BY 1, 2
LIMIT  100;

I am unnesting an input array to provide the set. As an aside: escape \ with \ inside the array literal, or use an ARRAY constructor instead:

ARRAY['\x3965623166306238383033393437613338373162313934383034366139653239'
    , '\x3836653432356638366638636338393364373935343938303233343363373561']::bytea[]

Or:

ARRAY['\x3965623166306238383033393437613338373162313934383034366139653239'::bytea
    , '\x3836653432356638366638636338393364373935343938303233343363373561']

Alternatively, a VALUES expression does the trick, too:

SELECT DISTINCT time, item_id
FROM  (
   VALUES
     ('\x3965623166306238383033393437613338373162313934383034366139653239'::bytea)
   , ('\x3836653432356638366638636338393364373935343938303233343363373561')
   ) c(code)
CROSS  JOIN LATERAL ( ...

Q2. Why?

Because Postgres just doesn't have this kind of index skip scan implemented as query plan, yet. So we have to shoehorn it in.

Related:

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633