My use case is I am trying to select rows based on a list of known text values. For example, I have known text values as 'Frasier' and 'Cheers' and I want to select some rows that match the values.
EXPLAIN
WITH lcaseid AS (
SELECT lower(icaseid)
FROM (
VALUES ('Frasier'),
('Big brother')) AS i (icaseid)
)
SELECT *
FROM products.catalog
WHERE lower(id)
in (
SELECT *
FROM lcaseid);
Hash Semi Join (cost=0.10..12.76 rows=2 width=587)
Hash Cond: (lower(catalog.id) = lcaseid.lower)
CTE lcaseid
-> Values Scan on ""*VALUES*"" (cost=0.00..0.03 rows=2 width=32)
-> Seq Scan on catalog (cost=0.00..12.29 rows=129 width=587)
-> Hash (cost=0.04..0.04 rows=2 width=32)
-> CTE Scan on lcaseid (cost=0.00..0.04 rows=2 width=32)
Why is Postgres doing Sequential Scan here? There is an index on lower(id):
CREATE UNIQUE INDEX lower_case_id ON products.catalog ((lower(id)));
When I do this instead, Postgres shows an Index Scan:
EXPLAIN
SELECT
*
FROM
products.catalog
WHERE
lower(id)
in('frasier');
Index Scan using lower_case_id on catalog (cost=0.14..8.16 rows=1 width=587)
Index Cond: (lower(id) = 'frasier'::text)
The problem with the second approach is that I have to have the text values to be in lowercase because I can use them in the in clause. The first approach solves this problem by lower-casing when creating the temporary table lcaseid.
My guess is the temporary table is to blame for the sequential scan? Because Postgres does not have statistics on the temporary table, it simply does a sequential scan.
Also how should I go about solving the problem of selecting based on lowercase text values optimally?
Table definition:
CREATE TABLE "products"."catalog" (
"id" text NOT NULL,
"is_original" bool NOT NULL,
"image_url_1" text NOT NULL,
"image_url_2" text NOT NULL,
"image_url_3" text NOT NULL,
"image_url_4" text NOT NULL,
PRIMARY KEY ("id")
);
CREATE UNIQUE INDEX lower_case_id ON products.catalog ((lower(id)));
Postgres version: 9.6.12