4

In PostgreSQL 9.4, with following schema:

CREATE TABLE people (
    id INTEGER PRIMARY KEY,
    name TEXT,
    junk CHAR(1000)
);

INSERT INTO people(id, name)
 SELECT generate_series(1,100000), md5(random()::text);

CREATE INDEX ON people (name text_pattern_ops);

if I search by name, the index is used:

test=# explain analyze select id, name from people where name like 'a%';
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on people  (cost=248.59..1160.92 rows=6061 width=37) (actual time=2.412..8.340 rows=6271 loops=1)
   Filter: (name ~~ 'a%'::text)
   Heap Blocks: exact=834
   ->  Bitmap Index Scan on people_name_idx  (cost=0.00..247.08 rows=6266 width=0) (actual time=2.123..2.123 rows=6271 loops=1)
         Index Cond: ((name ~>=~ 'a'::text) AND (name ~<~ 'b'::text))
 Planning time: 0.600 ms
 Execution time: 8.991 ms

But if I replace TEXT with CITEXT:

CREATE EXTENSION CIText;

CREATE TABLE people (
    id INTEGER PRIMARY KEY,
    name CITEXT,
    junk CHAR(1000)
);

The index is not used anymore:

test=# explain analyze select id, name from people where name like 'a%';
                                                QUERY PLAN                                                 
-----------------------------------------------------------------------------------------------------------
 Seq Scan on people  (cost=0.00..2084.00 rows=500 width=36) (actual time=5.700..152.572 rows=6305 loops=1)
   Filter: (name ~~ 'a%'::citext)
   Rows Removed by Filter: 93695
 Planning time: 0.764 ms
 Execution time: 153.046 ms

According to CITEXT PostgreSQL documentation, the behavior should be as with TEXT:

Otherwise, it behaves almost exactly like text.

How can I tell PostgreSQL to use the index?

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
José Luis
  • 215
  • 1
  • 3
  • 7

1 Answers1

7

Index usage with text_pattern_ops (as well as with the default operator class when using the C locale) depends on the binary representation of character data. citext stores original values with the case preserved, so there must be a problem with that ...

Like you commented, the actual reason is burried in collation support.

Either way, citext or text, you can make it work with an expression index:

CREATE INDEX people_name_idx ON people (lower(name) text_pattern_ops);

And a corresponding query:

SELECT id, name FROM people WHERE lower(name) LIKE 'abc%';

Note that lower(name) returns data type text, even when feeding it citext.

Alternatively, you could use a trigram index, which is more expensive to maintain, but also offers more capabilities:

Aside: your test case is suboptimal because your dummy values are all lower case to begin with and the pattern 'a%' is often not selective enough to use an index at all. And char(1000) doesn't make sense (even though irrelevant to the test).

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633