36

Today Seven Databases in Seven Weeks introduced me to per-operator indexes.

You can index strings for pattern matching the previous queries by creating a text_pattern_ops operator class index, as long as the values are indexed in lowercase.

CREATE INDEX moves_title_pattern ON movies (
    (lower(title) text_pattern_ops);

We used the text_pattern_ops because the title is of type text. If you need to index varchars, chars, or names, use the related ops: varchar_pattern_ops, bpchar_pattern_ops, and name_pattern_ops.

I find the example really confusing. Why is it useful to do this?

If the column is type text, wouldn't the other types (varchar, char, name) be cast to to text before being used as a search value?

How does that index behave differently from one using the default operator?

CREATE INDEX moves_title_pattern ON movies (lower(title));
Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
Iain Samuel McLean Elder
  • 2,408
  • 5
  • 26
  • 39

2 Answers2

35

The documentation often gives you an answer to such questions. Like in this case, too:

The operator classes text_pattern_ops, varchar_pattern_ops, and bpchar_pattern_ops support B-tree indexes on the types text, varchar, and char respectively. The difference from the default operator classes is that the values are compared strictly character by character rather than according to the locale-specific collation rules. This makes these operator classes suitable for use by queries involving pattern matching expressions (LIKE or POSIX regular expressions) when the database does not use the standard "C" locale. As an example, you might index a varchar column like this:

CREATE INDEX test_index ON test_table (col varchar_pattern_ops);

Note that you should also create an index with the default operator class if you want queries involving ordinary <, <=, >, or >= comparisons to use an index. Such queries cannot use the xxx_pattern_ops operator classes. (Ordinary equality comparisons can use these operator classes, however.) It is possible to create multiple indexes on the same column with different operator classes.

The documentation goes on to say:

If you do use the C locale, you do not need the xxx_pattern_ops operator classes, because an index with the default operator class is usable for pattern-matching queries in the C locale.

You can check your locale as follows (it is likely to be UTF8 rather than "C"):

postgres=> show lc_collate;
 lc_collate
-------------
 en_GB.UTF-8
Jack Douglas
  • 40,517
  • 16
  • 106
  • 178
András Váczi
  • 31,778
  • 13
  • 102
  • 151
2

Just to give a concrete example of what András Váczi mentioned, if you do:

drop table if exists tmp;
create table tmp(
    s text not null collate "en_US"
);
create index tmp_s on tmp using btree(s);
explain select * from tmp where s like 'a%';
explain select * from tmp where s ilike 'a%';
explain select * from tmp where s = 'a';

then we get:

                     QUERY PLAN                      
-----------------------------------------------------
 Seq Scan on tmp  (cost=0.00..27.00 rows=7 width=32)
   Filter: (s ~~ 'a%'::text)
(2 rows)
                   QUERY PLAN                       

Seq Scan on tmp (cost=0.00..27.00 rows=1360 width=32) Filter: (s ~~* 'a%'::text) (2 rows)

                         QUERY PLAN                             

Bitmap Heap Scan on tmp (cost=4.21..14.35 rows=7 width=32) Recheck Cond: (s = 'a'::text) -> Bitmap Index Scan on tmp_s (cost=0.00..4.21 rows=7 width=0) Index Cond: (s = 'a'::text) (4 rows)

meaning that the LIKE and ILIKE prefix queries were not sped up by that index, only the equality = was.

To also speed up LIKE (but not ILIKE) we can either add text_pattern_opts at:

create index tmp_s on tmp using btree(s text_pattern_ops);

or change the collation to C:

s text not null collate "C"

so this is where text_pattern_ops can make a difference, in accelerating LIKE prefix queries.

If you don't set COLLATION explicitly on the column, then it comes from your database default. You can check the collation of an existing column with:

\d table_name

which give something like:

               Table "public.tmp"
 Column | Type | Collation | Nullable | Default 
--------+------+-----------+----------+---------
 s      | text | en_US     | not null |

Tested on PostgreSQL 16, Ubuntu 24.04.

Related: How to optimize my PostgreSQL DB for prefix search?