0

I am trying to optimize a text search that involves two large tables, persons and emails, and currently uses ILIKE operators. I have those two indexes:

CREATE INDEX idx_persons_full_name ON persons 
USING gin (full_name gin_trgm_ops, to_iso_date(date_created) gin_trgm_ops);

CREATE INDEX idx_emails_gin_trgm_idx ON emails USING gin (email gin_trgm_ops);

This query uses the GIN index:

explain analyze select p.id, e.id from persons p left join emails e on p.email_id=e.id 
where email ilike '%john%';
->  Bitmap Heap Scan on emails e  (cost=97.80..1000.32 rows=1265 width=4) (actual time=0.884..2.662 rows=89 loops=1)
      Recheck Cond: (email ~~* '%john%'::text)
      Heap Blocks: exact=81
      ->  Bitmap Index Scan on idx_emails_gin_trgm_idx  (cost=0.00..97.49 rows=1265 width=0) (actual time=0.829..0.829 rows=89 loops=1)
            Index Cond: (email ~~* '%john%'::text)

... however when I add a second ilike clause I end up with a sequential scan on emails:

explain analyze select p.id, e.id from persons p left join emails e on p.email_id=e.id 
where p.full_name ilike '%john%' or e.email ilike '%john%';

Hash Left Join (cost=2254.88..4527.57 rows=1598 width=8) (actual time=30.053..97.757 rows=138 loops=1) Hash Cond: (p.email_id = e.id) Filter: ((p.full_name ~~* '%john%'::text) OR (e.email ~~* '%john%'::text)) Rows Removed by Filter: 78569 -> Seq Scan on persons p (cost=0.00..2066.07 rows=78707 width=23) (actual time=0.016..4.569 rows=78707 loops=1) -> Hash (cost=1472.17..1472.17 rows=62617 width=27) (actual time=29.364..29.364 rows=62459 loops=1) Buckets: 65536 Batches: 1 Memory Usage: 4148kB -> Seq Scan on emails e (cost=0.00..1472.17 rows=62617 width=27) (actual time=0.049..13.182 rows=62459 loops=1)

I can think a a few ways to address the issue, what would you recommend?

  1. Changing something about the existing queries or indexes to get the indexes to be used
  2. bringing the e.email into the persons table (using triggers, or in a separate materialized view) and using a single gin index on that denormalized table
  3. Replacing the ILIKEs with trigram searches

The final query looks like this, with pagination on date_created & id

SELECT
  p.id AS id, p.is_staff AS is_staff, p.first_name AS first_name, p.last_name AS last_name, p.email_id AS email_id, p.address_id AS address_id, p.is_instructor AS is_instructor, p.date_created AS date_created, p.birthdate AS birthdate, p.phone AS phone, p.full_name AS full_name, p.external_id AS external_id, p.venue_id AS venue_id, p.ssn AS ssn, p.gender AS gender, e.id AS id, e.email AS email, e.confirmed AS confirmed, e.owner_id AS owner_id, e.account_id AS account_id
FROM
  persons p 
  LEFT JOIN emails e ON p.email_id = e.id
WHERE
  p.date_archived IS NULL
  AND p.date_created < '2023-06-13T08:51:00Z'::timestampwithtimezone
  AND p.id < 321482
  AND p.is_staff = false
  AND (
    p.full_name ILIKE UNACCENT('%john%')
    OR e.email ILIKE '%john%'
    OR TO_ISO_DATE(p.date_created) ILIKE '%john%'
  )
ORDER BY
  p.date_created DESC
LIMIT 40

Thanks! Franck

Franck
  • 33
  • 7

0 Answers0