I have configured trigram index for fuzzy search in Postgres. I inserted 8500000 rows to this table with 4 words text.
CREATE DATABASE people;
\connect people;
CREATE TABLE person
(
id bigserial NOT NULL,
name text NULL
);
CREATE EXTENSION fuzzystrmatch;
CREATE EXTENSION pg_trgm;
CREATE INDEX name_trigram_idx ON person USING gin (name gin_trgm_ops);
INSERT INTO person (name)
SELECT CONCAT(g.name, ' ', g.name, ' ', g.name, ' ', (SELECT array_to_string(
array(select substr(
'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
((random() * (36 - 1) + 1)::integer),
1)
from generate_series(1, 10)), '')))
FROM generate_series(1, 8500000) AS g (name);
Now I do a fuzzy search:
SELECT * FROM person WHERE name % '1200000'
And have results:
id | name
---------+------------------------------------
12000 | 12000 12000 12000 CLXBHQVEKC
120000 | 120000 120000 120000 CLXBHQVEKC
1200000 | 1200000 1200000 1200000 CLXBHQVEKC
1202000 | 1202000 1202000 1202000 CLXBHQVEKC
1212000 | 1212000 1212000 1212000 CLXBHQVEKC
(5 rows)
It takes about 1 second. That's good but I would like to have it faster (300ms).
What's are possible ways for optimization the speed?
Here is explain-analyze request:
EXPLAIN (ANALYZE) SELECT * FROM person WHERE name % '1200000'
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on person (cost=201.87..24804.13 rows=8499 width=42) (actual time=45.135..972.583 rows=5 loops=1)
Recheck Cond: (name % '1200000'::text)
Rows Removed by Index Recheck: 147980
Heap Blocks: exact=25286
-> Bitmap Index Scan on name_trigram_idx (cost=0.00..199.74 rows=8499 width=0) (actual time=40.833..40.840 rows=147985 loops=1)
Index Cond: (name % '1200000'::text)
Planning Time: 1.538 ms
Execution Time: 973.046 ms
(8 rows)
And a link to my test repo.
Generally speaking, I would like to have a typo-proof search for a text with several words which has about 9 million rows and it shouldn't respond longer than 300ms.
P.S. for this experiment I'm running Postgres in docker with 6 CPUs, 8GB memory and 1GB of swap.