Why is PostgreSQL reading from heap and how to improve cache usage?

Question

I have an SQL query that executes slowly on PostgreSQL 13.15 with 128GB of memory. The query mainly performs a Bitmap Heap Scan, and I’ve noticed that many reads are coming from the heap instead of the cache. Here’s a link to the query and execution plan.

An index that frequently appears in the plan looks like this:

create index ix_contacts__addresses_value__normalized
    on contacts__addresses using gin (company_id, field_name, field_id, value__normalized);

My memory settings:

•   shared_buffers = 32GB
•   work_mem = 64MB
•   effective_cache_size = 96GB
•   maintenance_work_mem = 1320MB

Questions:

Why is the query reading so much from the heap?
How can I configure PostgreSQL to better use memory (shared_buffers, work_mem, and other settings) to avoid unnecessary heap reads?
Should I consider modifying the indexes to reduce execution time?

I would greatly appreciate any advice on optimizing caching and overall performance.

score 0 · Accepted Answer · answered Sep 17 '24 at 15:46

Your first two questions don't make sense, for reasons outlined by mustaccio.

For your 3rd question, multicolumn GIN indexes are not like multicolumn btree indexes. Each column has to be handled individually, and then the result combined internally. So instead of jumping just to the rows which meet all 4 conditions, it first needs to make lists of all rows satisfying each of the separate conditions and then determine which rows are on all four lists. That is a lot of work if any of the conditions are not very selective.

You would probably be better served by a multicolumn btree index over (company_id, field_name, field_id), and then possibly a GIN index over just value__normalized, although I suspect that last one is not really needed.

But it might be better to incorporate the JSONB into an expressional btree index like (company_id, field_name, field_id, (value__normalized->>'country')). This would require you to rewrite the query to test with ->> rather than with @>.

Why is PostgreSQL reading from heap and how to improve cache usage?

1 Answers1