Fulltext queries against this database (storing RT (Request Tracker) tickets) seem to be taking a very long time to execute. The attachments table (containing the fulltext data) is about 15GB.
The database schema is as follows, it's about 2 million rows:
rt4=# \d+ attachments
Table "public.attachments"
Column | Type | Modifiers | Storage | Description
-----------------+-----------------------------+----------------------------------------------------------+----------+-------------
id | integer | not null default nextval('attachments_id_seq'::regclass) | plain |
transactionid | integer | not null | plain |
parent | integer | not null default 0 | plain |
messageid | character varying(160) | | extended |
subject | character varying(255) | | extended |
filename | character varying(255) | | extended |
contenttype | character varying(80) | | extended |
contentencoding | character varying(80) | | extended |
content | text | | extended |
headers | text | | extended |
creator | integer | not null default 0 | plain |
created | timestamp without time zone | | plain |
contentindex | tsvector | | extended |
Indexes:
"attachments_pkey" PRIMARY KEY, btree (id)
"attachments1" btree (parent)
"attachments2" btree (transactionid)
"attachments3" btree (parent, transactionid)
"contentindex_idx" gin (contentindex)
Has OIDs: no
I can query the database on it's own very quickly (<1s) with a query such as:
select objectid
from attachments
join transactions on attachments.transactionid = transactions.id
where contentindex @@ to_tsquery('frobnicate');
However, when RT runs a query that's supposed to perform a fulltext index search on the same table, it usually takes hundreds of seconds to complete. The query analyze output is as follows:
Query
SELECT COUNT(DISTINCT main.id)
FROM Tickets main
JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectType = 'RT::Ticket' )
AND ( Transactions_1.ObjectId = main.id )
JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId = Transactions_1.id )
WHERE (main.Status != 'deleted')
AND ( ( ( Attachments_2.ContentIndex @@ plainto_tsquery('frobnicate') ) ) )
AND (main.Type = 'ticket')
AND (main.EffectiveId = main.id);
EXPLAIN ANALYZE output
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=51210.60..51210.61 rows=1 width=4) (actual time=477778.806..477778.806 rows=1 loops=1)
-> Nested Loop (cost=0.00..51210.57 rows=15 width=4) (actual time=17943.986..477775.174 rows=4197 loops=1)
-> Nested Loop (cost=0.00..40643.08 rows=6507 width=8) (actual time=8.526..20610.380 rows=1714818 loops=1)
-> Seq Scan on tickets main (cost=0.00..9818.37 rows=598 width=8) (actual time=0.008..256.042 rows=96990 loops=1)
Filter: (((status)::text 'deleted'::text) AND (id = effectiveid) AND ((type)::text = 'ticket'::text))
-> Index Scan using transactions1 on transactions transactions_1 (cost=0.00..51.36 rows=15 width=8) (actual time=0.102..0.202 rows=18 loops=96990)
Index Cond: (((objecttype)::text = 'RT::Ticket'::text) AND (objectid = main.id))
-> Index Scan using attachments2 on attachments attachments_2 (cost=0.00..1.61 rows=1 width=4) (actual time=0.266..0.266 rows=0 loops=1714818)
Index Cond: (transactionid = transactions_1.id)
Filter: (contentindex @@ plainto_tsquery('frobnicate'::text))
Total runtime: 477778.883 ms
As far as I can tell, the issue appears to be that it's not using the index created on the contentindex field (contentindex_idx), rather it's doing a filter on a large number of matching rows in the attachments table. The row counts in the explain output also appear to be wildly inaccurate, even after a recent ANALYZE: estimated rows=6507 actual rows=1714818.
I'm not really sure where to go next with this.