We currently have a PostgreSQL table that stores audit logs similar to the following:
| id | portal_id | created_at | action | caused_by_id | caused_by_type |
|---|---|---|---|---|---|
| 1 | 1 | 2019-09-19 09:39:48.827924 | user-logged-in | 1 | User |
| 1 | 1 | 2019-09-19 09:39:48.827924 | send-email | 1 | System |
The table has another 8 columns and the interface that we have allows the user to specify any combination of filters, i.e., created_at with action, action only, action with caused_by_id, etc
The problem that we have is with count(*) or count(1). We have around 8M rows and growing and it currently needs around 1 second and 900ms to respond on a 16vCPU server with 128 GB ram and around 6000 IOPS.
All other queries are fine since we have pagination in place that the user is not able to paginate more than 200 records.
We got a slow query and we optimized it to the level that EXPLAIN (ANALYZE, BUFFERS) shows that an INDEX ONLY SCAN has been used.
Please keep in mind that:
- Every time that a user searches with a filter we should have in place the appropriate
btreeindex to achieve an index only scan other wise the query is beyond slow. Creating all combinations as index is only impossible due to the fact of column ordering while creating the index. - Our index might end up having exactly the same size as the main table.
Questions:
- Is there a way to optimize the aggregation
COUNTquery to give us the results faster or we should move to a different infrastructure? - How do big companies like Salesforce / Freshdesk give you results even on million of records in sub-second time while applying any kind of filters? Are they using a different kind of DB?
Additional details:
PostgreSQL Version: 13.7
Server: 16vCPU - 128 GB RAM - 6400 IOPS
Platform: Microsoft Azure Flexible Server
Table audit
create table audit (
id bigint,
portal_id integer,
action varchar(255),
caused_by_id integer,
caused_by_type varchar(255),
created_at timestamp
);
Indexes:
We tried multiple combinations. A sample will be the following:
create index audit_caused_by_id_portal_id
on audit_2 (caused_by_id, portal_id);
Query:
select count(1)
from (select "portal_id"
from "crm"."audit"
where (("portal_id" = 1) and (not (lower("action") = 'send-email')))
and (not ("caused_by_id" in (1)))) x2;
EXPLAIN (ANALYZE, BUFFERS) result here.