I have a table structured as follows:
| legal_entity_id (text) | solution_string (text) | total_score (bigint) |
|---|---|---|
| asjY4474dgd | Server | 9 |
I need to determine, for each solution and company provided by users, how many companies have expressed interest in that solution (you can assume that if a company is listed in this table, it has interest in at least one solution). Here’s an example of the desired output:
| solution_string | count |
|---|---|
| Server | 315 |
| Automotive | 256 |
The table contains approximately 2.5 million rows, and I need the query to execute in under 100ms for 95% of the cases, when querying for 25,000 companies and 20 solutions. Indexes are already in place on both legal_entity_id and solution_string columns. However, there is no primary key for this table (edit: a primary key has been added to legal_entity_id, solution_name). The current query being used is as follows:
SELECT solution_string, COUNT(*)
FROM intent_signals
WHERE legal_entity_id IN (id1, id2, ...) AND solution_string in (solution_1, solution_2, ...)
GROUP BY solution_string;
The query is extremely slow, and we have no idea how to optimize it. Currently, it takes an average of 2.8 seconds to complete with 5000 legal_entity_ids.
Edit: additional info
PostgreSQL 10.22 (Ubuntu 10.22-0ubuntu0.18.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit
| QUERY PLAN |
|---|
| Finalize GroupAggregate (cost=4888296.66..4888317.51 rows=27 width=26) (actual time=5830.528..5848.222 rows=2 loops=1) |
| Group Key: solution_string |
| -> Gather Merge (cost=4888296.66..4888316.83 rows=81 width=26) (actual time=5830.303..5848.206 rows=8 loops=1) |
| Workers Planned: 3 |
| Workers Launched: 3 |
| -> Partial GroupAggregate (cost=4887296.62..4887307.28 rows=27 width=26) (actual time=5807.789..5807.904 rows=2 loops=4) |
| Group Key: solution_string |
| -> Sort (cost=4887296.62..4887300.08 rows=1385 width=18) (actual time=5807.701..5807.759 rows=955 loops=4) |
| Sort Key: solution_string |
| Sort Method: quicksort Memory: 116kB |
| -> Parallel Seq Scan on intent_signals (cost=0.00..4887224.35 rows=1385 width=18) (actual time=0.149..5807.172 rows=955 loops=4) |
| Filter: ((solution_string = ANY ('{Automotive,"Banking Other",Financial}'::text[])) AND (legal_entity_id = ANY ('{0010800002nSsILAA0,}'::text[]))) |
| Rows Removed by Filter: 600911 |
| Planning time: 36.703 ms |
| Execution time: 5848.318 ms |