0

I have a table with about 80 million records, I want to find all the activities of lists and workspaces that a user has access to. So first, I get the ids of the lists and workspaces and then I run the following query:

select *, COALESCE("origin_created_at", "created_at") AS "created_at",
  COALESCE("updated_at", "origin_updated_at") AS "updated_at" 
from "activities" 
where ("listId" in (310,214088,219,220,271,222,28434,36046,43233,38236,
  1014787,1017501,1065915,162,399844,399845,395721,824491,400,405,408,
  395873,36,188,178,120,461,1104,27341,27356,83329,29271,158639,482197,
  587679,841589,722320,551,170392,421035,197071,632736,632742,632755,
  632758,673517,155,1231,2691,2695,9092,13783,24273,45765,57909,57938,
  58323,291171,324525,496,5369,54099,54576,98818,569319,1434677,279,
  158821,127,158197,50301,761351,261,438101,159009,643013,158273,58557,
  643867,356252,631758,299145,131,179,156,661,241,260,281,245,438106,
  886,101,72915,90857,144564,166270,230,178981,195046,208561,382159,
  226599,297964,298318,89043,193559,326394,313589,450540,541359,620442,
  323458,628644,643014,261008,650332,689117,847849,672369,932660,382843,
  267000,826590,642775,400339,642875,1282788,1341992,1411789,1515479,
  74018) 
 or "workspaceId" in (137, 81, 111, 424284, 425935, 430658, 84, 163840, 
  3, 4, 281105, 57, 64642, 96660, 38739, 273574, 295312, 79, 213, 
  240478, 424760, 65, 36989)) 
and (("isBulk" = false or "activities"."type" = 0) 
       and "activities"."deprecated_at" is null) 
order by COALESCE("origin_created_at", "created_at") DESC, "id" desc
limit 40;

and this the execution plan

 Limit  (cost=2446886.55..2446886.65 rows=40 width=1002) (actual time=44452.393..44452.418 rows=40 loops=1)
   ->  Sort  (cost=2446886.55..2449439.67 rows=1021250 width=1002) (actual time=44452.391..44452.401 rows=40 loops=1)
         Sort Key: (COALESCE(origin_created_at, created_at)) DESC, id DESC
         Sort Method: top-N heapsort  Memory: 37kB
         ->  Bitmap Heap Scan on activities  (cost=37546.04..2414605.20 rows=1021250 width=1002) (actual time=1043.663..43916.385 rows=568891 loops=1)
               Recheck Cond: (("listId" = ANY ('{310,214088,219,220,271,222,28434,36046,43233,38236,1014787,1017501,1065915,162,399844,399845,395721,824491,400,405,408,395873,36,188,178,120,461,1104,27341,27356,83329,29271,158639,482197,587679,841589,722320,551,170392,421035,197071,632736,632742,632755,632758,673517,155,1231,2691,2695,9092,13783,24273,45765,57909,57938,58323,291171,324525,496,5369,54099,54576,98818,569319,1434677,279,158821,127,158197,50301,761351,261,438101,159009,643013,158273,58557,643867,356252,631758,299145,131,179,156,661,241,260,281,245,438106,886,101,72915,90857,144564,166270,230,178981,195046,208561,382159,226599,297964,298318,89043,193559,326394,313589,450540,541359,620442,323458,628644,643014,261008,650332,689117,847849,672369,932660,382843,267000,826590,642775,400339,642875,1282788,1341992,1411789,1515479,74018}'::integer[])) OR ("workspaceId" = ANY ('{137,81,111,424284,425935,430658,84,163840,3,4,281105,57,64642,96660,38739,273574,295312,79,213,240478,424760,65,36989}'::integer[])))
               Rows Removed by Index Recheck: 9072392
               Filter: ((deprecated_at IS NULL) AND ((NOT "isBulk") OR (type = 0)))
               Rows Removed by Filter: 113630
               Heap Blocks: exact=41259 lossy=271838
               ->  BitmapOr  (cost=37546.04..37546.04 rows=1350377 width=0) (actual time=1032.769..1032.769 rows=0 loops=1)
                     ->  Bitmap Index Scan on activities_list_id_index  (cost=0.00..17333.10 rows=617933 width=0) (actual time=118.412..118.412 rows=507019 loops=1)
                           Index Cond: ("listId" = ANY ('{310,214088,219,220,271,222,28434,36046,43233,38236,1014787,1017501,1065915,162,399844,399845,395721,824491,400,405,408,395873,36,188,178,120,461,1104,27341,27356,83329,29271,158639,482197,587679,841589,722320,551,170392,421035,197071,632736,632742,632755,632758,673517,155,1231,2691,2695,9092,13783,24273,45765,57909,57938,58323,291171,324525,496,5369,54099,54576,98818,569319,1434677,279,158821,127,158197,50301,761351,261,438101,159009,643013,158273,58557,643867,356252,631758,299145,131,179,156,661,241,260,281,245,438106,886,101,72915,90857,144564,166270,230,178981,195046,208561,382159,226599,297964,298318,89043,193559,326394,313589,450540,541359,620442,323458,628644,643014,261008,650332,689117,847849,672369,932660,382843,267000,826590,642775,400339,642875,1282788,1341992,1411789,1515479,74018}'::integer[]))
                     ->  Bitmap Index Scan on activities_workspace_id_index  (cost=0.00..19702.32 rows=732444 width=0) (actual time=914.355..914.355 rows=682628 loops=1)
                           Index Cond: ("workspaceId" = ANY ('{137,81,111,424284,425935,430658,84,163840,3,4,281105,57,64642,96660,38739,273574,295312,79,213,240478,424760,65,36989}'::integer[]))
 Planning time: 2.882 ms
 Execution time: 44452.871 ms
(17 rows)

As stated in the plan PostgreSQL uses "Bitmap Heap Scan" to scan the activities which makes the query slower although both columns are indexed. In total, there are 4 indices on the table, one for each of the following columns: type, listId, workspaceId, organizationId.

How can I make the query faster? Or is there a better way to rewrite the query?

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
Alan
  • 1
  • 1
  • 1

2 Answers2

2

That query will get faster if you increase work_mem (because then there will be no more "lossy" blocks).

The idea to first select all ids from one table and then select rows from another table based on these ids is fundamentally wrong. You should instead join the two tables and do the same work with a single query.

Laurenz Albe
  • 61,070
  • 4
  • 55
  • 90
1

For your query with ...

  • a very small LIMIT 40
  • and not very selective WHERE conditions ①

... this partial, multicolumn expression index might work wonders:

CREATE INDEX foo ON activities (COALESCE(origin_created_at, created_at) DESC, id DESC)
WHERE ("isBulk" = false OR type = 0) AND deprecated_at IS NULL;

① Currently, after doing a lot more work than necessary, much of it due to your undersized work_mem setting, see:

... rows=568891 qualify after the Bitmap Heap Scan, the Recheck and the Filter step. After sorting, only 40 (!) of those are returned. On average, out of 80 million rows, every 140th row qualifies.

With the new index, Postgres can just traverse the index matching the sort order of the query until 40 qualifying rows are found. Postgres has to read 140 * 40 = 5600 rows on avg. Should be substantially faster.

It might pay to append "listId" and "workspaceId" to the index. Makes the index bigger (bad), but we Postgres can filter index tuples before going to the heap, where only some odd dead tuples still may have to be filtered, thus reducing heap access to an absolute minimum:

CREATE INDEX foo ON activities (COALESCE(origin_created_at, created_at) DESC, id DESC, "listId"`, `"workspaceId")
WHERE ("isBulk" = false OR type = 0) AND deprecated_at IS NULL;

All of this might fall flat though if you are not telling the whole story.

Related:

Asides

COALESCE(updated_at, origin_updated_at) AS updated_at? Shouldn't that be COALESCE(origin_updated_at, updated_at) AS updated_at to match the logic of COALESCE(origin_created_at, created_at) AS created_at?

Consider legal, lower-case identifiers without double-quotes in Postgres to make your life easier.

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633