1

I asked this question a few years ago and got an amazing answer that solved all of my problems.

However, I also have a very similar query on my Postgres 9.6.3 data that does not perform nearly as well.

Here's the query that is doing great:

EXPLAIN ANALYZE SELECT posts.* FROM unnest('{39303,39302,39304,70255,70256}'::int[]) s(source_id), LATERAL (SELECT "posts".* FROM "posts" WHERE (source_id = s.source_id) AND posts.deleted_at IS NULL ORDER BY posts.position, posts.external_created_at desc) posts ORDER BY posts.position, posts.external_created_at desc;
                                                                         QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=356674.93..356771.23 rows=192600 width=1061) (actual time=2.715..2.731 rows=357 loops=1)
   Sort Key: posts."position", posts.external_created_at
   Sort Method: quicksort  Memory: 257kB
   ->  Nested Loop  (cost=2880.92..289343.79 rows=192600 width=1061) (actual time=0.545..1.938 rows=357 loops=1)
         ->  Function Scan on unnest s  (cost=0.00..0.30 rows=100 width=4) (actual time=0.007..0.008 rows=5 loops=1)
         ->  Sort  (cost=2880.92..2881.88 rows=1926 width=1061) (actual time=0.359..0.362 rows=71 loops=5)
               Sort Key: posts."position", posts.external_created_at
               Sort Method: quicksort  Memory: 27kB
               ->  Index Scan using index_posts_on_source_id on posts  (cost=0.11..2859.90 rows=1926 width=1061) (actual time=0.048..0.253 rows=71 loops=5)
                     Index Cond: (source_id = s.source_id)
                     Filter: (deleted_at IS NULL)
                     Rows Removed by Filter: 165
 Planning time: 0.207 ms
 Execution time: 2.793 ms
(14 rows)

And here's the similar query that I think needs an index:

EXPLAIN ANALYZE SELECT posts.* FROM unnest('{70256, 70255, 39304, 39303, 39302}'::int[]) s(source_id), LATERAL (SELECT  "posts".* FROM "posts" WHERE (source_id = s.source_id) AND ("posts"."deleted_at" IS NOT NULL) AND "posts"."rejected_at" IS NULL ORDER BY posts.external_created_at desc LIMIT 100) posts ORDER BY posts.external_created_at desc LIMIT 100 OFFSET 0;
                                                                                        QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=15879.66..15879.71 rows=100 width=1061) (actual time=325.798..325.814 rows=84 loops=1)
   ->  Sort  (cost=15879.66..15884.66 rows=10000 width=1061) (actual time=325.798..325.806 rows=84 loops=1)
         Sort Key: posts.external_created_at
         Sort Method: quicksort  Memory: 68kB
         ->  Nested Loop  (cost=0.11..15803.23 rows=10000 width=1061) (actual time=1.647..325.634 rows=84 loops=1)
               ->  Function Scan on unnest s  (cost=0.00..0.30 rows=100 width=4) (actual time=0.010..0.015 rows=5 loops=1)
               ->  Limit  (cost=0.11..157.43 rows=100 width=1061) (actual time=36.246..65.110 rows=17 loops=5)
                     ->  Index Scan using index_posts_on_source_id_and_external_created_at on posts  (cost=0.11..841.76 rows=535 width=1061) (actual time=36.244..65.105 rows=17 loops=5)
                           Index Cond: (source_id = s.source_id)
                           Filter: (rejected_at IS NULL)
                           Rows Removed by Filter: 148
 Planning time: 12.438 ms
 Execution time: 325.893 ms

As for my setup:

CREATE TABLE posts (
    id integer NOT NULL,
    source_id integer,
    message text,
    image text,
    external_id text,
    created_at timestamp without time zone,
    updated_at timestamp without time zone,
    external text,
    like_count integer DEFAULT 0 NOT NULL,
    comment_count integer DEFAULT 0 NOT NULL,
    external_created_at timestamp without time zone,
    deleted_at timestamp without time zone,
    poster_name character varying(255),
    poster_image text,
    poster_url character varying(255),
    poster_id text,
    position integer,
    location character varying(255),
    description text,
    video text,
    rejected_at timestamp without time zone,
    deleted_by character varying(255),
    height integer,
    width integer
);

CREATE INDEX index_posts_on_source_id_and_external_created_at 
    ON posts USING btree (source_id, external_created_at DESC) 
    WHERE deleted_at IS NOT NULL;

Postgres memory settings:

name, setting, unit
'default_statistics_target','100',''
'effective_cache_size','16384','8kB'
'maintenance_work_mem','16384','kB'
'max_connections','100',''
'random_page_cost','4',NULL
'seq_page_cost','1',NULL
'shared_buffers','16384','8kB'
'work_mem','1024','kB'

Database stats:

Total Posts: 20,997,027
Posts where deleted_at is null: 15,665,487
Distinct source_id's: 22,245
Max number of rows per single source_id: 1,543,950
Min number of rows per single source_id: 1
Most source_ids in a single query: 21
Distinct external_created_at: 11,146,151

Any recommendations as to how to structure my index?

goddamnyouryan
  • 239
  • 2
  • 9

1 Answers1

2

This index might help:

CREATE INDEX index_posts_on_source_id_and_external_created_at ON posts 
  (source_id, external_created_at)
  WHERE deleted_at IS NOT NULL AND rejected_at IS NULL

will probably help.

If you have other similar queries but where the rejected_at test is inverted, then maybe:

CREATE INDEX index_posts_on_source_id_and_external_created_at ON posts 
  (source_id, rejected_at, external_created_at)
  WHERE deleted_at IS NOT NULL
jjanes
  • 42,332
  • 3
  • 44
  • 54