5

We use an Amazon RDS instance with

PostgreSQL 11.13 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), 64-bit

I have a simple classic top-1-per-group query. I need to get the latest item in the history for each creativeScheduleId.

Here is a table and index definitions:

CREATE TABLE IF NOT EXISTS public.creative_schedule_status_histories (
  id serial PRIMARY KEY,
  "creativeScheduleId" text NOT NULL,
  -- other columns
);

CREATE UNIQUE INDEX IF NOT EXISTS idx_creativescheduleid_id ON public.creative_schedule_status_histories ("creativeScheduleId" ASC, id ASC);

When a query orders by id ASC the engine reads only the index and does not do any extra sorts:

EXPLAIN (ANALYZE) 
SELECT history.id, history."creativeScheduleId"
FROM  (
    SELECT cssh.id, cssh."creativeScheduleId"
         , ROW_NUMBER() OVER (PARTITION BY cssh."creativeScheduleId"
                              ORDER BY cssh.id ASC) AS rn  -- !
    FROM creative_schedule_status_histories as cssh
    ) AS history
WHERE history.rn = 1;
"Subquery Scan on history  (cost=0.56..511808.63 rows=26377 width=41) (actual time=0.047..4539.058 rows=709030 loops=1)"
"  Filter: (history.rn = 1)"
"  Rows Removed by Filter: 4579766"
"  ->  WindowAgg  (cost=0.56..445866.24 rows=5275391 width=49) (actual time=0.046..4165.835 rows=5288796 loops=1)"
"        ->  Index Only Scan using idx_creativescheduleid_id on creative_schedule_status_histories cssh  (cost=0.56..353546.90 rows=5275391 width=41) (actual time=0.037..1447.490 rows=5288796 loops=1)"
"              Heap Fetches: 2372"
"Planning Time: 0.072 ms"
"Execution Time: 4568.235 ms"

I expected to see exactly the same plan for a query when I order by id DESC, but there is an explicit sort in the plan which spills to disk and obviously everything is just slower.

EXPLAIN (ANALYZE) 
SELECT history.id, history."creativeScheduleId"
FROM  (
    SELECT cssh.id, cssh."creativeScheduleId"
         , ROW_NUMBER() OVER (PARTITION BY cssh."creativeScheduleId"
                              ORDER BY cssh.id DESC) AS rn  -- !
    FROM creative_schedule_status_histories as cssh
    ) AS history
WHERE history.rn = 1;
"Subquery Scan on history  (cost=1267132.63..1438582.84 rows=26377 width=41) (actual time=11974.827..15840.338 rows=709046 loops=1)"
"  Filter: (history.rn = 1)"
"  Rows Removed by Filter: 4579802"
"  ->  WindowAgg  (cost=1267132.63..1372640.45 rows=5275391 width=49) (actual time=11974.825..15529.679 rows=5288848 loops=1)"
"        ->  Sort  (cost=1267132.63..1280321.11 rows=5275391 width=41) (actual time=11974.814..13547.038 rows=5288848 loops=1)"
"              Sort Key: cssh.""creativeScheduleId"", cssh.id DESC"
"              Sort Method: external merge  Disk: 263992kB"
"              ->  Index Only Scan using idx_creativescheduleid_id on creative_schedule_status_histories cssh  (cost=0.56..353550.90 rows=5275391 width=41) (actual time=0.015..1386.310 rows=5288848 loops=1)"
"                    Heap Fetches: 2508"
"Planning Time: 0.078 ms"
"Execution Time: 15949.877 ms"

I expected that the given index would be equally useful in both variants of the query.
Postgres can't scan an index backwards here?
What am I missing here?


When I make a query for a specific given creativeScheduleId, then Postgres uses index equally well for both ASC and DESC sort order. There is no explicit sort in any variant:

EXPLAIN (ANALYZE) 
SELECT id, "creativeScheduleId"
FROM   creative_schedule_status_histories AS cssh
WHERE  "creativeScheduleId" = '24238370-a64c-4b30-ac8e-27eb2b693aca'
ORDER  BY id DESC  -- or ASC, no sort
LIMIT  1
"Limit  (cost=0.56..0.71 rows=1 width=41) (actual time=0.022..0.022 rows=1 loops=1)"
"  ->  Index Only Scan Backward using idx_creativescheduleid_id on creative_schedule_status_histories cssh  (cost=0.56..14.06 rows=86 width=41) (actual time=0.021..0.021 rows=1 loops=1)"
"        Index Cond: (""creativeScheduleId"" = '24238370-a64c-4b30-ac8e-27eb2b693aca'::text)"
"        Heap Fetches: 0"
"Planning Time: 0.064 ms"
"Execution Time: 0.033 ms"

Here we actually see Index Only Scan Backward, so Postgres is capable of it. But not for the whole table.

Any ideas how to encourage the engine to scan the whole index backwards for the first query that reads the whole table?

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
Vladimir Baranov
  • 4,720
  • 4
  • 27
  • 42

3 Answers3

3

Because of the discussed limitation, we can't make Postgres scan the index backwards for the particular use case. However ...

Clean test case

I removed the noise from the test case:

CREATE TABLE tbl (
  id   int PRIMARY KEY
, part int NOT NULL
, ballast text  -- possibly big column(s)?
);

CREATE UNIQUE INDEX tbl_part_id_idx ON tbl (part, id);

Get rid of the expensive sort step

In Postgres 14 or later I see an Incremental Sort on top of an Index Only Scan.
In Postgres 11 or later the additional sort goes away with this workaround:

SELECT id, part
FROM  (
   SELECT *
        , CASE WHEN part = lead(part) OVER (ORDER BY part, id ROWS UNBOUNDED PRECEDING)
               THEN false 
               ELSE true END AS qualified
   FROM   tbl
   ) sub
WHERE  qualified;
Subquery Scan on sub (cost=0.42..10293.58 rows=89880 width=8) (actual time=1.759..115.799 rows=67 loops=1)
  Filter: sub.qualified
  Rows Removed by Filter: 179692
  -> WindowAgg (cost=0.42..8495.99 rows=179759 width=41) (actual time=0.022..106.609 rows=179759 loops=1)
        -> Index Only Scan using tbl_part_id_idx on tbl (cost=0.42..4900.81 rows=179759 width=8) (actual time=0.017..29.208 rows=179759 loops=1)
              Heap Fetches: 0
Planning Time: 0.114 ms
Execution Time: 115.850 ms

That's based on Paul's query It's better than than my first idea to compare row number and count per partition. I adapted to get the greatest id per group, simplified, and switched to ROWS mode for better performance. See:

It scans the index forwards. To see an actual Index Scan backwards:

...
          CASE WHEN part = lag(part) OVER (ORDER BY part DESC, id DESC ROWS UNBOUNDED PRECEDING)
               THEN false 
               ELSE true END AS qualified
...

The first variant is just slightly shorter and faster.

While you are stuck with your original query, you can at least make the sort happen in RAM. Your query plan says Disk: 263992kB. Increase work_mem by 300 MB (if you can afford that) to achieve that. Possibly just in your session for the big query. See:

What you really want

Your current query never wins any competition.
To make your query fast (even without additional index):

For few rows per group (and sufficient work_mem), use DISTINCT ON. It's fastest with matching index, and maybe even without:

SELECT DISTINCT ON (part) id, part
FROM   tbl
ORDER  BY part, id DESC;
Unique (cost=33480.91..34380.55 rows=67 width=8) (actual time=96.726..131.735 rows=67 loops=1)
  -> Sort (cost=33480.91..33930.73 rows=179929 width=8) (actual time=96.724..118.292 rows=179929 loops=1)
        Sort Key: part, id DESC
        Sort Method: external merge Disk: 3184kB
        -> Index Only Scan using tbl_part_id_idx on tbl (cost=0.42..4903.35 rows=179929 width=8) (actual time=0.019..25.871 rows=179929 loops=1)
              Heap Fetches: 0
Planning Time: 0.102 ms
Execution Time: 132.208 ms

For many rows per group (like in this test), DISTINCT ON is not ideal, but typically not that bad, either.

For more than a few rows per group, we'd want an index skip scan. Considerable effort has been made but, unfortunately, that didn't make it into Postgres 15. We can still emulate the technique to great effect with a recursive CTE:

EXPLAIN ANALYZE
WITH RECURSIVE cte AS (
   (
   SELECT part, id
   FROM   tbl
   ORDER  BY part DESC, id DESC
   LIMIT  1
   )

UNION ALL SELECT l.* FROM cte c CROSS JOIN LATERAL ( SELECT t.part, t.id FROM tbl t WHERE t.part < c.part ORDER BY t.part DESC, t.id DESC LIMIT 1 ) l ) TABLE cte;

'CTE Scan on cte  (cost=40.74..42.76 rows=101 width=8) (actual time=0.013..0.304 rows=34 loops=1)'
'  CTE cte'
'    ->  Recursive Union  (cost=0.29..40.74 rows=101 width=8) (actual time=0.012..0.297 rows=34 loops=1)'
'          ->  Limit  (cost=0.29..0.33 rows=1 width=8) (actual time=0.011..0.012 rows=1 loops=1)'
'                ->  Index Only Scan Backward using tbl_part_id_idx on tbl  (cost=0.29..3410.28 rows=100000 width=8) (actual time=0.010..0.011 rows=1 loops=1)'
'                      Heap Fetches: 1'
'          ->  Nested Loop  (cost=0.29..3.84 rows=10 width=8) (actual time=0.008..0.008 rows=1 loops=34)'
'                ->  WorkTable Scan on cte c  (cost=0.00..0.20 rows=10 width=4) (actual time=0.000..0.000 rows=1 loops=34)'
'                ->  Limit  (cost=0.29..0.34 rows=1 width=8) (actual time=0.008..0.008 rows=1 loops=34)'
'                      ->  Index Only Scan Backward using tbl_part_id_idx on tbl t  (cost=0.29..1714.08 rows=33333 width=8) (actual time=0.007..0.007 rows=1 loops=34)'
'                            Index Cond: (part < c.part)'
'                            Heap Fetches: 33'
'Planning Time: 0.115 ms'
'Execution Time: 0.324 ms'  -- !!!

One or the other is typically (much) faster than your original query.

fiddle for Postgres 15 with 3000 rows per group
fiddle for Postgres 11
fiddle for Postgres 11 with 8 rows per group (~ your distribution) & sufficient work_mem
fiddle for Postgres 11 with 2 rows per group & sufficient work_mem

See:

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
3

As a workaround, consider rows sorted in descending index order:

id creativeScheduleId
10 d
9 c
8 c
7 b
6 b
5 b
4 a
3 a
2 a
1 a

The rows you want (in bold) are the ones where the previous row doesn't have a matching value for "creativeScheduleId":

EXPLAIN (ANALYZE) 
SELECT 
    q1.id, 
    q1."creativeScheduleId" 
FROM 
(
    SELECT
        cssh.*,
        CASE
            WHEN cssh."creativeScheduleId" = 
                LAST_VALUE(cssh."creativeScheduleId") OVER (
                    ORDER BY cssh."creativeScheduleId" DESC, cssh.id DESC
                        ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING)
            THEN 0
            ELSE 1
    END AS qualified
    FROM public.creative_schedule_status_histories AS cssh
) AS q1
WHERE
    q1.qualified = 1;
Subquery Scan on q1 (cost=0.15..104.48 rows=6 width=36) (actual time=0.014..0.014 rows=0 loops=1)
  Filter: (q1.qualified = 1)
  -> WindowAgg (cost=0.15..88.60 rows=1270 width=40) (actual time=0.013..0.014 rows=0 loops=1)
        -> Index Only Scan Backward using idx_creativescheduleid_id on creative_schedule_status_histories cssh (cost=0.15..63.20 rows=1270 width=36) (actual time=0.011..0.011 rows=0 loops=1)
              Heap Fetches: 0
Planning Time: 0.415 ms
Execution Time: 0.076 ms

db<>fiddle


In a comment, you expressed an interest in how SQL Server handles this.

It can use an index backward scan, but needs a little assistance:

SELECT
    Q1.id, 
    Q1.creativeScheduleId
FROM 
(
    SELECT 
        CSSH.id, 
        CSSH.creativeScheduleId,
        rn = ROW_NUMBER() OVER (
            PARTITION BY CSSH.creativeScheduleId
            ORDER BY CSSH.id DESC)
    FROM dbo.creative_schedule_status_histories AS CSSH
) AS Q1
WHERE
    Q1.rn = 1
-- Encourage optimizer
ORDER BY
    Q1.creativeScheduleId DESC,
    Q1.id DESC;
 |--Filter(WHERE:([Expr1001]=(1)))
       |--Sequence Project(DEFINE:([Expr1001]=row_number))
            |--Segment
                 |--Index Scan([idx_creativescheduleid_id]), ORDERED BACKWARD)

Index backward scan; no sort

db<>fiddle

Paul White
  • 94,921
  • 30
  • 437
  • 687
2

While PostgreSQL knows how to read an index backwards in general, there are some cases which exceed its grasp and this window function with both a partitioning and an ordering is one of them.

You could also imagine this query being resolved using hash tables rather than sorting/ordering of any kind (in the specific case of rn=1) , but that, too, is not implemented.

jjanes
  • 42,332
  • 3
  • 44
  • 54