We use an Amazon RDS instance with
PostgreSQL 11.13 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), 64-bit
I have a simple classic top-1-per-group query. I need to get the latest item in the history for each creativeScheduleId.
Here is a table and index definitions:
CREATE TABLE IF NOT EXISTS public.creative_schedule_status_histories (
id serial PRIMARY KEY,
"creativeScheduleId" text NOT NULL,
-- other columns
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_creativescheduleid_id
ON public.creative_schedule_status_histories ("creativeScheduleId" ASC, id ASC);
When a query orders by id ASC the engine reads only the index and does not do any extra sorts:
EXPLAIN (ANALYZE)
SELECT history.id, history."creativeScheduleId"
FROM (
SELECT cssh.id, cssh."creativeScheduleId"
, ROW_NUMBER() OVER (PARTITION BY cssh."creativeScheduleId"
ORDER BY cssh.id ASC) AS rn -- !
FROM creative_schedule_status_histories as cssh
) AS history
WHERE history.rn = 1;
"Subquery Scan on history (cost=0.56..511808.63 rows=26377 width=41) (actual time=0.047..4539.058 rows=709030 loops=1)"
" Filter: (history.rn = 1)"
" Rows Removed by Filter: 4579766"
" -> WindowAgg (cost=0.56..445866.24 rows=5275391 width=49) (actual time=0.046..4165.835 rows=5288796 loops=1)"
" -> Index Only Scan using idx_creativescheduleid_id on creative_schedule_status_histories cssh (cost=0.56..353546.90 rows=5275391 width=41) (actual time=0.037..1447.490 rows=5288796 loops=1)"
" Heap Fetches: 2372"
"Planning Time: 0.072 ms"
"Execution Time: 4568.235 ms"
I expected to see exactly the same plan for a query when I order by id DESC, but there is an explicit sort in the plan which spills to disk and obviously everything is just slower.
EXPLAIN (ANALYZE)
SELECT history.id, history."creativeScheduleId"
FROM (
SELECT cssh.id, cssh."creativeScheduleId"
, ROW_NUMBER() OVER (PARTITION BY cssh."creativeScheduleId"
ORDER BY cssh.id DESC) AS rn -- !
FROM creative_schedule_status_histories as cssh
) AS history
WHERE history.rn = 1;
"Subquery Scan on history (cost=1267132.63..1438582.84 rows=26377 width=41) (actual time=11974.827..15840.338 rows=709046 loops=1)"
" Filter: (history.rn = 1)"
" Rows Removed by Filter: 4579802"
" -> WindowAgg (cost=1267132.63..1372640.45 rows=5275391 width=49) (actual time=11974.825..15529.679 rows=5288848 loops=1)"
" -> Sort (cost=1267132.63..1280321.11 rows=5275391 width=41) (actual time=11974.814..13547.038 rows=5288848 loops=1)"
" Sort Key: cssh.""creativeScheduleId"", cssh.id DESC"
" Sort Method: external merge Disk: 263992kB"
" -> Index Only Scan using idx_creativescheduleid_id on creative_schedule_status_histories cssh (cost=0.56..353550.90 rows=5275391 width=41) (actual time=0.015..1386.310 rows=5288848 loops=1)"
" Heap Fetches: 2508"
"Planning Time: 0.078 ms"
"Execution Time: 15949.877 ms"
I expected that the given index would be equally useful in both variants of the query.
Postgres can't scan an index backwards here?
What am I missing here?
When I make a query for a specific given creativeScheduleId, then Postgres uses index equally well for both ASC and DESC sort order. There is no explicit sort in any variant:
EXPLAIN (ANALYZE)
SELECT id, "creativeScheduleId"
FROM creative_schedule_status_histories AS cssh
WHERE "creativeScheduleId" = '24238370-a64c-4b30-ac8e-27eb2b693aca'
ORDER BY id DESC -- or ASC, no sort
LIMIT 1
"Limit (cost=0.56..0.71 rows=1 width=41) (actual time=0.022..0.022 rows=1 loops=1)"
" -> Index Only Scan Backward using idx_creativescheduleid_id on creative_schedule_status_histories cssh (cost=0.56..14.06 rows=86 width=41) (actual time=0.021..0.021 rows=1 loops=1)"
" Index Cond: (""creativeScheduleId"" = '24238370-a64c-4b30-ac8e-27eb2b693aca'::text)"
" Heap Fetches: 0"
"Planning Time: 0.064 ms"
"Execution Time: 0.033 ms"
Here we actually see Index Only Scan Backward, so Postgres is capable of it. But not for the whole table.
Any ideas how to encourage the engine to scan the whole index backwards for the first query that reads the whole table?
