Using Postgres 9.4.
I have a table cartests with 5.5M rows. Each row is a car test:
\d log.cartests;
Table "log.cartests"
Column | Type | Modifiers
-------------+--------------------------+-----------
carid | integer | not null
timestamp | timestamp with time zone | not null
exhaust | varchar |
brakes | varchar
Indexes:
"cartests_pkey" PRIMARY KEY, btree (id, "position", "timestamp")
"cartests_carid_idx" btree (carid)
"cartests_timestamp_idx" btree ("timestamp")
I need to retrieve the latest log for each car.
SELECT DISTINCT ON (carid) carid, timestamp, exhaust, brakes
FROM log.cartests
ORDER BY carid, timestamp DESC;
This one takes ~ 6 seconds. Running EXPLAIN ANALYZE gives:
Unique (cost=933787.06..961490.34 rows=472 width=16) (actual time=4951.347..6072.534 rows=476 loops=1) -> Sort (cost=933787.06..947638.70 rows=5540656 width=16) (actual time=4951.345..5782.466 rows=5540656 loops=1) Sort Key: carid, "timestamp" Sort Method: external merge Disk: 162504kB -> Seq Scan on cartests (cost=0.00..123810.56 rows=5540656 width=16) (actual time=0.009..620.988 rows=5540656 loops=1) Planning time: 0.107 ms Execution time: 6095.598 ms
Why must it sort if I have an index on both carid and timestamp?
Is there a better method to use here?