How can I speed up my query that involves `max(timestamp)` in a huge table? I already added an index to every field

Question

I have a huge table that has fields ip, mac, and timestamp.

Neither of the three fields is unique, but the combination of all three is.

Table is automatically populated, with newer records added all the time. The field timestamp refers to when a row was added. Records are never UPDATEd.

Here's the table description:

  Column   |            Type             | Nullable | Default
-----------+-----------------------------+----------+--------
 event     | text                        | not null |
 ip        | inet                        | not null |
 mac       | macaddr8                    | not null |
 timestamp | timestamp without time zone | not null | now()
Indexes:
    "ip_idx" btree (ip)
    "mac_idx" btree (mac)
    "time_idx" btree ("timestamp")
    "timestamp_ip_event_key" UNIQUE CONSTRAINT, btree ("timestamp", ip, event)

I have this very slow query, causing the website to take very long time to load

How can I speed it up?

Is it possible to take advantage of the fact that the table is basically ordered by timestamp?

I do not have access to the script that adds records.

Executed SQL
select ip,
       max(timestamp)
  from my_table
 WHERE ip << inet '10.38.69.0/24'
 group by ip
 order by ip;
Time
2151.577594000173 ms
Database
default
QUERY PLAN
Finalize GroupAggregate  (cost=291919.94..292190.52 rows=1068 width=15) (actual time=696.246..704.698 rows=151 loops=1)
  Group Key: ip
  ->  Gather Merge  (cost=291919.94..292169.16 rows=2136 width=15) (actual time=696.220..704.558 rows=429 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Sort  (cost=290919.92..290922.59 rows=1068 width=15) (actual time=679.313..679.325 rows=143 loops=3)
              Sort Key: ip
              Sort Method: quicksort  Memory: 31kB
              Worker 0:  Sort Method: quicksort  Memory: 31kB
              Worker 1:  Sort Method: quicksort  Memory: 31kB
              ->  Partial HashAggregate  (cost=290855.52..290866.20 rows=1068 width=15) (actual time=679.192..679.233 rows=143 loops=3)
                    Group Key: ip
                    Batches: 1  Memory Usage: 81kB
                    Worker 0:  Batches: 1  Memory Usage: 81kB
                    Worker 1:  Batches: 1  Memory Usage: 81kB
                    ->  Parallel Bitmap Heap Scan on my_table  (cost=12023.68..289019.89 rows=367126 width=15) (actual time=67.898..580.432 rows=312819 loops=3)
                          Filter: (ip << '10.38.69.0/24'::inet)
                          Rows Removed by Filter: 1716854
                          Heap Blocks: exact=15528 lossy=18121
                          ->  Bitmap Index Scan on my_table_ip_idx  (cost=0.00..11803.41 rows=881097 width=0) (actual time=62.721..62.721 rows=938457 loops=1)
                                Index Cond: ((ip > '10.38.69.0/24'::inet) AND (ip <= '10.38.69.255'::inet))
Planning Time: 1.049 ms
JIT:
  Functions: 30
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 2.180 ms, Inlining 0.000 ms, Optimization 1.470 ms, Emission 29.303 ms, Total 32.952 ms
Execution Time: 726.126 ms

Erwin Brandstetter · Answer 1 · 2023-08-12T01:22:46.290

Not enough `work_mem`

The lion share of the cost is accrued here:

                    ->  Parallel Bitmap Heap Scan on my_table  (cost=12023.68..289019.89 rows=367126 width=15) (actual time=67.898..580.432 rows=312819 loops=3)
                          Filter: (ip << '10.38.69.0/24'::inet)
                          Rows Removed by Filter: 1716854
                          Heap Blocks: exact=15528 lossy=18121

Note lossy=18121, meaning there was not enough work_mem to store tuple pointers of matching index entries. Postgres only remembers data pages. Consequently, all tuples of those "lossy" pages have to be rechecked, and most of them filtered out: Rows Removed by Filter: 1716854, which is expensive.

Increase your work_mem setting - if you can afford that. Maybe just for your session or just this query if it's exceptionally expensive. Increase to an amount where you get no more "lossy" pages. See:

"Recheck Cond:" line in query plans with a bitmap index scan

Better query

That said, you probably won't need additional work_mem with this much more efficient query emulating an index skip scan:

WITH RECURSIVE cte AS (
   (
   SELECT ip, timestamp
   FROM   my_table
   WHERE  ip << inet '10.38.69.0/24'
   ORDER  BY ip, timestamp DESC
   LIMIT  1
   )
   UNION ALL
   SELECT t.*
   FROM   cte c
   CROSS  JOIN LATERAL (
      SELECT t.*
      FROM   my_table t
      WHERE  t.ip > c.ip
      AND    ip << inet '10.38.69.0/24'
      ORDER  BY ip, timestamp DESC
      LIMIT  1
      ) t
   )
TABLE  cte;

While the query also works with the index on just (ip), it isn't typically much faster than your original. You need this matching multicolumn index to make it fast:

CREATE INDEX my_table_ip_time_stamp_idx ON my_table(ip, timestamp DESC);

See:

Remarkably similar question & answer from just today:

PostgreSQL fast query to get most recent rows

How can I speed up my query that involves `max(timestamp)` in a huge table? I already added an index to every field

1 Answers1

Not enough work_mem

Better query

Not enough `work_mem`