4

I have table events with fields:

id
user_id
time_start
time_end
...

And have b-tree index on (time_start, time_end).

SELECT user_id
FROM events
WHERE ((time_start <= '2021-08-24T15:30:00+00:00' AND time_end >= '2021-08-24T15:30:00+00:00') OR
       (time_start <= '2021-08-24T15:59:00+00:00' AND time_end >= '2021-08-24T15:59:00+00:00'))
GROUP BY user_id);
Group  (cost=243735.42..243998.32 rows=1103 width=4) (actual time=186.533..188.244 rows=166 loops=1)
  Group Key: user_id
  Buffers: shared hit=224848
  ->  Gather Merge  (cost=243735.42..243992.80 rows=2206 width=4) (actual time=186.532..188.199 rows=176 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=224848
        ->  Sort  (cost=242735.39..242738.15 rows=1103 width=4) (actual time=184.121..184.126 rows=59 loops=3)
              Sort Key: user_id
              Sort Method: quicksort  Memory: 27kB
              Worker 0:  Sort Method: quicksort  Memory: 27kB
              Worker 1:  Sort Method: quicksort  Memory: 28kB
              Buffers: shared hit=224848
              ->  Partial HashAggregate  (cost=242668.62..242679.65 rows=1103 width=4) (actual time=184.065..184.085 rows=59 loops=3)
                    Group Key: user_id
                    Buffers: shared hit=224834
                    ->  Parallel Seq Scan on events  (cost=0.00..242553.74 rows=45952 width=4) (actual time=104.085..183.994 rows=64 loops=3)
                          Filter: (((time_start <= '2021-08-24 15:30:00+00'::timestamp with time zone) AND (time_end >= '2021-08-24 15:30:00+00'::timestamp with time zone)) OR ((time_start <= '2021-08-24 15:59:00+00'::timestamp with time zone) AND (time_end >= '2021-08-24 15:59:00+00'::timestamp with time zone)))
                          Rows Removed by Filter: 708728
                          Buffers: shared hit=224834
Planning Time: 0.169 ms
Execution Time: 188.294 ms

Postgres uses Seq Scan with filter:

Filter: (((time_start <= '2021-08-24 15:30:00+00'::timestamp with time zone) AND (time_end >= '2021-08-24 15:30:00+00'::timestamp with time zone)) OR ((time_start <= '2021-08-24 15:59:00+00'::timestamp with time zone) AND (time_end >= '2021-08-24 15:59:00+00'::timestamp with time zone)))

But when I leave one condition for time_start and time_end it starts to use Index Scan.

How can I change condition to make Postgres use Index Scan over Seq Scan?

I don't want to use UNION like:

SELECT user_id
FROM events
WHERE (
     (time_start <= '2021-08-24T15:59:00+00:00' AND time_end >= '2021-08-24T15:59:00+00:00'))
GROUP BY user_id)
UNION (SELECT user_id
       FROM events
       WHERE (
           (time_start <= '2021-08-24T15:59:00+00:00' AND time_end >= '2021-08-24T15:59:00+00:00'))
   GROUP BY user_id
Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
Dmitro
  • 175
  • 4

1 Answers1

6

Expression index

A GiST or (even better) SP-GiST expression index on an inclusive timestamp range should work wonders.

CREATE INDEX events_right_idx ON events USING spgist (tsrange(time_start, time_end, '[]'));

Rewrite your query with the "range contains" operator @> and match the indexed expression (exactly equivalent to your original):

SELECT user_id
FROM   events
WHERE  tsrange(time_start, time_end, '[]') @> timestamp '2021-08-24 15:30:00'
    OR tsrange(time_start, time_end, '[]') @> timestamp '2021-08-24 15:59:00'
GROUP  BY user_id;

You'll get a query plan like this:

HashAggregate  (cost=9.90..10.00 rows=10 width=4)
  Group Key: user_id
  ->  Bitmap Heap Scan on events  (cost=2.57..9.88 rows=10 width=4)
        Recheck Cond: ((tsrange(time_start, time_end, ''[]''::text) @> ''2021-08-24 15:30:00''::timestamp without time zone) OR (tsrange(time_start, time_end, ''[]''::text) @> ''2021-08-24 15:59:00''::timestamp without time zone))
        ->  BitmapOr  (cost=2.57..2.57 rows=10 width=0)
              ->  Bitmap Index Scan on events_right_expr_idx  (cost=0.00..1.28 rows=5 width=0)
                    Index Cond: (tsrange(time_start, time_end, ''[]''::text) @> ''2021-08-24 15:30:00''::timestamp without time zone)
              ->  Bitmap Index Scan on events_right_expr_idx  (cost=0.00..1.28 rows=5 width=0)
                    Index Cond: (tsrange(time_start, time_end, ''[]''::text) @> ''2021-08-24 15:59:00''::timestamp without time zone)

Should be much faster.

Range types assume inclusive lower and exclusive upper bound unless instructed otherwise. tsrange(time_start, time_end) is the same as tsrange(time_start, time_end), '[)').
Since you operate with >= and <=, include both bounds with tsrange(time_start, time_end, '[]').

Related:

Alternatively, store range column in table

Should be a bit faster, yet, as plain (not expression) index.
You can add a timestamp range column to your table, like:

ALTER TABLE event ADD COLUMN ts_range tsrange GENERATED ALWAYE AS (tsrange(time_start, time_end, '[]')) STORED;

See:

Or, more radically, replace time_start and time_end with the range column. Then index and query are a bit simpler:

CREATE INDEX events_right_idx ON events USING spgist (ts_range);

SELECT user_id FROM events WHERE ts_range @> timestamp '2021-08-24T15:30:00' OR ts_range @> timestamp '2021-08-24T15:59:00' GROUP BY user_id;

But the tsrange column occupies more space than two timestamp columns. Weigh cost and benefits.

Asides

Postgres 14 (currently beta) even allows a covering SP-GiST index. The release notes:

Allow SP-GiST to use INCLUDE'd columns (Pavel Borisov)

But I don't think you can get index-only scans for your particular query.

If you have to make do with your B-tree index for some reason, this fixed UNION query shouldn't be too bad:

SELECT user_id
FROM   events
WHERE  '2021-08-24T15:30:00' BETWEEN time_start AND time_end
UNION
SELECT user_id
FROM   events
WHERE  '2021-08-24T15:59:00' BETWEEN time_start AND time_end

Notably, no GROUP BY. UNION already does all the work.
And simplify with BETWEEN (no effect on performance).

Also, you seem to have a wild mix of timestamp without time zone and timestamp with time zone. And name it "time" to add to the confusion. Typically timestamptz is the better choice. See:

Last, but not least, this indicates inaccurate column statistics, leading to a sub-optimal query plan:

 ->  Parallel Seq Scan on events  (cost=0.00..242553.74 rows=45952 width=4)
                                  (actual time=104.085..183.994 rows=64 loops=3)

Run

ANALYZE events;

And retry. Your original query can use a plain B-tree index. It's just not as efficient as the suggested SP-GiST index.
And then maybe tune your autovacuum and statistics settings to avoid bad statistics in the future. See:

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633