UPDATE
I've posted a newer more focused question How to use index for simple time range join?
I'll keep this for reference.
ORIGINAL
I have a large (~100 million rows) timeseries table t_16 where the primary key is a field abs_date_time of type timestamp.
I want to join this table to a small (<1000 rows) ranges table, which only has one field range of type tsrange.
In the production example, range is not really a table, but a CTE (i.e. a query in the WITH clause.
I managed to reproduce the issue with a "real" table, like so:
For my tests I create a new ranges table and insert only one range.
create table ranges
(
time_range tsrange
);
In this case the query is very slow: it makes a full table scan of the timeseries table:
EXPLAIN
SELECT t_16.*
FROM t_16, ranges
WHERE (abs_date_time >= LOWER(ranges.time_range)
AND abs_date_time < UPPER(ranges.time_range));
Explain output - note the Seq Scan on t_16:
Nested Loop (cost=0.00..2645938802.76 rows=13056002418 width=80)
Join Filter: ((t_16.abs_date_time >= lower(ranges.time_range)) AND (t_16.abs_date_time < upper(ranges.time_range)))
-> Seq Scan on t_16 (cost=0.00..2098286.16 rows=86400016 width=80)
-> Materialize (cost=0.00..30.40 rows=1360 width=32)
-> Seq Scan on ranges (cost=0.00..23.60 rows=1360 width=32)
When I now analyse the ranges table and run the query again, the index is used and the query is fast:
analyse ranges;
EXPLAIN
SELECT t_16.*
FROM t_16, ranges
WHERE (abs_date_time >= LOWER(ranges.time_range)
AND abs_date_time < UPPER(ranges.time_range));
Explain result: note that t_16_pkey is used:
Nested Loop (cost=127355.89..3147692.66 rows=9600002 width=80)
-> Seq Scan on ranges (cost=0.00..1.01 rows=1 width=22)
-> Bitmap Heap Scan on t_16 (cost=127355.89..3051691.63 rows=9600002 width=80)
Recheck Cond: ((abs_date_time >= lower(ranges.time_range)) AND (abs_date_time < upper(ranges.time_range)))
-> Bitmap Index Scan on t_16_pkey (cost=0.00..124955.89 rows=9600002 width=0)
Index Cond: ((abs_date_time >= lower(ranges.time_range)) AND (abs_date_time < upper(ranges.time_range)))
Questions:
- why is a full table scan used (before the
analyse)? Since the join is only on the primary key field, I hoped that this was always used. - Is there a way that I can rewrite the query, so that the index is used, even without
analyse(which I cannot use in a CTE)?
Additional Info:
Here is a version that actually uses a CTE:
EXPLAIN
WITH cte AS (
select * from ranges
)
SELECT t_16.*
FROM t_16, cte
WHERE (abs_date_time >= LOWER(cte.time_range)
AND abs_date_time < UPPER(cte.time_range));
This also does not use the index (before analyse), but makes a full-table scan ont the time-series-table.
Explain output:
Nested Loop (cost=23.60..3527218976.16 rows=13056002418 width=80)
Join Filter: ((t_16.abs_date_time >= lower(cte.time_range)) AND (t_16.abs_date_time < upper(cte.time_range)))
CTE cte
-> Seq Scan on ranges (cost=0.00..23.60 rows=1360 width=32)
-> Seq Scan on t_16 (cost=0.00..2098286.16 rows=86400016 width=80)
-> CTE Scan on cte (cost=0.00..27.20 rows=1360 width=32)