Why does a CTE cause a full table scan?

Question

UPDATE

I've posted a newer more focused question How to use index for simple time range join?
I'll keep this for reference.

ORIGINAL

I have a large (~100 million rows) timeseries table t_16 where the primary key is a field abs_date_time of type timestamp.

I want to join this table to a small (<1000 rows) ranges table, which only has one field range of type tsrange.

In the production example, range is not really a table, but a CTE (i.e. a query in the WITH clause.
I managed to reproduce the issue with a "real" table, like so:

For my tests I create a new ranges table and insert only one range.

create table ranges
(
    time_range tsrange
);

In this case the query is very slow: it makes a full table scan of the timeseries table:

EXPLAIN
SELECT t_16.*
FROM t_16, ranges
WHERE (abs_date_time >= LOWER(ranges.time_range)
    AND abs_date_time <  UPPER(ranges.time_range));

Explain output - note the Seq Scan on t_16:

Nested Loop  (cost=0.00..2645938802.76 rows=13056002418 width=80)
  Join Filter: ((t_16.abs_date_time >= lower(ranges.time_range)) AND (t_16.abs_date_time < upper(ranges.time_range)))
  ->  Seq Scan on t_16  (cost=0.00..2098286.16 rows=86400016 width=80)
  ->  Materialize  (cost=0.00..30.40 rows=1360 width=32)
        ->  Seq Scan on ranges  (cost=0.00..23.60 rows=1360 width=32)

When I now analyse the ranges table and run the query again, the index is used and the query is fast:

analyse ranges;

EXPLAIN
SELECT t_16.*
FROM t_16, ranges
WHERE (abs_date_time >= LOWER(ranges.time_range)
    AND abs_date_time <  UPPER(ranges.time_range));

Explain result: note that t_16_pkey is used:

Nested Loop  (cost=127355.89..3147692.66 rows=9600002 width=80)
  ->  Seq Scan on ranges  (cost=0.00..1.01 rows=1 width=22)
  ->  Bitmap Heap Scan on t_16  (cost=127355.89..3051691.63 rows=9600002 width=80)
        Recheck Cond: ((abs_date_time >= lower(ranges.time_range)) AND (abs_date_time < upper(ranges.time_range)))
        ->  Bitmap Index Scan on t_16_pkey  (cost=0.00..124955.89 rows=9600002 width=0)
              Index Cond: ((abs_date_time >= lower(ranges.time_range)) AND (abs_date_time < upper(ranges.time_range)))

Questions:

why is a full table scan used (before the analyse)? Since the join is only on the primary key field, I hoped that this was always used.
Is there a way that I can rewrite the query, so that the index is used, even without analyse (which I cannot use in a CTE)?

Additional Info:

Here is a version that actually uses a CTE:

EXPLAIN
WITH cte AS (
    select * from ranges
)
SELECT t_16.*
FROM t_16, cte
WHERE (abs_date_time >= LOWER(cte.time_range)
    AND abs_date_time <  UPPER(cte.time_range));

This also does not use the index (before analyse), but makes a full-table scan ont the time-series-table.
Explain output:

Nested Loop  (cost=23.60..3527218976.16 rows=13056002418 width=80)
  Join Filter: ((t_16.abs_date_time >= lower(cte.time_range)) AND (t_16.abs_date_time < upper(cte.time_range)))
  CTE cte
    ->  Seq Scan on ranges  (cost=0.00..23.60 rows=1360 width=32)
  ->  Seq Scan on t_16  (cost=0.00..2098286.16 rows=86400016 width=80)
  ->  CTE Scan on cte  (cost=0.00..27.20 rows=1360 width=32)

score 0 · Answer 1 · answered Nov 27 '19 at 15:55

You are making unwarranted assumptions here. You don't need an ANALYZE for a CTE, a CTE will use the stats of the base table(s), not its own stats.

However, before version 12, CTEs are optimization "fences". The CTE will always run to completion almost as if it were an entirely separate query.

Why does a CTE cause a full table scan?

1 Answers1

Linked