4

In Postgres 9.1. I am attempting to do a single query to get the timezone of a given postal code. I have data loaded in my database, and I am using PostGIS to store the coordinates of each postal code. Here are the two queries that I need to do in order to get the data I need:

SELECT coord FROM public.postal_code WHERE postal_code = 'T1K0T4' LIMIT 1

query time: 6ms

This gives me the coordinate of the center of the postal_code area. Then I use this coordinate to find which timezone it intersects:

SELECT *
FROM public.timezones as tz
WHERE ST_Intersects(ST_GeomFromText('POINT(-112 49)',4326),geom)

query time: 36ms

Now when I combine the 2 queries, the query time jumps to 7-8 seconds. This is my query:

SELECT *
FROM public.timezones as tz
WHERE ST_Intersects((SELECT coord FROM taduler.postal_code
                     WHERE postal_code = 'T1K0T4' LIMIT 1),geom)

I have a spatial index on the coord column in the postal_code table, and also on the geom column in the timezone table, but it seems like it isn't being used for the subquery.

Does anyone know of a better way to optimize this query? I have tried several variations of this query, like joining the tables and such, but they have all resulted in the same query speed.

Output from EXPLAIN ANALYZE:

Seq Scan on timezones tz  (cost=8.37..167.47 rows=136 width=335547) (actual time=4606.136..7274.428 rows=1 loops=1)
  Filter: st_intersects($0, (geom)::geography)
  InitPlan 1 (returns $0)
    ->  Limit  (cost=0.00..8.37 rows=1 width=128) (actual time=0.011..0.011 rows=1 loops=1)
          ->  Index Scan using postal_code_idx on postal_code  (cost=0.00..8.37 rows=1 width=128) (actual time=0.010..0.010 rows=1 loops=1)
                Index Cond: ((postal_code)::text = 'T1K0T4'::text)
Total runtime: 7274.448 ms

EXPLAIN ANALYZE for the following query:

SELECT *
FROM public.timezones as tz
JOIN taduler.postal_code as pc on ST_Intersects(pc.coord, tz.geom)
WHERE pc.postal_code = 'T1K0T4'

Output:

Nested Loop  (cost=0.00..174.61 rows=1 width=335714) (actual time=4870.908..7572.723 rows=1 loops=1)
  Join Filter: ((pc.coord && (tz.geom)::geography) AND (_st_distance(pc.coord, (tz.geom)::geography, 0::double precision, false) < 1e-05::double precision))
  ->  Index Scan using postal_code_idx on postal_code pc  (cost=0.00..8.37 rows=1 width=167) (actual time=0.012..0.019 rows=1 loops=1)
        Index Cond: ((postal_code)::text = 'T1K0T4'::text)
  ->  Seq Scan on timezones tz  (cost=0.00..56.08 rows=408 width=335547) (actual time=0.002..2.795 rows=408 loops=1)
Total runtime: 7572.787 ms

1 Answers1

4

You are running in a weakness of the query planner: That index cannot be used in a plain join. Had a similar problem here:

In Postgres 9.3 or later use a LATERAL join instead:

SELECT *
FROM  (
    SELECT coord
    FROM   taduler.postal_code
    WHERE  postal_code = 'T1K0T4'
    LIMIT  1
    ) pc
LEFT JOIN LATERAL (
    SELECT *
    FROM   public.timezones tz
    WHERE  ST_Intersects(pc.coord, tz.geom)
   ) tz ON true;

See:

In Postgres 9.1, it might help to encapsulate the first query in a CTE, but I am not sure:

WITH pc AS (
    SELECT coord
    FROM   taduler.postal_code
    WHERE  postal_code = 'T1K0T4'
    LIMIT  1
    )
SELECT *
FROM   pc
JOIN   public.timezones tz ON ST_Intersects(pc.coord, tz.geom);

A PL/pgSQL function encapsulating two separate queries certainly does the trick:

CREATE OR REPLACE FUNCTION f_get_tz(_pc text)
  RETURNS SETOF public.timezones
  LANGUAGE plpgsql AS
$func$
DECLARE
   _coord geom;
BEGIN
   SELECT coord
   INTO  _coord
   FROM   taduler.postal_code
   WHERE  postal_code = _pc
   LIMIT  1;

RETURN QUERY SELECT * FROM public.timezones tz WHERE ST_Intersects(_coord, tz.geom); END $func$;

Call:

SELECT * FROM f_get_tz('T1K0T4');
Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633