2

tl;dr: When to use array_agg over a lateral join with array subquery?

Context:

create table t_zipcode ( zipcode text primary key, location geography );
create table t_user ( id text primary key, zipcode text references t_zipcode);

I want to design a query that gets all 'nearby' zipcodes to each user, say in a range of 20 km. My first attempt was like this:

select u.id, z.zipcode, z.location, array_agg(zz.zipcode order by zz.zipcode) as nearby_zipcodes
from t_user u
         join t_zipcode z on u.zipcode = z.zipcode
         left join t_zipcode zz on st_dwithin(z.location, zz.location, 20000)
group by u.id, z.zipcode, z.location
order by u.id;

which returns results like this:

+------------------------------------+-------+--------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|id                                  |zipcode|location                                          |nearby_zipcodes                                                                                                                                                                                                                                                                                                                            |
+------------------------------------+-------+--------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|0233684d-d2d8-4fdc-863f-08c6aac79d92|81001  |0101000020E6100000C4245CC823225AC025E7C41EDA254340|{81001,81003,81006,81008,81025}                                                                                                                                                                                                                                                                                                            |
|0e59fd58-00a5-442e-a21e-16ab1f4d2f65|85226  |0101000020E6100000506EDBF7A8005CC016A243E048A04040|{85034,85040,85041,85042,85044,85045,85048,85121,85202,85224,85225,85226,85248,85282,85283,85284,85286,85339}                                                                                                                                                                                                                              |
|13f26eca-bda0-4b8f-bcff-c95cceee3421|81001  |0101000020E6100000C4245CC823225AC025E7C41EDA254340|{81001,81003,81006,81008,81025}                                                                                                                                                                                                                                                                                                            |
|2d69e6bd-e0f0-40aa-ba3a-0e157fff46ff|81001  |0101000020E6100000C4245CC823225AC025E7C41EDA254340|{81001,81003,81006,81008,81025}                                                                                                                                                                                                                                                                                                            |
|4bd8e913-7a75-4028-8254-faebf043f629|81001  |0101000020E6100000C4245CC823225AC025E7C41EDA254340|{81001,81003,81006,81008,81025}                                                                                                                                                                                                                                                                                                            |
|627897e0-595c-4d21-b1b1-8dd43f21dee7|85033  |0101000020E6100000DAC69FA86C0D5CC08E78B29B19BF4040|{85003,85004,85006,85007,85009,85012,85013,85014,85015,85016,85017,85019,85020,85021,85023,85029,85031,85033,85034,85035,85037,85040,85041,85043,85051,85053,85064,85301,85302,85303,85304,85305,85306,85307,85308,85309,85323,85335,85340,85345,85351,85353,85363,85378,85379,85381,85382,85392,85395}                                    |
|6353efe7-b0e0-42a2-b5fd-f38b7c8e27ad|81001  |0101000020E6100000C4245CC823225AC025E7C41EDA254340|{81001,81003,81006,81008,81025}                                                                                                                                                                                                                                                                                                            |
|6efad24b-10a7-4ac7-b91a-8842bca70c06|81001  |0101000020E6100000C4245CC823225AC025E7C41EDA254340|{81001,81003,81006,81008,81025}                                                                                                                                                                                                                                                                                                            |
|7f1fe1d1-e2cd-44b9-85f0-b3b888823a20|85305  |0101000020E6100000BF7D1D3867105CC0D2C6116BF1C34040|{85003,85004,85007,85009,85012,85013,85014,85015,85017,85019,85020,85021,85023,85029,85031,85033,85035,85037,85043,85051,85053,85080,85301,85302,85303,85304,85305,85306,85307,85308,85309,85323,85335,85340,85345,85351,85353,85355,85363,85373,85374,85375,85378,85379,85381,85382,85388,85392,85395}                                    |
|9c238c4a-712b-4b10-a91d-6c3548ae59bc|86001  |0101000020E6100000054F2157EAE85BC0D6AA5D13D2A44140|{86001,86011,86015}                                                                                                                                                                                                                                                                                                                        |
|ace88100-9ecd-4931-a617-b8d8b091470e|85007  |0101000020E6100000CB30EE06D1055CC0AA9D616A4BB94040|{85003,85004,85006,85007,85008,85009,85012,85013,85014,85015,85016,85017,85018,85019,85020,85021,85028,85029,85031,85033,85034,85035,85037,85040,85041,85042,85043,85044,85045,85048,85051,85064,85251,85253,85257,85281,85282,85283,85284,85301,85302,85303,85304,85305,85353}                                                            |
|bf96321b-be2e-4ef9-801e-9ee82ce693f4|81001  |0101000020E6100000C4245CC823225AC025E7C41EDA254340|{81001,81003,81006,81008,81025}                                                                                                                                                                                                                                                                                                            |
|d787537e-a3fa-4267-b313-71b2c07027fe|81001  |0101000020E6100000C4245CC823225AC025E7C41EDA254340|{81001,81003,81006,81008,81025}                                                                                                                                                                                                                                                                                                            |
|ea8d074d-adb1-4c45-bd02-b6b57f5a6604|41091  |0101000020E61000005B0A48FB1F2F55C063D009A183744340|{41005,41018,41030,41042,41051,41080,41091,41092,41094,41095,47020,47038,47040}                                                                                                                                                                                                                                                            |
|f925949e-6a5e-4e2c-9591-60f05f42ccb1|81001  |0101000020E6100000C4245CC823225AC025E7C41EDA254340|{81001,81003,81006,81008,81025}                                                                                                                                                                                                                                                                                                            |
|fc47120d-d3da-4847-8134-0eca87708467|98102  |0101000020E61000001C78B5DC99945EC09A7CB3CD8DD14740|{98004,98005,98006,98007,98008,98011,98020,98021,98028,98033,98034,98036,98039,98040,98043,98052,98056,98057,98072,98101,98102,98103,98104,98105,98106,98107,98108,98109,98110,98112,98115,98116,98117,98118,98119,98121,98122,98125,98126,98133,98134,98136,98144,98146,98154,98155,98164,98168,98174,98177,98178,98195,98199,98342,98353}|
+------------------------------------+-------+--------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

This is the analyzed plan with just 16 users AND using an index on the zipcode location column:

+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
|QUERY PLAN                                                                                                                                                     |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
|GroupAggregate  (cost=85365.87..86252.54 rows=35467 width=112) (actual time=1.932..2.474 rows=16 loops=1)                                                      |
|  Output: u.id, z.zipcode, z.location, array_agg(zz.zipcode ORDER BY zz.zipcode)                                                                               |
|  Group Key: u.id, z.zipcode, z.location                                                                                                                       |
|  ->  Sort  (cost=85365.87..85454.53 rows=35467 width=112) (actual time=1.887..1.913 rows=277 loops=1)                                                         |
|        Output: u.id, z.zipcode, z.location, zz.zipcode                                                                                                        |
|        Sort Key: u.id, z.zipcode, z.location                                                                                                                  |
|        Sort Method: quicksort  Memory: 53kB                                                                                                                   |
|        ->  Nested Loop Left Join  (cost=0.69..81507.59 rows=35467 width=112) (actual time=0.197..1.641 rows=277 loops=1)                                      |
|              Output: u.id, z.zipcode, z.location, zz.zipcode                                                                                                  |
|              ->  Nested Loop  (cost=0.29..719.73 rows=1070 width=80) (actual time=0.043..0.136 rows=16 loops=1)                                               |
|                    Output: u.id, z.zipcode, z.location                                                                                                        |
|                    Inner Unique: true                                                                                                                         |
|                    ->  Seq Scan on pg_temp.t_user u  (cost=0.00..20.70 rows=1070 width=48) (actual time=0.017..0.021 rows=16 loops=1)                         |
|                          Output: u.id, u.zipcode                                                                                                              |
|                    ->  Index Scan using test_idx_user_zip on pg_temp.t_zipcode z  (cost=0.29..0.65 rows=1 width=64) (actual time=0.006..0.006 rows=1 loops=16)|
|                          Output: z.zipcode, z.location                                                                                                        |
|                          Index Cond: (z.zipcode = u.zipcode)                                                                                                  |
|              ->  Index Scan using test_idx_zip_geo on pg_temp.t_zipcode zz  (cost=0.40..75.47 rows=3 width=64) (actual time=0.040..0.089 rows=17 loops=16)    |
|                    Output: zz.zipcode, zz.location                                                                                                            |
|                    Index Cond: (zz.location && _st_expand(z.location, '20000'::double precision))                                                             |
|                    Filter: st_dwithin(z.location, zz.location, '20000'::double precision, true)                                                               |
|                    Rows Removed by Filter: 8                                                                                                                  |
|Query Identifier: -2358650375080684395                                                                                                                         |
|Planning Time: 0.421 ms                                                                                                                                        |
|Execution Time: 2.649 ms                                                                                                                                       |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------+

I then tried a similar solution using a lateral join, which yields the exact same results:

select u.id, z.zipcode, z.location, t.nearby_zipcodes
from t_user u
         join t_zipcode z on u.zipcode = z.zipcode
         cross join lateral (select array(select zz.zipcode
                                          from t_zipcode zz
                                          where st_dwithin(z.location, zz.location, 20000)
                                          order by zz.zipcode) as nearby_zipcodes
    ) t
order by u.id;

which to my surprise, has slightly better performance:

+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
|QUERY PLAN                                                                                                                                                   |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
|Sort  (cost=85823.88..85826.55 rows=1070 width=112) (actual time=1.828..1.830 rows=16 loops=1)                                                               |
|  Output: u.id, z.zipcode, z.location, ((SubPlan 1))                                                                                                         |
|  Sort Key: u.id                                                                                                                                             |
|  Sort Method: quicksort  Memory: 30kB                                                                                                                       |
|  ->  Nested Loop  (cost=0.29..85770.04 rows=1070 width=112) (actual time=0.319..1.809 rows=16 loops=1)                                                      |
|        Output: u.id, z.zipcode, z.location, (SubPlan 1)                                                                                                     |
|        Inner Unique: true                                                                                                                                   |
|        ->  Seq Scan on pg_temp.t_user u  (cost=0.00..20.70 rows=1070 width=48) (actual time=0.012..0.016 rows=16 loops=1)                                   |
|              Output: u.id, u.zipcode                                                                                                                        |
|        ->  Index Scan using test_idx_user_zip on pg_temp.t_zipcode z  (cost=0.29..0.65 rows=1 width=64) (actual time=0.006..0.006 rows=1 loops=16)          |
|              Output: z.zipcode, z.location                                                                                                                  |
|              Index Cond: (z.zipcode = u.zipcode)                                                                                                            |
|        SubPlan 1                                                                                                                                            |
|          ->  Sort  (cost=79.48..79.49 rows=3 width=32) (actual time=0.101..0.102 rows=17 loops=16)                                                          |
|                Output: zz.zipcode                                                                                                                           |
|                Sort Key: zz.zipcode                                                                                                                         |
|                Sort Method: quicksort  Memory: 25kB                                                                                                         |
|                ->  Index Scan using test_idx_zip_geo on pg_temp.t_zipcode zz  (cost=0.40..79.45 rows=3 width=32) (actual time=0.032..0.080 rows=17 loops=16)|
|                      Output: zz.zipcode                                                                                                                     |
|                      Index Cond: (zz.location && _st_expand(z.location, '20000'::double precision))                                                         |
|                      Filter: st_dwithin(z.location, zz.location, '20000'::double precision, true)                                                           |
|                      Rows Removed by Filter: 8                                                                                                              |
|Query Identifier: 8776498431209157273                                                                                                                        |
|Planning Time: 0.257 ms                                                                                                                                      |
|Execution Time: 1.892 ms                                                                                                                                     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+

I was under the impression that lateral joins should be avoided in general when possible, is there maybe a demerit I'm failing to notice?

ed__
  • 23
  • 5

1 Answers1

2

I was under the impression that lateral joins should be avoided in general when possible ...

That's generalizing a bit much. LATERAL join is a tool like any other: use when appropriate. It all depends.

ORDER BY in the aggregate function is typically comparatively expensive. That is one thing to avoid when possible. And it's possible here either way:

Query for few users

16 users is very few.
A LATERAL subquery is probably simplest and fastest. And an array constructor is typically cheapest in this case. See:

Your second query is already spot on. The array constructor always returns a row, so CROSS JOIN does not eliminate users without nearby locations.
Since the lateral subquery always returns exactly one row, and to squeeze out the last bit of performance, you might use a correlated subquery instead of the LATERAL subquery. Typically slightly faster:

SELECT u.id, zipcode, z.location
     , ARRAY(SELECT zz.zipcode
             FROM   t_zipcode zz   
             WHERE  st_dwithin(z.location, zz.location, 20000)
             ORDER  BY zz.zipcode
             ) AS nearby_zipcodes
FROM   t_user    u
JOIN   t_zipcode z USING (zipcode)
ORDER  BY u.id;

The beauty: no GROUP BY at all.

Related:

Query for many users

The above does not scale well for two reasons:

  1. A single aggregate step is faster than one per user (or one subquery with array constructor per user).
  2. Typically, many users share the same zib code. Aggregating once per qualifying zib code scales better. Join each result to 1-n users.
SELECT u.id, zipcode, n.location, n.nearby_zipcodes
FROM   t_user u
LEFT   JOIN (
   SELECT zipcode, location, array_agg(nearby_zipcode) AS nearby_zipcodes
   FROM  (
      SELECT z.zipcode, z.location, zz.zipcode AS nearby_zipcode
      FROM  (SELECT DISTINCT zipcode FROM t_user) u  -- distinct user zib codes
      JOIN   t_zipcode z USING (zipcode)
      JOIN   t_zipcode zz ON st_dwithin(z.location, zz.location, 20000)
      ORDER  BY z.zipcode, zz.zipcode
      ) sub
   GROUP  BY zipcode, location
   ) n USING (zipcode);

That's more like your first query, but ordered once in a subquery. The gain is proportional to the number of users and the ratio of duplicate zib codes.

MATERIALIZED VIEW

Zib codes and geo locations don't usually change. If the search radius is constant, too - or for a small number of admissible radii, a MATERIALIZED VIEW suggests itself. Materialize nearby_zipcodes (one per search radius) and reuse the result. Much faster, yet.

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633