0

I'm trying to optimize my geolocation query for a table of addresses of ~862k rows within a search radius using earth_box. My first initial query is not too terrible:

explain analyze SELECT
    id
FROM
   location
WHERE
    earth_box(ll_to_earth(40.65130101, -73.83367812), 25000) @> ll_to_earth(latitude, longitude)
  ;
                                                                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on location  (cost=46.97..3385.08 rows=863 width=16) (actual time=11.430..49.406 rows=29407 loops=1)
   Recheck Cond: ('(1322317.9173587903, -4672693.781456112, 4130081.708818594),(1372317.8853516562, -4622693.8134632455, 4180081.6768114604)'::cube @> (ll_to_earth(latitude, longitude))::cube)
   Heap Blocks: exact=22479
   ->  Bitmap Index Scan on location_gist_lat_lon_idx  (cost=0.00..46.76 rows=863 width=0) (actual time=7.942..7.943 rows=29407 loops=1)
         Index Cond: ((ll_to_earth(latitude, longitude))::cube <@ '(1322317.9173587903, -4672693.781456112, 4130081.708818594),(1372317.8853516562, -4622693.8134632455, 4180081.6768114604)'::cube)
 Planning Time: 0.711 ms
 Execution Time:  51.018 ms

However, when I want to refine my search to get more accurate results with earth_distance the execution time dramatically increases by 10x:

=> explain analyze SELECT
    id
FROM
   location
WHERE
    earth_box(ll_to_earth(40.65130101, -73.83367812), 25000) @> ll_to_earth(latitude, longitude)
  AND earth_distance(ll_to_earth(40.65130101, -73.83367812),
                     ll_to_earth(latitude, longitude)) < 25000 ;
                                                                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on location  (cost=46.83..3820.75 rows=288 width=16) (actual time=11.264..537.018 rows=24898 loops=1)
   Recheck Cond: ('(1322317.9173587903, -4672693.781456112, 4130081.708818594),(1372317.8853516562, -4622693.8134632455, 4180081.6768114604)'::cube @> (ll_to_earth(latitude, longitude))::cube)
   Filter: (sec_to_gc(cube_distance('(1347317.9013552233, -4647693.797459679, 4155081.692815027)'::cube, (ll_to_earth(latitude, longitude))::cube)) < '25000'::double precision)
   Rows Removed by Filter: 4509
   Heap Blocks: exact=22479
   ->  Bitmap Index Scan on location_gist_lat_lon_idx  (cost=0.00..46.76 rows=863 width=0) (actual time=7.358..7.358 rows=29407 loops=1)
         Index Cond: ((ll_to_earth(latitude, longitude))::cube <@ '(1322317.9173587903, -4672693.781456112, 4130081.708818594),(1372317.8853516562, -4622693.8134632455, 4180081.6768114604)'::cube)
 Planning Time: 0.901 ms
 Execution Time: 539.113 ms
(9 rows)

My table schema (excluded some columns):

                               Table "provider.location"
      Column      |           Type           | Collation | Nullable |      Default
------------------+--------------------------+-----------+----------+-------------------
 id               | uuid                     |           | not null |
 created_at       | timestamp with time zone |           | not null | CURRENT_TIMESTAMP
 updated_at       | timestamp with time zone |           | not null | CURRENT_TIMESTAMP
 can_delete       | boolean                  |           |          |
 name             | text                     |           |          |
 address          | text                     |           |          |
 address_line_1   | text                     |           |          |
 address_line_2   | text                     |           |          |
 city             | text                     |           |          |
 state            | text                     |           |          |
 street           | text                     |           |          |
 zip              | text                     |           |          |
 confidence       | int                      |           |          |
 google_maps_link | text                     |           |          |
 is_coe           | boolean                  |           |          |
 latitude         | double precision         |           |          |
 longitude        | double precision         |           |          |

I also have a GiST index created for the lat/lon:

  "location_gist_lat_lon_idx" gist (ll_to_earth(latitude, longitude))

I'm wondering what is it that I'm missing that's making the additional query execution time increase by 10x?

My postgres 13.5 instance has the following specs:

CPU: 4vCPU
Memory: 16GB
SDD: 250GB

This question is related to How can I speed-up my query on geo-location processes, however after following the suggested answer it didn't seem to improve my performance.

blin
  • 3
  • 1

1 Answers1

0

I don't know what you are missing, other than some perspective.

earth_distance is kind of slow. So computing it 29407 times is slow.

I don't know what from that linked post you tried to do. The suggestion was to create an index, which you already had. References are great, but you still need to tell us what you did.

earth_distance (the module) is mostly for demonstration purposes. Any serious work should be one with postgis. In my hands, postgis is about 5 times faster at the recheck/filter task than earth_distance is. If you were willing to do some C programming, you could probably make earth_distance quite a bit faster, but I would say that that is the overkill when postgis already exists.

Another thing you could do is get it to run in parallel. The only way to do this (that I could find) is to do alter table location set (parallel_workers = 4);. I don't know why the planner doesn't pick parallel on its own without doing that. You might also need to increase max_parallel_workers_per_gather as well.

jjanes
  • 42,332
  • 3
  • 44
  • 54