2

I have a table called locations with ~70000 records which includes a geometry column of points and a spatial index as follows:

UPDATE locations SET geom = ST_GeomFromText(CONCAT( 'POINT(', latitude, ' ', longitude, ')' ), 4326);
ALTER TABLE locations MODIFY geom GEOMETRY NOT NULL;
ALTER TABLE locations ADD SPATIAL INDEX(geom);

I have another table called location_boundaries with ~350 records and a geometry column for country/region boundaries converted from a geojson string of polygons/multipolygons as follows:

UPDATE location_boundaries SET geom = ST_GeomFromGeoJSON(geo_json, 1, 4326);
ALTER TABLE location_boundaries MODIFY geom GEOMETRY NOT NULL;
ALTER TABLE location_boundaries ADD SPATIAL INDEX(geom);

I have a table called jobs and I have the following query to return all jobs within a particular region - in this case North West England (note the geojson boundry for this region was taken from the following site: https://nominatim.openstreetmap.org/search?q=North%20West%20England&countrycodes=gb&polygon_geojson=1&format=json

The goal of this query is to generate a paginated result set. However the query takes forever to execute (well over 40 mins before I had to cancel it). To filter through ~6000 jobs is far too slow.

SET @g = (SELECT geom 
   FROM location_boundaries 
WHERE NAME = 'North West England');

SELECT DISTINCT jobs.*
      FROM  jobs
INNER JOIN locations
        ON locations.id = jobs.location_id
     WHERE St_contains(@g, locations.geom)
       AND jobs.start_at <= now()
       AND jobs.end_at >= now()
       AND jobs.status_id = 4
 ORDER  BY Coalesce(IF(jobs.product_id = 4 AND jobs.refreshed_at <= UTC_TIMESTAMP(), jobs.refreshed_at, jobs.start_at)) DESC
        , jobs.id DESC
     LIMIT 25 offset 0 

This is the explain:

+----+-------------+-----------+------------+--------+-------------------------------------------------------------------------------+---------    +---------+-------------------------+------+----------+----------------------------------------------+
| id | select_type | table     | partitions | type   | possible_keys                                                                 | key     | key_len | ref                     | rows | filtered | Extra                                        |
+----+-------------+-----------+------------+--------+-------------------------------------------------------------------------------+---------+---------+-------------------------+------+----------+----------------------------------------------+
|  1 | SIMPLE      | jobs      | NULL       | ALL    | jobs_location_id_foreign,jobs_status_id_start_at_index,jobs_status_id_foreign | NULL    | NULL    | NULL                    | 5731 |    33.33 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | locations | NULL       | eq_ref | PRIMARY,geom                                                                  | PRIMARY | 4       | testdb.jobs.location_id |    1 |     5.00 | Using where; Distinct                        |
+----+-------------+-----------+------------+--------+-------------------------------------------------------------------------------+---------+---------+-------------------------+------+----------+----------------------------------------------+

Why arnt the spatial indexes being used? How can I optimize this query. Any help appreciated.

Strangely when switch the boundary to a different multipolygon say to country level as follows I can get the query to execute in 32 seconds which is still too slow.

SET @g = (SELECT geom 
   FROM location_boundaries 
WHERE NAME = 'United Kingdom');

When you validate the geojson for Northwest england in qgis it is valid

adam78
  • 155
  • 1
  • 8

0 Answers0