9

I need to fetch records from a InnoDb Table by distance (must not be exactly) and sort by distance. The table has 10 million records.

My best time is so far 8 sec (3 sec without order by distance), which make this not usable. How I could improve this?

I have a point column defined as SRID 4326. I'm using MySQL 8.0.12.

SELECT mp.hash_id, 
ROUND(ST_Distance(ST_SRID(POINT(8.53955, 47.37706), 4326), mp.geo_pt), 2) AS distance
  FROM member_profile mp 
  WHERE
    MBRCONTAINS(ST_GeomFromText(
      CONCAT('POLYGON((', ST_X(POINT (8.53955, 47.37706)) - 0.43415340086831, ' ',
        ST_Y(POINT (8.53955, 47.37706)) - 0.43415340086831, ',',
        ST_X(POINT (8.53955, 47.37706)) + 0.43415340086831, ' ',
        ST_Y(POINT (8.53955, 47.37706)) - 0.43415340086831, ',',
        ST_X(POINT (8.53955, 47.37706)) + 0.43415340086831, ' ',
        ST_Y(POINT (8.53955, 47.37706)) + 0.43415340086831, ',',
        ST_X(POINT (8.53955, 47.37706)) - 0.43415340086831, ' ',
        ST_Y(POINT (8.53955, 47.37706)) + 0.43415340086831, ',',
        ST_X(POINT (8.53955, 47.37706)) - 0.43415340086831, ' ',
        ST_Y(POINT (8.53955, 47.37706)) - 0.43415340086831, ')) ')
           , 4326), geo_pt)
-- ST_Distance(ST_GeomFromText('POINT (8.53955 47.37706)', 4326), mp.geo_pt) <= 25000 -- need 16 sec
-- order by distance -- need 8 sec with MBRContains, 100 sec with ST_Distance
LIMIT 50;

A spatial Index was created:

CREATE SPATIAL INDEX geo_pt_index ON mp (geo_pt);

EXPLAIN shows me that my geo_pt Index is used.

my.cnf

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
symbolic-links=0
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
innodb_buffer_pool_size = 12G
innodb_log_file_size = 512M
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
key_buffer_size = 1G
secure-file-priv = ""

This Server is only allocated for this database, no load on it (except when I execute a Query). There is no IOPS bottleneck. innodb_buffer_pool_size is sized to hold the whole dataset in Memory.

Server Instance has 16 GB Memory, uses fast NVMe SSD (There is no IOPS bottleneck). The Server only hosts this one Database and has except the Querys no load. 30% of Disk is used.

SHOW GLOBAL STATUS Output: https://pastebin.com/EMeNL8yT

SHOW GLOBAL VARIABLES Output: https://pastebin.com/yxzYn10E

MySQL Tuner Output: https://pastebin.com/NRWFQDMQ

I have today updated from 8.0.11 to 8.0.12 but followed mostly all related proposal of earlier MySQL Tuner Recommendations. The MySQL update was done regarding some fixed Bug with Spatial Search before the speed was the same.

SHOW WARNINGS (after Query execute):

Level,Code,Message
Note,1003,/* select#1 */ select `***`.`mp`.`member_id` AS `member_id`,round(st_distance(st_pointfromtext('POINT(8.53955 47.37706)',4326),`***`.`mp`.`geo_pt`),2) AS `distance` from `***`.`member_profile` `mp` where mbrcontains(<cache>(st_geomfromtext(concat('POLYGON((',(st_x(point(8.53955,47.37706)) - 0.43415340086831),' ',(st_y(point(8.53955,47.37706)) - 0.43415340086831),',',(st_x(point(8.53955,47.37706)) + 0.43415340086831),' ',(st_y(point(8.53955,47.37706)) - 0.43415340086831),',',(st_x(point(8.53955,47.37706)) + 0.43415340086831),' ',(st_y(point(8.53955,47.37706)) + 0.43415340086831),',',(st_x(point(8.53955,47.37706)) - 0.43415340086831),' ',(st_y(point(8.53955,47.37706)) + 0.43415340086831),',',(st_x(point(8.53955,47.37706)) - 0.43415340086831),' ',(st_y(point(8.53955,47.37706)) - 0.43415340086831),')) '),4326)),`***`.`mp`.`geo_pt`) order by `distance` limit 50

EXPLAIN:

id,select_type,table,partitions,type,possible_keys,key,
key_len,ref,rows,filtered,Extra
1,SIMPLE,mp,\N,range,geo_pt_index,geo_pt_index,34,\N,23,100.00,Using where; Using filesort

CREATE TABLE:

CREATE TABLE `member_profile` (
  `member_id` bigint(20) NOT NULL AUTO_INCREMENT,
  `hash_id` varchar(32)
        CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `geo_pt` point NOT NULL /*!80003 SRID 4326 */,
  PRIMARY KEY (`member_id`),
  UNIQUE KEY `hash_id` (`hash_id`),
  SPATIAL KEY `geo_pt_index` (`geo_pt`)
) ENGINE=InnoDB AUTO_INCREMENT=10498210
            DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

SHOW INDEX FROM:

Table,Non_unique,Key_name,Seq_in_index,Column_name,Collation,
Cardinality,Sub_part,
Packed,Null,Index_type,Comment,Index_comment,Visible

member_profile,0,PRIMARY,1,member_id,A,9936492,\N,\N,,BTREE,,,YES
member_profile,0,hash_id,1,hash_id,A,9936492,\N,\N,YES,BTREE,,,YES
member_profile,1,geo_pt_index,1,geo_pt,A,9936492,32,\N,,SPATIAL,,,YES
nenad007
  • 153
  • 2
  • 9

5 Answers5

5

"I have a point column defined as SRID 4326. I'm using MySQL 8.0.12."

I have a similar problem and changing the SRID to 0 improves performance significantly. I don't know if the side effect are unbearable for you, but at least you should try! Dont forget the other order of the lat and lon if you do that ;)

KR Pete

0

Suggestions to consider for your my.cnf [mysqld] section

max_connect_errors=10  # from 100, why give a hacker/cracker so many chances?
thread_cache_size=30  # from 9 since MySQL needs 8 to get started
innodb_io_capacity_max=60000  # from 2000  use that NVME for performance
innodb_io_capacity=30000  # from 200 why stick with a low limit with NVME
key_buffer_size=16M  # from 1G conserve RAM for more useful purpose
innodb_buffer_pool_dump_pct=90  # from 25 to reduce WARM up time
innodb_change_buffer_max_size=15  # from 25% for your low chg,del,ins need
innodb_lru_scan_depth=128  # from 1025 to conserve CPU every SECOND
innodb_read_io_threads=64  # from 4 see dba.stackexchange Question 5666
innodb_write_io_threads=64  # from 4 see 9/12/11 RolondaMySQLDBA info

please review profile, Network Profile for contact info, including Skype ID and get in touch.

Wilson Hauck
  • 1,763
  • 1
  • 11
  • 13
0

MySQL GIS is always slow. But you're telling me 3 seconds without ORDER BY DISTANCE is slow in a table with 10 million rows? Consider

  1. Using this method to build the points
  2. Building the bounding the box with WKT so you have something simplified. I have no idea how this would help, but it's MySQL!
  3. Rather than doing ST_Distance() < upperlimit, consider doing ST_Buffer( polygon, upperlimit ), and using that in your call to ST_Contains
  4. Consider moving to PostgreSQL/PostGIS, and using ST_DWithin(geom,geom,upperlimit). PostGIS has superior indexing. It can actually do this whole thing on an index because it supports KNN.
Evan Carroll
  • 65,432
  • 50
  • 254
  • 507
0

Side issues

Use ascii for hex values such as hashes.

Pack hex hashes into binary:

hash_id BINARY(16)
HEX(hash_id)   -- when reading
hash_id = UNHEX(...) -- when writing

Get rid of the AUTO_INCREMENT id and just use hash_id.

Better would be to get rid of the hash -- if the data grows too large to fit in RAM, it will become I/O-bound.

Faster algorithm

(Although the primary Question is about GIS, the the secondary question is about speeding up "find nearest".)

For the equivalent of LIMIT 50, the following algorithm will probably touch less than 200 rows (and calculate Great Circle distances for only those). (Better than 79,901?)

http://mysql.rjweb.org/doc.php/latlng (code included)

Analysis of VARIABLES/STATUS

It seems like you have not run much yet. Hence, there is not much much to say here:

Observations:

  • Version: 8.0.12
  • 16 GB of RAM
  • Uptime = 05:49:58; some GLOBAL STATUS values may not be meaningful yet.
  • Are you sure this was a SHOW GLOBAL STATUS ?
  • You are not running on Windows.
  • Running 64-bit version
  • You appear to be running entirely (or mostly) InnoDB.

The More Important Issues:

key_buffer_size = 50M
long_query_time = 2

Turn on the slowlog so you can identify slow queries.

Details and other observations:

( (key_buffer_size - 1.2 * Key_blocks_used * 1024) / _ram ) = (1024M - 1.2 * 16 * 1024) / 16384M = 6.2% -- Percent of RAM wasted in key_buffer. -- Decrease key_buffer_size.

( Key_blocks_used * 1024 / key_buffer_size ) = 16 * 1024 / 1024M = 0.00% -- Percent of key_buffer used. High-water-mark. -- Lower key_buffer_size to avoid unnecessary memory usage.

( table_open_cache ) = 4,000 -- Number of table descriptors to cache -- Several hundred is usually good.

( Innodb_buffer_pool_pages_free / Innodb_buffer_pool_pages_total ) = 443,594 / 786432 = 56.4% -- Pct of buffer_pool currently not in use -- innodb_buffer_pool_size is bigger than necessary?

( Innodb_os_log_written / (Uptime / 3600) / innodb_log_files_in_group / innodb_log_file_size ) = 87,040 / (20998 / 3600) / 2 / 512M = 1.4e-5 -- Ratio -- (see minutes)

( Uptime / 60 * innodb_log_file_size / Innodb_os_log_written ) = 20,998 / 60 * 512M / 87040 = 2.16e+6 -- Minutes between InnoDB log rotations Beginning with 5.6.8, this can be changed dynamically; be sure to also change my.cnf. -- (The recommendation of 60 minutes between rotations is somewhat arbitrary.) Adjust innodb_log_file_size. (Cannot change in AWS.)

( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF -- Whether to log all Deadlocks. -- If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.

( join_buffer_size / _ram ) = 262,144 / 16384M = 0.00% -- 0-N per thread. May speed up JOINs (better to fix queries/indexes) (all engines) Used for index scan, range index scan, full table scan, each full JOIN, etc. -- If large, decrease join_buffer_size to avoid memory pressure. Suggest less than 1% of RAM. If small, increase to 0.01% of RAM to improve some queries.

( query_prealloc_size / _ram ) = 8,192 / 16384M = 0.00% -- For parsing. Pct of RAM

( query_alloc_block_size / _ram ) = 8,192 / 16384M = 0.00% -- For parsing. Pct of RAM

( net_buffer_length / max_allowed_packet ) = 16,384 / 64M = 0.02%

( (Com_show_create_table + Com_show_fields) / Questions ) = (7 + 7) / 698 = 2.0% -- Naughty framework -- spending a lot of effort rediscovering the schema. -- Complain to the 3rd party vendor.

( (Com_insert + Com_update + Com_delete + Com_replace) / Com_commit ) = (0 + 20 + 0 + 0) / 0 = INF -- Statements per Commit (assuming all InnoDB) -- Low: Might help to group queries together in transactions; High: long transactions strain various things.

( Select_scan / Com_select ) = 96 / 355 = 27.0% -- % of selects doing full table scan. (May be fooled by Stored Routines.) -- Add indexes / optimize queries

( expire_logs_days ) = 0 -- How soon to automatically purge binlog (after this many days) -- Too large (or zero) = consumes disk space; too small = need to respond quickly to network/machine crash. (Not relevant if log_bin = OFF)

( slave_pending_jobs_size_max / max_allowed_packet ) = 128M / 64M = 2 -- For parallel slave threads -- slave_pending_jobs_size_max must not be less than max_allowed_packet

( slow_query_log ) = slow_query_log = OFF -- Whether to log slow queries. (5.1.12)

( long_query_time ) = 10 -- Cutoff (Seconds) for defining a "slow" query. -- Suggest 2

( back_log / max_connections ) = 151 / 151 = 100.0%

( Threads_created / Connections ) = 4 / 214 = 1.9% -- Rapidity of process creation -- Increase thread_cache_size (non-Windows)

Abnormally large:

Com_create_db = 0.17 /HR
Com_drop_db = 0.34 /HR
Com_show_profiles = 1.9 /HR
Innodb_buffer_pool_pages_flushed / max(Questions, Queries) = 0.365
Innodb_buffer_pool_pages_free = 443,594
Select_range / Com_select = 33.2%
Ssl_session_cache_size = 128
innodb_purge_threads = 4
innodb_undo_tablespaces = 2
max_error_count = 1,024
max_length_for_sort_data = 4,096
optimizer_trace_max_mem_size = 1.05e+6
slave_pending_jobs_size_max = 128MB

Abnormal strings:

Ssl_session_cache_mode = SERVER
default_authentication_plugin = caching_sha2_password
event_scheduler = ON
explicit_defaults_for_timestamp = ON
ft_boolean_syntax = + -><()~*:&
have_query_cache = NO
have_ssl = YES
have_symlink = DISABLED
innodb_buffer_pool_dump_at_shutdown = ON
innodb_buffer_pool_load_at_startup = ON
innodb_fast_shutdown = 1
innodb_undo_log_truncate = ON
log_syslog = ON
master_info_repository = TABLE
optimizer_trace = enabled=off,one_line=off
optimizer_trace_features = greedy_search=on, range_optimizer=on, dynamic_range=on, repeated_subselect=on
relay_log_info_repository = TABLE
slave_rows_search_algorithms = INDEX_SCAN,HASH_SCAN
ssl_ca = ca.pem
ssl_cert = server-cert.pem
ssl_key = server-key.pem
Rick James
  • 80,479
  • 5
  • 52
  • 119
0

I had a similar problem and was able to solve it by adding an additional condition to my WHERE clause, thereby reducing the number of rows it needs to examine. Depending on your application, you may or may not be able to use the same qualifier I did, but for me any time i'm looking at waypoints inside a polygon, i know what city the polygon is in, so i added WHERE city = XXX.

Here is my BEFORE query which took 2.55 seconds

mysql> select listingid FROM property p left join geocodes g on p.listingid=g.mls where st_within(point(g.lat,g.lon),  ST_GeomFromText('Polygon((26.404844904956107 -80.06515801164369, 26.404907366957104 -80.0682586452649, 26.40564729885774 -80.06828010293702, 26.405642494120446 -80.06509900304536, 26.404844904956107 -80.06515801164369))'));
+-----------+
| listingid |
+-----------+
| R10802616 |
| R10798568 |
| R10819043 |
+-----------+
3 rows in set (2.55 sec)

compared to the new query which is now 0.00 seconds

mysql> select listingid FROM property p FORCE INDEX(city) left join geocodes g on p.listingid=g.mls where city = 'Highland Beach' AND st_within(point(g.lat,g.lon),  ST_GeomFromText('Polygon((26.404844904
04907366957104 -80.0682586452649, 26.40564729885774 -80.06828010293702, 26.405642494120446 -80.06509900304536, 26.404844904956107 -80.06515801164369))'));
+-----------+
| listingid |
+-----------+
| R10802616 |
| R10798568 |
| R10819043 |
+-----------+
3 rows in set (0.00 sec)
tinlyx
  • 3,810
  • 14
  • 50
  • 79