0

Recently I upgraded MySQL on my server from newest 5.7.x to newest 8.0.x and I imported my entire database with multi-indexes that I created when I was using MySQL 5.7.x. These indexes optimize SELECT queries that run with WHERE clause on few columns (both INT and VARCHAR columns) on a table that has ~ 8 million records (no JOINs) and they used to work very well, but strangly after the upgrade they do NOT work at all... some queries on this table are MUCH slower :-( We don't want to stay with MySQL 5.7 as it will be discontinued this year. Have you had any similar problem as I have? I tweaked my.ini the same way I did in MySQL 5.7 by increasing buffer sizes, etc. Please help. I will provide more details later if it's necessary like the table structure, example queries and my.ini . Thank you very much.

EDIT: TABLE AND INDEXES DEFINITIONS:

CREATE TABLE IF NOT EXISTS event ( id int(11) NOT NULL AUTO_INCREMENT, imei char(50) NOT NULL, cradle_id varchar(45) NOT NULL, company_id int(10) unsigned NOT NULL, event_type int(10) NOT NULL, lat double NOT NULL, lon double NOT NULL,
speed double NOT NULL, dtime datetime NOT NULL, force_level double NOT NULL DEFAULT '0', duration double NOT NULL,
force_hist varchar(300) NOT NULL DEFAULT 'non', lr_force_hist varchar(300) NOT NULL DEFAULT 'non', speed_hist varchar(250) NOT NULL DEFAULT 'non', milage double NOT NULL, filter int(11) DEFAULT '1', filter_reason int(11) DEFAULT NULL, road_type int(11) DEFAULT NULL, calibration_variance double DEFAULT '0',
reason int(11) DEFAULT NULL, reason_other text,
distance_adjustment float DEFAULT NULL, max_speed smallint(6) DEFAULT NULL, location_name varchar(225) DEFAULT NULL,
device_id char(17) DEFAULT NULL, utc_offset int(11) NOT NULL DEFAULT '0', duplicated int(11) DEFAULT '0',

PRIMARY KEY (id), 
KEY event_index (imei,dtime,event_type,filter), 
KEY event_index2 (device_id,dtime,event_type,filter),
KEY event_index3 (filter)
) ENGINE=InnoDB AUTO_INCREMENT=40026132 DEFAULT CHARSET=utf8;

SLOW QUERY LOG:

Query_time: 2.986858  Lock_time: 0.000001 
    Rows_sent: 1  Rows_examined: 42355 SET timestamp=1680260078;

select * from event where ( ( imei = "0e69eb4e-9999-494c-873d-9416d3569ab0" and dtime > "2021-12-16 13:23:50" ) or ( imei = "ae7f20e8-34d8-48e4-99a1-ff6197bc16ff" and dtime > "2021-11-17 14:44:05" and dtime < "2021-12-16 13:23:49" ) or ( imei = "bd415fb6-9a36-493f-ba65-fd54db7e6fce" and dtime > "2021-01-21 08:46:21" and dtime < "2021-11-17 14:44:04" ) or ( imei = "357577093449854" and dtime > "2018-10-07 19:45:26" and dtime < "2021-01-21 08:46:20" ) or ( imei = "359751080340701" and dtime > "2017-07-31 11:58:57" and dtime < "2018-10-07 19:45:25" ) or ( imei = "358979073825062" and dtime > "2016-10-11 09:02:37" and dtime < "2017-07-31 11:58:56" ) ) and filter != 3 order by dtime desc, id desc limit 1;

EXPLAIN of the above query on MySQL 8.0:

1 SIMPLE event range event_index3,event_index event_index 155 82900 50.00 Using index condition; Using MRR; Using filesort

EXPLAIN of the above query on MySQL 5.7:

1 SIMPLE event range event_index3,event_index event_index 155 82900 50.00 Using index condition; Using filesort

The query above takes 1.167099 second on MySQL 8.0, but is super fast on MySQL 5.7, only 0.1 second on MySQL 5.7

This happens with default MySQL 8.0 settings in my.ini and also with tweaked my.ini (increased buffer sizes). Both of the databases have KEYRING plugin enabled.

I tried turning off different indexes (setting them INVISIBLE) on MySQL 8.0 but it didn't speed up the query at all.

The server has NVME disk that reads with speed ~ 12 GB/s.

There are about 8 million records in that table. Unfortunatelly I cannot attach entire dump of data here, because it would violate privacy of clients. I can create some dummy data though if you need it to help me. Thank you.

Rick James
  • 80,479
  • 5
  • 52
  • 119

4 Answers4

1

You need to provide more information, such as table and index definition, the slow query and the execution plans on 5.7 and 8.0, statistics of tables and columns referenced in predicates.

1

Looks MRR causing the performance regression. You actually have a better solution to avoid such a performance regression: try to add a composite index on (imei, dtime desc, id desc). It will avoid using MRR and filesort.

-- explanation

  1. You only take one row(limit 1),so you should avoid sorting, which is very expensive(sort all but only take one).

  2. To take advantage of the order of an index, the index you use can only take equal conditions in predicate, which is why range conditions shouldn't be considered.

For the new query, my suggestion will be rewriting it to following union query with an index on event(imei, dtime desc, id desc).

select PawDT_1680490904511.*
from
   ((select *, event.dtime, event.id
     from event
     where event.imei = '358979073825062' and event.dtime > '2016-10-11 09:02:37'
  and event.dtime < '2017-07-31 11:58:56' and event.filter <> 3 order by event.dtime desc, event.id desc limit 1) 
    union 
    (select *, event.dtime, event.id
     from event
     where event.imei = '0e69eb4e-9999-494c-873d-9416d3569ab0'
  and event.dtime > '2021-12-16 13:23:50' and event.filter <> 3 order by event.dtime desc, event.id desc limit 1)
    union 
    (select *, event.dtime, event.id
     from event
     where event.imei = 'ae7f20e8-34d8-48e4-99a1-ff6197bc16ff'
  and event.dtime > '2021-11-17 14:44:05' and event.dtime < '2021-12-16 13:23:49'
  and event.filter <> 3 order by event.dtime desc, event.id desc limit 1)
    union 
    (select *, event.dtime, event.id
     from event
     where event.imei = 'bd415fb6-9a36-493f-ba65-fd54db7e6fce'
  and event.dtime > '2021-01-21 08:46:21' and event.dtime < '2021-11-17 14:44:04'
  and event.filter <> 3 order by event.dtime desc, event.id desc limit 1)
    union 
    (select *, event.dtime, event.id
     from event
     where event.imei = '357577093449854' and event.dtime > '2018-10-07 19:45:26'
  and event.dtime < '2021-01-21 08:46:20' and event.filter <> 3 order by event.dtime desc, event.id desc limit 1) 
    union 
    (select *, event.dtime, event.id
     from event
     where event.imei = '359751080340701' and event.dtime > '2017-07-31 11:58:57'
  and event.dtime < '2018-10-07 19:45:25' and event.filter <> 3 order by event.dtime desc, event.id desc limit 1)) as PawDT_1680490904511
order by PawDT_1680490904511.dtime desc, PawDT_1680490904511.id desc limit 1

The logic is the same for each union branch.

0

Tomas almost has it. The issue is that OR is essentially un-optimizable. The workaround is to do a UNION ALL

Change his answer to

 SELECT ...
     FROM (
             ( SELECT ... ORDER BY ... LIMIT 1 )
             UNION ALL
             ( SELECT ... ORDER BY ... LIMIT 1 )
             UNION ALL
             ( SELECT ... ORDER BY ... LIMIT 1 )
             UNION ALL
             ...
          )
          ORDER BY ... LIMIT 1  -- yes, repeated

Without the ORDER BY ... LIMIT 1, it will collect all possible rows before reaching the final LIMIT . This could lead to a bulky temp table and filesort.

Since each test looks for a different imei, I feel safe is using UNION ALL instead of the slower UNION (which implies DISTINCT).

I think INDEX(imei, dtime) (or at least starting with those 2 columns) will be optimal.

This formulation and index should work well for any MySQL version.

... when I add more imei and dtime conditions ...

Consider creating and populating a table that contains those 3 values. Then JOIN to event -- no need for OR or UNION. And it may be even faster.

Rick James
  • 80,479
  • 5
  • 52
  • 119
0

I'm experiencing the same thing when doing "in" queries with thousands of values. It looks like the range optimizer changed how it estimates the memory usage. The range optimizer tries to estimate how much memory the query will use, and compares it to the variable range_optimizer_max_mem_size. If it comes over this value then it reverts to a full table scan. The default value for range_optimizer_max_mem_size is around 8 megabytes, and this allows for values ~35k values, which lines up with this estimation technique.

But specifically in mysql 8, if the table has multiple indexes in addition to the one used then it seems to be counting those somehow. If I have a table with

create table parameter_test
(
    id int not null auto_increment primary key,
    v1 varchar(64),
    v2 varchar(64),
    v3 varchar(64),
    v4 varchar(64),
    v5 varchar(64),
index ix_v1 (v1),
index ix_v2 (v2),
index ix_v3 (v3),
index ix_v4 (v4),
index ix_v5 (v5)

);

Then I do

select
        *
    from
        parameter_test
    where
        id in (
            1000001,1000002,1000003,1000004,1000005,1000006... (thousands of entries)
        );

it only goes to 9000 before it switches to a full table scan.

If I drop the varchar indexes then it gets to 34k before it switches to a full table scan.

I can't find any documentation about this change, and I don't know if the memory usage changed, or just the estimation of it changes.

armitage
  • 1,429
  • 2
  • 14
  • 20