6

We had a small malfunction which generated unexpected amount of data for a device. Our code ends up creating a select query along the lines of SELEXT * from XXX WHERE yyy IN (xxx,xxx,xxx....,xxxx) That should fetch data of last 5 days which normally is not a problem at all as the WHERE field is indexed.

We suddenly started having large amount of slow queries and bogging the server down. The queries were running at around 5000 seconds before we aborted them. On figuring out what was happening it turns out that if we have a query with maximum of 33594 parameters within the IN () clause, everything works fine and query finishes in 200ms. If we add one parameter the select seems to be running forever. We haven't let one to run indefinetly until it completes so I'm not sure if it will complete or not but they seem to last at least hours...

The data within the select is about 361.04KB. We're running on Google cloud SQL with MYSQL MYSQL_8_0_31

Any ideas on what might be causing this behaviour?

Tuxie
  • 182
  • 1
  • 5

2 Answers2

5

I'm more curious about why this happens to understand what's the limit...

The why is because it's an overly complicated predicate. The IN clause is syntactical sugar for a bunch of OR clauses. Too many OR clauses or too many values in an IN clause results in an overly complicated predicate that the database engine is unable to parse and generate an efficient query plan for. Instead, the query plan being generated is now likely scanning the entire table in an inefficient manner.

There is no fixed hard-coded limit when this happens, and it will vary from query to query.

...to better understand the system and thus better be able to understand what workaround/fix is the best.

~33,000 values in an IN clause is an unreasonable amount. Heck, even 100 values in my mind is unreasonable, and in my opinion such a query design is an anti-pattern.

There's a few main ways to re-write the query that can fix the issue:

  1. Break the query up into multiple copies of the query with much smaller IN clauses and UNION them back together. This is probably also unrealistic to do with so many values in your IN clause though.

  2. Put the values in some sort of table, such as a temporary table, and use a JOIN to it from the table you're trying to filter down. This is typically more relationally efficient.

J.D.
  • 40,776
  • 12
  • 62
  • 141
1

At my company we have had the problem since MySQL 5.7 that there's a cap on the memory the optimizer can use, according to the option range_optimizer_max_mem_size. If you have too many items in the list, the optimizer gives up, and decides to do a table-scan instead of use the index.

See https://dev.mysql.com/doc/refman/5.7/en/range-optimization.html#range-optimization-memory-use

Our solution was to set this in the my.cnf:

range_optimizer_max_mem_size=0

It means there is no memory limit for the optimizer, so it's up to the developer to avoid a list so long that it causes memory faults. But it's more likely to use an index.

This has fixed the issue in every case where a long list was causing a table-scan.

Bill Karwin
  • 16,963
  • 3
  • 31
  • 45