I have recently upgraded from MySQL 5.7 to 8 on several databases innodb engine with the same schema but different data hosted on RDS, performance was fine on all of them until the last one which was also the biggest but also has double the CPU cores and double the memory than the 2nd biggest which is running fine.
When I first tried to upgrade the performance was so bad that I was forced to swap back to the old 5.7. After some testing I discovered that it was only certain queries that were slower and that they performed much better when changing the internal_temp_storage_engine to MEMORY so I tried again with this change. While it was better there were still some queries performing considerably worse, one was poorly optimised and optimising it put the database into a usable state. The queries all involve 1 table which is one of the most used tables in the database and there doesn't seem to be a lot of scope to optimise them individually.
The strange thing is that when I lift the queries from the slow query log they run quite quickly but if I run it 5-10 times waiting a couple of seconds between each run there will be one that takes considerably longer.
some of the times from SHOW PROFILES
| Query_ID | Duration |
|---|---|
| 2 | 0.840759 |
| 3 | 1.398106 |
| 4 | 0.963524 |
| 5 | 0.747397 |
| 6 | 0.836251 |
| 7 | 0.713267 |
| 8 | 1.438113 |
| 9 | 0.775215 |
| 10 | 8.35518 |
executing results(only results that vary much between the 3) from the detail for queries 8, 9 & 10
| QUERY_ID | SEQ | STATE | DURATION | CPU_USER | CPU_SYSTEM | CONTEXT_VOLUNTARY | CONTEXT_INVOLUNTARY | BLOCK_OPS_IN | BLOCK_OPS_OUT | MESSAGES_SENT | MESSAGES_RECEIVED | PAGE_FAULTS_MAJOR | PAGE_FAULTS_MINOR | SWAPS | SOURCE_FUNCTION | SOURCE_FILE | SOURCE_LINE |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 8 | 13 | executing | 1.437335 | 3.747356 | 0.200181 | 5144 | 162 | 32 | 1800 | 0 | 0 | 0 | 131 | 0 | ExecuteIteratorQuery | sql_union.cc | 1669 |
| 9 | 13 | executing | 0.774446 | 1.729243 | 0.03518 | 1463 | 24 | 32 | 296 | 0 | 0 | 0 | 141 | 0 | ExecuteIteratorQuery | sql_union.cc | 1669 |
| 10 | 13 | executing | 8.354382 | 34.46092 | 1.697898 | 149678 | 20723 | 128 | 13232 | 0 | 0 | 0 | 1955 | 0 | ExecuteIteratorQuery | sql_union.cc | 1669 |
slow query info for query 10
Query_time: 8.355097 Lock_time: 0.000002 Rows_sent: 1 Rows_examined: 290330 Thread_id: 46276 Errno: 0 Killed: 0 Bytes_received: 273 Bytes_sent: 57 Read_first: 0 Read_last: 0 Read_key: 4 Read_next: 290330 Read_prev: 0 Read_rnd: 0 Read_rnd_next: 0 Sort_merge_passes: 0 Sort_range_count: 0 Sort_rows: 0 Sort_scan_count: 0 Created_tmp_disk_tables: 0 Created_tmp_tables: 0 Start: 2023-10-11T10:16:54.340615Z End: 2023-10-11T10:17:02.695712Z
I haven't been able to find much information on interpretating this but I believe what this is telling me is that the index required to process the query is no longer present in the buffer and it needs to reload it from disk. The database server is not under any pressure when queries are running slow, the CPU usage sits between 30-40% and the buffer when running this was at 91% of 24GB on a 32GB box so I don't see why the index would no longer be in the buffer.
Is there anyone that understands this information better and knows of a way to fix this issue, I think I need to increase a buffer setting but I think the 24GB buffer that is currently in use is already big enough, some of my research is pointing to increasing join_buffer_size(currently 262144) and or innodb_buffer_pool_instances(currently 8) but trial and error hasn't being going well for me so I am hoping that someone else can provide more information.
Update 2023-10-19
The query(all of the ones from the SHOW PROFILES are this one) that was run above was parameters were likely different
select count(id) from appointment where practice_id = 141 and appointment_time > '2023-12-11 09:00:00' and reschedule = 1 and ifnull(delete_ind,0) = 0 and appointment_status_id not in (5,12,13,14)
I am going to test changing the appointment_time index to descending to see if that makes a difference as several queries sort filter on appointment_time and its usually a date close to the current and often a > and 99.9% of the 10 million rows have appointment dates in the past. create statement https://justpaste.it/cobzf
There are several queries that have the similar behaviour of running quick most of the time but really slow sometimes. The main reason some queries were running much slower on MySQL 8 than 5.7 is because of differences in the query optimiser, I have improved most of them but the behaviour of occasional very poor performance still happens but it now doesn't have the same negative affects on end users.
Answers to other questions Server running slowly is a db.t3.2xlarge (8 cores 32GB) gp2 storage max iops 3000 getting nowhere close to this limit at any time. From observing the server I believe the slowness is being caused by the storage, wait/io/table/sql/handler was high during a slow period but throughput peaked at 16.75 and iops at 731 quite a bit below the maximum, I am thinking I need to increase the io read and write threads currently 4 for each.
A) table count - 582
B) server was restarted last night will fill this out later
C) variables - https://justpaste.it/bmuwk
D) mostly empty except some sleeps will try get one when it is slow
E) server was restarted last night will fill this out later
H) server was restarted last night will fill this out later
G) server was restarted last night will fill this out later
Additionally I have run MySQLTuner and am considering implementing its recommendations after a bit more research. https://justpaste.it/56gex