MariaDB 11.4 threads freezing

Question

I'm running MariaDB 11.4.4 with a busy website (Debian12+nginx+php-fpm).

Once every few days MariaDB threads looks frozen until max connection is reached. Even killing the query doesn't release the thread. MariaDB can't shutdown by systemctl stop mariadb Finally I need to killall -9 mariadbd to resume operations. There is nothing unexpected in mysql/error.log or slow_query.log

Here is SHOW FULL PROCESSLIST, SHOW GLOBAL STATUS and SHOW ENGINE INNODB STATUS taken just before full MariaDB lockdown:

https://pastejustit.com/mariadb-debug-info

SHOW VARIABLES is here:

https://pastebin.com/xeVjpT4A

There have been similar reports in the past, however they were related to an older version of MariaDB, no clear solution and were likely fixed in MariaDB 10.6.*

Any ideas how to avoid this issue?

ADD: fully_ping table definition

CREATE TABLE `fully_ping` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `devid` varchar(255) NOT NULL,
  `appid` int(11) NOT NULL DEFAULT 1,
  `firstping_time` datetime DEFAULT NULL COMMENT 'UTC',
  `lastping_time` datetime NOT NULL COMMENT 'UTC',
  `message` mediumtext NOT NULL,
  `pass` varchar(255) DEFAULT NULL,
  `ip` varchar(255) DEFAULT NULL,
  `serial` varchar(255) DEFAULT NULL,
  `counter` int(11) NOT NULL DEFAULT 0,
  `version` int(11) DEFAULT NULL,
  `tls_version` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `devid` (`devid`(191)),
  KEY `appid` (`appid`),
  KEY `lastping_time` (`lastping_time`),
  KEY `counter` (`counter`),
  KEY `serial` (`serial`),
  CONSTRAINT `fully_ping_ibfk_1` FOREIGN KEY (`appid`) REFERENCES `fully_apps` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=180169 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_520_ci

score 1 · Accepted Answer · answered Jan 29 '25 at 09:35

1

I am having the same issues on Debian. This is probably triggering my issue: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1093243

answered Jan 29 '25 at 09:35

zimmi

26
1

Rick James · Answer 2 · 2025-01-26T16:11:56.090

Please provide SHOW CREATE TABLE fully_ping

Possibly this would help:

INDEX(appid, devid, counter)

Can you change the declaration of devid to VARCHAR(191)? or are some of them too long?

And, replace

DELETE FROM fully_history
    WHERE account_id IN (11368)
      AND COALESCE(schedule_time, request_time) <
              UTC_TIMESTAMP() - INTERVAL 1 WEEK

by

DELETE FROM fully_history
    WHERE account_id IN (11368)
      AND schedule_time IS NULL
      AND request_time <
              UTC_TIMESTAMP() - INTERVAL 1 WEEK;
DELETE FROM fully_history
    WHERE account_id IN (11368)
      AND schedule_time IS NOT NULL
      AND schedule_time <
              UTC_TIMESTAMP() - INTERVAL 1 WEEK;

together with

INDEX(account_id, schedule_time, request_time)
INDEX(account_id, request_time, schedule_time)

score 0 · Answer 3 · answered Jan 27 '25 at 03:09

Hmmmm.... Some interesting things in the STATUS / VARIABLES:

Analysis of GLOBAL STATUS and VARIABLES:

Observations:

Version: 11.4.4-MariaDB-deb12-log
16 GB of RAM ??
Uptime = 1d 09:24:37
7.8e+3 QPS

The More Important Issues:

Recommended settings changes:

innodb_buffer_pool_size = 8G  -- (to make room for other things)
max_connections = 300  -- push back on clients (see below)
thread_cache_size = 400
table_open_cache = 4000
innodb_open_files = 4000
table_definition_cache = 1000
host_cache_size = 1000
innodb_io_capacity = 2000  -- if using SSD
innodb_flush_neighbors = 0  -- if using SSD
innodb_log_file_size = 8G  -- caution: may be tricky to change

Change the clients to be less aggressive at connecting to MariaDB. When you have 600 connections (cf Max_used_connections), each is stumbling over all the others. If you are using Apache, lower the number of children there, and let it build up a backlog.

Why so many SET commands? (987/sec) Look into connection pooling (re 987 Connections/sec.)

Let's see the slowlog -- there may be additional clues there:(https://mysql.rjweb.org/doc.php/mysql_analysis#slow_queries_and_slowlog) This may help with the high number of tmp_tables being created.

Details and other observations:

( Key_blocks_used * 1024 / key_buffer_size ) = 0 * 1024 / 128M = 0 -- Percent of key_buffer used. High-water-mark. -- Lower key_buffer_size (now 134217728) to avoid unnecessary memory usage.

( Table_open_cache_misses ) = 115349902 / 120277 = 959 /sec -- May need to increase table_open_cache (now 2000)

( Table_open_cache_misses / (Table_open_cache_hits + Table_open_cache_misses) ) = 115,349,902 / (1508587798 + 115349902) = 7.1% -- Effectiveness of table_open_cache. -- Increase table_open_cache (now 2000) and check table_open_cache_instances (now 8).

( binlog_cache_size * max_connections ) = (16M * 700) / 16384M = 68.4% -- RAM used for caching transactions on the way to the binlog. -- Decrease binlog_cache_size (now 16777216) and/or max_connections (now 700)

( innodb_lru_scan_depth ) = 1,536 -- innodb_lru_scan_depth is a very poorly named variable. A better name would be innodb_free_page_target_per_buffer_pool. It is a number of pages InnoDB tries to keep free in each buffer pool instance to speed up read and page creation operations. -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixed by lowering lru_scan_depth

( innodb_io_capacity ) = 200 -- When flushing, use this many IOPs. -- Reads could be slugghish or spiky. Use 2000 if using SSD drive.

( Innodb_buffer_pool_pages_free / Innodb_buffer_pool_pages_total ) = 418,553 / 648960 = 64.5% -- Pct of buffer_pool currently not in use -- innodb_buffer_pool_size (now 10737418240) is bigger than necessary?

( innodb_io_capacity_max / innodb_io_capacity ) = 2,000 / 200 = 10 -- Capacity: max/plain -- Recommend 2. Max should be about equal to the IOPs your I/O subsystem can handle. (If the drive type is unknown 2000/200 may be a reasonable pair.)

( Innodb_buffer_pool_bytes_data / innodb_buffer_pool_size ) = 3,774,693,376 / 10240M = 35.2% -- Percent of buffer pool taken up by data -- A small percent may indicate that the buffer_pool is unnecessarily big.

( Innodb_os_log_written ) = 505660239032 / 120277 = 4204130 /sec -- This is an indicator of how busy InnoDB is. -- Very idle or very busy InnoDB.

( Innodb_log_writes ) = 40547610 / 120277 = 337 /sec

( Uptime / 60 * innodb_log_file_size / Innodb_os_log_written ) = 120,277 / 60 * 2048M / 505660239032 = 8.51 -- Minutes between InnoDB log rotations Beginning with 5.6.8, innodb_log_file_size can be changed dynamically; I don't know about MariaDB. Be sure to also change my.cnf -- (The recommendation of 60 minutes between rotations is somewhat arbitrary.) Adjust innodb_log_file_size (now 2147483648). (Cannot change in AWS.)

( Innodb_num_open_files/innodb_open_files ) = 76/2000 = 3.8% -- Pct of InnoDB's file cache currently in use(?) -- innodb_open_files (now 2000) too small = wasting RAM; too large: thrashing.

( default_tmp_storage_engine ) = default_tmp_storage_engine =

( Innodb_row_lock_waits ) = 13945 / 120277 = 0.12 /sec -- How often there is a delay in getting a row lock. -- May be caused by complex queries that could be optimized.

( innodb_flush_neighbors ) = innodb_flush_neighbors = 1 -- A minor optimization when writing blocks to disk. -- Use 0 for SSD drives; 1 for HDD.

( innodb_io_capacity ) = 200 -- I/O ops per second capable on disk . 100 for slow drives; 200 for spinning drives; 1000-2000 for SSDs; multiply by RAID factor. Limits write IO requests per second (IOPS). -- For starters: HDD: 200; SSD: 2000.

( innodb_flush_log_at_trx_commit ) = 1 -- 1 = secure; 2 = faster -- (You decide) Use 1, along with sync_binlog (now 0)=1 for the greatest level of fault tolerance. 0 is best for speed. 2 is a compromise between 0 and 1.

( sync_binlog ) = 0 -- Use 1 for added security, at some cost of I/O =1 may lead to lots of "query end"; =0 may lead to "binlog at impossible position" and lose transactions in a crash, but is faster. 0 is OK for Galera.

( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF -- Whether to log all Deadlocks. -- If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.

( min( tmp_table_size, max_heap_table_size ) ) = (min( 256M, 256M )) / 16384M = 1.6% -- Percent of RAM to allocate when needing MEMORY table (per table), or temp table inside a SELECT (per temp table per some SELECTs). Too high may lead to swapping. -- Decrease tmp_table_size (now 268435456) and max_heap_table_size (now 268435456) to, say, 1% of ram.

( innodb_ft_result_cache_limit ) = 2,000,000,000 / 16384M = 11.6% -- Byte limit on FULLTEXT resultset. (It grows as needed.) -- Lower the setting.

( character_set_client ) = character_set_client = utf8mb3 -- -- If you will be using text from places other than Western Europe, consider switching to utf8mb4. (Beyond the scope of this discussion.)

( character_set_connection ) = character_set_connection = utf8mb3 --

( character_set_results ) = character_set_results = utf8mb3 --

( log_warnings ) = log_warnings = 4

( Max_statement_time_exceeded ) = 594 / 120277 = 18 /HR -- Identify long-running statements that have been aborted.

( local_infile ) = local_infile = ON -- local_infile (now ON) = ON is a potential security issue

( Com_set_option ) = 118817521 / 120277 = 987 /sec -- Frequency of using SET option -- Why so high?

( Questions ) = 937764234 / 120277 = 7796 /sec -- Queries (outside SP) -- "qps" -- >2000 may be stressing server

( Queries ) = 937764234 / 120277 = 7796 /sec -- Queries (including inside SP) -- >3000 may be stressing server

( Created_tmp_tables ) = 4922791 / 120277 = 41 /sec -- Frequency of creating "temp" tables as part of complex SELECTs.

( Created_tmp_disk_tables ) = 3193850 / 120277 = 27 /sec -- Frequency of creating disk "temp" tables as part of complex SELECTs -- increase tmp_table_size (now 268435456) and max_heap_table_size (now 268435456). Check the rules for temp tables on when MEMORY is used instead of MyISAM. Perhaps minor schema or query changes can avoid MyISAM. Better indexes and reformulation of queries are more likely to help.

( Created_tmp_disk_tables / Created_tmp_tables ) = 3,193,850 / 4922791 = 64.9% -- Percent of temp tables that spilled to disk -- Maybe increase tmp_table_size (now 268435456) and max_heap_table_size (now 268435456); improve indexes; avoid blobs, etc.

( tmp_table_size ) = 256M -- Limit on size of MEMORY temp tables used to support a SELECT -- Decrease tmp_table_size (now 268435456) to avoid running out of RAM. Perhaps no more than 64M.

( Com_show_variables ) = 240531 / 120277 = 2 /sec -- SHOW VARIABLES ... -- Why are you requesting the VARIABLES so often?

( Com_insert + Com_delete + Com_delete_multi + Com_replace + Com_update + Com_update_multi ) = (673370 + 23336 + 8020 + 5 + 117459591 + 0) / 120277 = 982 /sec -- writes/sec -- 50 writes/sec + log flushes will probably max out I/O write capacity of normal drives

( binlog_format ) = binlog_format = MIXED -- STATEMENT/ROW/MIXED. -- ROW is preferred by 5.7 (10.3)

( expire_logs_days ) = 1.9867897033691E-8M = 0.0208 -- How soon to automatically purge binlog (after this many days). Being replaced by binlog_expire_logs_seconds. -- Too large (or zero) = consumes disk space; too small = need to respond quickly to network/machine crash. (Not relevant if log_bin (now OFF) = OFF)

( Max_used_connections ) = 604 -- High-water mark for connections -- Lots of inactive connections is OK; over 100 active connections is likely to be a problem. Max_used_connections (now 604) does not distinguish them; Threads_running (now 603) is instantaneous.

( Connections ) = 118801097 / 120277 = 987 /sec -- Connections -- Increase wait_timeout (now 28800); use pooling?

( Threads_connected / max_connections ) = 604 / 700 = 86.3% -- Threads, as a % of max allowed -- Increase max_connection, decrease Apache's MaxClients, decrease wait_timeout (now 28800)

( Threads_running - 1 ) = 603 - 1 = 602 -- Active threads (concurrency when data collected) -- Optimize queries and/or schema. A value of more than, say, twice the number of CPU cores is likely to indicate overloading.

( Threads_running / thread_cache_size ) = 603 / 256 = 2.36 -- Threads: current / cached (Not relevant when using thread pooling) -- Optimize queries

( thread_pool_max_threads ) = 65,536 -- One of many settings for MariaDB's thread pooling -- Lower the value.

You have the Query Cache half-off. You should set both query_cache_type = OFF and query_cache_size = 0 . There is (according to a rumor) a 'bug' in the QC code that leaves some code on unless you turn off both of those settings.

Abnormally small:

Aria_pagecache_blocks_unused = 14,298
Innodb_data_writes - Innodb_log_writes - Innodb_dblwr_writes = -1.0e+6 /HR
innodb_log_write_ahead_size = 512
innodb_lru_flush_size = 0

Abnormally large:

Access_denied_errors = 114,317
Acl_roles = 1
Aria_pagecache_reads = 27 /sec
Bytes_received = 4561607 /sec
Com_check = 2.9 /HR
Com_delete_multi = 0.067 /sec
Com_release_savepoint = 1 /HR
Com_rollback_to_savepoint = 74 /HR
Com_savepoint = 1 /HR
Com_select = 4834 /sec
Com_show_binlog_status = 0.09 /HR
Com_show_create_db = 1 /HR
Com_show_generic = 0.95 /sec
Com_show_status = 1 /sec
Com_update = 976 /sec
Empty_queries = 1369 /sec
Feature_timezone = 66 /HR
Handler_commit = 5815 /sec
Handler_discover = 2.4 /HR
Handler_read_rnd = 17254 /sec
Handler_savepoint = 1 /HR
Handler_savepoint_rollback = 74 /HR
Innodb_buffer_pool_write_requests = 14062 /sec
Innodb_buffer_pool_write_requests / Innodb_buffer_pool_pages_flushed = 246
Innodb_checkpoint_age / innodb_log_file_size = 0.679
Innodb_data_fsyncs = 339 /sec
Innodb_dblwr_pages_written / Innodb_dblwr_writes = 123
Innodb_log_write_requests = 10654 /sec
Innodb_pages_created = 56 /sec
Innodb_pages_written/Innodb_data_writes = 99.2%
Memory_used = 73.5%
Memory_used_initial = 10,630.8MB
Open_streams = 4
Opened_views = 956 /sec
Select_full_range_join = 0.14 /sec
Sort_priority_queue_sorts = 25 /sec
Sort_range = 955 /sec
Sort_rows = 17089 /sec
Ssl_accepts = 1,889
Ssl_default_timeout = 7,200
Ssl_finished_accepts = 1,889
Ssl_verify_depth = 1.84e+19
Ssl_verify_mode = 5
Table_open_cache_hits = 12542 /sec
Tc_log_page_size = 4,096
Threads_connected = 604
Threads_running = 603
binlog_cache_size = 1.68e+7
innodb_buffer_pool_chunk_size = 160MB
innodb_lru_scan_depth / innodb_io_capacity = 7.68
innodb_undo_tablespaces = 3
log_slow_rate_limit = 10
max_statement_time = 20
pseudo_thread_id = 5.98e+7

MariaDB 11.4 threads freezing

3 Answers3