2

We're running MariaDB server 10.6.16 from the RPM repository https://mirror.mariadb.org/yum/10.6/rhel9-amd64/ on a AlmaLinux 9.3 system with 48 GiB memory (no swap space) and 32 CPU cores. We're exclusively using InnoDB for the database tables. The system's filesystems run on NVME SSDs via Ceph.

Memory usage steadily increases more and more, triggering the OOM killer now and then.

The formula from https://serverfault.com/a/1020847 returns an upper memory consumption limit of 7500 MiB. I'm aware that this formula is not exact, but the MariaDB service is currently using about 11 GiB of memory, which is much more than the predicted maximum of 7500 MiB. We observed a memory usage of over 15 GiB in the past, just before the OOM killer hit.

What am I missing here?

/proc/[mariadb-pid]/limits:

Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             191442               191442               processes 
Max open files            32768                32768                files     
Max locked memory         8388608              8388608              bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       191442               191442               signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us  

[mysqld] section of the MariaDB configuration:

basedir = /usr
bind_address = 0.0.0.0
binlog_format = mixed
character-set-server = utf8
collation-server = utf8_unicode_ci
datadir = /var/lib/mysql
expire_logs_days = 7
innodb_buffer_pool_size = 4G
innodb_data_file_path = ibdata1:10M:autoextend
innodb_file_per_table = on
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50
innodb_log_buffer_size = 16M
innodb_log_file_size = 512M
innodb_log_files_in_group = 2
innodb_stats_on_metadata = off
key_buffer_size = 16M
lc_messages = en_US
lc_messages_dir = /usr/share/mysql
log-bin = /var/log/mariadb/mysql-bin
log-error = /var/log/mysqld.log
long_query_time = 5
max_allowed_packet = 16M
max_connections = 1024
myisam_sort_buffer_size = 8M
net_buffer_length = 8K
pid-file = /run/mariadb/mariadb.pid
port = 3306
query_cache_size = 4194304
query_cache_type = 1
read_buffer_size = 256K
read_rnd_buffer_size = 512K
require_secure_transport = ON
server-id = 1
skip-external-locking
skip-name-resolve
slow_query_log = ON
slow_query_log_file = /var/log/mariadb/slow-queries.log
socket = /var/lib/mysql/mysql.sock
sort_buffer_size = 2M
sql_mode = ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
ssl_ca = /etc/ssl/mariadb/chain.pem
ssl_cert = /etc/ssl/mariadb/cert.pem
ssl_key = /etc/ssl/mariadb/privkey.pem
table_definition_cache = 8000
table_open_cache = 16000
thread_cache_size = 8
thread_stack = 256K
tmp_table_size = 8M
tmpdir = /tmp
user = mysql

SHOW VARIABLES: https://pastebin.com/raw/NcT5jxiR

SHOW GLOBAL STATUS: https://pastebin.com/raw/B8MFr2WX

SHOW ENGINE INNODB STATUS:

=====================================
2024-02-07 09:51:38 0x7fcf4d171640 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 1 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 0 srv_active, 0 srv_shutdown, 508984 srv_idle
srv_master_thread log flush and writes: 508978
----------
SEMAPHORES
----------
------------
TRANSACTIONS
------------
Trx id counter 191793820
Purge done for trx's n:o < 191793816 undo n:o < 0 state: running
History list length 2
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION (0x7fd05f21e880), not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION (0x7fd05f21dd80), not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION (0x7fd05f21bc80), not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION (0x7fd05f21b180), not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION (0x7fd05f21a680), not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION (0x7fd05f21d280), not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION (0x7fd05f21c780), not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION (0x7fd05f219b80), not started
0 lock struct(s), heap size 1128, 0 row lock(s)
--------
FILE I/O
--------
Pending flushes (fsync) log: 0; buffer pool: 0
31442319 OS file reads, 3626734 OS file writes, 3262529 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 5.99 writes/s, 5.99 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 0, seg size 2, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 65657865607
Log flushed up to   65657865557
Pages flushed up to 65594938073
Last checkpoint at  65594938073
0 pending log flushes, 0 pending chkp writes
3185884 log i/o's done, 6.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 4328521728
Dictionary memory allocated 61678248
Buffer pool size   259584
Free buffers       1
Database pages     259583
Old database pages 95802
Modified db pages  5013
Percent of dirty pages(LRU & free pages): 1.931
Max dirty pages percent: 90.000
Pending reads 0
Pending writes: LRU 0, flush list 0
Pages made young 7999092, not young 2607506757
0.00 youngs/s, 0.00 non-youngs/s
Pages read 31427077, created 59965, written 435199
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 259583, unzip_LRU len: 0
I/O sum[290]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 read views open inside InnoDB
Process ID=0, Main thread ID=0, state: sleeping
Number of rows inserted 564360, updated 1722533, deleted 374105, read 92464595804
0.00 inserts/s, 3.00 updates/s, 0.00 deletes/s, 211216.78 reads/s
Number of system rows inserted 0, updated 0, deleted 0, read 0
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

SELECT COUNT(*) FROM information_schema.tables:

+----------+
| COUNT(*) |
+----------+
|    14520 |
+----------+
1 row in set (0.125 sec)

SELECT COUNT(*), sum(data_length), sum(index_length), sum(data_free) FROM information_schema.tables LIMIT 100000:

+----------+------------------+-------------------+----------------+
| COUNT(*) | sum(data_length) | sum(index_length) | sum(data_free) |
+----------+------------------+-------------------+----------------+
|      228 |           106496 |            106496 |              0 |
+----------+------------------+-------------------+----------------+
1 row in set (0.618 sec)

3 Answers3

2

3/1/2024 stepping stone AG
This is likely contributing to your problem.

https://stackoverflow.com/questions/78071133/mariadb-not-releasing-ram-after-jobs-finish

Global Variable SELECT @@version_malloc_library; needs to be jemalloc

The best to you, always.

1
Analysis of GLOBAL STATUS and VARIABLES:

Observations:

  • Version: 10.6.16-MariaDB-log
  • 16 GB of RAM
  • Uptime = 5d 22:22:23
  • 904 Queries/sec : 761 Questions/sec

The More Important Issues:

Why is SHOW CREATE TABLE being performed so often?

Lowering innodb_ft_result_cache_limit might help.

There seems to be a lot of thrashing in the Query cache. Suggest turning it off:

query_cache_type = OFF
query_cache_size = 0

Try lowering thread_pool_max_threads (currently 64K)

Why is UNLOCK TABLES being used?

Why so many tables? Are lots of them "identical"? Let's discuss the drawbacks.

A guess: There are a lot of tmp tables (inside SELECTs); these may first try to use 8MB of RAM before giving up to use disk:

  • Lower tmp_table_size to 4M to see if it helps; and
  • Use the slowlog to find "slow" queries; then let's improve them. Use long_query_time = 1; see SlowLog

Details and other observations:

( table_open_cache ) = 15,857 -- Number of table descriptors to cache -- Several hundred is usually good.

( Table_open_cache_misses ) = 1010093 / 512543 = 2 /sec -- May need to increase table_open_cache (now 15857)

( Com_show_create_table ) = 2531490 / 512543 = 4.9 /sec -- How often SHOW CREATE TABLE is executed. Possibly comes from a 3rd party package that is sloppy. -- Check table_open_cache (now 15857) and open_files_limit (now 32768). Or avoid the 3rd party package.

( innodb_buffer_pool_size ) = 4,096 / 16384M = 25.0% -- % of RAM used for InnoDB buffer_pool -- Set to about 70% of available RAM. (To low is less efficient; too high risks swapping.)

( innodb_lru_scan_depth ) = 1,536 -- innodb_lru_scan_depth is a very poorly named variable. A better name would be innodb_free_page_target_per_buffer_pool. It is a number of pages InnoDB tries to keep free in each buffer pool instance to speed up read and page creation operations. -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixed by lowering lru_scan_depth

( innodb_io_capacity ) = 200 -- When flushing, use this many IOPs. -- Reads could be slugghish or spiky. Use 2000 if using SSD drive.

( Innodb_buffer_pool_pages_old / Innodb_buffer_pool_pages_total ) = 95,803 / 259584 = 36.9% -- Pct of buffer_pool that is "old".

( innodb_io_capacity_max / innodb_io_capacity ) = 2,000 / 200 = 10 -- Capacity: max/plain -- Recommend 2. Max should be about equal to the IOPs your I/O subsystem can handle. (If the drive type is unknown 2000/200 may be a reasonable pair.)

( Uptime / 60 * innodb_log_file_size / Innodb_os_log_written ) = 512,543 / 60 * 512M / 5039606272 = 910 -- Minutes between InnoDB log rotations Beginning with 5.6.8, innodb_log_file_size can be changed dynamically; I don't know about MariaDB. Be sure to also change my.cnf -- (The recommendation of 60 minutes between rotations is somewhat arbitrary.) Adjust innodb_log_file_size (now 536870912). (Cannot change in AWS.)

( Innodb_row_lock_waits/Innodb_rows_inserted ) = 19,947/565194 = 3.5% -- Frequency of having to wait for a row.

( innodb_flush_neighbors ) = innodb_flush_neighbors = 1 -- A minor optimization when writing blocks to disk. -- Use 0 for SSD drives; 1 for HDD.

( innodb_io_capacity ) = 200 -- I/O ops per second capable on disk . 100 for slow drives; 200 for spinning drives; 1000-2000 for SSDs; multiply by RAID factor. Limits write IO requests per second (IOPS). -- For starters: HDD: 200; SSD: 2000.

( innodb_flush_log_at_trx_commit ) = 1 -- 1 = secure; 2 = faster -- (You decide) Use 1, along with sync_binlog (now 0)=1 for the greatest level of fault tolerance. 0 is best for speed. 2 is a compromise between 0 and 1.

( sync_binlog ) = 0 -- Use 1 for added security, at some cost of I/O =1 may lead to lots of "query end"; =0 may lead to "binlog at impossible position" and lose transactions in a crash, but is faster. 0 is OK for Galera.

( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF -- Whether to log all Deadlocks. -- If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.

( innodb_ft_result_cache_limit ) = 2,000,000,000 / 16384M = 11.6% -- Byte limit on FULLTEXT resultset. (It grows as needed.) -- Lower the setting.

( character_set_client ) = character_set_client = utf8mb3 -- -- If you will be using text from places other than Western Europe, consider switching to utf8mb4. (Beyond the scope of this discussion.)

( character_set_connection ) = character_set_connection = utf8mb3 --

( character_set_results ) = character_set_results = utf8mb3 --

( local_infile ) = local_infile = ON -- local_infile (now ON) = ON is a potential security issue

( query_cache_size ) = 4M -- Size of QC -- Too small = not of much use. Too large = too much overhead. Recommend either 0 or no more than 50M.

( Qcache_lowmem_prunes ) = 223055278 / 512543 = 435 /sec -- Running out of room in QC -- increase query_cache_size (now 4194304)

( Qcache_lowmem_prunes/Qcache_inserts ) = 223,055,278/224646888 = 99.3% -- Removal Ratio (frequency of needing to prune due to not enough memory)

( Qcache_hits / Qcache_inserts ) = 119,595,591 / 224646888 = 0.532 -- Hit to insert ratio -- high is good -- Consider turning off the query cache.

( Qcache_inserts - Qcache_queries_in_cache ) = (224646888 - 700) / 512543 = 438 /sec -- Invalidations/sec.

( (query_cache_size - Qcache_free_memory) / Qcache_queries_in_cache / query_alloc_block_size ) = (4M - 582944) / 700 / 16384 = 0.315 -- query_alloc_block_size vs formula -- Adjust query_alloc_block_size (now 16384)

( Created_tmp_tables ) = 14256825 / 512543 = 28 /sec -- Frequency of creating "temp" tables as part of complex SELECTs.

( Created_tmp_disk_tables ) = 3584311 / 512543 = 7 /sec -- Frequency of creating disk "temp" tables as part of complex SELECTs -- increase tmp_table_size (now 8388608) and max_heap_table_size (now 16777216). Check the rules for temp tables on when MEMORY is used instead of MyISAM. Perhaps minor schema or query changes can avoid MyISAM. Better indexes and reformulation of queries are more likely to help.

( binlog_format ) = binlog_format = MIXED -- STATEMENT/ROW/MIXED. -- ROW is preferred by 5.7 (10.3)

( long_query_time ) = 5 -- Cutoff (Seconds) for defining a "slow" query. -- Suggest 2

( Max_used_connections / max_connections ) = 205 / 1024 = 20.0% -- Peak % of connections -- Since several memory factors can expand based on max_connections (now 1024), it is good not to have that setting too high.

( Com_change_db ) = 4206525 / 512543 = 8.2 /sec -- Probably comes from USE statements. -- Consider connecting with DB, using db.tbl syntax, eliminating spurious USE statements, etc.

( Connections ) = 7860380 / 512543 = 15 /sec -- Connections -- Increase wait_timeout (now 28800); use pooling?

( thread_pool_max_threads ) = 65,536 -- One of many settings for MariaDB's thread pooling -- Lower the value.

Abnormally small:

Handler_read_next / Handler_read_key = 0.58
min(max_heap_table_size, tmp_table_size) = 8MB
net_buffer_length = 8,192
tmp_memory_table_size = 8MB

Abnormally large:

(Com_select + Qcache_hits) / (Com_insert + Com_update + Com_delete + Com_replace) = 90
Acl_database_grants = 171
Acl_users = 175
Binlog_snapshot_position = 1.06e+9
Com_release_savepoint = 14 /HR
Com_rollback_to_savepoint = 2.3 /sec
Com_savepoint = 14 /HR
Com_show_binlog_status = 0.014 /HR
Com_show_events = 1.1 /HR
Com_show_fields = 2.5 /sec
Com_show_grants = 0.099 /sec
Com_stmt_close = 72 /sec
Com_stmt_execute = 72 /sec
Com_stmt_prepare = 72 /sec
Com_unlock_tables = 18 /HR
Feature_check_constraint = 22 /HR
Feature_fulltext = 0.054 /sec
Feature_json = 4.6 /sec
Feature_subquery = 167 /sec
Handler_discover = 0.77 /HR
Handler_read_key = 90462 /sec
Handler_savepoint = 28 /HR
Handler_savepoint_rollback = 4.6 /sec
Handler_tmp_update = 6181 /sec
Innodb_buffer_pool_pages_lru_flushed = 317,184
Innodb_buffer_pool_pages_lru_freed = 1.72e+6
Innodb_data_pending_reads = 1
Innodb_dblwr_pages_written / Innodb_dblwr_writes = 72.5
Innodb_instant_alter_column = 0.47 /HR
Innodb_log_writes / Innodb_log_write_requests = 675.0%
Innodb_num_open_files = 15,257
Memory_used = 29.6%
Memory_used_initial = 4,280.9MB
Open_streams = 4
Open_table_definitions = 14,162
Open_tables = 15,857
Opened_views = 0.15 /sec
Qcache_hits = 233 /sec
Qcache_inserts = 438 /sec
Ssl_accepts = 536
Ssl_finished_accepts = 536
Subquery_cache_hit = 494 /sec
Subquery_cache_miss = 114 /sec
innodb_lru_scan_depth / innodb_io_capacity = 7.68
innodb_open_files = 15,857
innodb_purge_batch_size = 1,000
performance_schema_max_statement_classes = 222
table_definition_cache = 8,000

Abnormal strings:

Slave_heartbeat_period = 0
Slave_received_heartbeats = 0
aria_recover_options = BACKUP,QUICK
character_set_system = utf8mb3
disconnect_on_expired_password = OFF
innodb_fast_shutdown = 1
old_alter_table = DEFAULT
old_mode = UTF8_IS_UTF8MB3
optimizer_trace = enabled=off
require_secure_transport = ON
sql_slave_skip_counter = 0
Rick James
  • 2,743
0

Out Of Memory Suggestion to consider for relief.

Create 10G SWAP Space.

It would be better to run slow for a few seconds than deal with OOM.

Rate Per Second = RPS

Suggestions to consider for your my.cnf [mysqld] section to reduce RAM used

max_connections= 512  # from 1024 - max_connections_used = 205 in 5 days.
read_rnd_buffer_size=16K  # from 512K to reduce handler_read_rnd_next RPS 108,335
innodb_io_capacity=900  # from 200 to use more of NVME IOPS capacity
thread_cache_size=128  # from 8 to reduce threads_created RPS 2+.
innodb_old_blocks_pct=1  # from 37 to support more rows in Innodb_buffer_pool_size

There are many more opportunities to conserve RAM in your configuration and improve performance every day/night.