1

I have a galera cluster with mariadb 10.6.22 and maxscale 24.02.2 in ubuntu 22.04 server.

This is my configuration:

[maxscale]
threads=auto

[srv1] type=server address=127.0.0.1 port=3306

[srv2] type=server address=10.0.0.2 port=3306

[Galera-Cluster] type=monitor module=galeramon servers=srv1,srv2 user=maxscale password=xxxxxxxxxx monitor_interval=2s root_node_as_master=true

[RW-Router] type=service router=readwritesplit cluster=Galera-Cluster user=maxscale password=xxxxxxxxxxx

[Read-Write-Listener] type=listener service=RW-Router protocol=mariadbprotocol address=0.0.0.0 port=4008

I'm experincing this issue: when I stop the master node, maxscale doesn't switch the slave to master.

┌────────┬────────────┬──────┬─────────────┬────────────────────────┬──────┬────────────────┐
│ Server │ Address    │ Port │ Connections │ State                  │ GTID │ Monitor        │
├────────┼────────────┼──────┼─────────────┼────────────────────────┼──────┼────────────────┤
│ srv1   │ 127.0.0.1  │ 3306 │ 0           │ Slave, Synced, Running │      │ Galera-Cluster │
├────────┼────────────┼──────┼─────────────┼────────────────────────┼──────┼────────────────┤
│ srv2   │ 10.0.0.11  │ 3306 │ 0           │ Down                   │      │ Galera-Cluster │
└────────┴────────────┴──────┴─────────────┴────────────────────────┴──────┴────────────────┘

I cannot understand the reason. Looking into maxscale log I dont find nothing useful:

MariaDB MaxScale  /var/log/maxscale/maxscale.log  Thu Jun  5 15:04:18 2025
----------------------------------------------------------------------------
notice : Module 'galeramon' loaded from '/usr/lib/x86_64-linux-gnu/maxscale/libgaleramon.so'.
notice : Module 'readwritesplit' loaded from '/usr/lib/x86_64-linux-gnu/maxscale/libreadwritesplit.so'.
notice : The logging of info messages has been enabled.
notice : Using up to 1.16GiB of memory for query classifier cache
notice : syslog logging is disabled.
notice : maxlog logging is enabled.
notice : Host: 'srv1.cinebot.it' OS: Linux@5.15.0-141-generic, #151-Ubuntu SMP Sun May 18 21:35:19 UTC 2025, x86_64 with 2 processor cores (2.00 available).
notice : Total main memory: 7.75GiB (7.75GiB usable).
notice : MaxScale is running in process 93218
notice : MariaDB MaxScale 24.02.2 started (Commit: b362d654969c495ec50fdf028f419514a854dd0a)
notice : Configuration file: /etc/maxscale.cnf
notice : Log directory: /var/log/maxscale
notice : Data directory: /var/lib/maxscale
notice : Module directory: /usr/lib/x86_64-linux-gnu/maxscale
notice : Service cache: /var/cache/maxscale
notice : Working directory: /var/log/maxscale
notice : Query classification results are cached and reused. Memory used per thread: 595.34MiB
notice : Password encryption key file '/var/lib/maxscale/.secrets' not found, using configured passwords as plaintext.
notice : The systemd watchdog is Enabled. Internal timeout = 30s
notice : Module 'pp_sqlite' loaded from '/usr/lib/x86_64-linux-gnu/maxscale/libpp_sqlite.so'.
info   : pp_sqlite loaded.
notice : [MariaDBProtocol] Parser plugin loaded.
info   : [pp_sqlite] In-memory sqlite database successfully opened for thread 140044797709888.
info   : No 'auto_tune' parameters specified, no auto tuning will be performed.
notice : Using HS256 for JWT signatures
warning: The MaxScale GUI is enabled but encryption for the REST API is not enabled, the GUI will not be enabled. Configure `admin_ssl_key` and `admin_ssl_cert` to enable HTTPS or add `admin_secure_gui=false` to allow use of the GUI without encryption.
notice : Started REST API on [127.0.0.1]:8989
notice : srv1 sent version string '10.6.22-MariaDB-0ubuntu0.22.04.1'. Detected type: MariaDB, version: 10.6.22.
notice : Server 'srv1' charset: utf8mb4_general_ci
info   : Variables have changed on 'srv1': 'character_set_client = utf8mb4', 'character_set_connection = utf8mb4', 'character_set_results = utf8mb4', 'max_allowed_packet = 16777216', 'session_track_system_variables = autocommit,character_set_client,character_set_connection,character_set_results,time_zone', 'system_time_zone = CEST', 'time_zone = SYSTEM', 'tx_isolation = REPEATABLE-READ', 'wait_timeout = 28800'
error  : Monitor was unable to connect to server srv2[10.0.0.11:3306] : 'Can't connect to server on '10.0.0.11' (115)'
notice : [galeramon] Found cluster members
notice : Starting a total of 1 services...
notice : (Read-Write-Listener); Listening for connections at [0.0.0.0]:4008
notice : Service 'RW-Router' started (1/1)
info   : [pp_sqlite] In-memory sqlite database successfully opened for thread 140044754216512.
info   : Epoll instance for listening sockets added to worker epoll instance.
info   : [pp_sqlite] In-memory sqlite database successfully opened for thread 140044745823808.
info   : Epoll instance for listening sockets added to worker epoll instance.
notice : MaxScale started with 2 worker threads.
notice : Read 19 user@host entries from 'srv1' for service 'RW-Router'.
info   : Accept authentication from 'admin', using password. Request: /v1/servers
info   : Accept authentication from 'admin', using password. Request: /v1/servers

The galera status seems to be ok for the slave server:

wsrep_local_state       4
wsrep_local_state_comment       Synced
wsrep_cluster_status    Primary
wsrep_local_index       1

Do you have any idea why Maxscaler doesn't switch slave to master if master is down?

Tobia
  • 211
  • 3
  • 11

2 Answers2

1

As you found out, turning off root_node_as_master fixed the issue. The use of this setting is not appropriate when the root node is either not a part of the cluster or is an arbitrator node. The general recommendation I'd give is to always configure the whole Galera cluster in the MaxScale.

The reason why one would use root_node_as_master is to make it so that multiple MaxScale instances always send writes to the same node (i.e. the one with wsrep_local_index = 0). This avoids conflicts that would otherwise happen when you write to multiple nodes that touch the same rows of a table.

Applications are rarely written to handle deadlock errors on COMMIT and thus your error rate increases with naive load balancing of writes to all nodes. As you've found out, offloading the routing decisions to MaxScale lets your application behave exactly as it does when you write to a single MariaDB node.

markusjm
  • 479
  • 2
  • 10
0

Galera is a multi-master cluster where every node is replicating everything synchronously. There is no "slave" in a Galera setup.

You could just remove MaxScale from this setup and have your databases connect to any of the Galera nodes directly at random, and fail-over to any other node at random.

Otto
  • 444
  • 10