Best Practice of High Availability For MariaDB

Question

I'm preparing to run a MariaDB Cluster with High Availability and I'm looking for the best practices.

Years back, I used to run MariaDB in master-master mode, and there was a big issue. Whenever any of the nodes (two nodes in total) was disconnected and reconnected for a second, the whole database was inaccessible until the sync process was completed.

I did search a lot and there are multiple suggestions, which not sure which one is the best and simple solution to go for.

using maxscale as proxy
using master-slave mode
using master-master mode

I haven't worked with MaxScale, and I would be thankful if someone guides me if this is the best solution to go for a database with high I/O, also I know that I need to have another MaxScale + HAProxy setup to cover MaxScale failures.

The second item is not a proper solution, since if the master DB fails I need to go over some manual processes to make it as master.

The third item, could be used with Galera, but as I searched I saw many people complaining about the sync issues which I had before.

I would be thankful if someone can guide me with at least 2-3 years of experience in such a cluster.

score 1 · Answer 1 · answered Aug 07 '24 at 07:46

If you want to avoid using Galera due to the long SST times, the second option is an asynchronous replication MariaDB cluster and MaxScale configured with cooperative monitoring. With three MariaDB nodes you'll get the same availability that you'd get with Galera.

Additionally, the causal reads mode in MaxScale completely hides the complexity of of the critical read problem where an earlier write on the primary node isn't seen by a later read on the replica because it hasn't been replicated yet. MaxScale solves this by automating the task of synchronizing reads done on a replica with the writes on the primary node and you get the same user experience with a cluster as you'd get with a single database. This provides more read throughput at the cost of additional latency for the critical reads but the feature can also be configured to re-route critical reads to the primary if no server has replicated it.

Even with Galera, it makes sense to use MaxScale. It avoids the usual problems of master-master usage in Galera where writes to different nodes may end up conflicting and you get deadlocks on COMMIT. This is because the readwritesplit router in MaxScale routes all writes to a single node and only load balances reads. This is the main difference between a smart database proxy and a normal network proxy which simply round-robins all requests to all Galera nodes. Writing to multiple Galera nodes isn't beneficial as Galera is not a write scaling solution but an HA solution: the data gets copied to all nodes and the same amount of IO is done per transaction regardless of where you do the writes. Even if writing to multiple nodes is possible, and sometimes convenient, applications are not always written to handle a deadlock on COMMIT.

As for HA for MaxScale, this is usually less expensive as MaxScale doesn't store data on disk and is stateless in the sense that you can replace a failed MaxScale node by just provisioning a new one at the same address. Most of the MariaDB connectors support multiple network addresses which allows you to route your requests to multiple MaxScale instances. With this, the need for separate network load balancers in front of MaxScale is eliminated. Of course sometimes it's simpler to use a load balancer in front of the MaxScales (e.g. AWS ELB) and let the client applications connect to just one endpoint.

Best Practice of High Availability For MariaDB

1 Answers1