4

I have a MariaDB Galera cluster. If some nodes fail, I cannot blindly restart them, I have to determine a good wsrep_cluster_address first.

If I can keep a keepalived virtual IP on one of the healthy nodes, can I use this IP as wsrep_cluster_address on other nodes? So in case of node failure, the joining node would always have a right wsrep_cluster_address? Or are there any other solutions enabling automatic rejoin?

I feel it should be somehow possible to keep the cluster up and automatically rejoin nodes as long as there is at least 1 healthy node (or Primary Component?) up.

(Note: I am aware of the answer in Galera cluster without having to specify all hosts on wsrep_cluster_address, but multicast is unfortunately not an option.)

scream314
  • 103
  • 1
  • 7

2 Answers2

1

You can put multiple addresses into wsrep_cluster_address, e.g. gcomm://10.1.1.1,10.1.1.2,10.1.1.3, and Galera Cluster will detect it smartly.

I has used Percona XtraDB Cluster (which is based on Galera Cluster too) to run serveral (> 10) clusters in production environment:

  • One has been running with 2 DB servers and 1 garbd server, and we put all three servers into wsrep_cluster_address.
  • Another was 7 database servers, and we just put first three servers (which were in different racks) into wsrep_cluster_address.

There were power failures (all three servers), hardware failures, software bugs on these clusters in four years, and they worked pretty well.

Gea-Suan Lin
  • 385
  • 1
  • 2
  • 6
1

I tried this out and yes, it seems to work.

Test setup:

3 nodes.

One dynamic (DHCP) IP on each node's eth0.

The nodes are hostnamed: mari0, mari1, mari2, and these names are resolvable. (May not be needed, but...)

wsrep_cluster_address='gcomm://1.2.3.4' on all 3 nodes. (Note: the first node must be started with wsrep_cluster_address='gcomm://', after the cluster is up, wsrep_cluster_address` should be changed to the vIP.)

A keepalived vIP on the nodes.

! Configuration File for keepalived
vrrp_script chk_mysqld {
    script "pidof mysqld"
    interval 2
}
vrrp_instance VI_1 {
    state [MASTER|BACKUP]
    interface eth0
    virtual_router_id 52
    priority [PRIO]
    advert_int 1
    authentication {
        auth_type AH
        auth_pass [password]
    }
    virtual_ipaddress {
        1.2.3.4
    }
    track_script {
        chk_mysqld
    }
}

(not the best, but works)

First result:

After the first node was started, the vIP was placed on it. After that I started the remaining 2 nodes.

The nodes joined the cluster and their keepaliveds entered BACKUP state.

service mysql stop removed the node from the cluster, service mysql start added the node to it.

I did not tested adding a 4th and 5th node to the cluster this way, yet, but it should also work.

scream314
  • 103
  • 1
  • 7