We are trying to setup MySQL multi-master group replication (GR) on kubernetes Group replication configuring instances.
GR is starting on one pod after all the configurations. However the second node goes to RECOVERING state when GR is started followed by ERROR state.
There is no error in GCS_DEBUG_TRACE logs also.
Let me know if there is anything missing and if more info is required to analyze. Thanks in advance.
Workarounds tried:
- MySQL Group Replication Multi-Primary Setup
- https://stackoverflow.com/questions/50794695/mysql-group-replication-stuck-on-recovering-forever
Cluster Setup:
- Created 3 PVCs for each pods in a namespace
- Launched pods using mysql:8.0.23 docker image (https://hub.docker.com/_/mysql)
- Ran below queries to configure the pods
$ kubectl get all -n myb5
NAME READY STATUS RESTARTS AGE
pod/mysql1 1/1 Running 0 15h
pod/mysql2 1/1 Running 0 15h
pod/mysql3 1/1 Running 0 15h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/gr-domain ClusterIP None <none> <none> 15h
$ kubectl get pvc -n myb5
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
mysql-pv-claim-1 Bound pvc-f7957eff-b75e-4dbc-990a-8d79e54b6f06 250Gi RWO robin 15h
mysql-pv-claim-2 Bound pvc-1c5d4dfd-8495-4266-af0d-882ce8e8ccec 250Gi RWO robin 15h
mysql-pv-claim-3 Bound pvc-49c0979b-49cb-413b-b695-479b32124343 250Gi RWO robin 15h
Configuration on all pods:
SET PERSIST general_log = ON;
SET PERSIST general_log_file= '/var/lib/mysql/mysql1.log';
SET PERSIST group_replication_communication_debug_options='GCS_DEBUG_ALL';
SET PERSIST enforce_gtid_consistency=ON;
SET PERSIST gtid_mode = OFF_PERMISSIVE;
SET PERSIST gtid_mode = ON_PERMISSIVE;
SET PERSIST gtid_mode = ON;
SET PERSIST binlog_format = ROW;
SET PERSIST master_info_repository='TABLE';
SET PERSIST relay_log_info_repository='TABLE';
SET PERSIST transaction_write_set_extraction=XXHASH64;
SET SQL_LOG_BIN = 0;
CREATE USER rpl_user@'%' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE ON . TO rpl_user@'%';
GRANT BACKUP_ADMIN ON . TO rpl_user@'%';
FLUSH PRIVILEGES;
CHANGE REPLICATION SOURCE TO SOURCE_USER='rpl_user', SOURCE_PASSWORD='password' FOR CHANNEL 'group_replication_recovery';
SET SQL_LOG_BIN = 1;
INSTALL PLUGIN group_replication SONAME 'group_replication.so';
SET PERSIST group_replication_group_name='85cbd4a0-7338-46f1-b15e-28c1a26f465e';
SET PERSIST group_replication_start_on_boot=OFF;
SET PERSIST group_replication_bootstrap_group=OFF;
SET PERSIST group_replication_single_primary_mode=OFF;
SET PERSIST group_replication_enforce_update_everywhere_checks=ON;
SET PERSIST group_replication_member_expel_timeout=3600;
SET PERSIST group_replication_group_seeds='mysql1.gr-domain.myb5.svc.cluster.local:33061,mysql2.gr-domain.myb5.svc.cluster.local:33061,mysql3.gr-domain.myb5.svc.cluster.local:33061';
SET PERSIST group_replication_ip_allowlist='mysql1.gr-domain.myb5.svc.cluster.local,mysql2.gr-domain.myb5.svc.cluster.local,mysql3.gr-domain.myb5.svc.cluster.local';
Conf on pod1:
SET PERSIST server_id=1;
SET PERSIST group_replication_local_address= 'mysql1.gr-domain.myb5.svc.cluster.local:33061';
SET PERSIST group_replication_bootstrap_group=ON;
START GROUP_REPLICATION USER='rpl_user', PASSWORD='password';
SET PERSIST group_replication_bootstrap_group=OFF;
SET PERSIST group_replication_recovery_get_public_key=ON;
Conf on pod2:
SET PERSIST server_id=2;
SET PERSIST group_replication_local_address= 'mysql2.gr-domain.myb5.svc.cluster.local:33061';
START GROUP_REPLICATION USER='rpl_user', PASSWORD='password';
Conf on pod3:
SET PERSIST server_id=3;
SET PERSIST group_replication_local_address= 'mysql3.gr-domain.myb5.svc.cluster.local:33061';
START GROUP_REPLICATION USER='rpl_user', PASSWORD='password';
Group Replication Status when started:
mysql> SELECT * FROM performance_schema.replication_group_members\G
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: 57b9f42a-8b4d-11eb-bd3e-0242ac110003
MEMBER_HOST: mysql1
MEMBER_PORT: 3306
MEMBER_STATE: ONLINE
MEMBER_ROLE: PRIMARY
MEMBER_VERSION: 8.0.23
*************************** 2. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: 57ec4b94-8b4d-11eb-8fdc-0242ac110004
MEMBER_HOST: mysql2
MEMBER_PORT: 3306
MEMBER_STATE: RECOVERING
MEMBER_ROLE: PRIMARY
MEMBER_VERSION: 8.0.23
*************************** 3. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: 57ec4b94-8b4d-11eb-8fdc-0242ac110005
MEMBER_HOST: mysql3
MEMBER_PORT: 3306
MEMBER_STATE: RECOVERING
MEMBER_ROLE: PRIMARY
MEMBER_VERSION: 8.0.23
Group Replication Error after few minutes:
mysql> SELECT * FROM performance_schema.replication_group_members\G
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: 57ec4b94-8b4d-11eb-8fdc-0242ac110004
MEMBER_HOST: mysql2
MEMBER_PORT: 3306
MEMBER_STATE: ERROR
MEMBER_ROLE:
MEMBER_VERSION: 8.0.23