I'm experimenting with Galera Cluster with 3 nodes, but I'm having some issues but I'm not sure why. What I noticed is that I'm having lots of failed to report last committed (number), -110 (connection timed out). Well I'm using 100Mb/s connection, I made a throughput test and I'm able to use 97Mb/s between hosts so my network is ok. But I noticed that one of the nodes was sending many flow control messages, don't know exactly why, so I turned it off and tried with the other two remaining nodes and no problem at all. Here is the output of the wsrep_ variables of the node to give you a hint of what can be causing this issue and suggest me some tuning and the hardware's characteristics, they are heterogeneous:
At all cases the OS is Ubuntu 18.04 LTS
Computer 1: L502X Dell XPS
Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz 4 cores/8 threads
16GB RAM Kingston
1TB HD ST1000LM024 HN-M101MBB
Running MariaDB 10.3 natively
Computer 2: Toshiba Satellite A200-220
Intel(R) Core(TM)2 Duo T7500 @ 2.20GHz 2 cores/4 threads
4GB RAM
200GB HD Toshiba MK2046GSX
Running MariaDB 10.3 on VirtualBox in a Windows Host:
2 vCPUs
1,5 GB RAM
Computer 3: Dell Inspiron
Intel(R) Core(TM) i3 4005U @ 1.70GHz 2 cores/4 threads
4GB RAM
1TB HD ST1000LM024 HN-M101MBB
Running MariaDB 10.3 on VirtualBox in a Windows Host:
2 vCPUs
1,5 GB RAM
Curious thing is that I tested in pairs to check if the timeouts would stop occurring and, in fact, it stopped only with computers 1 and 2 in the cluster. So I pulled 2 off and pulled in 3 and timeout returned. I don't know why a newer hardware is being the culprit and I have a fear of testing at AWS in the hope of having no timeout and face it there too, I'm making these tests because I plan to migrate my company's DB to AWS and use Galera for HA, Backup and to increase performance and I'm new to Galera and I must have sure that I'm not doing anything wrong.
At all cases I'm using wsrep_slave_threads = 4
Node 1 wsrep_%
wsrep_apply_oooe 0.000000
wsrep_apply_oool 0.000000
wsrep_apply_window 1.000000
wsrep_causal_reads 0
wsrep_cert_deps_distance 9.436170
wsrep_cert_index_size 94473
wsrep_cert_interval 0.000000
wsrep_cluster_conf_id 11
wsrep_cluster_size 3
wsrep_cluster_state_uuid e3c270c9-6dae-11e8-86a2-0f18d007f9fa
wsrep_cluster_status Primary
wsrep_commit_oooe 0.000000
wsrep_commit_oool 0.000000
wsrep_commit_window 1.000000
wsrep_connected ON
wsrep_desync_count 0
wsrep_evs_delayed
wsrep_evs_evict_list
wsrep_evs_repl_latency 0.00379364/0.0224714/0.0381743/0.00500254/200
wsrep_evs_state OPERATIONAL
wsrep_flow_control_paused 0.418272
wsrep_flow_control_paused_ns 580464988618
wsrep_flow_control_recv 123
wsrep_flow_control_sent 0
wsrep_gcomm_uuid 378af41f-6ddc-11e8-82f4-8aec32d4faa4
wsrep_incoming_addresses 10.0.0.16:3306,10.0.0.19:3306,10.0.0.12:3306
wsrep_last_committed 4534
wsrep_local_bf_aborts 0
wsrep_local_cached_downto 4312
wsrep_local_cert_failures 0
wsrep_local_commits 495
wsrep_local_index 1
wsrep_local_recv_queue 0
wsrep_local_recv_queue_avg 0.007519
wsrep_local_recv_queue_max 2
wsrep_local_recv_queue_min 0
wsrep_local_replays 0
wsrep_local_send_queue 1
wsrep_local_send_queue_avg 0.094382
wsrep_local_send_queue_max 2
wsrep_local_send_queue_min 0
wsrep_local_state 4
wsrep_local_state_comment Synced
wsrep_local_state_uuid e3c270c9-6dae-11e8-86a2-0f18d007f9fa
wsrep_protocol_version 8
wsrep_provider_name Galera
wsrep_provider_vendor Codership Oy <info@codership.com>
wsrep_provider_version 25.3.23(r3789)
wsrep_ready ON
wsrep_received 2016
wsrep_received_bytes 1453297536
wsrep_repl_data_bytes 952332692
wsrep_repl_keys 3381123
wsrep_repl_keys_bytes 27069920
wsrep_repl_other_bytes 0
wsrep_replicated 716
wsrep_replicated_bytes 979451040
wsrep_thread_count 5
Node 2 wsrep_%
wsrep_apply_oooe 0.444043
wsrep_apply_oool 0.032491
wsrep_apply_window 2.660650
wsrep_causal_reads 0
wsrep_cert_deps_distance 9.296029
wsrep_cert_index_size 105593
wsrep_cert_interval 0.000000
wsrep_cluster_conf_id 11
wsrep_cluster_size 3
wsrep_cluster_state_uuid e3c270c9-6dae-11e8-86a2-0f18d007f9fa
wsrep_cluster_status Primary
wsrep_commit_oooe 0.000000
wsrep_commit_oool 0.000000
wsrep_commit_window 1.273723
wsrep_connected ON
wsrep_desync_count 0
wsrep_evs_delayed
wsrep_evs_evict_list
wsrep_evs_repl_latency 0.0209565/0.0265051/0.0300225/0.00397011/3
wsrep_evs_state OPERATIONAL
wsrep_flow_control_paused 0.451273
wsrep_flow_control_paused_ns 155151831643
wsrep_flow_control_recv 122
wsrep_flow_control_sent 0
wsrep_gcomm_uuid 2ef79ae7-6de7-11e8-b8bb-621e468c1867
wsrep_incoming_addresses 10.0.0.16:3306,10.0.0.19:3306,10.0.0.12:3306
wsrep_last_committed 4525
wsrep_local_bf_aborts 0
wsrep_local_cached_downto 4310
wsrep_local_cert_failures 0
wsrep_local_commits 0
wsrep_local_index 0
wsrep_local_recv_queue 8
wsrep_local_recv_queue_avg 2.460976
wsrep_local_recv_queue_max 16
wsrep_local_recv_queue_min 0
wsrep_local_replays 0
wsrep_local_send_queue 0
wsrep_local_send_queue_avg 0.000000
wsrep_local_send_queue_max 1
wsrep_local_send_queue_min 0
wsrep_local_state 4
wsrep_local_state_comment Synced
wsrep_local_state_uuid e3c270c9-6dae-11e8-86a2-0f18d007f9fa
wsrep_protocol_version 8
wsrep_provider_name Galera
wsrep_provider_vendor Codership Oy <info@codership.com>
wsrep_provider_version 25.3.23(r3789)
wsrep_ready ON
wsrep_received 406
wsrep_received_bytes 319335762
wsrep_repl_data_bytes 0
wsrep_repl_keys 0
wsrep_repl_keys_bytes 0
wsrep_repl_other_bytes 0
wsrep_replicated 0
wsrep_replicated_bytes 0
wsrep_thread_count 5
Node 3 wsrep_%
wsrep_apply_oooe 0.481061
wsrep_apply_oool 0.018939
wsrep_apply_window 1.742424
wsrep_causal_reads 0
wsrep_cert_deps_distance 8.821970
wsrep_cert_index_size 100278
wsrep_cert_interval 0.000000
wsrep_cluster_conf_id 11
wsrep_cluster_size 3
wsrep_cluster_state_uuid e3c270c9-6dae-11e8-86a2-0f18d007f9fa
wsrep_cluster_status Primary
wsrep_commit_oooe 0.000000
wsrep_commit_oool 0.000000
wsrep_commit_window 1.064639
wsrep_connected ON
wsrep_desync_count 0
wsrep_evs_delayed
wsrep_evs_evict_list
wsrep_evs_repl_latency 0.00271209/0.00877237/0.0228643/0.00731529/11
wsrep_evs_state OPERATIONAL
wsrep_flow_control_paused 0.407408
wsrep_flow_control_paused_ns 554009596591
wsrep_flow_control_recv 121
wsrep_flow_control_sent 121
wsrep_gcomm_uuid b0924918-6de1-11e8-80bb-c607140f7861
wsrep_incoming_addresses 10.0.0.16:3306,10.0.0.19:3306,10.0.0.12:3306
wsrep_last_committed 4514
wsrep_local_bf_aborts 0
wsrep_local_cached_downto 4309
wsrep_local_cert_failures 0
wsrep_local_commits 0
wsrep_local_index 2
wsrep_local_recv_queue 29
wsrep_local_recv_queue_avg 23.671569
wsrep_local_recv_queue_max 30
wsrep_local_recv_queue_min 0
wsrep_local_replays 0
wsrep_local_send_queue 0
wsrep_local_send_queue_avg 0.000000
wsrep_local_send_queue_max 1
wsrep_local_send_queue_min 0
wsrep_local_state 4
wsrep_local_state_comment Synced
wsrep_local_state_uuid e3c270c9-6dae-11e8-86a2-0f18d007f9fa
wsrep_protocol_version 8
wsrep_provider_name Galera
wsrep_provider_vendor Codership Oy <info@codership.com>
wsrep_provider_version 25.3.23(r3789)
wsrep_ready ON
wsrep_received 1089
wsrep_received_bytes 940803284
wsrep_repl_data_bytes 378
wsrep_repl_keys 1
wsrep_repl_keys_bytes 32
wsrep_repl_other_bytes 0
wsrep_replicated 1
wsrep_replicated_bytes 480
wsrep_thread_count 3