1

My organization is planning to implement a high-availability PostgreSQL cluster using Patroni and etcd. However, we only have two data center sites available, which makes deploying a standard 3-node etcd cluster across separate failure domains challenging.

We understand that running only 2-node etcd cluster increases the risk of split-brain or unavailability if one site becomes unreachable, due to the lack of a quorum.

to address this, we come with the following topology:

enter image description here

DC (Primary Site):

  • 192.168.30.80: PostgreSQL node running Patroni (initial master)

  • 192.168.30.83: etcd node

DRC (Disaster Recovery Site):

  • 192.168.30.81: PostgreSQL node running Patroni (replica)

  • 192.168.30.82: backup etcd node

each site runs a single-node etcd cluster, we have tested that failover still works in this setup, we use etcd mirror maker feature to continuously relay key creates and updates to a separate cluster in the DRC. We then use keepalived to manage a floating IP between the etcd clusters, which is used by Patroni on both nodes to access etcd.

My questions are:

  • What are the risks are involved in running this kind of setup?

  • Would it be better to add a lightweight third etcd node in separate site (e.g., the cloud) to form a proper quorum?

1 Answers1

3

Having only two members in a raft cluster is a Very Bad Idea™: a quorum in raft is num original members/2 + 1, in your case 2. So both members have to be up for electing a leader. Hence, you plainly eliminate partition tolerance (if the communication between both members fails for any reason, no quorum can be built) and worsen the availability (if one member fails for any reason, the cluster becomes unavailable). Frankly, 9 times out of 10 you‘d be better off when you use just one instance with a proper snapshotting setup and of course backups than using only 2 instances.

That being said: the setup you devised is prone to the same problems AND potentially a split brain, sacrificing not only partition tolerance, but also consistency. In a worst case scenario, you may have race conditions which may lead to keys being updated or written on BOTH etcd instances. The technical term for this situation is FUBAR.

Given the absurdly low prices for a single Fargate instance some $0.08/h and EBS (or comparable offerings by other cloud providers), I‘d strongly suggest running a third etcd in a VPC, especially in this use case where only metadata is stored in etcd.

Markus W Mahlberg
  • 3,301
  • 1
  • 15
  • 19