1

I guess, the main purpose of a cluster is failure tolerance. However, when I start the following consul cluster, in docker, it is not the case and I don't understand why.

services:

  # docker network create --driver=bridge discovery-network

  # SERVICE DISCOVERY
  consul-server-0:
    image: consul:1.6.0
    container_name: consul-server-0
    command: "agent -server -bootstrap-expect 2 -client 0.0.0.0 -datacenter datacenter-1 -node consul-server-0"
    networks:
      - discovery-network

  consul-server-1:
    image: consul:1.6.0
    container_name: consul-server-1
    command: "agent -server -retry-join consul-server-0 -client 0.0.0.0 -datacenter datacenter-1  -node consul-server-1"
    networks:
      - discovery-network
    depends_on:
      - consul-server-0

  consul-client-1:
    image: consul:1.6.0
    container_name: consul-client-1
    command: "agent -retry-join consul-server-0 -ui -client 0.0.0.0  -datacenter datacenter-1  -node consul-client-1"
    ports:
      - "8500:8500" # GUI
    networks:
      - discovery-network
    depends_on:
      - consul-server-0

networks:
  discovery-network:
    external:true

When I stop one of servers, the cluster does not work anymore. I am unable to register anymore service (through consul-client).

In the remaining server's logs, I can see the message Failed to make RequestVote RPC In the client's logs, I can see the message No cluster leader

What is wrong with my configuration?

1 Answers1

2

Based on my understanding of the topology you have described you essential have:

3 Nodes - 2 Servers and 1 Client

This layout is immediately a problem because the clustering protocol that consul uses relys on being able to easily elect a leader, which requires an odd number of servers.

If you consider the failure scenario of a 2 node consul cluster - as soon as a single node goes down your one remaining node cannot achieve consensus by itself because there's no way to get a majority of the servers voting on something.

In the consul docs the consensus protocol is reasonably well described Consensus docs

They even include a summary table which explicitly states how many server members could be lost for varying cluster sizes.

Following their advice, your 2 server cluster cannot withstand any fault because it would then become impossible to achieve quorum.

Really a minimum number of servers for a functional consul cluster is 3.

hvindin
  • 1,754
  • 10
  • 12