1

I can have multiple replicas of my database to ensure high availability. Then I can have a quorum such that R+W > N to ensure consistency.

So does strategies like quorum base read/write or consensus algorithms circumvent fundamental limitations posed by the CAP theorem ? I am sure answer is no, but somehow people saying "Consensus algorithms enable systems to maintain strong consistency while still offering fault tolerance and availability.." is not clicking me.

Rather I believe it would be right to say that these strategies help to choose right spot within a range of spectrum between availability & consistency in distributed system.

Can someone please help understand this better with some example ?

3 Answers3

5

I think not.

A Quorum Protocol can't guarantee progress. This would fall into the Partition of CAP. ie when a partition occurs the QP activates to try to decide what to do, but can fall into a loop with no progress being made, thus meaning the system doesn't continue to run normally, or be forced to exit early sacrificing Consistence or Availablity.

If you think about it though even a super simple fail over system "..enable systems to maintain strong consistency while still offering fault tolerance and availability.."

QPs just try to optimise failover to have less error conditions. Perhaps to the point where theoretical errors are not practically going to happen unless you are cold war planning.

https://en.m.wikipedia.org/wiki/Consensus_(computer_science)#The_FLP_impossibility_result_for_asynchronous_deterministic_consensus

Ewan
  • 83,178
2

...strong consistency while still offering fault tolerance and availability...

That's not the three items of the CAP theorem.

"Fault tolerance" in the context of "consensus algorithms" does not mean what "partition tolerance whilst also preserving full consistency and full availability" would mean in the context of the CAP theorem.

Consensus algorithms are designed to tolerate faults in the processing of inputs that are available in principle to all the processes participating in the quorum. Specifically, it tolerates wrong-processing occuring in a minority of processes, all of which are intended to duplicate the same processing of the same inputs.

A key point in this is that the same input must (by some means) already be available to each process at the outset, and the outputs must (by some means) be available to some single deciding process at the conclusion.

A consensus algorithm can tolerate the complete "partition" of a minority of the processes, so that either this minority do not start processing for lack of inputs, or their outputs do not feed into the final deciding process, but only because the work they are doing is already fully duplicated by other processes.

Consensus algorithms are used in cases where the risk of errors can be mitigated by doing some of the same activities more than once.

The CAP theorem concerns something completely different. It concerns the terms on which non-duplicated inputs and outputs must work.

Specifically, CAP says that if you accept inputs and/or outputs in more than one place in a system, and inputs and/or outputs at different places should influence each other, then that system must either be up, down, or it's state must be inconsistent (i.e. an influence at one place has not propagated to all the other places it should). This is very closely related to basic physics, it's not unique to computing.

A consensus algorithm presumes the inputs are already in one shared place, and that the majority of outputs can get to one shared place again. A consensus algorithm won't correctly deal with any faults that undermine those two assumptions.

Steve
  • 12,325
  • 2
  • 19
  • 35
0

https://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed/

A critical point from Brewers follow up article is:

high availability (A) of that data (for updates);

Hence from a CAP perspective a quorum (R+W > N) is always CP.

If you were to have N=5, R=2, W=4 it's possible that you have a 3-2 network partition, in this mode the database is effectively read-only - all the nodes are able to read the last value, but writes are not possible.

Your system is NOT available from a pure CAP perspective (because writes are not possible) however from a practical sense, it may be enough that your website stays up in read-only mode - which some people might say is "available", "fault tolerant", etc...

You may be able to add queues to batch process updates later, but when you take a broader view encompassing your database and the queue you now find you have an AP system, as there is no longer a single consistent "latest" value. However this may be the optimal solution for providing the best overall experience to end users - given the various potential failure modes.

In the real world you might chose a different trade of, for example N=5, R=3, W=3 - that would perform really well with respect to hardware failure of 1 (or 2) servers, but may result in more impact on end users if a network partition occurred.

CAP is a tool for quickly evaluating what is possible, but at the end of the day if an asteroid takes out all life on earth, your web site doesn't need to stay up - the real world always has corner cases/loopholes you can use.

DavidT
  • 4,601