Paxos is used to achieve something called Strong Consistency, where each node sees the same message in the same order. If you think of each node as a deterministic state machine, they are guaranteed to end up in the same state after responding to the same sequence of messages. It's nice and intuitive, but requiring global synchronization on every write is terrible for performance.
Other consistency schemes exist. A popular one is Eventual Consistency, where writes are made immediately at any node (not just the leader) and the system is expected to synchronize in the background and "converge" to the same state. However, this can result in merge conflicts: if you're editing a document in collaboration with other users, what if you edit a word in a paragraph while another user deletes that entire paragraph? Does the system resolve this automatically, or require user assistance? The answer to this question varies according to system requirements. I think most HN users have experienced the joys of resolving merge conflicts.
A newer model is something called Strong Eventual Consistency, which is similar to Eventual Consistency but merge conflicts are impossible by design: every update to the system must be commutative, associative, and idempotent with other updates. It is not always possible to design your system this way. These systems are implemented with Conflict-Free Replicated Data Types (or ad-hoc equivalents) and have excellent liveness/throughput/performance characteristics compared to Strong Consistency.
CRDTs are not as simple as Paxos. You're forced out of the cozy one-system world and your system must deal with two nodes concurrently holding different values. For most applications, magic Paxos dust is all you need. For others, CRDTs are an excellent tool.
In general, I like to think of Paxos as an approach that uses LWW for all value types.
[0] - http://pagesperso-systeme.lip6.fr/Marc.Shapiro/papers/RR-695...
[0] https://hal.inria.fr/file/index/docid/609399/filename/RR-768...
Quick question though. On this slide ( http://i.imgur.com/m02CMxx.png ) it shows a network split condition and shows how the 2 split networks will eventually negotiate and the 3 node split wins because it had a majority while the 2 node side's uncommitted changes are thrown out.
What happens if the split happens right down the middle (3 active nodes on each side instead of the 2 and 3)? Wouldn't both sides elect leaders that both have majorities with committed data?
Is there a connection?
Also, does it have anything to do with the Byzantine Generals Problem?
I don't think this is true, at least not as stated. You should certainly be thinking about more than crash-stop failures, but full-blown Byzantine fault tolerance is rarely warranted in practice. Empirically, the number of systems that use non-Byzantine vs. Byzantine agreement is probably on the order of 100:1.
But Paxos has gentler assumptions about adversaries than in the Byzantine Generals problem.
Also keep in mind that Raft, Zab, and View-Stamped replication (in reverse chronological order) are alternatives to the Synod protocol in Paxos. These protocols differ from Paxos by employing a different leader-election mechanism and slightly different way of maintaining their invariants.
There have been many Paxos variants. This site [1] shows the various Paxos variants over a timeline and points out their contributions.
Those of you interested in building replicated state machines using Paxos should take a look at OpenReplica [2]. It is a full Multi-Paxos implementation that takes any Python object and makes it distributed and fault-tolerant, like an RPC package on steroids.
> Raft is a consensus algorithm that is designed to be easy to understand. It's equivalent to Paxos in fault-tolerance and performance. The difference is that it's decomposed into relatively independent subproblems, and it cleanly addresses all major pieces needed for practical systems. We hope Raft will make consensus available to a wider audience, and that this wider audience will be able to develop a variety of higher quality consensus-based systems than are available today.
They are moving the core problem into a different domain. Worst explanation of PAXOS ever... nice animations though.
Edit: 'Worst explanation' is just an exaggeration, obv. It is nice, but doesn't explain really important issues.
I'm not sure about Spanner and Paxos. Sebastian Kanthak said during his Google Spanner talk:
"If you've been to the Raft talk this morning, our Paxos implementation is actually closer to the Raft algorithm than to what you'd read in the Paxos paper, which is… if you haven't read it, don't read it, it's horrible." (at 7:43)
http://www.infoq.com/presentations/spanner-distributed-googl...
How do you keep a broken or hostile node from advancing the sequence number to the end of the sequence number space?
There's an algorithm from one of Butler Lamson's grad students at MIT which fixes this, but it seems to require one more message per cycle. (http://pmg.csail.mit.edu/~castro/thesis.pdf) That paper later appears as a Microsoft Research paper on how to make an NFS-like file system with this consensus properly. Did Microsoft ever put that in a product?
lol