(1) You want all your nodes to have the exact same replica of the database i.e consistency across your cluster.
You would need to reach consensus before any node actually adds anything to its local database to make sure that property (1) is fulfilled.
== Linearizability is just a consistency model i.e a variant of property 1 with stronger/weaker constraints.
Well done!
edit: to provide some context
- Raft is a distributed consensus algorithm that is seen by many as a viable alternative to *-Paxos because of its relative simplicity. It was created by D. Ongaro and J. Ousterhout (Tcl/Tk!)
- D. Ongaro's dissertation includes a TLA+ specifications of Raft; TLA+ is a model checker (see ) for a more detailed explanation of what is a model checker/theorem prover's job
- TLA+ is a model checker; Coq is a proof assistant. See https://stackoverflow.com/questions/22418448/can-coq-be-used... for a more detailed explanation.
- Verdi is a Coq framework to make formal proof about distributed systems.
- Doug Woos and James Wilcox, made a proof of Raft using Verdi; Verdi helps you figure whether your implementation of X (in this case X=Raft) meets the specifications (Leader Safety, strong consistency etc..)
Link to the Verdi website: http://verdi.uwplse.org/
edit: as noted by ahelwer, TLA+ is not a model checker but a language to describe a distributed system's specifications. I was referring to TLC which can work with TLA+.
I listed some of the assumptions we made in another comment https://news.ycombinator.com/item?id=10018985 .
The question you ask was believed for a long time to be the death knell of formal verification. It was persuasively argued in "Social Processes and Proofs of Theorems and Programs" https://www.cs.umd.edu/~gasarch/BLOGPAPERS/social.pdf that any proof of a complex system is necessarily at least as complex. Thus there would be no reason to trust the proof any more than the original code.
The breakthrough in verification came when we started using machine-checked proofs. In this approach, you write a short, simple, and trusted proof checker. You can then verify complex systems using complex proofs that are checked by the simple checker. Then the only possible errors in the proof can come from the proof checker being wrong. Because the checker is simple and general (ie, it can check all kinds of proofs, not just proofs about a particular program), it is more trustworthy.
Just to reiterate (including some content from my other comment): we can be wrong only if we have written the wrong definition of linearizability, misstated the correctness theorem, or if there is a bug in the proof checker or in Coq's logic itself.
This Coq proof is an enormous step forward. It's that machine-checked proof I was hoping someone would do when I wrote the above paragraph. 1) Assuming the TCB is correct, we know the Verdi implementation satisfies linearizability. 2) Assuming the TCB is correct and the Verdi implementation and the Raft paper/dissertation/spec are equivalent, then the Raft algorithm satisfies linearizability too.
I'm more willing to believe that the Verdi implementation implements Raft faithfully than I am willing to believe that my hand proof is correct. If you're really worried about (2), you have the option of using the Verdi generated implementation. If you're less worried about (2), you can now have more confidence in the safety of the Raft algorithm in general.
Questions for the researchers, if they are reading:
- Does this proof cover any liveness concerns (weak fairness, deadlock freedom, etc) in addition to the safety property of linearizability?
- If not, what would it take to extend this model to cover liveness? Is this even a good starting point?
As you probably know, Raft and other consensus algorithms are not guaranteed to be live in all situations. But subject to some assumptions about the frequency of failure, they are guaranteed to make progress.
In Verdi, systems are verified with respect to semantics for the network they are running on. Our semantics currently don't include any restrictions about how often failures can happen; a failure "step" can occur at any time. We're not sure what the best way is to introduce this kind of restriction, but we've got a couple ideas. One would be to guarantee that the total number of failures has some finite bound which is unknown to the system itself but which is available during verification. Another would be to model failure probabilistically. We will probably end up doing at least one of these things in the next year or so :).
This is a good question. More broadly, what do you have to trust in order to believe our claims?
In addition to all the usual things (like the soundness of Coq's logic and the correctness of its proof checker), the most important thing you need to trust is the top-level specification. In our case, this is linearizability (have a look at https://github.com/uwplse/verdi/blob/master/raft/Linearizabi... for the key definition; the key theorem statement is at https://github.com/uwplse/verdi/blob/master/raft-proofs/EndT... ). If you can convince yourself that these correspond to your intuitive understanding of linearizability, then you don't need to trust any other theorem statement or definition in the development.
If you actually want to run our code, then you need to make several other assumptions. Our system runs via extraction to OCaml, so you must trust the extractor and the OCaml compiler and runtime. In addition we have a few hundred lines of OCaml to hook up the Coq code to the real world (eg, writing to disk and putting bits on the network).
To respond more directly to your question about Diego's proof, I can tell you we referred to it often to get the high-level idea. But the TLA model differs in several respects from our implementation of Raft in Verdi. Most importantly, our code is an implementation in the sense that you can run it. This means that it resolves all nondeterminism in the specification. Furthermore, there is no need to manually check that what we implemented matches the TLA model, unless that is your preferred means for convincing yourself that we really did implement Raft.