Application-Level Consensus [pdf] (opens in new tab)

(weareadaptive.com)

53 pointshugothefrog9y ago8 comments

8 comments

7 comments · 4 top-level

ergl9y ago· 2 in thread

Jane Street uses the same approach to build their exchange [0]. Like the doc says, it can be great to replay some sequence of messages in dev to reproduce issues, and to give fault-tolerance to the system.

One downside is that, if all your nodes are using the same application code, simply replaying the log might not help as all nodes might hit exactly the same bug with the same sequence of transitions.

[0] There's an overwiev of their infrastructure here https://youtube.com/watch?v=b1e4t2k2KJY

odeheurles9y ago

Thanks for sharing the video and great talk btw. Brian, the speaker, actually asks the audience (around minute 20 in the video) if anybody use paxos for the matching engine. What I'm talking about in the article is exactly that: we're just using another consensus algorithm (Raft) which is significantly simpler to implement than Paxos.

LMAX use synchronous replication in their exchange: https://www.infoq.com/presentations/LMAX

sourcedelica9y ago

What kind of latency does the consensus add? We are looking at adding fault tolerance to our matching engine but can only afford 10-15 micros.

1 more reply

gawi9y ago· 1 in thread

This is very interesting. I have no doubts that not having to deal with fault tolerance at the application level compensates for the efforts to put in place this architecture. And yes, in my opinion, "application-level consensus" is the perfect term to designate this architecture.

mr_luc9y ago

I agree. One place where application-level consensus is fairly common is in Elixir applications, mostly thanks to the crdt implementation that's nicely wrapped up by Phoenix.Tracker in the phoenix_pubsub library.

This is used by the Phoenix project's Presence module to provide a distributed notion of what users are 'present', but it's also used by others to do service location using hash rings, or implement a dht, etc. I've used it for master election and failover on a few projects for little services.

silviatorres9y ago

Hi there!! looks like there was some minimal mistakes on the text and the document was updated: http://weareadaptive.com/wp-content/uploads/2017/04/Applicat...

eternalban9y ago

Try 'Edge-Coherence'.

j / k navigate · click thread line to collapse