1. You cannot beat the speed of light.
2. Machines break. Even the most reliable ones.
3. Networks are unreliable. Even local area networks.
4. It is an exciting time for distributed systems: CRDTs, Hybrid Logical Clocks, Zookeeper, etc.
These are things I feel like I've been preaching a lot, but get upset responses "Well, certainly Globally Consistent systems work, most databases are!" Lies, as @rdtsc mentions:
"Keeping an always consistent state in a large distributed [system] you are fighting against the laws of physics."
Next up, that is why master-master replication is important, because at some point your primary will go down (or at least the network to it). I started an interesting thought experiment that turned into a full on open source project: What if we were to build a database in the worst possible environment (the browser, aka javascript and unreliability)? What algorithms would we need to use to make such a system still survive and work?
This is why I chose and worked on solutions involving Hybrid Logical Clocks and CRDTs, which are at the core of my http://github.com/amark/gun database. An AP system with eventual consistency and no notion of "now", as every replica runs in its own special-relativity "state machine" view of the world.
These are all interesting concepts, and the article was a good one. I recommend it.
Like it is mentioned, Google did it with their F1/Spanner SQL database. But that also mean GPS receivers with antennas on the roofs of data centers.
Which is yet another thing that can fail and it, either by itself or in a cascade of other failures will lead to unspecified and possibly undesirable behavior.
Recently I see a lot of advocates of dropping NoSQL databases and moving back to Postgress or other SQL databases.
The problem is SQL and schemas is not the only reason NoSQL databases became popular, they became popular because they also started to have a default and more defined behavior in respect to replication and distribution.
Most solutions don't need that and sticking with a solid single database works very well. But those that need distributed operation have a pretty hard task ahead of them.
One heuristic you can look at is if and how is distribution implemented. Is it something that is bolted on top, like you download some proxy or addon, or added as an afterthought, be very careful. Those things should be baked into the core. For example, does it support CRTDs? Does it have master to master replication and so on. If it claims to have consistency instead availability figure out how it is implemented, Paxos, Raft, or something else.
So far I think Riak is probably the database that thought the hardest and did the most reasonable job. The other simpler database is CouchDB, they have well specified conflict resolution behavior and master to master replication. But you have to usually write your own cluster topology. There are probably others but those are the two I know of first hand.
what do you even mean by a consistent state? even in theory a person initiating a new additional record in Auckland, New Zealand at the same time somebody iniatiates a change in Gibraltar or London (which are antipodal to the former[1]) 66 milliseconds away, cannot have a confirmation in less than 120 milliseconds, right? So do you just wait for that before declaring 'consistency'? Do you literally add 120 milliseconds to each and every request? (And this is assuming you have a damned good solution to the two generals problem)? I mean suppose the database tracks something as simple as: number of web page hits. It's a counter. You now distribute it, and have a stochastic process of counter hits between 0 and 5 per second in your largest cities, distributed throughout the world. How can that database ever be consistent?
If there are ten new records per second in New Zealand and ten records per second in the UK, and they potentially depend on each other in some way, are you going to just make everyone wait until everything has been committed and confirmed to be consistent? Or is "a foolish consistency the hobgoblin of little minds", and you really can accept out-of-date data and deal with merging conflicts later?
I just don't understand why we would expect consistency to rank up there, when we deal with a worldwide real-time system where the difference between getting served by a local database in 40 milliesconds and one far away in 250 milliseconds is both staggering, and incredibly noticeable. why be consistent? what is consistence?
Yap, you have to add all the delay until you get confirmation from the members of the cluster that the write was written. And you have to have linearizability, so that anyone reading after that (and one can argue what 'that' and where 'that' is) will now see the new state. If one of the members has failed you could potentially be stuck forever waiting. Now you also have to make sure how cluster membership and connectivity is defines and what are the possible state and transition during membership change, coupled with network partitions, coupled with hardware failures.
In other words you are fighting against the laws of physics. It is expensive and hard to do.
In case of the counters, one should ask is it worth it. Or is an CRDT based counter (that will eventually converge) good enough.
Even banks are eventually consistent. They choose to be available first. So you can withdraw $100 in New Zeland and then $100 in New York with a short period of time even if you only have $100 in your account. Inconsistency is handled later when you get a letter that your account is overdrawn.
The short version is that you can expose the different read/write modes to the client, and let them decide on what works best for their use case.
For spanner, table 2 in the paper[1] highlights the different interaction modes (read/write transaction, read wait for pending transactions, read skip pending transactions, read at some past timestamp). Each operation gives you a "consistent" view of the data, where "consistent" means that you don't see partial transactions.
It turns out that, for most high-qps web applications, you want a recent consistent view of the data, but you don't care about pending transactions. That means you can read at "now", or at "now - 1 second" without any latency penalty.
Clients doing read/write will care about pending transactions, but you can be opportunistic if the transactions do not overlap. Settling transactions does require quorum synchronization, so you are limited to max(median+1(latency)) milliseconds. These databases are usually keyed by something that minimizes transaction overlap (e.g. you can shard a counter, that kind of thing).
Regional migration of quorums (e.g. counter for asia, counter for europe) can also be done. So, the counter for "asia" consists in a "segment" (not the actual terminology, but we'll go with it) where the quorum exists mostly distinct asia regions. Metadata for the segment (which everything reads at startup, and is kept up to date) will tell you what servers are responsible.
[1] http://static.googleusercontent.com/media/research.google.co...
In general I think consistency is over valued. There are plenty of cases where it is important. Lots of people are brainwashed in college to think that all data must be consistent all the time, and that's just not necessary.
You will only have to wait for the commited/aborted response, which cannot be achieved faster than 2*66=132ms (and this system can come arbitrarily close to that, by increasing the heartbeat frequency).
There is no need to wait any time before running a subsequent transaction though. Confirmations will flow with a 132ms delay, but there is no limit on transactions concurrency.
This is just from a vocal minority on HN. You just need to look at the facts.
Companies like Mongo, Datastax, Aerospike etc are growing bigger by the day, with increasingly higher valuations. Old school database companies like Teradata are now all about datalakes incorporating Hadoop and Mongo. And technologies like Spark, Impala are now on the front line for many data analytics and processing work.
In the enterprise at least SQL databases are increasingly being relegated to a small part of the whole data pipeline i.e. storing the consolidated, integrated data model.
Right tool for the job and all that stuff.
I made download and print versions of this: http://blog.jgc.org/2012/10/a-downloadable-nanosecond.html
Is there a paper somewhere that new folk should read first? One that includes:
- A tutorial to describe all of the things we think of as 'time', e.g. order of sequence, etc. and their dependence on each other.
- The idea that time as it occurs in the physical world is probabilistic - requiring such descriptors as precision (what is the smallest difference we can discern), and error bounds or probability distributions (how accurately can we describe it).
- And for the concrete thinkers who 'get' that true simultaneity is impossible, an easy to understand example of how we succeed in observing logical coherence, from the scale of a single CPU chip (internally non-coherent, externally consistent) to cross-continent compute clusters?
This is a mission critical safety system. There are some standards which are unhappy with dynamically allocating memory in that situation, let alone dynamically allocating the entire compute resource.
I'm currently working on a multiplayer game design on these principles.
Nope. Most designers pick a particular mechanic and try to make that exact mechanic work over the network. If you abandon the particular mechanic and aim on meta-goals, one of which might be no jitters, resyncs, and (visible) alternate timelines, then you can eliminate everything you just mentioned above.
Hello world is a game where players drop shapes into a world and they automatically sync / appear on other devices.
Sorry about the plug -- great article Justin! I sent it to my team because I think it does a great job stepping back from the buzzwords and actually talking about the problems distributed databases have to confront.
great job stepping back from the buzzwords
Those pesky buzzwords. You were just caught out being misled by one. (sync -- which typically means what you thought it did, but also has a generalized meaning)
Layman question. I wonder how important this result really is. This is an impossibility result in a certain model, where processes are deterministic. It's certainly a nice theoretical result but in practice, there are probabilistic algorithm that solve this problem.
I don't know what are the probabilistic bounds of probabilistic consensus algorithms, but if it's arbitrary low, the impossibility result for deterministic processes is irrelevant isn't it?
After all, if we can live with a super low probability of a meteorite destroying the planet, so can we with a good probabilistic consensus algorithm.
Simultaneity isn't just for machines -- it is necessary for people being connected together online as well. Time sync is a huge part of creating a shared experience, and this will become more widely appreciated as virtual reality develops socially.
We solved this in a very limited way at rapt.fm for our timed rap battles. We maintained a shared clock tick (with adjustment), allowing UI and in-game events -- e.g. the beat kicking in -- to happen somewhat simultaneously across browsers. This helped make up for the latency of video, and created a feeling that people were together at the same "place".
1. http://research.microsoft.com/en-us/people/mickens/thenightw...
Sometimes the hardest part is getting them to understand exactly how much of what they take for granted is an illusion... Often even before any packets leave their machine.
That's still more than enough time to require algorithmic trickery from games in order to provide the illusion of "real-time" gaming over the internet.
>..., this is really a "many writes, chosen unpredictably, will be lost" policy—but that wouldn't sell as many databases, would it?
The idea that there is no 'now' is of course, preposterous, we have very strict laws of physics supporting the concept of now (sans relativity) and eventually our engineering will be able to track that very accurately. Spanner is a step on that journey.
These kinds of 'impossible' articles will appear very dated in 10 years time, as they are really over exaggerating the rules-of-thumb of the previous 10 years.
Laws of physics "sans relativity" aren't the actual laws of physics in our universe. There very much is no now except the now that is also here. Its quite accurate to say that simultaneity does not exist in distributed systems, and simultaneity is less valid as even an approximation the more widely distributed a system is.
[1] http://chimera.labs.oreilly.com/books/1230000000545/ch01.htm...
[2] http://www.eecs.berkeley.edu/~rcs/research/interactive_laten...
If 3D chip designs let us take a couple racks of hardware into a few centimeters, then perhaps there will be a more Newtonian "now" possible, but it seems that once we can do that we'll just want to build even bigger systems that can analyze more data, and get back to distributed systems again.
Relativity means that observers with different velocities view timings differently.
Mere distances do not matter.
Pick a reference frame for your protocol, and relativity stops being a problem. (Hint: If all endpoints are on Earth you probably won't have enough precision to even need to compensate for relativistic effects.)
Hardware is unreliable. Software is possibly less reliable. We have known that for a long time. The author talks on a conceptual level about logical time, but this concept isn't enough to understand the real challenges & possible solutions of keeping interactions in your system logically ordered in the dimension of time[0].
> You can think of coordination as providing a logical surrogate for "now." When used in that way, however, these protocols have a cost, resulting from something they all fundamentally have in common: constant communication. For example, if you coordinate an ordering for all of the things that happen in your distributed system, then at best you are able to provide a response latency no less than the round-trip time (two sequential message deliveries) inside that system.
Consensus protocols don't provide a logical surrogate for 'now', a log does that. The silver bullet for assuring that your transactions are ordered correctly is immutability[1]--"If two identical, deterministic processes begin in the same state and get the same inputs in the same order, they will produce the same output and end in the same state.[0]" It's important, from the perspective of the implementor, to understand that there are multiple pieces to this puzzle, and that each protocol has very specific details that can make or break the reliability and performance of a distributed system. This is similar to how a small bug in your cryptography code can expose the entire system to threat. Paxos itself can be implemented in a myriad of ways, and each decision the implementor makes must be well researched.
[0]http://engineering.linkedin.com/distributed-systems/log-what...
[1]http://basho.com/clocks-are-bad-or-welcome-to-distributed-sy...
Vector clocks and version vectors are variations on the Lamport clock concept (and that quote is from a paragraph that mentions Paxos, another Lamport invention, cited in the bibliography).