See this tweet by @aphyr: https://twitter.com/aphyr/status/542755074380791809
(All credit for the idea in this comment is due to @aphyr)
Basically because the transactions modified keys selected from a uniform distribution, the probability of contention was extremely low. AKA this workload is basically a data-parallel problem, somewhat lessening the impressiveness of the high throughput. Would be interesting to see it with a Zipfian distribution (or even better, a Biebermark [0])
[0] - http://smalldatum.blogspot.co.il/2014/04/biebermarks.html
Equally wrong is his assumption that this has any bearing on the performance of FoundationDB. In fact FoundationDB will perform the same whether the conflict rate is high or low. This isn't to say that FoundationDB somehow cheats the laws of transaction conflicts, just that it has to do all the work in either case. There is no trick or cheat on this test--this same performance will hold on a variety of workloads with varying conflict rates as well as those including reads.
Presumably you will want to retry conflicting transactions, so you generally would not count them towards your throughput.
For example if I commit 100 transactions per second, and 90% of them return conflicts, I am only successfully committing 10 transactions per second.
Of course most real world workloads are (hopefully!) no where near 90% conflicts. If you had a 90% transaction conflict rate you could expect FoundationDB performance to drop by about 30-60% due to retries (and, worse news, you would need to rethink your architecture).
because there are no reads, any serialization order is possible and therefore there is no chance of conflict
I don't understand how this is the case. If clients A and B try to write to the same key, it can be serialized as {A,B} or {B,A}, but in either case, there is some kind of conflict... no?
Sorry, I don't like that at all.
http://en.wikipedia.org/wiki/Hertz
If anything you are counting is happening 14.4 million times a second, then that thing is happening at 14 megahertz. It's just a unit. If there are 14.4 million writes per second, then the writes are occuring at 14.4 mega hertz. That's just the definition of hertz.
Haha, Google agrees, check it out: https://www.google.com/?q=4+per+second#q=4+per+second
The reference is to frequency, not average or typical rate; there's a difference between a tone and pink noise.
But I disagree that the Bq is appropriate for this example; the becquerel quantifies radioactivity:
> The becquerel (symbol Bq) (pronounced: 'be-kə-rel) is the SI derived unit of radioactivity. One Bq is defined as the activity of a quantity of radioactive material in which one nucleus decays per second. (https://en.wikipedia.org/wiki/Becquerel)
I remember evaluating a few low latency key-value storage solutions, and one of these was Stanford's RAMCloud, which is supposed to give 4-5 microseconds reads, 15 microseconds writes, scale up to 10,000 boxes and provide data durability. https://ramcloud.atlassian.net/wiki/display/RAM/RAMCloud Seems like, that would be "Databases at 2000Mhz".
I've actually studied the code that was handling the network and it had been written pretty nicely, and as far as I know, it should work both over 10Gbe and Infiniband with similar latencies. And I'm not at all surprised, they could get pretty clean looking 4-5us latency distribution, with the code like that.
How does it compare with FoundationDB? Is it completely different technology?
"Many more issues remain, such as whether we can provide higher-level features such as secondary indexes and multiple-object transactions without sacrificing the latency or scalability of the system. We are currently exploring several of these issues."
Sounds it doesn't provide multi-key ACID transactions at the very least.
It should be compared to Memcached, not to a real storage engine.
* Automatic replication, crash recovery, and fail-over
(no loss of availability if a server fails)
* Durability guarantee: data is always replicated
and durable before operations return, without
significant performance hit (subject to the
requirement for persistent buffers on backups).Nothing stops you from replicating memcached.
We continue to use it for more and more data access patterns which require strong consistency guarantees.
We currently store ~2 terabytes of data in a 12 node FDB cluster. It's rock solid and comes out of the box with great tooling.
Excited about this release! My only regret is I didn't find it sooner :)
FoundationDB requires SSD drives, which the best we can get efficiently in our data center is ~670 GB of usable space (3x480GB raided).
12x670 = 8040 GB
We try to keep extra space available for node failures (FDB will immediately start replicating addition data if it notices data with less than the configured number of replicas).
Our dataset currently grows at a decent pace, so we over provision a bit as well.
Is it really the first Distributed DB project to have built a simulator ?
Because frankly, if that's the case, it seems revolutionary to me. Intuitively, it seems like bringing the same kind of quality improvement as unit testing did to regular software development.
PS : i should add that this talk is one of the best i've seen this year. The guy is extremely smart, passionate, and clear. (i just loved the The Hurst exponent part).
I do believe this is unique among publically-available distributed databases.
One of the links leads to an interesting C++ actor preprocessor called 'Flow'. In that table, it lists the performance result of sending a message around a ring for a certain number of processes and a certain number of messages, in which Flow appears to be fastest with 0.075 sec in the case of N=1000 and M=1000, compared with, e.g. erlang @ 1.09 seconds.
My curiosity was piqued, so I threw together a quick microbenchmark in erlang. On a moderately loaded 2013 macbook air (2-core i7) and erlang 17.1, with 1000 iterations of M=1000 and N=1000, it averaged 34 microseconds per run, which compares pretty favorably with Flow's claimed 75000 microseconds. The Flow paper appears to maybe be from 2010, so it would be interesting to know how it's doing in 2014.
I wish I was an investor in them.
[1] http://www.statisticbrain.com/wal-mart-company-statistics/
32 c3.2xlarge instances have 1920GB memory. Given 1 billion 16B+ 8..100B values the whole dataset fits just into memory.
The Cassandra test mentioned [1] sustained loss of 1/3 instances. That's very impressive! Would love to see how F-DB handles this type of real-life situation (hint hint for follow up blog post).
[1] http://googlecloudplatform.blogspot.cz/2014/03/cassandra-hit...
[Edit, more info] They seem to run with multiple clients which also stresses the system. From their explanation * The clients simulate 320,000 concurrent sessions
The best source for DB benchmarking I know of is http://www.tpc.org/. The methodology is more complicated there, but the top results are around 8 million transactions per minute on $5 million systems. This FoundationDB result is more like 900 million transactions per minute on a system that costs $1.5 million a year to rent (so, approx $5 million to buy?).
The USD/transactions-per-minute metric is clear, but without a standard test suite (schema, queries, client count, etc.), comparing claims of database performance makes my head hurt.
In the NoSQL world many people have converged on a workload of 90% reads/10% writes to individual keys. We show 90/10 results on our performance page [1] but in this test we do 100% writes to stress the "transaction engine", which processes writes.
Since we have our SQL Layer [2] as well, we will run some more-comparable SQL tests in the future.
Nice performance page. It pre-answers my follow up question, which was how linear is scaling with more cores? Looks solid.
However I think there's still plenty of room to grow.
320,000 concurrent sessions isn't that much by modern standards. You can get 12 million concurrent connections on one linux machine, and push 1gigabit of data.
Also, 167 megabytes per second (116B * 14.4 million) is not pushing the limits of what one machine can do. I've been able to process 680 megabytes per second of data into a custom video database, plus write it to disk on one 2010 machine. That's doing heavy processing at the same time on the video with plenty of CPU to spare.
PCIe over fibre can do many transactions messages per second. You can fit 2TB memory machines in 1U (and more).
Since this is a memory + eventually dump to disk database, I think there is still a lot of room to grow.