There are strongly consistent scalable "NoSQL" systems e.g., BigTable. Megastore even provides complex distributed cross row transactions.
In a well tuned system, loss of availability (in a failure scenario) could be minimize to seconds. What this means for performance is that you have systems which encounter second-long latency spikes upon failures.
It also isn't a binary switch:
1) Quorums can be used to achieve read-your-write consistency in the case of simple failures (loss of 1 node out of 3).
2) There's multiple kinds of relaxed consistency models. One of them is serializable consistency: you may get a stale read, but the the order of reads is the same as the order of writes. This is used by PNUTs and can be achieved by serializing the writes through a single master. This means that there's (again, for a short time period) loss of write availability in a failure, but there's no loss of read availability.
3) Paxos/multi-Paxos can be used to achieve atomic writes (all available nodes receive the write) while withstanding simple failures (similar to quorum protocols... and I believe multi-Paxos uses quorum protocols under the cover to improve liveness over "raw" Paxos). This is at the cost of higher latency (and complexity). [Edit: In this case, you're dealing with full-blown strong consistency, but with -- at the cost of latency -- the ability to tolerate certain kinds of simple failures/trivial partitions]
Clustrix looks interesting, but it looks like it addresses the scalability and performance issues with RDBMS, but not the availability feature. If an RDBMS were to drop the "A" and "I" in ACID (C in "ACID" means serializable view of the execution, which is not the same as the C in "CAP": the latter means all nodes in a cluster agreeing upon what the data is which isn't required for the former), it would be possible to build a highly available, low latency RDBMS; but it would also not be as useful without atomic and isolated cross row transactions.
[Disclaimer: I work on a Dynamo-style database, but fond of PNUTs as an architecture (more difficult to implement, but IMO a better fit for plurality of web applications) and generally fascinated and interested in distributed systems, databases and systems programming in general]
In an ACID system, that means my transactions either happen or don't, and it's on me to figure out whether they did or not. Sometimes I'll need to do some investigating after a failure to find out what succeeded and what rolled back.
In an eventually consistent system, I need a contingency plan for every operation on the system. Now that contingency plan might be very simple, such as "Whatever happened, happened and last writer wins." The CALM conjecture paper does a decent job of describing when this might be a workable contingency plan. More complicated operations require increasingly complex contingency plans. This is why Cassandra is now offering counters as an API feature, because something as simple as counters is really hard to get right in all failure scenarios.
If you fit the CALM description and require multi-data center availability, then EC is probably a good bet. It may still be a good bet otherwise, but the availability comes along with a much more difficult app development picture, failures or no.
This is specific to Cassandra, where the development team chose not to implement optimistic locking via version vectors. That said the counters patch essentially implements a vector of counters, very similar conceptually to a vector clock.
That said, there are some scenarios where quorums (in absence of an agreed upon view of the cluster e.g., a zookeeper-based failure detectors) could mean concurrent vector clocks, which means it's up to the application to reconcile it.
If your application can not reconcile a split brain scenario on its own and depends on a total order then yes, it's not a good fit for EC.
As I've said, I also think that serializable consistency is easier for applications to deal with (they can assume a total order) than eventual consistency but provides the strong benefit of read availability (something, quite frankly, most Internet applications require).
Most complex applications have points where consistency can be relaxed (either somewhat or fully) and points where consistency is needed.
In "we chose consistency" and "we chose availability" cases, the applications will have to deal with the consequences of those choices: much as developers may handwave the implications of an eventually consistent system (since it returns right results most of the time, not thinking whether or not their application is sensitive to a non-serializable order) they might also implement ad-hoc, poorly thought out HA solutions on top of strong consistent systems -- often times losing both availability and consistency.
The typical usage scenario of masking MySQL failures and latency by using memcache comes to mind. VoltDB is very correct to remove the need to use memcached for read latency (a caching layer in front of a database which already a cache which sits on top of a file system with a cache is a caching layer too many), but in many cases I've seen memcached used to accept reads (and sometimes writes) while the underlying database is unavailable.
Finally, the whole debate about atomicity is quite meaningless if (like many applications) you're fronting your data access layer with an RPC framework (your RPC call may go through to a database and perform a write even it times out to the application i.e., implying non-atomicity).
The CALM paper is a big move into the right direction: being able to determine synchronization points at which ACID transactions (or weaker forms of synchronizations if permissible e.g., quorums and version vectors) can be used. Incidentally, I also think a declarative language (whether like Bloom/Bud or closer to SQL) is a good way to express the constraints and let a system (which may not be all that different from a query planner) determine whether those points occur.
http://perspectives.mvdirona.com/2010/02/24/ILoveEventualCon...
Note that his "right answer" for performance (not availability, so it's a bit of an aside from your comment) is to dynamically repartition, which is what Clustrix does. Note that we can actually do this type of repartitioning on just the "hot" data, using MVCC to avoid any downtime (unlike Mongo, which blocks all writes).
By the way, the Google Megastore project I believe you're referring to is not actually eventually consistent in the same way that, say, Cassandra is. Megastore is fully ACID within an entity group, meaning that lowering eventually consistency down to the per-node level was not the solution that Google went with. BigTable is also not eventually consistent.
*Clustrix employee
> Note that his "right answer" for performance (not availability, so it's a bit of an aside from your comment) is to dynamically repartition, which is what Clustrix does. Note that we can actually do this type of repartitioning on just the "hot" data, using MVCC to avoid any downtime (unlike Mongo, which blocks all writes).
Dynamic repartitioning is also what HBase/BigTable do ("tablet splitting"). The MVCC approach is interesting way around the unavailability window, by the way.
> By the way, the Google Megastore project I believe you're referring to is not actually eventually consistent in the same way that, say, Cassandra is. Megastore is fully ACID within an entity group, meaning that lowering eventually consistency down to the per-node level was not the solution that Google went with. BigTable is also not eventually consistent.
Yes, I did not mean that Megastore is eventually consistent. It is not, it's ACID compliant and uses Paxos. Real question: was this not stated clearly in my comment (I'd like to edit to clarify if it is)?
Megastore, however, is an example of a "NoSQL" system and I meant to use it as proof that not all "NoSQL" systems give up consistency: ACID is orthogonal to the query language.
Quite frankly I find the whole query language debate (and "NoSQL" marketing gimmick) to be quite pointless, the distributed systems and systems programming aspects are a lot more interesting to me.
1. A DBMS is much more than just the interface. I'm going to write more on the subject. Whether you're using SQL, datalog, BSON, etc. -- there's a broader set of desirable features that's more important than the query language.
2. I'm not saying I agree with what the NoSQL folks are saying, or their justifications. Let's be honest: there's a prevalent sentiment that relational SQL based databases do not scale, and even further, that they somehow cannot scale. That's just not true.
I saw a video the other day of some guy at Google giving a talk about the database behind the app engine. At one point someone in the audience asked about scale and SQL. His response was "Well, how well does SQL scale?" Everyone in the room laughed.
Sell your strength, don't attack others weakness. Demonstrate how your product is actually very similar to megastore (strong consistency, use of distributed transactions) and differs only in the query language ("GQL" itself is very similar to SQL to the point where an application developer could care less about which one is using). It looks like this is case the case (except I believe you guys are using 2PC instead of Paxos, in which case you want to clearly state the availability scenarios which Paxos addresses and at what costs).
Of course the problem with systems like megastore is that the overhead they introduce is seldom worth the compromises the application developers have already made. Incidentally, form what I hear, Megastore is not used very frequently outide of the app engine: application developers at Google itself are comfortable with using Spanner/BigTable directly but aren't comfortable with the additional overhead of distributed transactions as used by Megastore. App engine, customers, on the other hand aren't particularly interested in low a in the first place but aren't used to design patterns used on top of systems like BigTable.
By the way, I think Clustrix is a great product. I remember reading the white papers/specs when it came out and I'll make sure to scan it over again. Just why in the (imo, useless) SQL/NoSQL marketing circus rather than focus on the product's strength?
I personally think MongoDB is a poor comparison choice: it has the buzz, but there are some serious flaws in its distribution architecture (namely ill defined consistency and partitioning model) and the outdated disk persistence story (Mendel Rosenblum wrote his dissertation in 1992; building a log structured database is no less risky than cloning circa-1995 MyISAM). Comparison with another strongly consistent system like HBase or HyperTable would be more fit.
http://sergeitsar.blogspot.com/2011/02/mongodb-vs-clustrix-c...
(nb - I also work at Clustrix)
But, companies like Twitter and Netflix certainly have the budget for Exadata (which is what clustrix wants to be when it grows up); they are using Cassandra instead not just because it scales on commodity hardware (the other price factor besides licensing) but also because it works across multiple datacenters which two-phase commit systems can't do no matter how big your budget is, and because its availability (failure tolerance) model is much more robust.
(Many NoSQL systems don't provide these advantages either, which is why lumping all non-relational systems together is usually not helpful.)
http://gigaom.com/cloud/clustrix-lifts-the-curtain-on-early-...
And plenty of folks use MySQL (and PogreSQL to a much lesser extent). You just can't scale those.
Your benchmark is pretty much irrelevant because Clustrix sells hardware appliances, not the software standalone.
Edit: That's not to say MongoDB didn't fail your benchmark. But the issues you highlight (single mutex) are known. Test against a real database and your results don't look that impressive for a 10 node setup.
All that said, Clustrix looks great. Just make it available sans-hardware.
Edit: twitter cassandra link (don't know if this is the latest): http://engineering.twitter.com/2010/07/cassandra-at-twitter-...
I'd be more curious to see how Clustrix performs against Cassandra, Riak or HBase in their respective domains. Those seem to be the more serious contenders when it comes to "Big Data".
I'm excited by some of the underlying technology in Mongo and its ilk (gossiping protocols, etc.), but there's no doubt that they require different programming techniques than traditional RDBMS. Bit like apples and oranges, isn't it?
EDIT: I noticed you changed RDBMS to "big data". I'm still curious if you have any pointers to fair benchmarks though.
I was just trying to point out that MongoDB is too easy a target here. The problems it has under high load are fairly well-known, at least to anyone who tried to benchmark it outside of their MacBooks. Just bulk-load a couple million records and watch it tip over if you don't believe me - I'm not making it up and neither is Sergei.
However if Clustrix wants to impress with benchmarks then they should pick an equal opponent. MongoDB is not exactly relevant for companies that consider the calibre (and cost!) of Clustrix.
//ps. i am a huge nosql fan dont downvote! :)
Now, where are the docs? Again, Mongo has great docs http://wiki.mongodb.org/display/DOCS/Home
How can I verify your claims?
Oh that's right, you call a salesperson first....
Unfortunately, they've got a terrible marketing problem.
Before 1998 or so, a relational database was an expensive product that you got from a vendor like Oracle. Since then, a generation of people have grown up that think about using a commercial RDBMS the same way most of us think about putting our hands in a toilet.
@va_coder hits the nail right on the head, it's not just the cost of the product, it's the cost of the buying process.
If I want to trial a product that is open source or has an OS or free edition (that could be mysql, mongodb, postgres or even Virtoso OpenLink or SQL Server Express) I can download it, read the docs and play around with it and learn a lot in a few hours. I might learn that the product is not for me, or I might get a positive impression and feel ready to commit coding time to it.
If I want to trial Oracle or Clustrix, well, I'm going to have to start a contact with a sales organization and then they need to pre-qualify me, and then they need to qualify me and then I'll spend a few hours on the phone talking to people (which might take a few weeks in wall-clock time.)
Even if they give me a 30 day free trial, I could easily spend $500+ of my time just getting the trial... And once I've gotten to the point where I'm negotiating with one vendor I'm going to feel a lot of pressure (internally or from my superiors) to talk with some competitive vendors too to make sure I'm making the right decision.
It's a shame because, certainly, a company like Clustrix could use the revenue they get from product sales to support an awesome development team and really deliver a better product. On the other hand, they have marketing channels that are aimed at large organizations that can afford an expensive buying experience... and it's an expensive selling process for them when they've got to do "complex sales" that require approvals from a large number of stakeholders. They've got to pass those costs onto you.
The trouble with this model is that tomorrow's large organizations are today's small organizations. Today, Facebook could afford just about any commercial software that's out there. However, they made critical technology decisions (that are difficult to reverse) back when they were a little company that could only afford MySQL.
Here's a quick example:
A few years back, I was writing Smalltalk web systems and evaluating which commercial implementation to use. The two big ones that bubbled to the top of the list then were Gemstone and Cincom.
Cincom offered a relatively familiar set up which would require using sticky sessions for the web server, and another backend database like PostgreSQL.
Gemstone offered pearls on a platter: shared session state across all the backends, and a build in distributed object database. I thought it would be brilliant, and reading their blogs and documentations gave no hint of any pitfalls.
Cimcom was a download away---as were any of the free smalltalks---but Gemstone then required you to contact their sales staff before playing with it. So I made a mistake and chose Gemstone, and the system was built to target their as of yet unseen architecture. When I finally got to use Gemstone, it was a mess!
If I had that experience to start with rather than just their glowing self-produced reports, I'd have never picked Gemstone and instead gone with the familiar "kludgy" system that was "good enough".
If Clustrix offered cloud-based deployment of their database, they could have a small free tier for people to play around with. Add a console applicaiton for accessing the API. Make it really developer friendly.
Imagine if you could just type:
$ sudo gem install clustrix
$ clustrix create my-test-app
Created cloud.clustrix.com/my-test-app
username: my-test-app
password: xd634shx
If they had that, I'd be trying it out in a flash. Then what if they had a per-usage pricing system like AWS? Something where you pay per gigabyte. Something that allows you to test stuff out for a few dollars a mount, and then when you're ready, scale it right up to a few thousand or more.Here's a link: http://www.oracle.com/technetwork/database/enterprise-editio...
I would be very interested to see a comparison where a large and "real" data model (that contains 6-7 "joined" tables for the SQL setup and a single table with the embedded document model for the No-SQL setup) is injected into each of the technologies.
Also, it is just horrible style not to include the benchmark code for peer review. Delivers near-0 credibility.
db.posts.find({tags: {$in: ["foo", "bar"]}})
to: SELECT * from posts JOIN taggings ON taggings.post_id = posts.id JOIN tags ON tags.tagging_id = taggings.id WHERE tags.tag IN ('foo', 'bar');
(Single query tag lookup; naively joins the entire taggings and tags tables before limiting with WHERE)Or a "better" query with two subselects (yikes!)
SELECT * FROM posts where post_id IN (SELECT taggings.post_id FROM taggings WHERE taggings.tag_id IN (SELECT id FROM tags WHERE tags.name IN ('foo', 'bar')))
And that's the "find where any tag matches" case. Try the "when all tags match" case ($all in MongoDB), and you'll go grey a few years earlier.Secondly: even though it becomes clear eventually, you should mention up front your relationship with Clustrix. Just because you are a founder of Clustrix doesn't necessarily invalidate your findings, but full disclosure is always a good idea.
Where have I seen this before? Oh right.. every time I see "Technology A vs Technology B" comparisons.
Naturally his results are in his favor, otherwise he would not have posted them.
That said, there are certainly nosql systems with better scaling and concurrency stories* than mongodb out there that he could have benchmarked against. :)
*I'm a cassandra committer
i really think that sometimes SQL and the relational model is a problem. at least, the relational database design is a course in a university, along with (object oriented) programming. so, to properly use postgre or mysql in your sophomore web startup, you should know two things well, or have a clever db guy...
i have seen brilliant web programmers that design nightmare database schemas.
no-one would ever use 'eventually consistent' databases for financial or trading data, i think. they're for higly scalable consumer web projects
http://www.mongodb.org/display/DOCS/Benchmarks
you can see that most of them compare Mongo against MySQL. Certainly "rows" are being inserted into the latter.
I think it was a mistake to put this post on a blog with no other content. I think it leaves the reader with the impression that this is the only thing you want to contribute to the community... and as other commenters have pointed out that contribution could be perceived negatively.
Something to think about next time.
maybe i missed smth but i still saw only the suggestion for you now send your benchmark to mongodb guys.
edit: please discard, now i see there they say you should include joins to your clustrix benchmark. waiting for the reply
A more interesting attempt is IMHO to check how the difference in the data model of some NoSQL solution can lead to very different performances.
For instance Clustrix VS Redis can be interesting. Examples:
1) A lot of writes against a table where you require then to get things ordered by insertion time. With Redis is is just LPUSH + LRANGE. Try to do a read/write test where many clients are writing and reading at the same time (real world), against a table (or Redis list) with millions of elements.
2) Range queries when there are a lot of writes against this indexes. For instance a table with a score (we are modeling an online game leaders board), a lot of inserts of new scores. Get ranges between random intervals at the same time. Again, many clients writing, many reading.
The fact that this has spurred the others to start building scalable RDBMSs is great. But let's not pretend these new RDBMSs won't have compromises, they'll just have different ones.
The important thing is for developers to make smart decisions about what tradeoffs make the most sense for scaling their application. Different applications will require different tradeoffs.
Disclaimer: I work for VoltDB.