Failing with MongoDB (opens in new tab)

(blog.schmichael.com)

154 pointslenn0x14y ago122 comments

122 comments

69 comments · 12 top-level

nomongo14y ago· 40 in thread

Why is a database that fails so easily and most of the time even loses data so popular? Is it really all just a huge marketing budget?

hello_moto14y ago

There are tons of reasons for that. Let me pull some of them from my butt:

Reason #1: Devs aren't Ops.

Reason #2: Devs need something new on their resume.

Reason #3: Certain type of Devs would read blogs and get excited and skipping scientific mumbo-jumbo and directly take the blogs as _the_ source of truth.

Reason #4: It's easy to bootstrap (schemaless, etc) your weekend project. Dealing with DB apparently is tedious for devs.

I'm sure others can add more...

Let me feel your love HN-ers ;)

viraptor14y ago

Why do I get the feeling that you're an op and look down on development people? If that's really true, try to start developing some project and see how you like frequent schema changes, trying to synchronise schemas with peers, resolving relation issues when merging features, etc. On the other hand if you abstract your interaction with data enough, you can change the whole backend later once it's stable and not care about it up-front.

What I hear you saying is unfortunately - it's worse for ops, so noone should use it.

zzzeek14y ago

> On the other hand if you abstract your interaction with data enough, you can change the whole backend later once it's stable and not care about it up-front.

have to disagree with this. By forcing yourself to work with a data layer so abstracted that you can't even reference whether you're dealing with a JSON document or a set of twelve joinable tables, you're going to write the most tortured and inefficient application. Non trivial applications require leaky abstractions.

1 more reply

quanticle14y ago

On the other hand, if you abstract your interaction with data enough, you can change the whole backend later once it's stable and not care about it up-front.

This is a conception of data that is more true in theory than it is in practice. In practice, if you want to query your data and efficiently, you'll need to worry about how it's stored. You'll have to worry about the failure cases.

Of course, it is definitely application dependent. If you're just writing a Wordpress-replacement, you can probably choose whatever data store you want and just write an abstraction layer on top of it (especially if you don't care about performance). On the other hand, if you're looking at querying and indexing terabytes (or more) of data, you'll have to work very closely with your data store to extract maximum performance.

einhverfr14y ago

"If that's really true, try to start developing some project and see how you like frequent schema changes, trying to synchronise schemas with peers, resolving relation issues when merging features, etc. On the other hand if you abstract your interaction with data enough, you can change the whole backend later once it's stable and not care about it up-front."

I can sort of sympathize with this a little. I used to use MySQL for schema prototyping and then move stable stuff to PostgreSQL back when PostgreSQL lacked an alter table drop column capability.

However today, this is less of a factor. Good database engineering is engineering. It's a math intensive discipline. Today I work often with intelligent database design approaches, while trying to allow for agility in higher levels of the app.

Don't get me wrong, NoSQL is great for some things. However it is NEVER a replacement for a good RDBMS where this is needed.

sausagefeet14y ago

> If that's really true, try to start developing some project and see how you like frequent schema changes, trying to synchronise schemas with peers, resolving relation issues when merging features,

How does something like MongoDB actually help with this, though? Certainly a lack of a schema lets you be more nimble in changing it, but you still have to write code to handle whatever schema you decide rather than letting a battle tested RDBMS handle it. I think NoSQLs have their uses but not forcing correctness on your data as a feature is not one of them. But I also believe in static typing.

hello_moto14y ago

I've been dev since I graduated. I had been everything else during my intern (dev, qa, test-automation developer, tools dev, build engineer, integration engineer, etc).

Are you saying that Rails schema migration can only solve 10% of your migration needs? That kinda suck bro.

2 more replies

Devilboy14y ago

Right there with you. The excuses I hear for not wanting to use a good 'ole RDBMS just does not make sense to me sometimes. CREATE TABLE too hard? Time consuming? Difficult?

Those who do not study the history of databases are doomed to repeat it. Soon we'll add back row-level write locks, transaction logging, schemas, multiple indexes and one day they wake up with MongoSQL.

mnutt14y ago

You say that like it's a failure on the devs' part, but that's kind of like blaming regular users for not switching to Linux because they don't like editing network configuration files.* It's masking the problem that there are real developer-friendliness issues with the existing databases. And taunting users will not get them to switch back.

* but then Linux distros get network autoconfiguration and suddenly it's obvious that it was the right solution all along.

2 more replies

hello_moto14y ago

Well, mostly the migration part that Devs don't want to deal with.

Most modern languages have migration utilities (Flyway for Java, Rails migration for Ruby, Python should have their de-facto migration for Django by now or else they fail hard, and JS... well.. let's wait until Node.js users decided to use RDBMS).

1 more reply

einhverfr14y ago

First of all, RDBMS's are not perfect for all jobs. They have serious shortcomings in some areas (semi-structured and unstructured data for example)

A lot of other stuff can be stored in an RDBMS but isn't really optimal for it. Ideally hierarchical directory servers LDAP don't run directly off a relational db.

So there are places for other forms of stores, from BDB to XML, but these cannot and should not replace RDBMS's for most critical tasks.

(Also there is room for real improvement in certain areas of relational constraints in RDBMS's today, but NoSQL moves the wrong direction there.)

japherwocky14y ago

Well here's a fuck you back from a dev: my time is finite and everyone wants a piece of it; If I can save an hour a day by never having to think about my database? If I can shave a week or two of labor off a project?

It's really easy to work with. This is why people keep using it.

hello_moto14y ago

Back at you boy. I'm a dev. I hate dealing with other dev that wasted my time just because he ain't lover with RDBMS and decided to write more code and add more infrastructure components (that includes message queue unless you absolutely have no choice).

My time is finite. Ops time is finite. Obviously you decided to dick around with mine and Ops. How bout I send you to the QA department to write automation and software tools so you don't dick around with production code?

You can write with any language and any storage systems you'd like there.

1 more reply

ehthere14y ago

What about all the time you'll waste debugging your app because you can't make good assumptions about the structure of your data?

1 more reply

rbranson14y ago

10gen has focused strongly on ease of adoption, which seems like the highest priority of MongoDB at this point. From what I can tell, the idea is to get everyone using it, and then "scale" it once you've got people willing to pay out $ for fixes, but sometimes bad decisions made early on (like the global locks and in-place updates) are harder to change than originally thought.

stoneg14y ago

yes, we are using mongo from 1.6.3. Reliability, Locking and Data security (not losing data) are never first priorities on their to-do list, they just push new features and busy doing marketing propaganda about how web-scale mongodb is(which is fake). I submitted a jira issue about losing data when sync a slave, it's already 3+ months, all they did is let me try the new releases to see if it fix the problem. I tried the latest 2.0.1 release, and it's still cause data loss. Every time I sync a new slave, I pray to god, hoping not lose data.

How come a DB lose data so frequently and it sill call itself web-scale? It just breaks when you need scale!

For auto-sharding it's also super unreliable, tried once and it failed, and now we are using a lib that do application level sharding. We are also considering move to other databases that at least know not losing data is the first and most important thing of a DB.

Some one summarized the issues of mongodb, http://pastebin.com/raw.php?i=FD3xe6Jt , we experienced most problems in the article. So just a remind for someone who want to create serious product using mongodb, read the article, it's not FUD, it's just so true that I hope I read it 1 year ago, so we don't have to try moving so much legacy data to a new database solution.

hello_moto14y ago

Not a bad business strategy. Kinda like MySQL back then right?

cperciva14y ago

Sadly, it seems that "give them free crap and then charge for fixes" is a very common business model in the open source world.

2 more replies

rdtsc14y ago

I have been asking that too and I concluded that it is due to dishonest marketing. Up until a couple of months ago they basically shipped a database product with disabled singe server durability. That fact should have been written in bright flashing red letter warning on their front page, it wasn't. So it made for very fast benchmarks, because everyone benchmarks for speed, not many benchmark for failure.

BarkMore14y ago

The the data model contributes to it's popularity. A document store with indexes on document fields is very convenient for several types of applications.

jethroalias9714y ago

It's interesting that couchdb gets little love (as evidenced by google trends), but it has document storage by index, easy enough to install, copy on write so has no global lock, sharding with bigcouch, and all client access is entirely REST... it may be couch is a little hard to grok, I dunno.

rkalla14y ago

It isnt packaged and marketed as cleanly. It takes hunting and learning on your own to get good with couch, and map reduce is no where as easy to get started with as mongo queries (although the addition of unql may improve things next year).

Also there is no single, central steward and authority on couch. All of this stymies traction and confidence even though the tech is great.

2 more replies

wisty14y ago

Couch doesn't have great documentation, and doesn't have official native client drivers. Oh, and it's slow (though you can tune it, and it doesn't crumble under load).

ehthere14y ago

You can do exactly the same document store with indexes on any RDBMS.

BarkMore14y ago

Yup. See http://bret.appspot.com/entry/how-friendfeed-uses-mysql for an example. MongoDB is more convenient to program because it's all built-in.

viraptor14y ago

If I never expect the dataset to grow past 1GB and a single server, why would I use anything else? It doesn't really fail - none of the issues described were "failures" really. [edit: just to be clear, it didn't crash and burn, I don't think performance issue == failure] The data loss was not confirmed either: "There appears to be some data loss occurring" and in small deployments you can just use transaction log.

There's no other project I know of, which provides: schemaless json documents, indexing on any part of them, server-side mapreduce, lots of connectors for different languages, atomic updates on part of the document. If there is one and it's better than mongo, I'd switch any moment.

cscotta14y ago

>> "It doesn't really fail - none of the issues described were "failures" really."

These absolutely were failures.

The author listed several instances in which the database became unavailable, the vendor-supplied client drivers refused to communicate with it, or both. Some of these scenarios included the primary database daemon crashing, secondaries failing to return from a "repairing" to an "online" state after a failure (and unable to serve operations in the cluster), and configuration servers failing to propagate shard config to the rest of the cluster -- which required taking down the entire database cluster to repair.

Each of the issues described above would result in extended application downtime (or at best highly degraded availability), the full attention of an operations team, and potential lost revenue. The data loss concern is also unnerving. In a rapidly-moving distributed system, it can be difficult to pin down and identify the root cause of data loss. However, many techniques such as implementing counters at the application level and periodically sanity-checking them against the database can at minimum indicate that data is missing or corrupted. The issues described do not appear to be related to a journal or lack thereof.

Further, the fact that the database's throughput is limited to utilizing a single core of a 16-way box due to a global write lock demonstrates that even when ample IO throughput is available, writes will be stuck contending for the global lock, while all reads are blocked. Being forced to run multiple instances of the daemon behind a sharding service on the same box to achieve any reasonable level of concurrency is embarrassing.

On the "1GB / small dataset" point, keep in mind that Mongo does not permit compactions and read/write operations to occur concurrently. As documents are inserted, updated, and deleted, what may be 1GB of data will grow without bound in size, past 10GB, 16GB, 32GB, and so on until it is compacted in a write-heavy scenario. Unfortunately, compaction also requires that nodes be taken out of service. Even with small datasets, the fact that they will continue to grow without bound in write/update/delete-heavy scenarios until the node is taken out of service to be compacted further compromises the availability of the system.

What's unfortunate is that many of these issues aren't simply "bugs" that can be fixed with a JIRA ticket, a patch, and a couple rounds of code review -- instead, they reach to the core of the engine itself. Even with small datasets, there are very good reasons to pause and carefully consider whether or not your application and operations team can tolerate these tradeoffs.

rbranson14y ago

Just to be 100% clear -- so people don't misunderstand your explanation of Mongo's compaction: Mongo does have a free space map that it uses to attempt to fit new data or resized documents into "holes" left by deleted data. However, compaction will still eventually have to be ran as the data will continue to fragment and eventually things get bad.

rhizome14y ago

The data loss was not confirmed either: "There appears to be some data loss occurring"

Oh, this mystery is a failure all right, and even the most charitable interpretation would call it a misfeature.

japherwocky14y ago

what do you think of redis? I feel the same way about Mongo for the most part, but have been considering switching.

rkalla14y ago

If you can model your data in redis data structures it is excellent. Keep in mind that there is no preferred mechanism for operating redis when data is larger than ram. There is vm and diskstore, both deprecated by antirez, and a focus on data sets that fit in ram.

If you can do both of those things, it is awesome.

InclinedPlane14y ago

So what's the preferred alternative noSQL wise?

MongoDB is flaky. CouchDB is a maintainability nightmare, so I hear.

Riak? Cassandra? Or does everything else have some other equally huge down-side?

rkalla14y ago

They all have their warts. For every story like this, there are petabyte deployments of your favorite datastore that work fine.

For every X sucks article, ther is Y is awesome.

In the nosql world the only way to choose is around the problems they solve... They are each specializing and optimizing for certain nitches. mongo is the most mysql-esque, but dosnt do things that redis, couch or cassandra do that you may need.

There is no clear winner (fortunately or unfortunately dependng on what you were hoping for)

InclinedPlane14y ago

Have there been any new entrants in the last few years? Seems like innovation has stalled a bit and stabilization / improvement hasn't caught up yet.

1 more reply

espeed14y ago

Many are moving to Neo4j, including some of the major social networks.

rkalla14y ago

Ahh good catch, I forgot about Neo4j. I'd like to see some non-graph deployments on it and see how it performs. From what I've seen with social/networked data models it looks very compelling.

Excited to see that DB get more and more traction.

fdr14y ago

It has a pretty good user experience, except for all the details. But the model isn't bad; it should be learned from. On the other hand, there is no trade-off made by Mongo that I'm aware of that is not fundamentally unavailable to more mature projects in a tractable amount of engineering time, so the question comes down to "does Mongo shed its reputation for lulz soon enough" vs "do other projects witness and adapt".

Yet we've also seen in the past that shedding such a reputation is not strictly required to be popular. And marketing budgets do matter.

vannevar14y ago

Why is a database that fails so easily and most of the time even loses data so popular?

Perhaps because both of your premises are wrong? I've used Mongo for over a year now with ~1000 writes/sec and haven't seen any of these problems. I'm not saying they don't exist (some are confirmed bugs that have been fixed), but they're not nearly as prevalent as your 'Do you still beat your wife?'-style question implies.

FooBarWidget14y ago

Not all data is important enough that small losses are unacceptable. Analytics data that can be inferred from other sources, for example. Furthermore MongoDB supports autosharding while most (all?) SQL databases do not.

dextorious14y ago

Probably because the quality of CS graduates has been so low at recent years. MongoDB = oh, shiny, fast.

plasma14y ago· 4 in thread

Ravendb (www.ravendb.net) is a solid competitor.

icey14y ago

"Raven is an Open Source (with a commercial option) document database for the .NET/Windows platform."

I'm not sure it's a competitor at all. RavenDB is a CouchDB clone for .Net that requires a commercial license for proprietary software.

latch14y ago

Which has a ton of magic baked into the driver making it unlikely you'll get your data back out via anything but .NET.

rkalla14y ago

Well put.

plasma14y ago

Why the downvotes? Why would I bother mentioning alternatives next time, sheesh.

gojomo14y ago· 3 in thread

Maybe there's a niche for "PostgreNoSQL", a layer atop Postgres that you start using like a NoSQL solution. (Perhaps, it's string keys and JSON blob values.) It's not very efficient, except for simple keyed lookups, but it works enough for a quick start.

Then, as you use it, the system optimizes itself (or makes suggestions) based on actual access patterns. A subset of objects could be a formal, indexed table? Have it happen automatically or offer the SQL as a suggestion.

i3415914y ago

Conversely, you could have a NoSQL layer below Postgres, where PG stores and indexes metadata which tells it which, of many, small NoSQL dbs to find the actual data in. These data dbs then can be sharded/replicated across physical systems as you like. You loose some raw speed on reads, but avoid a global write lock and the system scales quite well. I've started playing around with such a system with https://github.com/cloudflare/SortaSQL

rasur14y ago

IIRC, there are people talking directly to InnoDB (MySQL backend) using it as a NoSQL style DB. You don't however get SQL analysis, you're bypassing the SQL side of things.

einhverfr14y ago

hstore?

christkv14y ago· 3 in thread

Seems to me they used the wrong setup they should have looked at a replicaset setup with secondaries for read and sharding if they needed more write performance and nonblocking reads. That said version 2 has less locking problems and I understand they are working on finer grained locking.

schmichael14y ago

Sorry, this is a pretty poorly written blog post. We're definitely using sharding+replica sets.

Replication of any kind won't help you with a high write load as secondaries have to apply the same number of writes as primaries.

christkv14y ago

They seem to be very aware of the problem and focused on solving it as soon as possible. I guess it's just a matter of time. Compared to how long it took MySQL to mature into a stable platform I've been pretty impressed at their responsiveness and quick improvements so far :).

christkv14y ago

seems from the comments in the post that 10gen went out of it's way to be helpful in resolving the issues ???

1 more reply

StavrosK14y ago· 3 in thread

All I need is a schemaless version of postgres (with ACID-compliance and everything), does anyone know of one?

ericflo14y ago

http://www.postgresql.org/docs/9.0/static/hstore.html

StavrosK14y ago

That's very useful, thank you!

rkalla14y ago

Keep in mind this isnt meant to be redis-on-postgresql http://archives.postgresql.org/pgsql-performance/2011-05/msg...

vegai14y ago· 2 in thread

All the commercial DBs have similar issues. Just deal with them and go on.

dextorious14y ago

No, they do not. Some joke DBs had some issues back in the day (MySQL comes to mind) but issues of such importance were solved looong ago.

vegai14y ago

No, all of them had, and most still do. People don't seem to realize how freaking old and complicated those things are.

amalag14y ago· 2 in thread

If your data is easily modeled relationally, go for relation, if you are going to change it constantly and is not a natural fit for a relational model, Mongodb is worth a shot.

From this article, sounds like their data is pretty seriously relational.

Mongodb has been pushing the ops side of their product, but I can agree it has failings there. To me the advantage is the querying and the json style documents.

gmcquillan14y ago

I'm not sure you read the article fully, because relationships were never described in the article. Instead, it was high read/update load which caused problems.

Mongo, on paper, should be an ideal candidate for this job; but, due to complications with the locking model and with its inability to do online compactions, it's failing.

amalag14y ago

Relation was a bad word choice, I meant easily modeled by a relational database system. Seems like your data can be modeled with fixed columns.

I had to model data with umpteen crazy relationships so we went with Mongodb. We did not have the high update issue or any locking issues. If one has a few large tables with fixed columns that can easily define the data, then relational DBs probably make more sense. But to your point, 10gen will not tell you that and the hype doesn't tell you that either.

lucian190014y ago

Sadly, MongoDB blows for actual usage. It locks, it's not crash-only, it has mutable data.

CouchDB is much better (you're as likely to lose data as with Postgres), but is potentially less efficient (no BSON).

bbulkow14y ago

Disclosure: I wrote a product called Citrusleaf, which also plays in the NoSQL space.

My focus in starting Citruseaf wasn't features, it was operational dependability. I had worked at companies who had to take their system offline when they had the greatest exposure - like getting massive load from the Yahoo front page (back in the day). Citrusleaf focuses on monitoring, integration with monitoring software, operations. We call ourselves a real-time database because we've focused on predictable performance (and very high performance).

We don't have as many features as mongo. You can't do a javascript/json long running batch job. We'll get to features.

The global R/W lock does limit mongo. Absolutely. Our testing shows a nearly 10x difference in performance between Mongo and Citrusleaf on writes. Frankly, if you're still doing 1,000 tps, you should probably stick with a decent MySQL implementation.

Here's a performance analysis we did: http://bit.ly/rRlq9V

This theory that "mongo is designed to run on in-memory data sets" is, frankly, terrible --- simply because mongo doesn't give you the control to keep you in memory. You don't know when you're going to spill out of memory. There's no way to "timeout" a page cache IO. There's no asynchronous interface for page IO. For all of these reasons - and our internal testing showing page IO is 5x slower than aio; the reason all professional databases use aio and raw devices - we coded Citrusleaf using normal multithreaded io strategies.

With Citrusleaf, we do it differently, and that difference is huge. We keep our indexes in memory. Our indexes are the most efficient anywhere - more objects, fea. You configure Citrusleaf with the amount of memory you want to use, and apply policies when you start flowing out of memory. Like not taking writes. Like expiring the least-recently-used data.

That's an example of our focus on operations. If your application use pattern changes, you can't have your database go down, or go so slowly as to be nearly unusable.

Again, take my comments with a grain of salt, but with Citrusleaf you'll have better uptime, fewer servers, a far less complex installation. Sure, it's not free, but talk to us and we'll find a way to make it work for your project.

t3mp3st14y ago

Disclosure: I hack on MongoDB.

I'm a little surprised to see all of the MongoDB hate in this thread.

There seems to be quite a bit of misinformation out there: lots of folks seem focused on the global R/W lock and how it must lead to lousy performance.

In practice, the global R/W isn't optimal -- but it's really not a big deal.

First, MongoDB is designed to be run on a machine with sufficient primary memory to hold the working set. In this case, writes finish extremely quickly and therefore lock contention is quite low. Optimizing for this data pattern is a fundamental design decision.

Second, long running operations (i.e., just before a pageout) cause the MongoDB kernel to yield. This prevents slow operations from screwing the pooch, so to speak. Not perfect, but smooths over many problematic cases.

Third, the MongoDB developer community is EXTREMELY passionate about the project. Fine-grained locking and concurrency are areas of active development. The allegation that features or patches are withheld from the broader community is total bunk; the team at 10gen is dedicated, community-focused, and honest. Take a look at the Google Group, JIRA, or disqus if you don't believe me: "free" tickets and questions get resolved very, very quickly.

Other criticisms of MongoDB concerning in-place updates and durability are worth looking at a bit more closely. MongoDB is designed to scale very well for applications where a single master (and/or sharding) makes sense. Thus, the "idiomatic" way of achieving durability in MongoDB is through replication -- journaling comes at a cost that can, in a properly replicated environment, be safely factored out. This is merely a design decision.

Next, in-place updates allow for extremely fast writes provided a correctly designed schema and an aversion to document-growing updates (i.e., $push). If you meet these requirements-- or select an appropriate padding factor-- you'll enjoy high performance without having to garbage collect old versions of data or store more data than you need. Again, this is a design decision.

Finally, it is worth stressing the convenience and flexibility of a schemaless document-oriented datastore. Migrations are greatly simplified and generic models (i.e., product or profile) no longer require a zillion joins. In many regards, working with a schemaless store is a lot like working with an interpreted language: you don't have to mess with "compilation" and you enjoy a bit more flexibility (though you'll need to be more careful at runtime). It's worth noting that MongoDB provides support for dynamic querying of this schemaless data -- you're free to ask whatever you like, indices be damned. Many other schemaless stores do not provide this functionality.

Regardless of the above, if you're looking to scale writes and can tolerate data conflicts (due to outages or network partitions), you might be better served by Cassandra, CouchDB, or another master-master/NoSQL/fill-in-the-blank datastore. It's really up to the developer to select the right tool for the job and to use that tool the way it's designed to be used.

I've written a bit more than I intended to but I hope that what I've said has added to the discussion. MongoDB is a neat piece of software that's really useful for a particular set of applications. Does it always work perfectly? No. Is it the best for everything? Not at all. Do the developers care? You better believe they do.

nomoremongo14y ago

I'd appreciate if someone would submit this story for me.

http://pastebin.com/raw.php?i=FD3xe6Jt

1 more reply

patrickod14y ago

Wow there's a lot of Mongo hate in this thread all from one article. Yesterday MongoDB was the darling of HN and today it has to be defended from ridiculous claims. Why the mob attitude? Have you all had these issues?

j / k navigate · click thread line to collapse

122 comments

69 comments · 12 top-level

nomongo14y ago· 40 in thread

Why is a database that fails so easily and most of the time even loses data so popular? Is it really all just a huge marketing budget?

hello_moto14y ago

There are tons of reasons for that. Let me pull some of them from my butt:

Reason #1: Devs aren't Ops.

Reason #2: Devs need something new on their resume.

Reason #3: Certain type of Devs would read blogs and get excited and skipping scientific mumbo-jumbo and directly take the blogs as _the_ source of truth.

Reason #4: It's easy to bootstrap (schemaless, etc) your weekend project. Dealing with DB apparently is tedious for devs.

I'm sure others can add more...

Let me feel your love HN-ers ;)

viraptor14y ago

What I hear you saying is unfortunately - it's worse for ops, so noone should use it.

zzzeek14y ago

> On the other hand if you abstract your interaction with data enough, you can change the whole backend later once it's stable and not care about it up-front.

1 more reply

quanticle14y ago

On the other hand, if you abstract your interaction with data enough, you can change the whole backend later once it's stable and not care about it up-front.

einhverfr14y ago

I can sort of sympathize with this a little. I used to use MySQL for schema prototyping and then move stable stuff to PostgreSQL back when PostgreSQL lacked an alter table drop column capability.

Don't get me wrong, NoSQL is great for some things. However it is NEVER a replacement for a good RDBMS where this is needed.

sausagefeet14y ago

> If that's really true, try to start developing some project and see how you like frequent schema changes, trying to synchronise schemas with peers, resolving relation issues when merging features,

hello_moto14y ago

I've been dev since I graduated. I had been everything else during my intern (dev, qa, test-automation developer, tools dev, build engineer, integration engineer, etc).

Are you saying that Rails schema migration can only solve 10% of your migration needs? That kinda suck bro.

2 more replies

Devilboy14y ago

Right there with you. The excuses I hear for not wanting to use a good 'ole RDBMS just does not make sense to me sometimes. CREATE TABLE too hard? Time consuming? Difficult?

mnutt14y ago

* but then Linux distros get network autoconfiguration and suddenly it's obvious that it was the right solution all along.

2 more replies

hello_moto14y ago

Well, mostly the migration part that Devs don't want to deal with.

1 more reply

einhverfr14y ago

First of all, RDBMS's are not perfect for all jobs. They have serious shortcomings in some areas (semi-structured and unstructured data for example)

A lot of other stuff can be stored in an RDBMS but isn't really optimal for it. Ideally hierarchical directory servers LDAP don't run directly off a relational db.

So there are places for other forms of stores, from BDB to XML, but these cannot and should not replace RDBMS's for most critical tasks.

(Also there is room for real improvement in certain areas of relational constraints in RDBMS's today, but NoSQL moves the wrong direction there.)

japherwocky14y ago

It's really easy to work with. This is why people keep using it.

hello_moto14y ago

You can write with any language and any storage systems you'd like there.

1 more reply

ehthere14y ago

What about all the time you'll waste debugging your app because you can't make good assumptions about the structure of your data?

1 more reply

rbranson14y ago

stoneg14y ago

How come a DB lose data so frequently and it sill call itself web-scale? It just breaks when you need scale!

hello_moto14y ago

Not a bad business strategy. Kinda like MySQL back then right?

cperciva14y ago

Sadly, it seems that "give them free crap and then charge for fixes" is a very common business model in the open source world.

2 more replies

rdtsc14y ago

BarkMore14y ago

The the data model contributes to it's popularity. A document store with indexes on document fields is very convenient for several types of applications.

jethroalias9714y ago

rkalla14y ago

Also there is no single, central steward and authority on couch. All of this stymies traction and confidence even though the tech is great.

2 more replies

wisty14y ago

Couch doesn't have great documentation, and doesn't have official native client drivers. Oh, and it's slow (though you can tune it, and it doesn't crumble under load).

ehthere14y ago

You can do exactly the same document store with indexes on any RDBMS.

BarkMore14y ago

Yup. See http://bret.appspot.com/entry/how-friendfeed-uses-mysql for an example. MongoDB is more convenient to program because it's all built-in.

viraptor14y ago

cscotta14y ago

>> "It doesn't really fail - none of the issues described were "failures" really."

These absolutely were failures.

rbranson14y ago

rhizome14y ago

The data loss was not confirmed either: "There appears to be some data loss occurring"

Oh, this mystery is a failure all right, and even the most charitable interpretation would call it a misfeature.

japherwocky14y ago

what do you think of redis? I feel the same way about Mongo for the most part, but have been considering switching.

rkalla14y ago

If you can do both of those things, it is awesome.

InclinedPlane14y ago

So what's the preferred alternative noSQL wise?

MongoDB is flaky. CouchDB is a maintainability nightmare, so I hear.

Riak? Cassandra? Or does everything else have some other equally huge down-side?

rkalla14y ago

They all have their warts. For every story like this, there are petabyte deployments of your favorite datastore that work fine.

For every X sucks article, ther is Y is awesome.

There is no clear winner (fortunately or unfortunately dependng on what you were hoping for)

InclinedPlane14y ago

Have there been any new entrants in the last few years? Seems like innovation has stalled a bit and stabilization / improvement hasn't caught up yet.

1 more reply

espeed14y ago

Many are moving to Neo4j, including some of the major social networks.

rkalla14y ago

Ahh good catch, I forgot about Neo4j. I'd like to see some non-graph deployments on it and see how it performs. From what I've seen with social/networked data models it looks very compelling.

Excited to see that DB get more and more traction.

fdr14y ago

Yet we've also seen in the past that shedding such a reputation is not strictly required to be popular. And marketing budgets do matter.

vannevar14y ago

Why is a database that fails so easily and most of the time even loses data so popular?

FooBarWidget14y ago

dextorious14y ago

Probably because the quality of CS graduates has been so low at recent years. MongoDB = oh, shiny, fast.

plasma14y ago· 4 in thread

Ravendb (www.ravendb.net) is a solid competitor.

icey14y ago

"Raven is an Open Source (with a commercial option) document database for the .NET/Windows platform."

I'm not sure it's a competitor at all. RavenDB is a CouchDB clone for .Net that requires a commercial license for proprietary software.

latch14y ago

Which has a ton of magic baked into the driver making it unlikely you'll get your data back out via anything but .NET.

rkalla14y ago

Well put.

plasma14y ago

Why the downvotes? Why would I bother mentioning alternatives next time, sheesh.

gojomo14y ago· 3 in thread

i3415914y ago

rasur14y ago

IIRC, there are people talking directly to InnoDB (MySQL backend) using it as a NoSQL style DB. You don't however get SQL analysis, you're bypassing the SQL side of things.

einhverfr14y ago

hstore?

christkv14y ago· 3 in thread

schmichael14y ago

Sorry, this is a pretty poorly written blog post. We're definitely using sharding+replica sets.

Replication of any kind won't help you with a high write load as secondaries have to apply the same number of writes as primaries.

christkv14y ago

seems from the comments in the post that 10gen went out of it's way to be helpful in resolving the issues ???

1 more reply

StavrosK14y ago· 3 in thread

All I need is a schemaless version of postgres (with ACID-compliance and everything), does anyone know of one?

ericflo14y ago

http://www.postgresql.org/docs/9.0/static/hstore.html

StavrosK14y ago

That's very useful, thank you!

rkalla14y ago

Keep in mind this isnt meant to be redis-on-postgresql http://archives.postgresql.org/pgsql-performance/2011-05/msg...

vegai14y ago· 2 in thread

All the commercial DBs have similar issues. Just deal with them and go on.

dextorious14y ago

No, they do not. Some joke DBs had some issues back in the day (MySQL comes to mind) but issues of such importance were solved looong ago.

vegai14y ago

No, all of them had, and most still do. People don't seem to realize how freaking old and complicated those things are.

amalag14y ago· 2 in thread

If your data is easily modeled relationally, go for relation, if you are going to change it constantly and is not a natural fit for a relational model, Mongodb is worth a shot.

From this article, sounds like their data is pretty seriously relational.

Mongodb has been pushing the ops side of their product, but I can agree it has failings there. To me the advantage is the querying and the json style documents.

gmcquillan14y ago

I'm not sure you read the article fully, because relationships were never described in the article. Instead, it was high read/update load which caused problems.

Mongo, on paper, should be an ideal candidate for this job; but, due to complications with the locking model and with its inability to do online compactions, it's failing.

amalag14y ago

Relation was a bad word choice, I meant easily modeled by a relational database system. Seems like your data can be modeled with fixed columns.

lucian190014y ago

Sadly, MongoDB blows for actual usage. It locks, it's not crash-only, it has mutable data.

CouchDB is much better (you're as likely to lose data as with Postgres), but is potentially less efficient (no BSON).

bbulkow14y ago

Disclosure: I wrote a product called Citrusleaf, which also plays in the NoSQL space.

We don't have as many features as mongo. You can't do a javascript/json long running batch job. We'll get to features.

Here's a performance analysis we did: http://bit.ly/rRlq9V

That's an example of our focus on operations. If your application use pattern changes, you can't have your database go down, or go so slowly as to be nearly unusable.

t3mp3st14y ago

Disclosure: I hack on MongoDB.

I'm a little surprised to see all of the MongoDB hate in this thread.

There seems to be quite a bit of misinformation out there: lots of folks seem focused on the global R/W lock and how it must lead to lousy performance.

In practice, the global R/W isn't optimal -- but it's really not a big deal.

nomoremongo14y ago

I'd appreciate if someone would submit this story for me.

http://pastebin.com/raw.php?i=FD3xe6Jt

1 more reply

patrickod14y ago

j / k navigate · click thread line to collapse