If you tell a database to store something, and it doesn’t complain, you should safely assume that it was stored.
This has nothing to do with the 2Gb limitation. Nowhere in the documentation does it mention that it will silently discard your data. What will happen with the 64-bit version if you run out of disk space, more silently discarded data?
I know a lot of you may have cut your teeth on MySQL which, in its default configuration, will happily truncate your strings if they are bigger than a column. Guess what? Anyone serious about databases does not consider MySQL to be a proper database with those defaults. And with this, neither is MongoDB, though it may have its uses if you don't need to be absolutely certain that your data is stored.
EDIT: Thanks for pointing out getLastError. My point still stands, since guaranteed persistence is optional rather than the default. In fact, reading more of the docs points out that some drivers can call getLastError by default to ensure persistence. That means that MongoDB + Driver X can be considered a database, but not MongoDB on its own.
I'm just struggling to imagine being willing to lose some amount of data purely for the sake of performance, so philosophically it's not a database unless you force it to be. Much like MySQL.
EDIT2: Not trying to be snarky here, but I would love to hear about datasets people have where missing random data would not be an issue. I'm serious, just want to know what the use case is that MongoDB's default behaviour was designed for.
EDIT3: (Seriously) I'm sure MongoDB works splendidly when you setup your driver to ensure that a certain numbers of servers will confirm receipt of the data (if your driver supports such an option), nowhere am I disputing that. But that number really should have a lower bound of 1, enforced by MongoDB itself. And to the guy who called me stupid: you are what's wrong with HN.
Demonstrably false. http://www.mongodb.org/display/DOCS/getLastError+Command
"MongoDB does not wait for a response by default when writing to the database. Use the getLastError command to ensure that operations have succeeded."
And I say this as an old-skool C guy who does do this in critical sections of code... But for everything else I'm in a language like OCaml that behaves sanely, using a DB like Oracle that behaves sanely.
http://api.mongodb.org/ruby/1.7.0/
'Success' and 'Failure' are fuzzy concepts when writing to distributed databases, and you need to tell Mongo which particular definition fits your needs. The 'unsafe' default in mongo is controversial, but ranting about what a "proper database" is without even reading the docs is stupid. Instead, let's rant about what a "proper developer" should do when using a new system...
A foursquare check-in database could be an example where performance is actually way more valuable than consistency. (I have no idea what database they use)
Nice ad homien there. MongoDB isn't DB2, just as MySQL wasn't. Both can still be used to build very good products; in fact, I'd go so far as to say they lead to better products than "proper" databases.
I'm really glad I haven't deployed mongo now in a production 32-bit system.
Response to EDIT2: Where can data loss be acceptable? If you are having a relatively speedy message system where messages are removed/outdated on rx. I'm sure there are other specialty needs.
So by default Mongo write operations are asynchronous and you have to explicitly ask for error codes later.
It's legit to criticize a language or a database. However, it seems to me that when MongoDB gets involved, the tone is far more aggressive and defensive. What's up with that? It's just software, bits and config files. It's not like someone called your mom a harlot.
Here's what I think. New developers, for a long time, have come into the industry and become overwhelmed with everything they need to learn. Let's take typical database servers. Writing a SELECT is easy enough, but to truly be an expert you have to learn about data writing operations, indexing, execution plans, triggers, replication, sharding, GRANTs, etc. As it's a mature technology, you start out barely an apprentice, with all these experienced professionals around you.
In recent years, software development has really been turned on its head. We're not building apps using the same stack we've used for a long time: OO + RDBMS + view layer + physical hardware. The younger the technology, the better, it seems. In theory, a 3 year developer and a 20 year developer are now pretty equal when we're talking about a technology that's been around 2-3 years. That wouldn't be true if we were dealing with say, OO design patterns. (Even when new languages come along, you still get to keep your experience in the core competencies.)
Attacks on these new technologies are perceived as an assault on this new world order, and those who have walked into being one of the "newb elite" respond emotionally to what they see as a battle for the return to the old guard. Am I totally off base here?
Mongodb was very aggressively marketed; its advocates produced benchmarks comparing it directly to traditional relational databases as though the use cases were the same. I think that set the tone for future discussion in a way that's still being felt.
If you're as old as your opinions suggest you'll remember the early days of Java were very similar - Sun marketing pushed it no end, and so tempers ran high and discussions were emotionally charged in a way that never happened when talking about perl or python or TCL.
More relevant, is my experience. I didn't come in when Java came out. I started (1997-1998) with some high-level dynamic web languages: ASP classic, ColdFusion (To this day, I still do CF - I'm a CF user group manager and I speak at CF conferences). Building HTML and JavaScript since 1996 (GeoCities, HotDog, and HomeSite). Nerded around with programming 1995-1997 in high school (TI Basic, Pascal, and Qbasic) In the days when I started web development, a lot of folks were still monkeying around with Perl and flatfiles. I can't really speak to early days of Java: until 2000, didn't really use it. ColdFusion 6 went from C++ to Java, at which point CF devs ran on the JVM and could target it.
From the beginning I was a consumer of RDBMSes. Started with Access and moved on to SQL Server. There wasn't a need to know the full DB, only the pieces you needed for CRUD. Perhaps for newbs that has changed, and they have to learn the full SQL administrative experience. Personally I doubt that. Do some db migrations in Rails: you don't even need to know what SQL engine you're running on. (A good thing, IMO, but still means a lesser body of knowledge)
Good point that a lot of products try so hard to be the "new sexy" that they suggest an inaccurate comparison, or at best, implement a subset of what they're trying to replace.
This is a case where, although the ultimate complaint of the author is the behavior of the product (which is documented, but un-intuitive in nature unless you've read up on the issue), it's the way in which he chose to frame the problem that is getting people upset.
This is a known issue, even if it seems like a completely poor design decision. The issue I think most people here are taking is that because the author did almost no research on the topic, he got himself into a problem, and is trying to blame it on Mongo.
Telling somebody they are wrong is one thing, calling them moronic or stupid is quite another.
I think this is an evolution of the language wars wherein immature[1] developers align themselves with a technology and mix up criticisms of the technology with criticisms of themselves. This seems to be part of the need humans have to be part of a community.
1. Immature in this context has nothing to do with age. Rather, it is an attitude that shows when any developer has not experienced and internalized enough technology to realize every single technology has fundamental problems, sucks in some way, yet is still usually pretty amazing nonetheless, especially within the context of its creation.
Hopefully the 20 year dev can recognize the new thing as new and possibly immature, can identify some areas of weakness when compared to tools with a successful history.
> Attacks on these new technologies are perceived as an assault on this new world order, and those who have walked into being one of the "newb elite" respond emotionally to what they see as a battle for the return to the old guard. Am I totally off base here?
Totally agree.
1) Care.
2) Feel they had wool pulled over their eyes unexpectedly.
Let's talk about the wool. MongoDB was marketed initially with stupid little benchmarks (that were later removed as a policy). Those benchmarks were what people saw, showed their bosses, colleagues and decided -- "this is the one". Yes they picked a bad tool should have RTFM, I would normally say but not for MongoDB.
They marketed themselves as a "database" while at the same time shipping with durability turned off. Yes, you can write very fast if you don't acknowledge that data has hit the disk buffers. I wasn't fooled, I saw the throughput rates and thought, something is fishy. But a lot didn't.
Most of all I have no problem with this design decision given that there is a bright red flashing warning on the front page saying what the default settings are and what it could do to your data. There wasn't.
As developers (programmers whatever you want to call it), we feel that perhaps when other developers market things aimed at us, they would be somewhat more honest than say someone selling rejuvenating magnetic bracelets at 4am in the morning on TV. I think that is where the passionate discussion comes from.
Aside from that, though, the 32 bit limitation is clear in the documentation and present on the download page. It's fine not to read the documentation before you use something but you can't then complain that it did something you did not expect. Mongodb is a little different from other databases. So is Redis. You can't blow everything off that is conceptually different.
There are plenty of valid arguments for not using MongoDB, but this is the weakest I have seen so far.
If you're talking about Ubuntu, I can attest that the default PM there is several versions out of date for a lot of things, and thus to get the version you'd expect, you're forced to install by hand.
Also, even using the PM version, didn't you get a warning when you started the server? I thought Mongo threw up a warning at start time about this exact issue (the 2GB limitation, not the silent failures)
I'm sure this only bit the author because he was using MongoDB for a toy project, and in a real system he'd have done due diligence first.
I'm not a fan of MongoDB myself, but if I were to use it I know that I must read about every option available because by default MongoDB's team chose settings that are suited for speed and not reliability, durability, or (if i'm being less charitable) even sanity.
I've noticed a trend across about 20+ candidates, all of whom are smart people: people are using Mongo without actually understanding what the hell it's trying to solve by getting away from the RDBMS paradigm.
I'm not sure if this is because 10gen markets it as a general purpose tool, but I have yet to talk with a candidate who can actually describe why they were using the DB vs. a SQL database. I'm all for learning new things, but I can't help but wonder if the string of negative MongoDB posts is coming from people who pick it b/c it's new, then realise pretty far in that this is nothing like a normal DB, and "having no schema" isn't really a reason to go with a tool as foundational as a data store.
I think Mongo is great for really specific problems that its designed to solve. It's probably pretty bad for a general purpose tool, but I'd be surprised if anyone serious actually considers it one.
My observation has been that a substantial number of people pick NoSQL stores because they don't really understand RDBMSs, and can't be bothered to learn.
I don't mean this as a dig at NoSQL in general - there's perfectly valid reasons to want some NoSQL features - but the hype train does attract a lot of people who just want the new hotness.
I have talked to more than one 10gen marketing bro who insisted that MongoDB is appropriate for any and all use cases, transient to archival. It's pretty disingenuous if you ask me.
There is a discontinuity between the ease-of-use story and the blame-the-user story, regardless of how well documented the async insert behavior is.
And it doesn't have to be this way. There are ways of designing interfaces, APIs, and even naming that go a long way to prevent your users from shooting themselves in the foot.
Take postgres. It also supports at least a couple kinds of async insert, one of which is a part of libpq (postgres C client library). It's called "sendQuery" and it's documented under the "Asynchronous Command Processing" section. It's hard to imagine a user trying to use that and expecting it to return an error code or exception. Even if the user doesn't read the docs, or reads some fragment from a blog post, they will still see that the name suggests async and that it returns an int rather than a PGResult (which means it obviously doesn't fit into the normal sync pattern).
There is no reason mongo couldn't be clear about this distinction -- say, rename "insert" to "async_insert" and have "insert" be a wrapper around async_insert and getLastError. But instead, it's the user's fault because they didn't read the docs.
Careful API design is important to reduce the frequency of these kinds of errors. In postgres, it's relatively hard to shoot yourself in the foot this badly in such a simple case. I'm sure there are gotchas, but there is a conscious effort to prevent surprises of this sort.
Because if you don't read enough of the docs to understand that 'insert' is asynchronous insert, you don't understand MongoDB and haven't done your research.
Why should 'insert' default to synchronous? Why shouldn't we instead have a sync_insert function instead? The only reason is that you're assuming familiarity for people coming from SQL/synchronous-oriented DBMS, but why should they be forced into an awkward design just because it's what people are familiar with from other DBMS?
Expecting the user to be an expert in your product from the start is simply not realistic; a well-designed system facilitates use by people of varying levels of expertise.
It's because it's a reasonable assumption to make. Data loss shouldn't be a surprise, if I need speed and am willing to risk dataloss I should have the option, but should explicitly choose to use it.
1) it does not behave exactly like SQL
2) the user didn't read any more than a Quickstart Guide
3) the user fundamentally misunderstands the aim of the new technology or the application it is intended for
Ember.js suffers from the same ignorance.
What makes it worse is all the morons who upvote without even reading the detail purely because the title reinforces some misconceived bias they already have.
'NoSQL' is part of the problem. This technology has absolutely no comparison with SQL other than it persists data.
Except that apparently under certain circumstances it doesn't persist data, which was the author's point.
Personally I wouldn't be upset about a limitation like the one described as much as I would be upset about the database not logging an error when it discards the data. Logs are a primary way you figure out what's wrong when your application isn't behaving as expected. If you open the logs and see a bunch of "32-bit capabilities exceeded, please buy a real computer" messages, you learn what the problem is. If the database error logs are empty, that implies that everything is working fine, when in this case it clearly isn't.
Almost all of the complaints against MongoDB are down to assumptions and lack of understanding about the database.
You call people "morons", yet it appears that you did not read the article yourself.
Whether SQL or not, scalable or not, old or new, or whatever... Is completely immaterial here.
When a database silently stops accepting data, and apparently has done so for 3 years, you have to at least admit that there are strange design goals at play.
Now, the entire claim of the article might be incorrect. Did you verify that yourself?
Edit: Spelling
And anyone who has read more than an introduction to mongo knows that you SHOULD use getLastError to be safe. If you do that, no data will be dropped.
With a getLastError model, you can do your work, then go check for errors when you're really ready.
I'm not saying it's a great api, but it does make sense in context. No idea why the tutorial the op followed didn't talk about the differences, or why asynch is hard.
This brings me back to the recent discussion about reading other people's code: it is almost certainly smarter to extend an existing database until it's capable of meeting your needs, rather than write one from scratch.
The fact that many programmers don't see it that way is a testament to their irrational fear of diving into other people's code.
People need to stop acting like PostgreSQL is some holy grail database. It isn't.
And making a solid, featureful, and performant database is vastly harder.
I'm hardly an inexperienced programmer. I've used Cassandra, SimpleDB, Voldemort, etc. I wrote part of the Inktomi Search Engine in the 90s, and plenty of (what today would be called) NoSQL stores over the years.
A default that's so counterintuitive for a database should be featured prominently with a huge neon sign. It wasn't in the Ruby tutorial, or in any of the many documents I read. It's buried deep in the Mongo website, and the first Google match about the 32-bit limitation is a blog post from 2009.
Sometimes you just have to admit you screwed up and didn't read the documentation. Everyone does it, we're hackers, we'd much rather play with technology than read docs.
That 2009 post is the canonical post about the issue, which is why it has such page rank. Its position is a consequence of the fact that it's linked to from all over the web, not because nobody has discussed it since.
I kinda like TCP vs. UDP analogy. Sometimes you care more about speed than precision. A few dropped items in a log. Not a big deal. I'd rather have that, than to be forced to use a more expensive machine for the job.
That said, I absolutely think the default should be the TCP way.
Look, I agree that in most cases you probably want to do everything you can to make your data 100% complete. But failed writes should be really rare, and there are plenty of times I'd trade the rare missing write for cheaper/faster database servers.
However, it starts to feel like Anti MongoDB is just considered cool today, when I see someone that worked with MongoDB for a year, upgraded to 2.2, knows it inside out and still hates it, I would listen, and start to worry. but until then, I'm going to keep using it, and saving time.
People who would rather not bother, can stick with their tools, work slower, and be happy.
In all seriousness, I built a 10 machine Mongo cluster, talked with a 10gen consultant a full day, went to Mongo meetup, and ran all sorts of benchmarks before ever using it in production. I still don't feel like I have the expertise to write a snarky blog post about it.
Not really following the snark there. Are you trying to compare MongoDB to MySQL's MyISAM storage engine? Like there aren't numerous other extremely valid RDBMS solutions out there, which don't do table locks during a write? (MySQL InnoDB, Percona, Maria, Aria, Postgresql, Firebird, etc...)
http://www.mongodb.org/display/DOCS/getLastError+Command
The MongoDB "way" is that clients know the importance of their data and can choose write strategies which make the proper trade-off between insertion throughput/latency and durability/consistency.
So, assuming you are writing an ecommerce application, here's where I think these flags come in.
- Session data: fsync = true. Wait for a response, and ensure it's written to disk
- Internal web analytics: safe = false. Who cares if it's written, I've got an application to serve!
- Orders: fsync = true. I know, RDBMS, transactions, blah blah blah.
People tend to look at NoSQL and wonder why it doesn't function like MySQL, then they loudly complain how bad the software is. Nobody is writing articles about how Memcached doesn't function like MySQL.
Yes, I realize that there's a "safe=True" option to my python driver. But I'm writing to a database. As others have said here and elsewhere, the default behavior of a database and its drivers should be to complain loudly when a write fails. It is ridiculous that safe!=True by default. If I want to turn off this feature to improve performance, I will.
Yes. Without question.
Is this his own fault for not reading the documentation and understanding that he should have opted for the 64bit version outright?
Yes. Without question.
Exception throwing database drivers are a relatively new thing not an old thing. The only thing MongoDB does differently is that the writes are fire and forget in that the database hasn't returned a response of any kind when the function returns.
In native code you can forget about using exceptions in a database driver because exception handling can be exceptionally broken on some platforms. SmartOS I am looking in your direction.
No excuse for not reading the docs, though.
There seems to be a number of people commenting, telling you to read the documentation, but I'm with you, that is completely counter-intuitive behaviour and should be viewed as a bug.
This reminds me of the attitude that I had to correct in developers that worked for me:
- There is a huge difference between "it works" and "it does what the user expects in a friendly way."
Steve Jobs said that if you need to read a user manual (particularly to do the most vanilla usage of a product), the problem is the product. Not you.
He's talking about consumer products, not databases that were intended for use by technology experts. There's a big difference there.
The onus is on you to understand the limitations of software before you start using it. You complain that the 32-bit warning doesn't show up in the package manager, but you still should have read the documentation before committing to a new technology. It's that simple.
Is it a flaw that mongo doesn't work well on 32 bit systems? Maybe. Probably.
Is it a flaw that you didn't do the requisite research before committing to a database and subsequently complaining about it? Definitely.
If you were working for me as a developer and had the attitude that you shouldn't have to _thoroughly_ read the manual and notes for something like MongoDB, I'd let you go. Steve Jobs was not a programmer.
Heck, I learned about error handling in Mongo the first hour I started learning it. Same for the 2Gb limitation of 32-bit. The mongo manual is very well done and also happens to be fully indexed in Google.
You are using a quote about UX/UI to make a point about and API/Dev tool I do not think that they are or should be related
Also, you expect it to work in a certain way. That where you are doing it wrong.
Beyond that I'm not sure why anyone would run a production system on a 32 bit system anymore. Sure the failing silently part sucks but really this seems much more like a poor deployment then a actual bug in mongodb being the root cause.
Another problem with Mongo I never heard anyone else raise is that there are no namespaces. If I install Mongo, all the tables/collections live in the same namespace. What if I want to use it for multiple projects? How do other people solve this problem?
These are really not points to be discovered in chapter whatever of the docs.
- Download, Brief 3rd party tutorial, Production, Break, Complain, RTFM / Complain
- RTFM, Smile, Download | Move On, Staging, Production
Seems most of the issues from this article came from a lack of reading and investigating.
In general it feels like Couch actually takes storing data seriously. Append-only and whatnot. It's slower and a little bulkier than Mongo, but it does the important things right (1.0 bugs notwithstanding.)
I'd love a follow-up blog post on your experience with Couch.
another thing I didn't realize was that because of the memory mapped systems which i guess is fine performancewise it's hard to estimate memory usage on a machine. from what I understand there is no possibility to limit the memory usage. Which means that the only way you can limit the amount of memory used is by keeping the size of the database below your memory. quite important things to know imho.
here's an interesting post mentioned in the comments: http://www.zopyx.com/blog/goodbye-mongodb
Isn't http://www.mongodb.org/downloads an obvious place?
The problem is that Mongodb didn't complain when he was inserting data above the limit. A data store doesn't complain when it runs out of space? It should be mentioned as the biggest problem with 32bit version.
I understand having another node or two for fail over but I reckon with the spec of the largest offerings from AWS or Linode most people will never need to worry about this and can manage everything on one Postgres or MySQL db. Why complicate things before you have to.