* didn't read the manual
* poor schema
* didn't maintain the database (compactions, etc.)
In this case, they hit several:
" Its volume on disk is growing 3-4 times faster than the real volume of data it store;"
They should be doing compactions and are not. Using PostgreSQL does not avoid administration; it simply changes the administration to be done.
"it eats up all the memory without the possibility to limit this"
That's the idea -- that memory isn't actually used though; it's just memory mapping the file. It will swap out for something else that needs the space unless you are actively using all the data, in which case you really are using all your memory. Which is why you should put it on its own server...
"it begins to slow down the application because of frequent disk access"
"Finally we sleep quietly, and don’t fear that mongodb will drive out redis to swap once again."
You should be running Mongo on a server by itself. At the very least, if you're having disk contention issues, don't run it on the same server as your other database.
I'm not sure you always need to read the manual for everything, but for your production database, it's probably worth it.
If a large proportion of MongoDB users are using it incorrectly, then I'd argue that it is a MongoDB problem, if only a documentation and messaging one. Clarity on what is and is not an appropriate use should be prominent.
So, what is this proportion?
Hey, that sounds a lot like the logic of Java haters!
Kidding aside, I'm afraid I'm not sure your logic is convincing, but that's for another debate.
The real world dictates that this happens more often than not. You know why I like Postgres? When I don't read the manual, create a crappy schema, and forgot to maintain the database it STILL seems to work okay.
Your comment has made me quite curious to know what people using mature databases of the time were saying about Postgres 19 years ago, when it was roughly the same age Mongo is today.
https://jira.mongodb.org/browse/SERVER-11763
It looks like compaction is an offline process. That really puts the user between a rock and a hard place.
Of course, if you aren't replicating your business's production database, you have a whole world of problems.
If everyone uses Mongo incorrectly, the problem is not Mongo. It is like the person crying out how everyone in the world is crazy.
As far as I can tell, a lot of people assumed it worked like a SQL database. It doesn't, which disappointed them. I'll even admit that some of the original defaults like the write concerns didn't really make sense as defaults. But that was all in the introductory documentation. Major subsystems like databases deserve at least a skim of the documentation if not a full read; if not up front then at least before putting them into production.
You're right. If people read the awesome mongodb docs before using it, they'd figure out that mongodb's ideal, good for performance schema has limitations that doesn't fit with a lot of projects. Of course this may have changed since mongodb evolves pretty quickly.
MongoDB and Redis on the same box? Two data stores that need working set / all of the data to reside in RAM for performance? That is a recipe bound for failure.
Everyone seems to learn about physical separation the hard way.
This also drives the amount of administrative overhead needed.
Mongo is very bad at managing used memory. In fact it doesn't actually manage memory since it just mmaps its database file.
It also touches disk much more often than would be reasonable, especially for how much memory it uses.
It's a terrible database and it is perfectly legitimate to be annoyed at it being this terrible.
1) MongoDB (and various other NoSQL solution) are schemaless and thus have to store document fields along with the values for each document. This alone usually results in roughly twice as much actual disk space being used compared to an RDBMS.
2) MongoDB preallocates fairly large chunks of disk for their mmap based storage (2Gb per database files by default). This means there will be up to 2Gb * N where N is the number of logical databases in "wasted" (more accurately, unused) space. This can be addressed somewhat through the --smallfiles option.
3) The biggest issue that I actually consider an design flaw is the ineffective reuse of disk space previously occupied by deleted or, more commonly, moved documents. MongoDB reserves a bit of padding for each document but since a lot of documents can grow over time these documents will be moved around on disk leaving holes in the data files. These holes are not adequately re-used and a compaction is required to make that space available again. Compaction is NOT a background process at the moment and blocks writes. The "usePowerOf2Sizes" option will help with this issue at the expense of always using a power of 2 size in bytes per document.
The above are factual reasons why MongoDB uses a lot of disk space. It's certainly a relatively young database and some issues do need to be addressed but this whole polarizing "it's terrible booo!" nonsense has to stop. Inform yourself, choose the tech appropriate for your project and post mortem aftwards.
Small note on the mmap thing; a lot of people consider the mmap based storage engine a big issue (I tend to agree). Tokutek offers what seems to be a better storage engine but does lag behind a bit on releases. I'm not affiliated with them but if you're interested you can check out http://www.tokutek.com/products/tokumx-for-mongodb/
I'd love to see MongoDB give up and become a PostgreSQL consultancy.
Everybody I talk to in the field has the exact same Mongo story: "We love JSON! We use JSON everywhere! We just wanted a DB with native JSON support. We didn't look at the implementation details. We only looked at their marketing. Now we wake up at 3am to fix it every night and lose data every day. Somebody help us. We love JSON."
Client A: Read JSON.
Client B: Read JSON.
Client A: Append new comment to json document.
Client B: Append new comment to json document.
Client A: Save JSON
Client B: Save JSON
A's comment will get deleted. My understanding is that Mongo DB does have a way to append a record within a document, but Postgres does not.I am in no way advocating for MongoDB (I dislike it). I am just saying that I understand that MongoDB has much more sophisticated updates capability than Postgres.
Yes. Actually that's why I said that I'm nearly sure.
As a side note: We may also need some rumors on being "web scale" (Actually I don't even know much about the events/comments/whatever which lead to that famous video but I still find it funny)
The hardest part is re-training all the devs to stop thinking like Mongo devs (ie. "I must make five queries and join the info in code") and let the DB do the heavy lifting it was designed to do.
Were they ever actually separate layers? I thought that PostgreSQL was a rename of Postgres that happened shortly (one-two versions) after they swapped query languages from the Ingres-derived QUEL to SQL.
With that being said, we are using it to store our JSON geo track data, most everything else is in a mysql database. As a result we haven't run into limitations around the storage/query model that some other people might be experiencing.
Additionally, we have some serious DB servers so haven't felt the pain of performance when exceeding working memory. 192gb of ram with 8 RAID10 512gb SSDs probably masks performance issues that other people are feeling.
Final note: I'll probably be walking away from mongo, due to the natural evolution of our stack. We'll store high fidelity track data as gzipped flat files of JSON, and a reduced track inside of postgis.
tl;dr - using mongo as a very simple key/value store for data that isn't updated frequently, which could easily be replaced by flat file storage, is painless. YMMV with other use cases.
In regards to actual iops, not sure what this thing can peak at off the top of my head, but we'll easily be doing 100 queries a second this year, with a considerable portion of those queries pulling out ~1mb documents.
Playing it conservative, so I am moving towards gzipping those large documents (never need to access anything but the full data, > 90% of accesses are directly served to clients that can handle inflating the data). For now they will stay in mongo, but I am building out an evaluation of using a flat file structure and just letting nginx pass them out.
Additionally our mysql db sees many more queries than mongo, so the overbuilt hardware is a bit less overbuilt when taking that into consideration :)
postgis isn't a good fit for the data we store in full fidelity, since it's not just geo data but also sensor data (heartrate, cadence, power in watts, temperature etc). However I'll be storing a point reduced version of the full track in postgis, so i can move to using actual intersection queries for matching tracks, instead of the current brute force approach (check everypoint in every track sharing a bounding box) that works now. All bets are out the window though with 2-4x the traffic and data we currently have, using that brute force approach.
I already run another beefy postgis setup (192gb ram, though spinning disks not SSDs) for serving OSM maps, and eventually OSM routing hence the ram.
How often did you update your data then? In my current project I am seeing locking issues in my way soon...
Our actual track data isn't updated frequently. Mostly it serves as an archive for a user, and is only seen by 1-2 other people. Most people use our service to store all their activities, which for the most part are really boring. They are interested in aggregate metrics like "I've ridden 200 miles this month".
A smaller portion of our data is from planning a route using google maps, which has much more modest storage requirements, since it's optimized data (one point every mile if it's a straight line) instead of 1hz logging from a GPS unit. This stuff is edited, but I'd say only 10% of planned routes are ever modified, so actual updates on the track data are small.
This is our use case as well and MongoDB has been fine. We had some initial pain as we learned the product but it's great for this use case. Currently sitting around 1TB of data.
I would hope so.
on the flip side, it implements database level locking, uses more disk/RAM than it probably should, and can start to give you headaches if you try to do a lot of writes at once.
edit: to give you a real world example, we use mariadb for storing everything persistently. however, a lot of data like "number of teachers in school A" is aggregated and too difficult to run in real time when we render paged results. to get around that, we use mongo as a document store and use its SQL like querying to generate the paged search results. this lets us sort/filter on the data without having to do everything in SQL.
This use case should be possible to solve with the JSON type in PostgreSQL. The indexing in PostgreSQL is just as advanced in 9.3 and will be better than MongoDB in 9.4 if a couple of patches land.
Does this mean you're using MongoDB as a kind of query cache? Was there a compelling reason to prefer it to other common caches? Or even building an ETL/DW into your existing database infrastructure?
Having used it on what is supposed to be its perfect use case, I think it's a terrible product. Use anything else you like the look of.
http://www.sarahmei.com/blog/2013/11/11/why-you-should-never...
As I recall, automatic sharding was on that list, and pg doesn't attempt to tackle that afaik.
You can do that with Postgres trivially, and even automatically with postgres_fdw and writeable views.
...may be possible, but almost always requires domain-specific concurrency-level understanding in your datastore, and is almost always harder to work with than strong consistency.
Saying that transactions are 'rarely' needed boggles my mind. Working inside transactions (where feasible, which is in the large majority of situations) vastly simplifies data storage.
It is still cumbersome to use, hard to shard, even harder to cluster and is incredibly complex to manage compared to databases like Cassandra.
I would not consider this good etiquette. If you fork your project (especially without discussing the intention first), adding a bug to the original project isn't a very nice thing to do.
An official pull request would be nicer or, even better, don't bother the original project, but just announce your fork over other channels.
Even better would be to at least discuss the issue with the original project - maybe they agree and you can work together.
This is a rather bizarre interpretation of nice behavior: Make a very cool modification to a project, but don't even bother to tell the original maintainers/authors?
Github Issues is a perfectly reasonable place for this. Maybe the mailing list would be better, but, shrug. Issues != Bugs, by the way. There's a reason it's called Issues. And it's basically the only way to have a discussion on github about anything whether it's an issue or not.
Also, some maintainers get mad if you send a pull request without doing an issue first, so there's no right way.
> We suggest to put errbit on PG. For those who want to try - the code here: https://github.com/Undev/errbit/tree/pg-upstream
The problem here is that the bad English grammar could have given the wrong impression. Maybe he is just saying:
"hey guys, you should consider migrating to postgresql. here's some code you can check out that has worked for us."
Rather than:
"hey guys, screw Errbit/MongoDB, use our fork!"
@realmyst Will you put up a Pull Request?
It sounds like MongoDB has no future indeed:
http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
realmyst commented 19 minutes ago
@21croissants yes, I will.As soon as your project starts to solidify, the main benefit of MongoDB is gone.
It still lives in some of my personal projects (e.g. <100mb of data, because even flat files can't mess that up).
I've seen people using Redis for their MVPs, which is hardly necessary to serve 100 or 1000 or 10000 users. When you have a hammer at hand, everything looks like a nail.
"Mongodb" already nearly exists as a single column type, 9.4 will complete it.
And if you think MongoDB is only popular because it is a JSON store then it shows just little you know about the database landscape and about how developers actually use databases.
It's a drop in replacement so it will work with current drivers. (if you have a running mongo cluster however expect quite some work if you want to migrate)
(I have no affiliation with TokuTek whatsoever except that I use their product)
Are they saying that it has a high constant overhead to the data, or are they saying the storage grows in a super-linear fashion?
Even better: The application I'm using Errbit the most for is already running in front of a nicely replicated and immensely powerful postgres install.
Being able to put the Errbit data there is amazing.
This is some of the best news I've read today :-)
Well duh, Mongo was designed to live on its own server as it tries to claim all of the free memory available. Putting it on the same server with Redis makes no sense.
The case that caused you sleepless nights does not apply to 99% of projects out there.