“MongoDB is dead. Long live Postgresql” (opens in new tab)

(github.com)

342 pointslest12y ago151 comments

151 comments

95 comments · 22 top-level

Seriously, another case of using Mongo incorrectly? I want to believe all the Mongo hate, but I can't because I always find out that the actual problem was one or more of:

* didn't read the manual

* poor schema

* didn't maintain the database (compactions, etc.)

In this case, they hit several:

" Its volume on disk is growing 3-4 times faster than the real volume of data it store;"

They should be doing compactions and are not. Using PostgreSQL does not avoid administration; it simply changes the administration to be done.

"it eats up all the memory without the possibility to limit this"

That's the idea -- that memory isn't actually used though; it's just memory mapping the file. It will swap out for something else that needs the space unless you are actively using all the data, in which case you really are using all your memory. Which is why you should put it on its own server...

"it begins to slow down the application because of frequent disk access"

"Finally we sleep quietly, and don’t fear that mongodb will drive out redis to swap once again."

You should be running Mongo on a server by itself. At the very least, if you're having disk contention issues, don't run it on the same server as your other database.

I'm not sure you always need to read the manual for everything, but for your production database, it's probably worth it.

rlpb12y ago

> Seriously, another case of using Mongo incorrectly?

If a large proportion of MongoDB users are using it incorrectly, then I'd argue that it is a MongoDB problem, if only a documentation and messaging one. Clarity on what is and is not an appropriate use should be prominent.

So, what is this proportion?

derefr12y ago

Or, to be even more specific--if there's a Right Way to use a program, that Right Way should be encoded as defaults you have to override (if you know what you're doing), and automated actions you have to disable (if you know what you're doing.)

2 more replies

mattquiros12y ago

> If a large proportion of MongoDB users are using it incorrectly, then I'd argue that it is a MongoDB problem

Hey, that sounds a lot like the logic of Java haters!

Kidding aside, I'm afraid I'm not sure your logic is convincing, but that's for another debate.

1 more reply

jtchang12y ago

> * didn't read the manual > * poor schema > * didn't maintain the database (compactions, etc.)

The real world dictates that this happens more often than not. You know why I like Postgres? When I don't read the manual, create a crappy schema, and forgot to maintain the database it STILL seems to work okay.

randomdata12y ago

To be fair, Postgres has automatic vacuuming now, but it is a relatively new feature. Both projects seem to agree that it is not a high-priority item, though there is certainly something to be said about using a mature product, which Mongo is most certainly not.

Your comment has made me quite curious to know what people using mature databases of the time were saying about Postgres 19 years ago, when it was roughly the same age Mongo is today.

4 more replies

jeffdavis12y ago

> They should be doing compactions and are not.

https://jira.mongodb.org/browse/SERVER-11763

It looks like compaction is an offline process. That really puts the user between a rock and a hard place.

functional_test12y ago

In a proper production environment, you just compact each slave one at a time because you have a replica set rather than a single instance.

Of course, if you aren't replicating your business's production database, you have a whole world of problems.

3 more replies

dev36012y ago

In all fairness, the compaction is a major pain in Mongo. I get a little worked up about this because I cant think of another database that handles compaction this poorly, but feel free to correct me if Im wrong.

ehwizard12y ago

Have you tried turning on power of 2 allocation? In general, it makes compaction much less important. Though online compaction is definitely needed.

1 more reply

rdtsc12y ago

> Seriously, another case of using Mongo incorrectly?

If everyone uses Mongo incorrectly, the problem is not Mongo. It is like the person crying out how everyone in the world is crazy.

functional_test12y ago

I seem to have been able to use it correctly. In fact, I ran a cluster for years in production without any issues. I know of several other groups that have used it successfully as well.

As far as I can tell, a lot of people assumed it worked like a SQL database. It doesn't, which disappointed them. I'll even admit that some of the original defaults like the write concerns didn't really make sense as defaults. But that was all in the introductory documentation. Major subsystems like databases deserve at least a skim of the documentation if not a full read; if not up front then at least before putting them into production.

chaostheory12y ago

> Seriously, another case of using Mongo incorrectly? I want to believe all the Mongo hate, but I can't because I always find out that the actual problem was one or more of: > * poor schema

You're right. If people read the awesome mongodb docs before using it, they'd figure out that mongodb's ideal, good for performance schema has limitations that doesn't fit with a lot of projects. Of course this may have changed since mongodb evolves pretty quickly.

bsg7512y ago

> "Finally we sleep quietly, and don’t fear that mongodb will drive out redis to swap once again."

MongoDB and Redis on the same box? Two data stores that need working set / all of the data to reside in RAM for performance? That is a recipe bound for failure.

Everyone seems to learn about physical separation the hard way.

blablabla12312y ago

What about locking? I heard that Mongo has a locking with only DB-granularity.

1 more reply

jb00712y ago

MongoDB therefore, is not a general purpose database. I recommend http://www.amisalabs.com

jlouis12y ago

For what it is worth, I would think people actually try different things in the existing setup before they decide on doing a switch like this. It is not easy to pull off at all. My guess would be that if you have way more Postgres knowledge in the house, then it is more sensible to run Postgres.

This also drives the amount of administrative overhead needed.

nailer12y ago

The current stable node drivers silently throws away exceptions. Seriously, mongodb inc acknowledge it. Is this also a case of not using mongo correctly?

lucian190012y ago

Mongo's disk format is extremely wasteful, the database files are gigantic. That is a real problem and there is no way to compact this to anywhere near the size something like Postgres would have for the same data.

Mongo is very bad at managing used memory. In fact it doesn't actually manage memory since it just mmaps its database file.

It also touches disk much more often than would be reasonable, especially for how much memory it uses.

It's a terrible database and it is perfectly legitimate to be annoyed at it being this terrible.

remon12y ago

Although it is true that MongoDB uses a lot of disk compared to your average RDMS there are reasons for that.

1) MongoDB (and various other NoSQL solution) are schemaless and thus have to store document fields along with the values for each document. This alone usually results in roughly twice as much actual disk space being used compared to an RDBMS.

2) MongoDB preallocates fairly large chunks of disk for their mmap based storage (2Gb per database files by default). This means there will be up to 2Gb * N where N is the number of logical databases in "wasted" (more accurately, unused) space. This can be addressed somewhat through the --smallfiles option.

3) The biggest issue that I actually consider an design flaw is the ineffective reuse of disk space previously occupied by deleted or, more commonly, moved documents. MongoDB reserves a bit of padding for each document but since a lot of documents can grow over time these documents will be moved around on disk leaving holes in the data files. These holes are not adequately re-used and a compaction is required to make that space available again. Compaction is NOT a background process at the moment and blocks writes. The "usePowerOf2Sizes" option will help with this issue at the expense of always using a power of 2 size in bytes per document.

The above are factual reasons why MongoDB uses a lot of disk space. It's certainly a relatively young database and some issues do need to be addressed but this whole polarizing "it's terrible booo!" nonsense has to stop. Inform yourself, choose the tech appropriate for your project and post mortem aftwards.

Small note on the mmap thing; a lot of people consider the mmap based storage engine a big issue (I tend to agree). Tokutek offers what seems to be a better storage engine but does lag behind a bit on releases. I'm not affiliated with them but if you're interested you can check out http://www.tokutek.com/products/tokumx-for-mongodb/

4 more replies

jeltz12y ago

And let's not forget the fact that it has a per database lock, which is a really strange choice for a document database.

1 more reply

egeozcan12y ago· 14 in thread

I'm nearly sure one day someone will write a MongoDB compatibility layer on top of PostgreSQL

seiji12y ago

PostgreSQL has native json support now. What else is missing? Just a protocol implementation?

I'd love to see MongoDB give up and become a PostgreSQL consultancy.

Everybody I talk to in the field has the exact same Mongo story: "We love JSON! We use JSON everywhere! We just wanted a DB with native JSON support. We didn't look at the implementation details. We only looked at their marketing. Now we wake up at 3am to fix it every night and lose data every day. Somebody help us. We love JSON."

jcampbell112y ago

It is probably not as simple as "supports json now". Imagine if HN comments were stored as a JSON document:

    Client A: Read JSON.
    Client B: Read JSON.
    Client A: Append new comment to json document.
    Client B: Append new comment to json document.
    Client A: Save JSON
    Client B: Save JSON

A's comment will get deleted. My understanding is that Mongo DB does have a way to append a record within a document, but Postgres does not.

I am in no way advocating for MongoDB (I dislike it). I am just saying that I understand that MongoDB has much more sophisticated updates capability than Postgres.

6 more replies

busterarm12y ago

I really would love it if somebody could go back in HN-Time and track the data on Mongo posts & comments. I've always been slightly skeptical about it, but it always seemed to me that there was a long love affair with it overall. Then people started voicing their frustrations and the community was divided and now it looks acrimonious for everyone.

2 more replies

egeozcan12y ago

> What else is missing? Just a protocol implementation?

Yes. Actually that's why I said that I'm nearly sure.

As a side note: We may also need some rumors on being "web scale" (Actually I don't even know much about the events/comments/whatever which lead to that famous video but I still find it funny)

nasalgoat12y ago

Funny you mention this, we've written one as we transition from MongoDB to PostgreSQL. It was actually much easier than I expected, because MongoDB doesn't really do anything - all the JOIN logic is in code and it mostly consists of "find" and "findOne" calls with minor filtering that is easily translatable to SQL.

The hardest part is re-training all the devs to stop thinking like Mongo devs (ie. "I must make five queries and join the info in code") and let the DB do the heavy lifting it was designed to do.

lucisferre12y ago

I hope so. I've become so used to Mongoid's API and I've never really liked the "invisible" properties of Active Record.

sanderjd12y ago

Yeah, it would be cool if AR came with a `field` method that could be made non-optional through configuration, and enforce that only those fields are accessible.

mrud12y ago

Not exactly what you were looking for, but you can connect from PostgreSQL to MongoDB via Foreign Data Wrappers - https://github.com/citusdata/mongo_fdw

egeozcan12y ago

I know about Foreign Data Wrappers and use them extensively. To clarify, what I want is to have a quick engine swap for some legacy apps without touching anything but config files.

1 more reply

jeffdavis12y ago

https://github.com/umitanuki/mongres

ddorian4312y ago

https://github.com/JerrySievert/mongolike

https://github.com/citusdata/mongo_fdw

gaius12y ago

There is already one on top of DB/2.

http://www.theregister.co.uk/2013/06/05/ibm_db2_mongodb/

csmuk12y ago

That's sensible when you consider that postgresql used to be a SQL layer over postgres. Not sure if that is the case now.

dragonwriter12y ago

> That's sensible when you consider that postgresql used to be a SQL layer over postgres.

Were they ever actually separate layers? I thought that PostgreSQL was a rename of Postgres that happened shortly (one-two versions) after they swapped query languages from the Ingres-derived QUEL to SQL.

1 more reply

cullenking12y ago· 11 in thread

Maybe I am just incredibly lucky, but mongodb has worked fine for ridewithgps.com - we are sitting at 670gb of data in mongo (actual DB size, indexes included) and haven't had a problem. Replica sets have been fantastic, I wish there was another DB out there that did auto-failover as cleanly/easily as mongo does. We've had a few server crashes of our primary, and aside from 1-2 seconds or so of errors as requests come in before the secondary is promoted, it's transparent.

With that being said, we are using it to store our JSON geo track data, most everything else is in a mysql database. As a result we haven't run into limitations around the storage/query model that some other people might be experiencing.

Additionally, we have some serious DB servers so haven't felt the pain of performance when exceeding working memory. 192gb of ram with 8 RAID10 512gb SSDs probably masks performance issues that other people are feeling.

Final note: I'll probably be walking away from mongo, due to the natural evolution of our stack. We'll store high fidelity track data as gzipped flat files of JSON, and a reduced track inside of postgis.

tl;dr - using mongo as a very simple key/value store for data that isn't updated frequently, which could easily be replaced by flat file storage, is painless. YMMV with other use cases.

rbranson12y ago

How many reads do you do on that server? 192GB + 8 SSDs is a pretty serious setup. Just the disks themselves should be able to push 500K+ random IOPS.

cullenking12y ago

It's actually very much over built. I had the budget, so I built for growth. This setup should easily last us through this year, assuming 4x growth over last year. I am actually putting together another similarly spec'd machine with 384gb of ram (why not, it's cheap) because the current secondary is on 15k disks with only 24gb of ram. Still more than enough to handle the load through this last year, but probably not this coming year, at least without a bunch more ram.

In regards to actual iops, not sure what this thing can peak at off the top of my head, but we'll easily be doing 100 queries a second this year, with a considerable portion of those queries pulling out ~1mb documents.

Playing it conservative, so I am moving towards gzipping those large documents (never need to access anything but the full data, > 90% of accesses are directly served to clients that can handle inflating the data). For now they will stay in mongo, but I am building out an evaluation of using a flat file structure and just letting nginx pass them out.

1 more reply

cullenking12y ago

One thing to note that I left out above, I also use the same severs for mysql. Our mysql working set is something like 30gb now, so I have a decent chunk of that ram apportioned to mysql.

Additionally our mysql db sees many more queries than mongo, so the overbuilt hardware is a bit less overbuilt when taking that into consideration :)

jlouis12y ago

670 gigabytes is a puny database size. You should be able to press so much power through a system with a disk system like the one you have. I would seriously consider a Postgres setup on a data set of that size. Additionally, I would probably just store the JSON data directly inside postgresql.

cullenking12y ago

See comment below. Definitely a small database size. You hit the nail on the head. It's going to grow fast this next year, and I'd like to put off sharding as long as possible, hence gzipped storage of the data.

postgis isn't a good fit for the data we store in full fidelity, since it's not just geo data but also sensor data (heartrate, cadence, power in watts, temperature etc). However I'll be storing a point reduced version of the full track in postgis, so i can move to using actual intersection queries for matching tracks, instead of the current brute force approach (check everypoint in every track sharing a bounding box) that works now. All bets are out the window though with 2-4x the traffic and data we currently have, using that brute force approach.

I already run another beefy postgis setup (192gb ram, though spinning disks not SSDs) for serving OSM maps, and eventually OSM routing hence the ram.

1 more reply

blablabla12312y ago

>data that isn't updated frequently

How often did you update your data then? In my current project I am seeing locking issues in my way soon...

cullenking12y ago

I think I peaked at 5% lock utilization this year, so I haven't seen any real issues.

Our actual track data isn't updated frequently. Mostly it serves as an archive for a user, and is only seen by 1-2 other people. Most people use our service to store all their activities, which for the most part are really boring. They are interested in aggregate metrics like "I've ridden 200 miles this month".

A smaller portion of our data is from planning a route using google maps, which has much more modest storage requirements, since it's optimized data (one point every mile if it's a straight line) instead of 1hz logging from a GPS unit. This stuff is edited, but I'd say only 10% of planned routes are ever modified, so actual updates on the track data are small.

mason5512y ago

> using mongo as a very simple key/value store for data that isn't updated frequently, which could easily be replaced by flat file storage, is painless. YMMV with other use cases.

This is our use case as well and MongoDB has been fine. We had some initial pain as we learned the product but it's great for this use case. Currently sitting around 1TB of data.

keypusher12y ago

> 192gb of ram with 8 RAID10 512gb SSDs probably masks performance issues

I would hope so.

3pt1415912y ago

Which data centre are you hosting in? Even when I was colo'ing back at 500px that much RAM wasn't "cheap". "Fairly priced" is the way I would put it.

cullenking12y ago

We build our own machines and host them out of a carrier neutral facility in Portland, OR. $200/mo unmetered unlimited 100mbit connection and $1100 for a full rack with redundant 20 amp feeds. It would cost us something like $5k/mo to run our 9 servers on amazon.

endijs12y ago· 8 in thread

I'm no MongoDB expert, but recently started to look into this db. Can anyone tell me (from experience, not from promo materials) - for which use cases MongoDB is good fit and for which ones it's not? It's clear that it can't fit for everyone. That's why it would be good to know in advance, for what it most likely to find and for what it's most likely not to fit.

mikegioia12y ago

It has the benefits and ease of use of a json document store, it allows you to do SQL style where clauses, it takes about a minute to install and start using, there are a wide range of drivers available for many languages, and it has a simple javascript map/reduce.

on the flip side, it implements database level locking, uses more disk/RAM than it probably should, and can start to give you headaches if you try to do a lot of writes at once.

edit: to give you a real world example, we use mariadb for storing everything persistently. however, a lot of data like "number of teachers in school A" is aggregated and too difficult to run in real time when we render paged results. to get around that, we use mongo as a document store and use its SQL like querying to generate the paged search results. this lets us sort/filter on the data without having to do everything in SQL.

jeltz12y ago

> to give you a real world example, we use mariadb for storing everything persistently. however, a lot of data like "number of teachers in school A" is aggregated and too difficult to run in real time when we render paged results. to get around that, we use mongo as a document store and use its SQL like querying to generate the paged search results. this lets us sort/filter on the data without having to do everything in SQL.

This use case should be possible to solve with the JSON type in PostgreSQL. The indexing in PostgreSQL is just as advanced in 9.3 and will be better than MongoDB in 9.4 if a couple of patches land.

1 more reply

jacques_chester12y ago

> however, a lot of data like "number of teachers in school A" is aggregated and too difficult to run in real time when we render paged results.

Does this mean you're using MongoDB as a kind of query cache? Was there a compelling reason to prefer it to other common caches? Or even building an ETL/DW into your existing database infrastructure?

1 more reply

lucian190012y ago

Don't.

Having used it on what is supposed to be its perfect use case, I think it's a terrible product. Use anything else you like the look of.

jwarkentin12y ago

Some things to consider:

http://www.sarahmei.com/blog/2013/11/11/why-you-should-never...

antonmaju12y ago

This is a reply to Sarah Mei's post from Ayende Rahien http://ayende.com/blog/164483/re-why-you-should-never-use-mo...

sigzero12y ago

http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

endijs12y ago

This is more like how it is in theory. I was hoping for more real life stories. "We had problem X, tried MongoDB, but failed because of ...", "We had problem Z, MongoDB works better than R, because..."

memracom12y ago· 7 in thread

Lets just say that PostgreSQL answers the criticisms of relational databases that led to NoSQL. The complaints all boiled down to saying that the RDBMS forced you to do things one way and that it was cumbersome. PostgreSQL evolved and fixed the most annoying issues like JSON support and schemaless key-value store support. That's the way open source is supposed to work. Now folks are learning that throwing out the baby with the bathwater leads to more complexity than just learning how to use a relational database. The pendulum has swung back.

yid12y ago

> PostgreSQL evolved and fixed the most annoying issues like JSON support and schemaless key-value store support.

As I recall, automatic sharding was on that list, and pg doesn't attempt to tackle that afaik.

lucian190012y ago

Mongo doesn't really do that in a way you can reliably use in practice either, though. Its sharding offers a subset of operations over an inconsistent view of your dbs.

You can do that with Postgres trivially, and even automatically with postgres_fdw and writeable views.

frezik12y ago

This story has played out before. Last time, it was Object Oriented databases. What happens each time is that the traditional RDBMS's pick up a few of the features, and then we keep using them until the next contender comes along.

remon12y ago

This is not true at all. The actual realization the past years is that strictly enforced relationality (is that a word?) and transactions are constructs that are not always or even rarely actually needed. Eventual consistency, schemaless data modelling and so on picked up steam and for good reasons. Every technology that survives the "Oh, new toy!" stage has a place or it wouldn't still exist. It is up to developers to choose the appropriate technology for them and their projects. That isn't to say that a lot of persistence problems cannot be solved by an RDBMS, a k/v store and a document store. In that case just base your decision on other drivers (comfort level, cost, and so on)

AlisdairO12y ago

> Eventual consistency,

...may be possible, but almost always requires domain-specific concurrency-level understanding in your datastore, and is almost always harder to work with than strong consistency.

Saying that transactions are 'rarely' needed boggles my mind. Working inside transactions (where feasible, which is in the large majority of situations) vastly simplifies data storage.

1 more reply

threeseed12y ago

PostgreSQL doesn't answer the criticisms of relational databases.

It is still cumbersome to use, hard to shard, even harder to cluster and is incredibly complex to manage compared to databases like Cassandra.

remon12y ago

You had me until the "like Cassandra" bit ;) If there's one thing where Cassandra loses the battle with other NoSQL tech then cluster managements is probably it. Also, it's a bit unfair to compare RDBMSs with Cassandra. Clustering is inherently going to be more complicated for RDBMSs. Incidentally it's actually where I feel MongoDB deserves a bit of credit.

pilif12y ago· 6 in thread

The title is a bit misleading. This is basically an announcement of a fork of Errbit that has Postgres support. Additionally, the fork was announced as an issue on errbit with no discussion or as an official pull request.

I would not consider this good etiquette. If you fork your project (especially without discussing the intention first), adding a bug to the original project isn't a very nice thing to do.

An official pull request would be nicer or, even better, don't bother the original project, but just announce your fork over other channels.

Even better would be to at least discuss the issue with the original project - maybe they agree and you can work together.

mapgrep12y ago

>even better, don't bother the original project, but just announce your fork over other channels

This is a rather bizarre interpretation of nice behavior: Make a very cool modification to a project, but don't even bother to tell the original maintainers/authors?

Github Issues is a perfectly reasonable place for this. Maybe the mailing list would be better, but, shrug. Issues != Bugs, by the way. There's a reason it's called Issues. And it's basically the only way to have a discussion on github about anything whether it's an issue or not.

Also, some maintainers get mad if you send a pull request without doing an issue first, so there's no right way.

spellboots12y ago

I disagree, I think that opening an issue on github is a good way to start a discussion about a feature. Many projects accept feature requests this way and if anyone did the same for one of my projects, this would be the way I would prefer them to handle it.

felideon12y ago

The thing is it sounds like he is just promoting his own fork he started 11 months ago, rather than "starting a discussion."

> We suggest to put errbit on PG. For those who want to try - the code here: https://github.com/Undev/errbit/tree/pg-upstream

The problem here is that the bad English grammar could have given the wrong impression. Maybe he is just saying:

"hey guys, you should consider migrating to postgresql. here's some code you can check out that has worked for us."

Rather than:

"hey guys, screw Errbit/MongoDB, use our fork!"

3 more replies

vdaniuk12y ago

Open source should encourage forking and easing the transmission of community from one fork to another.

ricardobeat12y ago

There is nothing wrong with announcing a fork within the original's community. Where else would you announce it?

gizzlon12y ago

If you go back and look at it now, you'll see that this is a non-issue:

    @realmyst Will you put up a Pull Request?

    It sounds like MongoDB has no future indeed:   
    http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/



   realmyst commented 19 minutes ago

   @21croissants yes, I will.

trekky170012y ago· 3 in thread

This just makes me wonder why they chose Mongo in the first place. It sounds like they didn't really consider their needs when initially choosing databases. Mongo has some benefits that when properly implemented far outweigh the negatives. At the same time, it's still relatively young, and doesn't have the "maturity of process" that makes older SQL engines so easy to manage/implement. Eventually, I'm sure, Mongo will solve these issues and be a great database for those who need to utilize its many virtues.

Xorlev12y ago

MongoDB is easy. I'll be the first one to spit-roast MongoDB with war stories, but the biggest benefit I keep coming back to is ease of use for a developer. It's very easy to change your data model and rapidly iterate.

As soon as your project starts to solidify, the main benefit of MongoDB is gone.

It still lives in some of my personal projects (e.g. <100mb of data, because even flat files can't mess that up).

jmngomes12y ago

Because 1) a lot of startups seem to choose "startup technology", i.e. whatever famous startups are using, just because it seems fashionable and/or they don't consider if it'll actually solve their specific problem; or 2) they're technically curious and end up using it just for fun, even if it's not a good fit for their problem.

I've seen people using Redis for their MVPs, which is hardly necessary to serve 100 or 1000 or 10000 users. When you have a hammer at hand, everything looks like a nail.

aidenn012y ago

As far as Redis goes, is there really much of anything in the space between bdb style KV stores and Redis? If you have design reasons for wanting your KV store in a separate process, why not use Redis?

1 more reply

mml12y ago· 2 in thread

The hstore enhancements coming in psql 9.4 will pretty much put mongo out to pasture.

"Mongodb" already nearly exists as a single column type, 9.4 will complete it.

threeseed12y ago

Right. Just like MySQL/Oracle was put out to pasture.

And if you think MongoDB is only popular because it is a JSON store then it shows just little you know about the database landscape and about how developers actually use databases.

mml12y ago

For those of us that use all 3 of those, your statement is in large part true. I'd also hazard that Mongodb isn't actually that popular outside the HN buzz bubble.

jvvlimme12y ago· 1 in thread

If you want to use MongoDB in a project and you don't intend to rely heavily on the aggregation framework, the consider TokuMX (http://www.tokutek.com/products/tokumx-for-mongodb/) as it alleviates many of the shortcomings of MongoDB (data compression, document level locking for writes, ...) + it adds transactions.

It's a drop in replacement so it will work with current drivers. (if you have a running mongo cluster however expect quite some work if you want to migrate)

(I have no affiliation with TokuTek whatsoever except that I use their product)

stonewhite12y ago

It is basic an ops-friendly mongo fork with _obviously_ better engineering decisions. I hope mongodb will support pluggable storage engines soon.

jeffdavis12y ago· 1 in thread

"Its volume on disk is growing 3-4 times faster than the real volume of data it store[sic]"

Are they saying that it has a high constant overhead to the data, or are they saying the storage grows in a super-linear fashion?

remon12y ago

It's constant overhead with some spikiness to it. See my reply in this thread for details.

pilif12y ago

All the philosophical issues and /(No)?SQL/ discussions aside, as a heavy user of Postgres and a user of Errbit, this is very good news to me. I have not much experience with running Mongo, but I have a ton of experience with running Postgres.

Even better: The application I'm using Errbit the most for is already running in front of a nicely replicated and immensely powerful postgres install.

Being able to put the Errbit data there is amazing.

This is some of the best news I've read today :-)

kldavenport12y ago

Their use case didn't seem especially Mongo-centric, I wonder why they chose to go down the road. We used MongoDB TokuMX to improve performance: http://www.tokutek.com/resources/benchmark-results/tokumx-be...

weixiyen12y ago

> Finally we sleep quietly, and don’t fear that mongodb will drive out redis to swap once again.

Well duh, Mongo was designed to live on its own server as it tries to claim all of the free memory available. Putting it on the same server with Redis makes no sense.

The case that caused you sleepless nights does not apply to 99% of projects out there.

WoodenChair12y ago

"Nobody ever got fired for buying IBM." Nobody ever got fired for using PostgreSQL.

r0muald12y ago

In case you missed it, this submission is not about PostgreSQL vs MongoDB. It's about the crazy GIF parade in the comments interleaved with thumbs up emojis. You don't see such stuff often on github :)

poseid12y ago

Anyone compared MongoDB with other document stores, e.g. with https://github.com/triAGENS/ArangoDB ?

iand12y ago

Typo in title.

WalterSear12y ago

Does anyone have a recommendation for an authoritative guide to either Postgres or Mongodb? One that does more than show you where the levers are, that is.

coolrhymes12y ago

RDS now supports Postgres. It supports both hot & cold swaps. Hopefully in future it will support read replicas.

filipedeschamps12y ago

This is what I call a click bait title.

194512y ago

This title is why I've been reading Hacker News less and less.

mtolan12y ago

Is PostgreSQL web scale? I will use it if it is web scale.

1 more reply

j / k navigate · click thread line to collapse

151 comments

95 comments · 22 top-level

functional_test12y ago· 20 in thread

Seriously, another case of using Mongo incorrectly? I want to believe all the Mongo hate, but I can't because I always find out that the actual problem was one or more of:

* didn't read the manual

* poor schema

* didn't maintain the database (compactions, etc.)

In this case, they hit several:

" Its volume on disk is growing 3-4 times faster than the real volume of data it store;"

They should be doing compactions and are not. Using PostgreSQL does not avoid administration; it simply changes the administration to be done.

"it eats up all the memory without the possibility to limit this"

"it begins to slow down the application because of frequent disk access"

"Finally we sleep quietly, and don’t fear that mongodb will drive out redis to swap once again."

You should be running Mongo on a server by itself. At the very least, if you're having disk contention issues, don't run it on the same server as your other database.

I'm not sure you always need to read the manual for everything, but for your production database, it's probably worth it.

rlpb12y ago

> Seriously, another case of using Mongo incorrectly?

So, what is this proportion?

derefr12y ago

2 more replies

mattquiros12y ago

> If a large proportion of MongoDB users are using it incorrectly, then I'd argue that it is a MongoDB problem

Hey, that sounds a lot like the logic of Java haters!

Kidding aside, I'm afraid I'm not sure your logic is convincing, but that's for another debate.

1 more reply

jtchang12y ago

> * didn't read the manual > * poor schema > * didn't maintain the database (compactions, etc.)

randomdata12y ago

Your comment has made me quite curious to know what people using mature databases of the time were saying about Postgres 19 years ago, when it was roughly the same age Mongo is today.

4 more replies

jeffdavis12y ago

> They should be doing compactions and are not.

https://jira.mongodb.org/browse/SERVER-11763

It looks like compaction is an offline process. That really puts the user between a rock and a hard place.

functional_test12y ago

In a proper production environment, you just compact each slave one at a time because you have a replica set rather than a single instance.

Of course, if you aren't replicating your business's production database, you have a whole world of problems.

3 more replies

dev36012y ago

ehwizard12y ago

Have you tried turning on power of 2 allocation? In general, it makes compaction much less important. Though online compaction is definitely needed.

1 more reply

rdtsc12y ago

> Seriously, another case of using Mongo incorrectly?

If everyone uses Mongo incorrectly, the problem is not Mongo. It is like the person crying out how everyone in the world is crazy.

functional_test12y ago

I seem to have been able to use it correctly. In fact, I ran a cluster for years in production without any issues. I know of several other groups that have used it successfully as well.

chaostheory12y ago

> Seriously, another case of using Mongo incorrectly? I want to believe all the Mongo hate, but I can't because I always find out that the actual problem was one or more of: > * poor schema

bsg7512y ago

> "Finally we sleep quietly, and don’t fear that mongodb will drive out redis to swap once again."

MongoDB and Redis on the same box? Two data stores that need working set / all of the data to reside in RAM for performance? That is a recipe bound for failure.

Everyone seems to learn about physical separation the hard way.

blablabla12312y ago

What about locking? I heard that Mongo has a locking with only DB-granularity.

1 more reply

jb00712y ago

MongoDB therefore, is not a general purpose database. I recommend http://www.amisalabs.com

jlouis12y ago

This also drives the amount of administrative overhead needed.

nailer12y ago

The current stable node drivers silently throws away exceptions. Seriously, mongodb inc acknowledge it. Is this also a case of not using mongo correctly?

lucian190012y ago

Mongo is very bad at managing used memory. In fact it doesn't actually manage memory since it just mmaps its database file.

It also touches disk much more often than would be reasonable, especially for how much memory it uses.

It's a terrible database and it is perfectly legitimate to be annoyed at it being this terrible.

remon12y ago

Although it is true that MongoDB uses a lot of disk compared to your average RDMS there are reasons for that.

4 more replies

jeltz12y ago

And let's not forget the fact that it has a per database lock, which is a really strange choice for a document database.

1 more reply

egeozcan12y ago· 14 in thread

I'm nearly sure one day someone will write a MongoDB compatibility layer on top of PostgreSQL

seiji12y ago

PostgreSQL has native json support now. What else is missing? Just a protocol implementation?

I'd love to see MongoDB give up and become a PostgreSQL consultancy.

jcampbell112y ago

It is probably not as simple as "supports json now". Imagine if HN comments were stored as a JSON document:

    Client A: Read JSON.
    Client B: Read JSON.
    Client A: Append new comment to json document.
    Client B: Append new comment to json document.
    Client A: Save JSON
    Client B: Save JSON

A's comment will get deleted. My understanding is that Mongo DB does have a way to append a record within a document, but Postgres does not.

I am in no way advocating for MongoDB (I dislike it). I am just saying that I understand that MongoDB has much more sophisticated updates capability than Postgres.

6 more replies

busterarm12y ago

2 more replies

egeozcan12y ago

> What else is missing? Just a protocol implementation?

Yes. Actually that's why I said that I'm nearly sure.

As a side note: We may also need some rumors on being "web scale" (Actually I don't even know much about the events/comments/whatever which lead to that famous video but I still find it funny)

nasalgoat12y ago

The hardest part is re-training all the devs to stop thinking like Mongo devs (ie. "I must make five queries and join the info in code") and let the DB do the heavy lifting it was designed to do.

lucisferre12y ago

I hope so. I've become so used to Mongoid's API and I've never really liked the "invisible" properties of Active Record.

sanderjd12y ago

Yeah, it would be cool if AR came with a `field` method that could be made non-optional through configuration, and enforce that only those fields are accessible.

mrud12y ago

Not exactly what you were looking for, but you can connect from PostgreSQL to MongoDB via Foreign Data Wrappers - https://github.com/citusdata/mongo_fdw

egeozcan12y ago

I know about Foreign Data Wrappers and use them extensively. To clarify, what I want is to have a quick engine swap for some legacy apps without touching anything but config files.

1 more reply

jeffdavis12y ago

https://github.com/umitanuki/mongres

ddorian4312y ago

https://github.com/JerrySievert/mongolike

https://github.com/citusdata/mongo_fdw

gaius12y ago

There is already one on top of DB/2.

http://www.theregister.co.uk/2013/06/05/ibm_db2_mongodb/

csmuk12y ago

That's sensible when you consider that postgresql used to be a SQL layer over postgres. Not sure if that is the case now.

dragonwriter12y ago

> That's sensible when you consider that postgresql used to be a SQL layer over postgres.

1 more reply

cullenking12y ago· 11 in thread

tl;dr - using mongo as a very simple key/value store for data that isn't updated frequently, which could easily be replaced by flat file storage, is painless. YMMV with other use cases.

rbranson12y ago

How many reads do you do on that server? 192GB + 8 SSDs is a pretty serious setup. Just the disks themselves should be able to push 500K+ random IOPS.

cullenking12y ago

1 more reply

cullenking12y ago

One thing to note that I left out above, I also use the same severs for mysql. Our mysql working set is something like 30gb now, so I have a decent chunk of that ram apportioned to mysql.

Additionally our mysql db sees many more queries than mongo, so the overbuilt hardware is a bit less overbuilt when taking that into consideration :)

jlouis12y ago

cullenking12y ago

I already run another beefy postgis setup (192gb ram, though spinning disks not SSDs) for serving OSM maps, and eventually OSM routing hence the ram.

1 more reply

blablabla12312y ago

>data that isn't updated frequently

How often did you update your data then? In my current project I am seeing locking issues in my way soon...

cullenking12y ago

I think I peaked at 5% lock utilization this year, so I haven't seen any real issues.

mason5512y ago

> using mongo as a very simple key/value store for data that isn't updated frequently, which could easily be replaced by flat file storage, is painless. YMMV with other use cases.

This is our use case as well and MongoDB has been fine. We had some initial pain as we learned the product but it's great for this use case. Currently sitting around 1TB of data.

keypusher12y ago

> 192gb of ram with 8 RAID10 512gb SSDs probably masks performance issues

I would hope so.

3pt1415912y ago

Which data centre are you hosting in? Even when I was colo'ing back at 500px that much RAM wasn't "cheap". "Fairly priced" is the way I would put it.

cullenking12y ago

endijs12y ago· 8 in thread

mikegioia12y ago

on the flip side, it implements database level locking, uses more disk/RAM than it probably should, and can start to give you headaches if you try to do a lot of writes at once.

jeltz12y ago

This use case should be possible to solve with the JSON type in PostgreSQL. The indexing in PostgreSQL is just as advanced in 9.3 and will be better than MongoDB in 9.4 if a couple of patches land.

1 more reply

jacques_chester12y ago

> however, a lot of data like "number of teachers in school A" is aggregated and too difficult to run in real time when we render paged results.

Does this mean you're using MongoDB as a kind of query cache? Was there a compelling reason to prefer it to other common caches? Or even building an ETL/DW into your existing database infrastructure?

1 more reply

lucian190012y ago

Don't.

Having used it on what is supposed to be its perfect use case, I think it's a terrible product. Use anything else you like the look of.

jwarkentin12y ago

Some things to consider:

http://www.sarahmei.com/blog/2013/11/11/why-you-should-never...

antonmaju12y ago

This is a reply to Sarah Mei's post from Ayende Rahien http://ayende.com/blog/164483/re-why-you-should-never-use-mo...

sigzero12y ago

http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

endijs12y ago

memracom12y ago· 7 in thread

yid12y ago

> PostgreSQL evolved and fixed the most annoying issues like JSON support and schemaless key-value store support.

As I recall, automatic sharding was on that list, and pg doesn't attempt to tackle that afaik.

lucian190012y ago

Mongo doesn't really do that in a way you can reliably use in practice either, though. Its sharding offers a subset of operations over an inconsistent view of your dbs.

You can do that with Postgres trivially, and even automatically with postgres_fdw and writeable views.

frezik12y ago

remon12y ago

AlisdairO12y ago

> Eventual consistency,

...may be possible, but almost always requires domain-specific concurrency-level understanding in your datastore, and is almost always harder to work with than strong consistency.

Saying that transactions are 'rarely' needed boggles my mind. Working inside transactions (where feasible, which is in the large majority of situations) vastly simplifies data storage.

1 more reply

threeseed12y ago

PostgreSQL doesn't answer the criticisms of relational databases.

It is still cumbersome to use, hard to shard, even harder to cluster and is incredibly complex to manage compared to databases like Cassandra.

remon12y ago

pilif12y ago· 6 in thread

I would not consider this good etiquette. If you fork your project (especially without discussing the intention first), adding a bug to the original project isn't a very nice thing to do.

An official pull request would be nicer or, even better, don't bother the original project, but just announce your fork over other channels.

Even better would be to at least discuss the issue with the original project - maybe they agree and you can work together.

mapgrep12y ago

>even better, don't bother the original project, but just announce your fork over other channels

This is a rather bizarre interpretation of nice behavior: Make a very cool modification to a project, but don't even bother to tell the original maintainers/authors?

Also, some maintainers get mad if you send a pull request without doing an issue first, so there's no right way.

spellboots12y ago

felideon12y ago

The thing is it sounds like he is just promoting his own fork he started 11 months ago, rather than "starting a discussion."

> We suggest to put errbit on PG. For those who want to try - the code here: https://github.com/Undev/errbit/tree/pg-upstream

The problem here is that the bad English grammar could have given the wrong impression. Maybe he is just saying:

"hey guys, you should consider migrating to postgresql. here's some code you can check out that has worked for us."

Rather than:

"hey guys, screw Errbit/MongoDB, use our fork!"

3 more replies

vdaniuk12y ago

Open source should encourage forking and easing the transmission of community from one fork to another.

ricardobeat12y ago

There is nothing wrong with announcing a fork within the original's community. Where else would you announce it?

gizzlon12y ago

If you go back and look at it now, you'll see that this is a non-issue:

    @realmyst Will you put up a Pull Request?

    It sounds like MongoDB has no future indeed:   
    http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/



   realmyst commented 19 minutes ago

   @21croissants yes, I will.

trekky170012y ago· 3 in thread

Xorlev12y ago

As soon as your project starts to solidify, the main benefit of MongoDB is gone.

It still lives in some of my personal projects (e.g. <100mb of data, because even flat files can't mess that up).

jmngomes12y ago

I've seen people using Redis for their MVPs, which is hardly necessary to serve 100 or 1000 or 10000 users. When you have a hammer at hand, everything looks like a nail.

aidenn012y ago

1 more reply

mml12y ago· 2 in thread

The hstore enhancements coming in psql 9.4 will pretty much put mongo out to pasture.

"Mongodb" already nearly exists as a single column type, 9.4 will complete it.

threeseed12y ago

Right. Just like MySQL/Oracle was put out to pasture.

And if you think MongoDB is only popular because it is a JSON store then it shows just little you know about the database landscape and about how developers actually use databases.

mml12y ago

For those of us that use all 3 of those, your statement is in large part true. I'd also hazard that Mongodb isn't actually that popular outside the HN buzz bubble.

jvvlimme12y ago· 1 in thread

It's a drop in replacement so it will work with current drivers. (if you have a running mongo cluster however expect quite some work if you want to migrate)

(I have no affiliation with TokuTek whatsoever except that I use their product)

stonewhite12y ago

It is basic an ops-friendly mongo fork with _obviously_ better engineering decisions. I hope mongodb will support pluggable storage engines soon.

jeffdavis12y ago· 1 in thread

"Its volume on disk is growing 3-4 times faster than the real volume of data it store[sic]"

Are they saying that it has a high constant overhead to the data, or are they saying the storage grows in a super-linear fashion?

remon12y ago

It's constant overhead with some spikiness to it. See my reply in this thread for details.

pilif12y ago

Even better: The application I'm using Errbit the most for is already running in front of a nicely replicated and immensely powerful postgres install.

Being able to put the Errbit data there is amazing.

This is some of the best news I've read today :-)

kldavenport12y ago

weixiyen12y ago

> Finally we sleep quietly, and don’t fear that mongodb will drive out redis to swap once again.

Well duh, Mongo was designed to live on its own server as it tries to claim all of the free memory available. Putting it on the same server with Redis makes no sense.

The case that caused you sleepless nights does not apply to 99% of projects out there.

WoodenChair12y ago

"Nobody ever got fired for buying IBM." Nobody ever got fired for using PostgreSQL.

r0muald12y ago

poseid12y ago

Anyone compared MongoDB with other document stores, e.g. with https://github.com/triAGENS/ArangoDB ?

iand12y ago

Typo in title.

WalterSear12y ago

Does anyone have a recommendation for an authoritative guide to either Postgres or Mongodb? One that does more than show you where the levers are, that is.

coolrhymes12y ago

RDS now supports Postgres. It supports both hot & cold swaps. Hopefully in future it will support read replicas.

filipedeschamps12y ago

This is what I call a click bait title.

194512y ago

This title is why I've been reading Hacker News less and less.

mtolan12y ago

Is PostgreSQL web scale? I will use it if it is web scale.

1 more reply

j / k navigate · click thread line to collapse