I think this is the most significant factor, by far. With Mongo it's turtles (or at least Maps/Hashes) all the way down, without a strange pseudo-english layer near the bottom that forces you to translate back and forth. For some devs that's a big deal.
For the last while I've been experimenting with bringing the same feature to PostgreSQL (http://bedquiltdb.github.io), turns out it's very do-able, but I don't have enough time to make it as featureful as it needs to be.
It's totally possible. There's a ton of libraries that do it in various ways. It's just not even remotely easy, like it should be. If you look into the inside of those libraries you'll find a C'thulian monstronsity of special cases interacting with special cases until the whole thing just explodes into a brain-consuming mess, because SQL was very, very clearly not written for this use case.
In another 10 or 20 years I look forward to data-based analysis that tries to determine how much of the "NoSQL" movement was because the relational data model doesn't work for all use cases, and how much of it was the entirely accidental (in the Brooks sense) problems with A: SQL, the language itself, not its capabilities and B: schema migrations with no essential reason to be as painful as they are. (And something something column stores, but I'm not sure where they fit into this story exactly.) And to be clear on tone, I really am interested. I'm pretty sure the answer won't be either extreme but I'm pretty uncertain about where in the middle we'll fall.
And really, when you think about it, it is a bit weird, (we send sentences of almost-english toward the DB, then get structured data back) and there's nothing wrong with that.
But many devs do value the ability to stay in hash-map land all day without needing to think about how to cross an abstraction boundary into the DB, and MongoDB actually has a pretty cool solution there.
No, it was not. It is an abomiation, just like SharePoint CAML Query or any other "express-your-query-as-an-AST" approach. Sure, it's a paradise for Buzzword-Compliant Fluent Interface Enterprise Query Builder-type libraries.
> Querying SQL data involves constructing strings...
No. Case in point: LINQ. How it is implemented is irrelevant to the statement above.
> ...and counting arguments very carefully
No. Named parameters to the rescue.
1. http://mongodb.github.io/mongo-csharp-driver/2.1/reference/d... 2. https://www.progress.com/connectors/mongodb
Well SQL is just an AST abstraction, which was made to be human readable. And that's exactly the issue, SQL is excellent for humans to express querys but really inconvenient to interface with other programs unless you express every single query ad-hoc (which seems the way most people choose since ORM tend to perform poorly)
The vast vast majority of SQL in the wild is pretty straightforward and sucks only when the schema sucks (can't blame language for that). Most of the time people hate SQL, they do so because they are working with crap schema (whether someone else created it or they themselves didn't think it through enough). Sane schema + putting a bit of effort write readable SQL can go a long long way.
Optimizing for the "newbie" case is not a failure.
Years of SQL injection mitigation don't agree with you. Are you sure you just don't prefer it because you are familiar with it?
Because it doesn't have to be shinny and all, it just works. There are no widely used alternatives, so we can say that people are happy-enough using it. See the situation with JS. CoffeeScript and all, it's quite popular. Is there an analogy (in terms of functionality and popularity) in SQL? Nope. No one bothered to do it well enough, or not enough people found that it's worth enough to learn it.
We don't need to reinvent something every few years just because new JS frameworks come out and others are forgotten every few months.
Granted, I think a big part of it has to do with joins being the annoying part of SQL...
Here's the repo:
https://github.com/adewes/blitzdb
The implementation is stable but I'm still working on finishing Python3 support and documentation.
Either MongoDB will be, or other databases that have learned the lessons, both good and bad, of MongoDB.
RethinkDB appears to have captured the "MongoDB done right" mindshare, and PostgreSQL has gained JSON and is gaining better replication in order to cover the same niches.
Why is that?
Quick google search doesn't hint at problems, but rather at pretty slick marketing pages: (which doesn't mean much, I know) https://aws.amazon.com/blogs/aws/mongodb-on-the-aws-cloud-ne...
And if we are talking about managed databases then it's equally dead simple to spin up a Compose/MongoLab instance in AWS.
PostgreSQL really needs a MUCH better replication/sharding/failover story... While I would use PostgreSQL in a situation where all I need/want is a single server, where multiple servers are needed for HA/failover, I'd probably just defer to MS-SQL, only because pg is so convoluted in that regard.
As to MySQL/Maria... I haven't touched it in years, and every time I have some weird behavior drives me nuts. I find it funny that people can love mysql, and bash on JS.
I'd also like to acknowledge ElasticSearch and Cassandra... ES is wonderful to work with for what it does best, search, and C* is a champ when you need a really good distributed table/kv store, though I think that RethinkDB is a better option today, if you don't need more than 10-20 nodes (which is a LOT).
In future though there will be support for bi-directional replication in Postgres, i.e true multi-master support.
Mindshare is irrelevant. MongoDB is killing it in the enterprise right now. They have integration with Oracle, Teradata, Hadoop and countless partnerships with other vendors. You can guarantee MongoDB will still be around in 20 years the way it is positioning itself. Can't say the same about RethinkDB (as great as it is).
> PostgreSQL has gained JSON and is gaining better replication in order to cover the same niches
The PostgreSQL replication story is pretty pathetic given how old/mature it is. And I've seen nothing to suggest that anything is really improving in this area. There are a range of addons none of which are supported or built in. Basic replication is confusing, the documentation non existent in parts and good luck getting any support.
You compare it to MongoDB (or really any of the newer NoSQL databases) and it's like night and day. It takes minutes to setup a replica set and there is plenty of documentation and official support for any issues.
MongoDB makes the operational side of replication easy, but handwaves a safe, functioning implementation: https://aphyr.com/posts/322-call-me-maybe-mongodb-stale-read...
Every chance I get, I advocate ArangoDB. It also is Mongo Done Right. You get joins, graphs and a thoughtful future plan from ArangoDB team. To help bridge the gap I've written an ArangoDB Hadoop connector [1]. Unlike the MongoDB one, you can read and write.
I've also added better Clojure support to it: from a driver to a Ragtime migrator.
Sadly as it stands Mongo has a better Ops story than ArangoDB. Until that improves, I don't think that ArangoDB will make it into many Fortune 1000's outside of some small prototype style applications. Maybe micro-services in the enterprise will change this, but I don't think a large insurance company wants to support multiple database standards in general, and definitely not within a family.
PostgreSQL 9.4, 9.5 and 9.6 all introduce foundational changes to make eventually enable replication, but none of it is really exposed to the end user. They are working on it, but they are being very conservative.
First off, almost all of the complaints would have been valid years ago. Secondly, there is so much more choice out there today if mongodb wasn't the right answer for your project, and so many NoSQL stores have had time to mature and get polished APIs and docs.
We use various data stores for different purpose across microservices, mostly ES, couchbase, and datomic, and "use the right tool for the job" and "do one thing and do it well" feels like the right approach to take. For most applications, a SQL DB feels like a really big hammer that is put to a lot of things that don't look like nails.
Nothing is wrong with NoSQL, used correctly and for the right purpose, it is AMAZING.
Most applications believe it or not are business modelling problems, which are overwhelmingly relational. SQL was invented to solve these, so no surprise it is actually the best tool for the job by far.
Absolutely. However, database is a sort of an extreme example. A lot in the modern software (especially Web) relies on the database, and often migrating to completely different one (because requirements change and it might not be the right tool anymore) is a huge task. So you want to use something flexible enough.
Also, you want to hire people, people leave the jobs, people change teams, etc. If you use some exotic, less common DB, it adds a lot of overhead. And if you apply the "right tool" to an extreme and have a few completely different DBs flying around, your maintenance cost increases a lot.
See, SQL might not be a perfect, most elegant choice, but most often it is just good enough. A lot of people have used it, a lot of people have scaled it. If you run into an issue, often enough,other people did too and blogged about it, etc. Hiring / getting help will be much easier than $insertNoSQLDBName.
And, let's be realistic, relatively few companies have hundreds of gigabytes or terabytes of data that typical relational DBs can't handle.
My rule of thumb is that if you're in doubt, use SQL/relational store (I realize that they are different things but often used as synonyms and mean MySQL/PostgreSQL/etc).
(This article certainly seems to be appealing to this use case, c.f. "counting arguments really carefully".)
A lot of recent "innovation" is mislabeled laziness.
Getting storage right is very hard. Either it's too low-level, and it's hard for applications to coordinate complex operations without corrupting data; or you end up putting a lot of features in and end up with a SQL dbms; or everything does its own storage and you have a mess.
But the lack of transactions over multiple documents (in the same shard at least) and the lack of joins over multiple collections are a big showstopper for the kind of applications I develop.
I note that solutions like YouTube's Vitess provide something similar to MongoDB's replica sets.
I also note that PostgreSQL's logical decoding provide the same functionality than MongoDB's oplog tailing.
Oh crowning irony of ironies, SQL literally means "Structured Query Language". :)
I will be happy if i got such simple tasks :)
If you want to write a truly scalable application you structure everything such that you do joins in your application layer.
http://highscalability.com/ebay-architecture
And in the case of MongoDB you avoid joins since it is a document database. You embed data instead.
cur.execute("INSERT INTO a (b,c) VALUES (%(a)s, %(b)s);",
{ 'a' : a, 'b' : b })
Also, SQL is typed, so even if you did fail to count arguments there is a good chance you'd just detect it the first time you ran it.The article acts as if treating the DB like native structures is somehow innovative and new - it's not. https://en.wikipedia.org/wiki/Object_database
We mostly abandoned object databases because they sucked. SQL was a huge improvement over them. SQL is a great way to organize and preserve the integrity of a lot of business data.
It's also a fantastic way to avoid repeated trips to the DB:
SELECT * FROM employees AS e
WHERE e.department_id = (SELECT id FROM departments WHERE name = "engineering");
In Mongo, I'm pretty sure you need to first lookup engineering, then lookup the employees in engineering. That could be O(# employees in engineering) queries rather than 1.Or, you could denormalize, and give yourself all sorts of future headaches maintaining data integrity.
The problem with that summary boils down to bad architecture. The point of document storage is storage with purpose; the intent being to make querying EASIER. This could easily be structured to be a single query. You can structure a document countless ways to represent that query, all of them would likely be different based on the purpose of the app.
Right now I'm building a data store and I don't know the app(s) that's are going to be built on it.
It would be really great if computing could stop forgetting it's history. Object databases failed for a reason.
Coming from postgresql land I would have never thought you can have such great replication with automatic failover. I've had literally 100% uptime for the past year.
And that's on commodity servers (one of them being in a room in my apartment, the other two in a proper datacenter) going through the usual upgrades, downtime, reboots, going from mongo 2 to mongo 3 and such.
Speaking of which, the migration from mongo2 to mongo3 was another pleasant surprise: they've made it backwards compatible. So I could do the upgrade on the servers, one by one, checking everything was ok and after that I could focus on updating the drivers and rewriting the deprecated queries, no need to have everything ready at once.
The accessible oplog was another gem that fit my project really well. Gone was the need to poll the database, I could just "watch" the oplog. That, coupled with long polling on the browser side meant I'd have very little chatter between the db/server/web client when idle. Websockets would have been nice, but adoption wasn't high enough that I'd be comfortable going forward with it.
And all this considering MongoDB was my first NoSQL experience.
I agree it doesn't fit every project, but when it does, it's a really nice experience.
But I don't understand that part:
> The accessible oplog was another gem that fit my project really well. Gone was the need to poll the database, I could just "watch" the oplog.
Coming from PostgreSQL, you could do the same using LISTEN/NOTIFY?
I have to admit I was not aware of this feature.
However, from the docs[1]:
> Commonly, the channel name is the same as the name of some table in the database, and the notify event essentially means, "I changed this table, take a look at it to see what's new".
From what I understand, you just know that something has changed, the actual change is not included in the event, so you need at least another query to see what changed.
Did I understand correctly?
In MongoDB you get the operation (insert, update, delete), the document and another few details right in the event.
[1] http://www.postgresql.org/docs/9.4/static/sql-notify.html
I don't know that it's story is so great. EASY, sure, but what good is easily replicating bad data?
"insert into foo (col1,col2,col3) values ($a, $b, $c)"
and it creates the safe prepared statement with ? placeholders. At compile time. Type-checked against the database to make sure your program types match your column types.Author nearly lost me here with this logic. Placing Marketing ahead of quality in something that is supposed to store a very valuable asset (data) is near insanity.
I get the mindset of "break fast", "release often", etc. in terms of customer facing features, but in something that is supposed to be a core part of your foundation, stability is if utmost importance. Otherwise nothing else works - and you lose customers, business, opportunities - because you can't look them up later.
Its not "brilliant marketing", its just marketing.
I'm no MySQL fan when things like PostgreSQL are an option, but its probably more sane than some other currently popular choices.
For a solution today, MarkLogic is a transactional distributed document database. Cross-document and cross-partition transactions have been a key tenet of the architecture from the beginning (like, 2002 beginning). Take a look at https://developer.marklogic.com/blog/how-marklogic-supports-... for details.
Full disclosure: I’m a Product Manager at MarkLogic.
There are a few equivalents for common SQL DBs (see LinkedIn's Databus for Oracle and MySQL), but in general, getting access to the write log is really hard. Even though it's sitting there!
It would be wonderful if there were some kind of established API or library that would let you parse the MySQL write log without doing hideous, fragile operations that change from version to version. Sure, change the format, but at least version and document it!
reference: https://www.digitalocean.com/community/tutorials/how-to-set-...