If there was an open source RDBMS that allowed instant/lazy schema migrations, direct map-reduce access to its data, was able to automatically partition and shard, and somehow magically had performant joins you would never see these document database come to be so popular. Document databases have no mathematical grounding analagous to the relational model; the reason is because they are a hackneyed application of the relational model that allows implemention to bleed even further into the model than current RDBMses do. For example, the fact a developer now will have to choose between "embedding" or joining documents together application side has absolutely zero different in terms of the true data model. The data model is unchanged in either scenario, it is just being materialized differently on disk for implementation issues.
Edit: To be clear, the way things should work is you should model your data as a normalized, clean model. Then, you should be able to annotate the data or provide alternative views that "embed" documents and so on to reduce cross-table joins. This is an optimization layer and the actual materialization of these views should be maintained by the database. It should be part of the query optimizer, and your joins should simply speed up once they can access the "embedded versions". The fact that you have to do this all manually and store the data yourself, and basically take on the role of the query optimizer as well, shows that we've traded C for assembly code in the db world with these document databases.
It's truly a sad state of affairs that peoples horrible experiences with whatever particular RDBMS they've used are causing them to react by using one of these document databases. Now, they've got a whole series of new problems. There is no free lunch until someone addresses some of the pain points like schema evolution in a way that doesn't sacrifice everything else.
You elude to this in the last paragraph of your comment. Having a horrible experience horizontally scaling, for example, is ample reason enough to look into document databases. Largely because the current implementations of document databases allow for ease of partitioning in a way that the current implementations of RDBMS' don't.
The author doesn't imply that document datastores are free lunches. They're simply alternatives and he goes on to say that document datastores are better for some problem domains than others.
Also, you oversimplify the difficulties in implementing a highly-performant document datastore by implying that data should be normalized and that storing the data on disk is an optimization layer. This comment comes off as being written by someone who is ignorant of how document disk storage is implemented and why it is so fast. Doing what you suggest would very likely result in a datastore that was drastically slower than the currently implemented document datastores or even key/value datastores for that matter (MongoDB, CouchDB, Cassandra, etc).
Will we someday perhaps have the end-all-be-all of RDBMS's that merges the feature-set of current RDBMS's and the performance of document/key-value datastores? Maybe, but until then, they're alternatives to each other that generally succeed in providing solutions for the problems for which they were designed:
* RDBMS - consistency, transactions, predictable schema, etc. * Document/Key-Value - horizontal scalability, raw performance, flexible schema, etc.
Different solutions for different problem domains.
I have a question about what good solution you have to solve these problems. Are you suggesting that Oracle does this well as a commercial solution? Is anybody at all working on a next-generation RDBMS that makes schema migration truly easy?
For the sharding/partitioning question, is it even possible to make a relational data scheme that stays available and consistent in the case of network partition?
The CAP theorem I think often gets rolled up into these "NoSQL" discussions erroneously. One can still model data relationally but materialize it in a way that reduces ACID compliance to survive in the face of network partitions. Arguably this is a red herring though. In practice, the "network partition" problem to me seems to be only really worth worrying about if you are Google or Amazon and are running cross-data center data storage. For the majority of the world this is not something that needs to be worried about. (See: VoltDB's choice on the matter.)
Edit: By the way, I'm not saying any of this is easy. It's hard, really hard. (I couldn't do it.) But the point here is that it's easy to get caught up and think these 'new' document databases are solving these problems. They're not. They're putting you back down to the assembly code level and forcing you to do these things yourself. The whole point of the RDBMS is to provide an abstraction layer. They're removing this abstraction layer and heralding it as a step forward. It is, in a bizarre sense similar to how it's a "step forward" for you to be able to access the registers on your CPU, but it's surely a step backwards in many ways as well.
This is essentially just a rephrasing of the relational model (modulo a relatively simple translation which adds surrogate keys in order to split up >2-ary relations which aren't normalisable any further, into binary relations)
They key point which I think people need to realise is that document databases / key-value stores are essentially lower-level tools than relational databases.
Don't expect a free lunch from choosing lower-level tools. Choose them when you need the extra flexibility / performance / scalability, but be prepared for an increased risk of shooting yourself in the foot, and prepared to implement lots of lower-level pieces of the puzzle, which a high-level tool will handle correctly (if not as speedily) for you.
Rather like choosing C over (say) Python. Although I wouldn't push the analogy tooo far.
It is possible to do banking this way (CouchDB is probably better for banking than relational systems). Here's an example: