FWIW MS SQL Server has this problem with snapshot isolation. An inserted or updated row lives in TempDB until the DB finds time to push the rows back to the cluster (I can't recall exactly when), so the more insert/update heavy your workload is the more tempdb space you'll need. This also makes selects slower since it has to check the row map for ones that are current in tempdb.
Could work well for infrequently changing tables though.
From the article:
"SQL means different things to different people. Depending on the context, it could mean transactional workloads, short reads/writes"
Really?
I thought SQL was just the query language on top of a database. All the transactional safety has nothing to do with SQL but more of a database technology topic like is the database ACID compliant.
TLDR but from the first few lines of the article, it seems like the author is confusing SQL or expecting it to do something it was never intended.
http://www.postgresql.org/docs/9.4/static/sql-cluster.html
Although it's somewhat soft in that tables are only clustered on demand, and not kept in rigorously clustered order after that.
Secondary-indexes contain the 'primary-key' and not the 'heap-position' of the row.
How mysql-innodb, mssql, tokudb have them.
EDIT:
Found it at https://raw.githubusercontent.com/fiksu/partitioned/master/P... in case anyone find interesting.
I believe the holy-grail of sharding is range-based (bigtable, hbase, hypertable etc).
You mean like Oracle has supported since version 8 or so?
Trying to break down the mumbo jumbo in database product releases is a nightmare - it took me a couple of days with SAP HANA, which I think boiled down to in-memory with row and column selection per-table.
It feels like nobody is brave enough to offer an apples-to-apples comparison in their press releases.
I think you are trying to ask if the sharding example in this post is similar to "the primary key hashmap thing that Teradata has*. I think the answer is yes: teradata does hash-based sharding, and this is how pg_shard implements the same outcome.
I think sharding at the app level is more efficient. I has helped facebook grow to where it is without worrying about whether MySQL can handle trillions and quadrillions of rows. The main constraint there is in the network topology in holdng MxN connections between M webservers and N database servers.