SQL, Scaling, and What's Unique About Postgres (opens in new tab)

(citusdata.com)

253 pointsspathak11y ago23 comments

23 comments

16 comments · 4 top-level

ddorian4311y ago· 7 in thread

What would be good is an api for the storage engine so it can support clustered-table (the whole row is together with the primary-key). This will probably make it better for compression since things similar are sorted-together and probably make range-queries faster since sequential-disk-access (fractal-indexes, lsm-trees etc).

sitharus11y ago

That doesn't sit well with Postgres' MVCC system, since between updates and VACUUMs you'd either need a secondary heap or the heap would need row-adjacent space reserved for the updates.

FWIW MS SQL Server has this problem with snapshot isolation. An inserted or updated row lives in TempDB until the DB finds time to push the rows back to the cluster (I can't recall exactly when), so the more insert/update heavy your workload is the more tempdb space you'll need. This also makes selects slower since it has to check the row map for ones that are current in tempdb.

Could work well for infrequently changing tables though.

doug100111y ago

but if the snapshot is read-only? can the master rely on this config to avoid having to check the row map?

emcrazyone11y ago

@ddorian43 I agree but I'm also wondering if perhaps I missed something?

From the article:

"SQL means different things to different people. Depending on the context, it could mean transactional workloads, short reads/writes"

Really?

I thought SQL was just the query language on top of a database. All the transactional safety has nothing to do with SQL but more of a database technology topic like is the database ACID compliant.

TLDR but from the first few lines of the article, it seems like the author is confusing SQL or expecting it to do something it was never intended.

zaphar11y ago

The author is using SQL in the colloquial sense where it stands in for ACID compliant Relational Database. He's not confused.

1 more reply

twic11y ago

I was under the impression that the primary key was stored as part of the row. Or do you mean the primary key index? If so, then there is already clustering:

http://www.postgresql.org/docs/9.4/static/sql-cluster.html

Although it's somewhat soft in that tables are only clustered on demand, and not kept in rigorously clustered order after that.

ddorian4311y ago

I mean the full row is stored in the primary-index.

Secondary-indexes contain the 'primary-key' and not the 'heap-position' of the row.

How mysql-innodb, mssql, tokudb have them.

1 more reply

duaneb11y ago

This is the approach by the distributed SQL engine F1. It uses a hierarchical table approach where root rows are guaranteed to be on the same shard.

hcarvalhoalves11y ago· 3 in thread

I've read a recipe that used Postgres table inheritance to achieve sharding in a way it's mostly transparent to the client. The blog post read like a conversation between two people. Unfortunately I'm not finding the link anymore.

EDIT:

Found it at https://raw.githubusercontent.com/fiksu/partitioned/master/P... in case anyone find interesting.

ddorian4311y ago

How do you decide to put as sharding-key? Something like 'user_id'. What if a user becomes too big? Something like id=uuid. Then you lose the semi-free user_id index.

I believe the holy-grail of sharding is range-based (bigtable, hbase, hypertable etc).

needusername11y ago

> I believe the holy-grail of sharding is range-based

You mean like Oracle has supported since version 8 or so?

1 more reply

Brian-Puccio11y ago

That's super neat, thanks for posting.

slagfart11y ago· 2 in thread

Is this the same as the primary key hashmap thing that Teradata has been doing for many moons? Seems so. If not, why?

Trying to break down the mumbo jumbo in database product releases is a nightmare - it took me a couple of days with SAP HANA, which I think boiled down to in-memory with row and column selection per-table.

It feels like nobody is brave enough to offer an apples-to-apples comparison in their press releases.

samstave11y ago

Can you give a quick brain dump on hana? good bad and ugly.

nl11y ago

AFAIK Teradata doesn't allow you to intercept SQL after parsing like Postgres does.

I think you are trying to ask if the sharding example in this post is similar to "the primary key hashmap thing that Teradata has*. I think the answer is yes: teradata does hash-based sharding, and this is how pg_shard implements the same outcome.

EGreg11y ago

I don't understand why the indexes aren't just as fast. Is it basically because smaller indexes are loaded into memory and the loader can reason about what it needs better? Because obviously the prefix used to load the indez can also be IN the actual searching of the index.

I think sharding at the app level is more efficient. I has helped facebook grow to where it is without worrying about whether MySQL can handle trillions and quadrillions of rows. The main constraint there is in the network topology in holdng MxN connections between M webservers and N database servers.

j / k navigate · click thread line to collapse

23 comments

16 comments · 4 top-level

ddorian4311y ago· 7 in thread

sitharus11y ago

That doesn't sit well with Postgres' MVCC system, since between updates and VACUUMs you'd either need a secondary heap or the heap would need row-adjacent space reserved for the updates.

Could work well for infrequently changing tables though.

doug100111y ago

but if the snapshot is read-only? can the master rely on this config to avoid having to check the row map?

emcrazyone11y ago

@ddorian43 I agree but I'm also wondering if perhaps I missed something?

From the article:

"SQL means different things to different people. Depending on the context, it could mean transactional workloads, short reads/writes"

Really?

I thought SQL was just the query language on top of a database. All the transactional safety has nothing to do with SQL but more of a database technology topic like is the database ACID compliant.

TLDR but from the first few lines of the article, it seems like the author is confusing SQL or expecting it to do something it was never intended.

zaphar11y ago

The author is using SQL in the colloquial sense where it stands in for ACID compliant Relational Database. He's not confused.

1 more reply

twic11y ago

I was under the impression that the primary key was stored as part of the row. Or do you mean the primary key index? If so, then there is already clustering:

http://www.postgresql.org/docs/9.4/static/sql-cluster.html

Although it's somewhat soft in that tables are only clustered on demand, and not kept in rigorously clustered order after that.

ddorian4311y ago

I mean the full row is stored in the primary-index.

Secondary-indexes contain the 'primary-key' and not the 'heap-position' of the row.

How mysql-innodb, mssql, tokudb have them.

1 more reply

duaneb11y ago

This is the approach by the distributed SQL engine F1. It uses a hierarchical table approach where root rows are guaranteed to be on the same shard.

hcarvalhoalves11y ago· 3 in thread

EDIT:

Found it at https://raw.githubusercontent.com/fiksu/partitioned/master/P... in case anyone find interesting.

ddorian4311y ago

How do you decide to put as sharding-key? Something like 'user_id'. What if a user becomes too big? Something like id=uuid. Then you lose the semi-free user_id index.

I believe the holy-grail of sharding is range-based (bigtable, hbase, hypertable etc).

needusername11y ago

> I believe the holy-grail of sharding is range-based

You mean like Oracle has supported since version 8 or so?

1 more reply

Brian-Puccio11y ago

That's super neat, thanks for posting.

slagfart11y ago· 2 in thread

Is this the same as the primary key hashmap thing that Teradata has been doing for many moons? Seems so. If not, why?

It feels like nobody is brave enough to offer an apples-to-apples comparison in their press releases.

samstave11y ago

Can you give a quick brain dump on hana? good bad and ugly.

nl11y ago

AFAIK Teradata doesn't allow you to intercept SQL after parsing like Postgres does.

EGreg11y ago

j / k navigate · click thread line to collapse