Using standard Postgres with sharding for OLTP workloads is great but there are better options for OLAP, especially if you’re using managed services. Also there is the Citus cloud offering if you want to stay with Postgres.
This is just patitioning a log on time so you can query the most recent and delete the old stuff.
I doesn't even really seem to me you necessarily want to partition on time since your load distribution is going to be terrible.
Edit: too add a little. There is a thing called a temporal database that is a little more general in usage i feel in that it is more about facts at specific points in time (such as your address last year) that i think this is more about.
There is even a bitemporal database that has two time dimensions (what do we think your last year's address is right now and what did we think your last year's address was yesterday - and in those you don't ever delete data that is wrong, you just update your belief about that point in time) and they are really interesting to work with. Those would seem much more similar to this.
I think their definition of time-series database fits the common usage I've seen everywhere: the data has a time dimension and is append-only/immutable (well, ok, you can mutate the data in a postgresql table, but nobody's forcing you to).
Given the choice between selecting a specialized time-series only database or using a time-series pattern in your existing postgresql database, postgresql is often (usually?) the more pragmatic choice. That's what we do at mixrank with time-series tables approaching the 100 billions of rows.
I think that's not accurate. The tables mentioned in the post are first sharded/distributed on `repo_id`. Later, each shard is also partitioned on time dimension (i.e., `created_at `). Thus, the load should be distributed proportionally with the activity for each `repo_id`.
Postgres 10 partitioning does have a few limitations and inefficiencies that will be resolved in Postgres 11. Partitioning is the most actively developed area of postgres.
Note that Citus shards across multiple nodes, and can then partition on disk using native partitioning, which is automated by pg_partman. TimescaleDB so far only works on a single node.
Citus can also run parallel, distributed SQL queries, perform distributed transactions, and build rollups tables in parallel, and is used in Postgres clusters with up to a petabyte of data.
Timescale seems to me more about the analytics side of things, focusing on ingestion speed and aggregation. Correct me if I'm wrong somebody.