Ask HN: What are the biggest databases you've worked with?

35 pointsbarelyusable9y ago16 comments

Originally thinking about SQL databases done with PostgresSQL/MySQL, but would be interested in anything. Wondering about number of queries/transactions per second, and how you handled that scale.

16 comments

16 comments · 9 top-level

kevrone9y ago· 6 in thread

At Timehop we currently work with a single instance AWS Aurora (MySQL-ish) database with over 40TB of data (plus a read-only replica on a smaller instance). Some stats: 1.5MB/sec receive throughput, 10MB/sec transmit throughput, commit latency around 3-4ms (with very occasional spikes to 10-20ms), select & commit counts are about 300/s, and select latency hovers around 35ms (we do about a dozen unions per query though).

All in all it's the easiest relational database I've ever worked with in terms of stability, speed, and scalability. I know this sounds like an ad for Aurora, but I just really like it.

dirtyaura9y ago

I'm curious, how does the replication work if the replica instance is smaller (I assume smaller in disk space)? Is is automatically removing some of the data from the replica based on a heuristic rule?

kevrone9y ago

Smaller instances are smaller in memory/compute power only. Storage is charged separately and the implementation details are unknown to me.

kbyatnal9y ago

Bit of an aside, but why haven't you guys listened to your users yet? Based on all the negative feedback about the recent updates, what are you guys doing?

kevrone9y ago

Those decisions are above my pay grade mate ;)

WrtCdEvrydy9y ago

What's the backing DB? Did you use MariaDB, or did you end up using Postgres?

kevrone9y ago

Aurora is it's own database type. The interface is MySQL-compatible, but it's not a perfect match.

tupshin9y ago· 1 in thread

Having worked with Cassandra for many years, I have worked on:

* 1000+ node clusters

* Petabyte scale data

* 10s of millions of reads and writes per second

Given my preface, the "how" is scale out on top of Cassandra, of course. Not SQL, and hard to do if you have a highly relational model, but many stories of success at those kinds of scale.

dserban9y ago

Hey, thanks for sharing.

As a Cassandra devops / data modeler myself, I would be fascinated to read more details about your scaling challenges. Do you guys have an engineering blog I could read?

spthorn609y ago

MLB's Statcast collects 7TB/game, or 17 petabytes of raw data annually. http://fortune.com/2015/09/04/mlb-statcast-data/

thinkMOAR9y ago

Would be nice if people answering include:

- hot vs cold data ratio of the total size - read vs write data ratio - if read/writes are split - how partitioning, if used, is done - total machine(s) resources disk/ram - average (read)query response time - how machine/node failure is handled

CyberFonic9y ago

My biggest installation was an accounting system for a multi-national corporation. 30 Oracle instances running on a 256 core Sun cluster, 192 GB RAM, 40 TB EMC SAN. Typical enterprise system overkill.One of the big 6 consulting firms designed it and deployed PeopleSoft on the completed system. I was just the lowly engineer who configured the hardware, the SAN and the Oracle instances. As for Rolls Royce cars, the performance was "adequate".

abalashov9y ago

I've handled ~1 TB DBs in Postgres with about 2000 read queries/sec. Technically these were stored function invocations, so wrapped a considerably larger number of queries inside.

This didn't seem to be a problem. It was the simultaneous write operations that created real limits, banging on the disks/disk controller like that.

1 more reply

ohstopitu9y ago

I had worked with ~2 TB of data in CouchDB (a few thousand endpoint calls / sec) for my capstone project at University and I thought I had experience, but reading these comments, I realized how much less experience I really have.

avitzurel9y ago

12TB MongoDB spread across 9 shards (2 replicas per shard) 4TB MySQL with some tables ranging the 400GB size.

MongoDB handled about 13K ops/sec at peak times with around 5-8K of these being writes.

MySQL was probably around 2-3K ops/sec.

Clownshoesms9y ago

2G, but it was a Cache database, in a hospital. Probably the worst job of my life.

j / k navigate · click thread line to collapse

16 comments

16 comments · 9 top-level

kevrone9y ago· 6 in thread

All in all it's the easiest relational database I've ever worked with in terms of stability, speed, and scalability. I know this sounds like an ad for Aurora, but I just really like it.

dirtyaura9y ago

kevrone9y ago

Smaller instances are smaller in memory/compute power only. Storage is charged separately and the implementation details are unknown to me.

kbyatnal9y ago

Bit of an aside, but why haven't you guys listened to your users yet? Based on all the negative feedback about the recent updates, what are you guys doing?

kevrone9y ago

Those decisions are above my pay grade mate ;)

WrtCdEvrydy9y ago

What's the backing DB? Did you use MariaDB, or did you end up using Postgres?

kevrone9y ago

Aurora is it's own database type. The interface is MySQL-compatible, but it's not a perfect match.

tupshin9y ago· 1 in thread

Having worked with Cassandra for many years, I have worked on:

* 1000+ node clusters

* Petabyte scale data

* 10s of millions of reads and writes per second

Given my preface, the "how" is scale out on top of Cassandra, of course. Not SQL, and hard to do if you have a highly relational model, but many stories of success at those kinds of scale.

dserban9y ago

Hey, thanks for sharing.

As a Cassandra devops / data modeler myself, I would be fascinated to read more details about your scaling challenges. Do you guys have an engineering blog I could read?

spthorn609y ago

MLB's Statcast collects 7TB/game, or 17 petabytes of raw data annually. http://fortune.com/2015/09/04/mlb-statcast-data/

thinkMOAR9y ago

Would be nice if people answering include:

CyberFonic9y ago

abalashov9y ago

I've handled ~1 TB DBs in Postgres with about 2000 read queries/sec. Technically these were stored function invocations, so wrapped a considerably larger number of queries inside.

This didn't seem to be a problem. It was the simultaneous write operations that created real limits, banging on the disks/disk controller like that.

1 more reply

ohstopitu9y ago

avitzurel9y ago

12TB MongoDB spread across 9 shards (2 replicas per shard) 4TB MySQL with some tables ranging the 400GB size.

MongoDB handled about 13K ops/sec at peak times with around 5-8K of these being writes.

MySQL was probably around 2-3K ops/sec.

Clownshoesms9y ago

2G, but it was a Cache database, in a hospital. Probably the worst job of my life.

j / k navigate · click thread line to collapse