undefined | Better HN

0 pointscogman103y ago0 comments

We are currently pushing close to 80tb mssql on prem instances.

The biggest issue we have with these giant dbs is they require pretty massive amounts of RAM. That's currently our main bottle neck.

But I agree. While our design is pretty bad in a few ways, the amount of data that we are able to serve from these big DBs is impressive. We have something like 6 dedicated servers for a company with something like 300 apps. A hand full of them hit dedicated dbs.

Were I to redesign the system, I'd have more tiny dedicated dbs per app to avoid a lot of the noisy neighbor/scaling problems we've had. But at the same time, It's impressive how far this design has gotten us and appears to have a lot more legs on it.

0 comments

Nathanba3y ago

Can I ask you how large tables can generally get before querying becomes slower? I just can't intuitively wrap my head around how tables can grow from 10gb to 100gb and why this wouldnt worsen query performance by x10. Surely you do table partitions or cycle data out into archive tables to keep up the query performance of the more recent table data, correct?

cogman10OP3y ago

> I just can't intuitively wrap my head around how tables can grow from 10gb to 100gb and why this wouldnt worsen query performance by x10

Sql server data is stored as a BTree structure. So a 10 -> 100gb growth ends up being roughly a 1/2 query performance slowdown (since it grows by a factor of log n) assuming good indexes are in place.

Filtered indexes can work pretty well for improving query performance. But ultimately we do have some tables which are either archived if we can or partitioned if we can't. SQL Server native partitioning is rough if the query patterns are all over the board.

The other thing that has helped is we've done a bit of application data shuffling. Moving heavy hitters onto new database servers that aren't as highly utilized.

We are currently in the process of getting read only replicas (always on) setup and configured in our applications. That will allow for a lot more load distribution.

AtlasBarfed3y ago

The issue with b-tree scaling isn't really the lookup performance issues, it is the index update time issues, which is why log structured merge trees were created.

EVENTUALLY, yes even read query performance also would degrade, but typically the insert / update load on a typical index is the first limiter.

1 more reply

jlokier3y ago

A lot depends on the type of queries. You could have tables the size of the entire internet and every disk drive ever made, and they'd still be reasonably fast for queries that just look up a single value by an indexed key.

The trick is to have the right indexes (which includes the interior structures of the storage data structure) so that queries jump quickly to the relevant data and ignore the rest. Like opening a book at the right page because the page number is known. Sometimes a close guess is good enough.

In addition, small indexes and tree interior nodes should stay hot in RAM between queries.

When the indexes are too large to fit in RAM, those get queried from storage as well, and at a low level it's analogous to the system finding the right page in an "index book", using an "index index" to get that page number. As many levels deep as you need. The number of levels is generally small.

For example, the following is something I worked on recently. It's a custom database (written by me) not Postgres, so the performance is higher but the table scaling principles are similar. The thing has 200GB tables at the moment, and when it's warmed up, querying a single value takes just one 4k read from disk, a single large sector, because the tree index fits comfortably in RAM.

It runs at approximately 1.1 million random-access queries/second from a single SSD on my machine, which is just a $110/month x86 server. The CPU has to work quite hard to keep up because the data is compressed, albeit with special query-friendly compression.

If there was very little RAM so nothing could be kept in it, the speed would drop by a factor of about 5, to 0.2 million queries/second. That shows you don't need a lot of RAM, it just helps.

Keeping the RAM and increasing table size to roughly 10TB the speed would drop by half to 0.5 million queries/second. In principle, with the same storage algorithms a table size of roughly 1000TB (1PB) would drop it to 0.3 million queries/second, and roughly 50,000TB (50PB) would drop it to 0.2 million. (But of course those sizes won't fit on a single SSD. A real system of that size would have more parallel components, and could have higher query performance.) You can grow to very large tables without much slowdown.

truetraveller3y ago

Thanks for the great info. What is this custom database? Why did you reinvent the wheel? Is it as durable as Postgres? Any links?

1 more reply

petergeoghegan3y ago

The important point is that many (though not all) queries are executed by looking things up in indexes, as opposed to searching through all of the data in the table. The internal pages of a B-Tree index are typically a fraction of 1% of the total size of the index. And so you really only need to store a tiny fraction of all of the data in memory to be able to do no more than 1 I/O per point lookup, no matter what. Your table may grow, but the amount of pages that you need to go through to do a point lookup is essentially fixed.

This is a bit of a simplification, but probably less than you'd think. It's definitely true in spirit - the assumptions that I'm making are pretty reasonable. Lots of people don't quite get their head around all this at first, but it's easier to understand with experience. It doesn't help that most pictures of B-Tree indexes are very misleading. It's closer to a bush than to a tree, really.

WJW3y ago

At my old workplace we had a few multi-TB tables with several billion rows in a vanilla RDS MySql 5.7 instance (although it was obviously a sizable instance type), simple single-row SELECT queries on an indexed column (ie SELECT * FROM table WHERE external_id = 123;) would be low single-digit milliseconds.

Proper indexing is key of course, and metrics to find bottlenecks.

cmckn3y ago

Well, any hot table should be indexed (with regards to your access patterns) and, thankfully, the data structures used to implement tables and indexes don't behave linearly :)

Of course, if your application rarely makes use of older rows, it could still make sense to offload them to some kind of colder, cheaper storage.

noselasd3y ago

Think of finding a record amongst many as e.g. a binary search. It doesn't take 10 times as many tries to find a thing(row/record) amongst 100 as it does amonst 1000.

j / k navigate · click thread line to collapse

0 comments

Nathanba3y ago

cogman10OP3y ago

> I just can't intuitively wrap my head around how tables can grow from 10gb to 100gb and why this wouldnt worsen query performance by x10

The other thing that has helped is we've done a bit of application data shuffling. Moving heavy hitters onto new database servers that aren't as highly utilized.

We are currently in the process of getting read only replicas (always on) setup and configured in our applications. That will allow for a lot more load distribution.

AtlasBarfed3y ago

The issue with b-tree scaling isn't really the lookup performance issues, it is the index update time issues, which is why log structured merge trees were created.

EVENTUALLY, yes even read query performance also would degrade, but typically the insert / update load on a typical index is the first limiter.

1 more reply

jlokier3y ago

In addition, small indexes and tree interior nodes should stay hot in RAM between queries.

If there was very little RAM so nothing could be kept in it, the speed would drop by a factor of about 5, to 0.2 million queries/second. That shows you don't need a lot of RAM, it just helps.

truetraveller3y ago

Thanks for the great info. What is this custom database? Why did you reinvent the wheel? Is it as durable as Postgres? Any links?

1 more reply

petergeoghegan3y ago

WJW3y ago

Proper indexing is key of course, and metrics to find bottlenecks.

cmckn3y ago

Well, any hot table should be indexed (with regards to your access patterns) and, thankfully, the data structures used to implement tables and indexes don't behave linearly :)

Of course, if your application rarely makes use of older rows, it could still make sense to offload them to some kind of colder, cheaper storage.

noselasd3y ago

Think of finding a record amongst many as e.g. a binary search. It doesn't take 10 times as many tries to find a thing(row/record) amongst 100 as it does amonst 1000.

j / k navigate · click thread line to collapse