undefined | Better HN

0 pointsivraatiems3y ago0 comments

We are absolutely talking about billions of records.

But when you say "return excess data to the application" I'm not sure what you mean. Not doing lots of complex JOINs doesn't mean not filtering the queries at the DB at all. Nobody is pulling back a billion records at a time.

Here's an example of what I'm talking about - read the first comment on https://www.brentozar.com/archive/2015/05/do-foreign-keys-ma..., another venue for this same debate.

0 comments

2 comments · 1 top-level

BeefWellington3y ago· 1 in thread

> We are absolutely talking about billions of records.

> But when you say "return excess data to the application" I'm not sure what you mean. Not doing lots of complex JOINs doesn't mean not filtering the queries at the DB at all. Nobody is pulling back a billion records at a time.

The question then becomes "Are you gaining anything by not using foreign keys on the database?" What is the additional speed impact of JOINs actually costing you? How denormalized is your database already, if JOINs are costly?

If you want raw speed at billions-of-records scale, you want as flat a schema as you can get and good indexes are actually going to fit into RAM.

By that point though, you should be able to recognize your use case is not the 90% (or even 95%) case, and your specific requirements are driving doing something different. That's very different than the vague "high-volume" statement you made at first.

My experience has been in assisting clients in migrating to data warehouses and specifically during the heyday of Hive/Hadoop/Spark years ago in seeing clients mistakenly believe they were "big data" and go down the rabbit hole of trying to scale out before it was actually necessary. The problem I have with the vague notion of "high-volume" is that I saw clients with 500m records believe they fit the bill, as well as clients with as few as 20m records who thought the same. The reality is neither of them did and they wound up wasting a lot of money in pursuit of slower systems.

> Here's an example of what I'm talking about - read the first comment on [...], another venue for this same debate.

That's not an example though, that's just some vague statements about needing to profile your query and evaluate the costs for yourself, which should be rather obvious.

ivraatiemsOP3y ago

I'll cop to my situation being unusual, but that's what I said in response to other posts in this thread: so is everyone's. This isn't a topic with a blanket rule. I loaded my original post with caveats like "for me" to try to make that clear. Sorry if it wasn't.

Apart from your skepticism about whether my org's DB is as efficient as could be, I don't think we actually disagree, unless your argument is that people in my situation are somehow obligated to make a specific set of choices rather than what works for them.

j / k navigate · click thread line to collapse