People tend to have a very bad sense of what constitutes large scale. It usually maps to "larger than the largest thing I've personally seen". So they hear "Use X instead of Y when operating at scale", and all of a sudden we have people implementing distributed datastore for a few MB of data.
Having gone downward in scale over the last few years of my career it has been eye opening how many people tell me X won't work due to "our scale", and I point out I have already used X in prior jobs for scale that's much larger than what we have.
Sometimes making 10+ million dollar decisions off that gut feel with literally zero data on what is actually going on.
It rarely works out well, but hey, have to leave that opening for competition somehow I guess?
And I'm not talking about 'why didn't they spend 6 months optimizing that one call which would save them $50 type stuff'. I mean literally zero idea what is going on, what actual performance issues are, etc.
[1] https://www.amazon.com/SanDisk-Extreme-microSDXC-Memory-Adap...
It seems like the last 4-5 years though have really made it super common again. Bubble maybe?
Huge horde of newbs?
Maybe I’m getting crustier.
I remember it was SUPER bad before the dot-com crash, all the fake it ‘til you make it too. I even had someone claim 10 years of Java experience who couldn’t write out a basic class on a whiteboard at all, and tons of folks starting that literally couldn’t write a hello world in the language they claimed experience in, and this was before decent GUI IDEs.
Cloud providers have successfully redefined the baseline performance of a server in the minds of a lot of developers. Many people don't understand just how powerful (and at the same time cheap) a single physical machine can be when all they've used is shitty overpriced AWS instances, so no wonder they have no confidence in putting a standard RDBMS in there when anything above 4GB of RAM will cost you an arm and a leg, therefore they're looking for "magic" workarounds, which the business often accepts - it's easier to get them to pay lots of $$$$ for running a "web-scale" DB than paying the same amount for a Postgres instance, or God forbid, actually opting for a bare-metal server outside of the cloud.
In my career I've seen significant amount of time & effort being wasted on workarounds such as deferring very trivial tasks onto queues or building an insanely-distributed system where the proper solution would've been to throw more hardware at it (even expensive AWS instances would've been cost-effective if you count the amount of developer time spent working around the problem).
I feel like sometimes the pendulum has swung too far the other way, where people deny that there ARE people dealing with actual scale problems.
I think it is the same frustration I get when I call my ISP for tech support and they tell me to reboot my computer. I realize that they are giving advice for the average person, but it sucks having to sit through it.
I feel similar frustrations with commenters saying I am doing it wrong by not moving everything to the cloud… I work for a CDN, we would go out of business pretty quickly if we moved everything to the cloud. Oh well.
Now when dealing with someone convinced that their single TB of data is google scale, the harder issue is changing that belief. But at least you know where they stand.
One gotcha here is not all PBs are equal. My field also is a case where multi-PB datastores are common. However for the most part those data sit at rest in S3 or similar. They'll occasionally be pulled in chunks of a couple TB at most. But when you talk to people they'll flash their "do you know how big our storage budget is?" badge at the drop of a hat. It gets used to explain all sorts of compute patterns. Meanwhile, all they need is a large S3 footprint and a machine with a reasonable amount of RAM.