Offers no math.
At retail, 625 16TB drives is $400000. This is about 2x the MONTHLY retail s3 pricing. Further, as we all know, AWS bandwidth pricing is absolutely bonkers (1).
I think your conclusion that S3 is "pretty good" needs a lot more math to support.
(1) https://twitter.com/eastdakota/status/1371252709836263425
My response was in the context of someone who didn't do any of that.
Also, my tongue-in-cheek response to you is: the price will be offset by the SRE engineers who were babysitting your AWS setup that you'll no longer need. (More seriously, I don't think finding quality sysadmins who enjoy this stuff is particularly harder than finding quality roles for any other tech positions).
But it's not about price: It's about control, and it's about the expertise you gain from running all of that. If you have 10PB of data, you should have someone in-house who knows how to work with 10PB of data at a low level, and the best way to get that is to employ people at all levels to make that work. You gain significant advantage from having the direct performance data and the expertise of having techs whose 9-5 is replacing disks.
I did and you ignored all of it -_-
* To store 10+ PB of data.
* You need 15 PB of storage (running at 66% capacity)
* You need 30 PB of raw disks (twice for redundancy).
>>> At retail, 625 16TB drives is $400000.
That's only 10 PB of disks. That's about one third of the actual need.
Please triple your number and we will start talking. That's about 2000 disks and well above a million dollar.
You can't just call a supplier and get for a thousand 16 TB disks (or even a hundred). They don't have that in stock now. The lead time might be 6 months to get a few hundreds. They might not have 16 TB disks for sale at all, the closest might be a 12 or 14TB.
Handling large amount of hardware is a logistic problem. Not a cost problem.
how much is the real estate cost of 625 drives and associated machinery to run it?
At a guess, AWS has an operating margin of about 30%, so you can approximate their cost of hardware, bandwidth, and other fixed costs as 70% of their sticker price. As a start up, can you actually get this price to be lower? I actually dont think you can, unless your operation is very small and can be done out of a home/small office.
> At a guess
The comments in this discussion that try to provide actual numbers show a fairly lopsided argument against S3. The comments that are advocating for S3 aren't as detailed.
You can look at this at the macro level, as on comment did, and see that one 1.2PB RackmountPro 4U server is $11K. Yes, of course you still need space and power. But at least this gives us actual numbers to play with as a base (e.g. buying 10 of these is less than what you'll spend on S3 in a month)
At a miro-level. You can spend $650 on a 16TB hard drive, or $650 on 16TB for 2 months of S3. Now, S3 is battle-tested, has redundancy, has power, has a cpu, has a network card (but not bandwidth), and is managed - unquestionable HUGE wins. But the hard drive (and other equipment) come with a 3-5 year warranty. Now, the difference between $650 for the hard drive, and $12000 for S3 over 3 years, won't let you: get the power, rent the racks, hire the staff, and invest in learning ceph. But the difference between $400K and $5million will.
An empty 4U server with 96+ bays looks like it will set you back ~$7k minimum. At $500 per drive (I have no idea what volume discounts are like) filling it with drives would be in the range of ~$50k. You'd still need RAM. And (as you noted) space and power.
I have no idea how the math ends up working out, but a 1PB appliance in working order is nowhere near as cheap as $11k.
protip: $11k is the cost of the empty enclosure. disks are sold separately.