>What you might save in not paying AWS's profit margin you will probably spend in not being able to be as efficient as they are.
This isn't how I've seen the numbers work out for the huge chunk of workloads that require mostly static instances (a.k.a haven't been modernized into a serverless code base). You are right about Amazon having an efficiency edge, but you are wrong about that benefit being to the customer's bottom line instead of theirs.
We are nowhere near the real commoditized pricing of massive scale compute. Even with the inefficiency of smaller datacenters, you can easily best AWS prices.
Where did you get the impression that you have to move all of the data into the cloud for every bursted request? That's a lazy strawman architecture to attack.
"Paying AWS's profit margin" assumes you can get access to their cost structure. You can't. In order to get AWS's cost structure, you need to (a) be buying servers by the truckload to get volume discounts, (b) have a scaled labor force for physically moving, racking, and installing all of this that's insured, directed, and managed to high rates of utilization (want to pay devs to rack servers?), (c) hire expensive network engineers whose cost is fully amortized across AWS's massive installed base, (d) fully amortize all the software engineering required to control all of this, etc.
So the more realistic option for a typical company is, do I (a) try to do this myself, with the time delay, risk, and cost profile of a nonspecialist provider, or (b) pay Amazon, which will cost about the same as (a), but be better in every other way, EVEN THOUGH amazon's superior cost structure lets them make a decent profit off of that decision?
(b) is clearly the right choice. It costs you nothing more, Amazon gets to make profit, and everyone is better off. Point being, you are forced to go with Amazon because they have structural advantages you don't, which gives them access to a better cost structure that you can't replicate.
At least for us, AWS is twice as expensive as dedicated.
I've worked on 40-rack build outs using supermicro without any special pricing that beat the cost of AWS for an equivalent number of reserved instances.
What are you doing with 40 racks of hardware? Serious question. That's A LOT of computers! I assume you mean 40 42 racks? That's like, 800-1000 computers depending on whether you do 1 vs. 2u machines, or use blades, how dense the switching is, how full the racks are, etc.
I was in charge of tech ops for a billion-device scale analytics company and we ran 100-200 VM instances on EC2. I can't imagine needing hundreds of bare-metal instances. I even lived with the guys running Firebase. Before they got acquired by Google, I think they had, maybe a dozen or two bare-metal instances at Softlayer.
What about the salary of the person who maintains all of that? The cost of spare parts? The cost of downtime when hardware breaks?
> This isn't how I've seen the numbers work out for the huge chunk of workloads that require mostly static instances.
Any time someone says this I have to question if they really looked at the "all in" number. Did you include the salary of the person in purchasing who orders the servers? Did you include the lost engineering time dealing with dead servers (instead of just shutting them off)? Did you include the cost of spare hardware sitting around for emergencies? Did you include the cost of downtime due to broken hardware while waiting for it to be repaired or replaced?
There are so many other costs to running your own datacenter besides the servers and the space, which Amazon gets to amortize over all their customers, but you have to bear 100% on your own.
Yes, but those costs may be low (or zero) for you, but Amazon has to architect at a level much higher than that. For example, I have researchers with data that has zero backup/DR requirements. This is 10s of TB of data, but if they lost it all due to a fire or a catastrophic system crash, they would just shrug, order a new storage array from the insurance money, and request new copies of the data from the research labs at other institutions that also have it. Amazon doesn't offer any storage products at that reliability level, and the ones that are even close have significant data access latencies or file transfer costs to run analysis over a significant chunk of the data.
So, they buy a basic NAS, stuff if with 12T drives, and pay $0.19/Gig for it. That's one time, not monthly, and at only 50% utilization. Assuming S3 Reduced Redundancy is $0.02/Gig/mo (it's actually a little more, but we're being generous), they start saving money in month 10 not counting the data transfer or compute costs associated with processing that data either locally or in the cloud.
Yes we did. > Did you include the salary of the person in purchasing who orders the servers?
That's a one time cost, and very low.
> Did you include the lost engineering time dealing with dead servers (instead of just shutting them off)?
Of course- but AWS has higher engineering costs to make it work, so self-hosting comes out in the end.
> Did you include the cost of spare hardware sitting around for emergencies?
You don't really need spare hardware. You have redundancy and get same day service from your vendors. And hardware nowadays rarely fails.
> Did you include the cost of downtime due to broken hardware while waiting for it to be repaired or replaced?
That would be zero, because a properly designed system has no downtime.
> There are so many other costs to running your own datacenter besides the servers and the space, which Amazon gets to amortize over all their customers, but you have to bear 100% on your own.
Sure, but there are so many other costs to running on Amazon that aren't there with self hosted.
Maintaining 40 racks of hardware takes a surprisingly trivial amount of time aside from regular OS management.
The only products that actually eliminate these folks are things like Lambda + DB as a service.
I think it's only surprising to people who have been listening to the "it's awful" mythology or believe in a much higher than reality hardware failure rate.
I think the worst I've ever seen was with spinning disks from the time of the Thailand flooding, and even those were only 10-15% AFR and only for certain models.
Absent that kind of black swan event, one can easily engineer around even disk failures, such that failed ones can just be spun down and left in place, and, of course, one can do the same with server-level redundancy.
If the labor burden were actually onerous, it would be easy enough to avoid it, but I think it's telling that such techniques are so rarely spoken of.