Amazon Prime Day event resulted in an incremental 163 petabytes of EBS storage capacity allocated – generating a peak of 15.35 trillion requests and 764 petabytes of data transfer per day.
The main thing that strikes me is how (seemingly) inefficient everything is. What do they possibly need this amount of data for in selling stuff? Are they taking high-def video of every customer as they browse for something to buy? I get that it's a huge company and this is (I guess) their business time, but how can the y need so much storage. Ditto for much of the other stuff.As we hit these levels we asked them - how many trades are we even doing on this system? The answer was something on the order of.. 50. Granted it was a bond system and the nationals are huge, but theres just no reason to store 20GB per trade.
These are the kinds of decisions that get made when one team is responsible for message generation and the other is responsible for the storage, lol.
We then had to work backwards with them to unwind a lot of the INFO level chatty messaging between what you'd now call "microservices" and reduce the volume by 90+%.
After looking at screen after screen of no-name garbage on Prime Day, I gave up. I suspect that there are tons of people like me. In other words, we only contributed to the numerator, not the denominator.
Also, it specifically says "incremental capacity allocated", not necessarily used. Keep in mind that every EC2 instance launched also means new EBS storage is allocated. The article also estimates that 50 million EC2 instances were used for Prime Day. If you assume that half of these were newly created to support the surge of Prime Day, 25 million instances using up 160 PB of storage is only 6 gigabytes per instance, which definitely seems in the realm of possibility.
Although, that does bring up the question of why AWS doesn't have a way to share a single read-only volume across multiple ec2 instances in the same availability zone. In many workloads there isn't any need to write to disk.
Well it's not amazing if your margin's are tiny, as they are in many industries (such as retail). Plus this was almost certainly architected by some of the foremost AWS experts in the world. It's verrrry easy to spend vastly more than was strictly necessary in AWS.
I don't mean to be too negative though, it was a really interesting article. Pretty wild to think about spending $100m on infrastructure over two days and still making a bunch of profit.
The listing data is almost static and should almost fit in RAM (the hot set probably does. Apparently Amazon has ~350M listings. A 24TB RAM server could give ~68kB/listing, and probably only a small fraction is hot). Since you'll need multiple servers anyway, you could shard on products and definitely fit things in RAM. 375 million sales even if condensed into 1 hour would only be 104k/second. A single db server should be able to handle the cart/checkout. Assuming ~10M page views/second, a couple racks of servers should be able to handle it.
The ad/tracking infrastructure surely can't account for the 1000x disparity in resource usage.
Be it Prime Day or Black Friday/Cyber Monday sales, I've seen the prices before the sale starts, and then once the sales start, it is the same price but with a slashed out higher MSRP type price. It's not any more of a sale during the sale than it was any of the other days.
Of course Amazon is paying itself that premium so they have little incentive to care.
You can absolutely spend an arm and a leg making a system work using a RDBMS that would be simpler and cheaper using a NoSQL store. The opposite is also true.
When picking a database you should always consider the trade offs of the different technologies and weigh those against your goals and budgets.
Sometimes is okay to spend more for a system that is just simpler to manage and use. Sometimes it’s not.
I imagine there would be a ton of Lambda and the like in there too.
RIs for their RDS instances. Saving Plan for their EC2s.
1 or 3 year commit, no upfront vs all upfront, etc.
A customer at the size of Amazon using AWS would have private pricing arrangement and an EDP.
When they realized their entire attack was just a fraction of what Amazon handled during the Holiday shopping season, they realized the futility and called it off.
Is Amazon’s general architecture for their retail site publicly described anywhere?
Do they do this? I have asked some friends who are developers at AWS and both told me that they don't worry or even know what their usage costs. But that's just anecdote; perhaps their boss knows.
I would ABSOLUTELY say that, at a minimum, every director or principal engineer needs to be familiar with costs and _should_ understand their P&L. Senior engineers and line managers probably/should have a passing familiarity or consideration. Individual random SDEs may not as its not their primary business function or deliverable and someone else is ultimately responsible.
Disclosure: Principal at AWS, opinions are my own.
Hmmmm….
With AWS still growing they are constantly having to add hardware. Ahead of Prime Day, I presume they just bring forward new resources that their model otherwise says aren't needed for a few months.
Unavailability for other customers indicates either AWS growth has plateaued, they have hit the limit of throughput of how much hardware they can provision, or they just did their sums wrong.
That said, why would capacity issues be behind NDA? Anyone can grab their API and attempt to allocate a VM (or 100k)
You can't spin up 100k instances on a virgin account, but it's an interesting idea!