There are a number of resource types that, simply by existing, will accrue costs. A lot of them, actually. On AWS that includes things like running EC2 instances, EBS volumes, RDS databases and backups, DynamoDB tables, data in S3 buckets, and more. The question is what should happen to these resources upon hitting a billing ceiling?
Should EC2 instances be terminated (which deletes all data on them), DynamoDB tables deleted, S3 data erased, RDS databases deleted? If that was the behavior, it would be an extremely dangerous feature to enable, and could lead to catastrophically bad customer experiences. This is a nonstarter for any serious user.
Conversely, if you expect those resources to continue to exist and continue operating, then that's basically expecting the cloud provider to pay your bill. The provider will then have to recoup those costs from other customers somehow, and so this option sets poor incentives and isn't fair to others. If you expect your account to remain open the following month, you'd have to settle the bill, and we're back to square one.
AWS gives people tools to tackle this problem, such as billing alerts. These can notify you over SMS, email, or programmatically when you hit an "$X this month" billing threshold, and then you can decide what to do. Since these events can be processed programmatically, it's possible to build a system that will automatically take whatever action you'd like AWS to take, such as shutting things down or deleting resources.
If you think all of this through, it's really hard to come up with an approach to billing limits that's fair and a good experience, so I think it's reasonable for cloud providers to give billing threshold alerts while leaving the choice of what to do in the hands of the customer.
Let's take a simplistic example and say you're paying per gigabyte. You decide you're willing to pay up to $X, and Amazon tells you ahead of time how much your $X will buy you, and you accept.
One type of customer will be using that storage to store priceless customer photos. Even if the customer ends up deleting the photos, it has to be your customer who makes that decision - not you, and not Amazon. You tell Amazon that you'd like an alarm at $X-$Y, but that if you hit $X, keep going, at least until you hit $X+$Z.
Another type of customer will be using it to store a cache copy (for quicker retrieval) of data backed up in a data warehouse somewhere. You tell Amazon that you'd like a policy which automatically deletes all the oldest data, to guarantee to stay under the limit.
Yet another type of customer would rather keep their old data and just return an error code to the user for stuffing too much new data into too little storage, so basically, guarantee to stay under the limit, and guarantee never to delete data.
You can't solve billing until you communicate with your customers and ask what they want.
So lets for a moment assume you talked to a large cohort of customers, and found a bunch of "types" including those three you list and many many more (inevitably, at AWS's scale).
You then need to make some business decisions about which of those "types" are most important to you, and which are way less profitable to spend time addressing.
So of course you solve the big pain points for your customers spending tens or hundreds of thousands of dollars per month before you prioritise the customers worried abou going over a tens or hundreds of dollars a month budget.
What would that solution look like? It'd have ways for customers with hundreds or thousands of services (virtual servers, databases, storage, etc) to make all their own decisions about alarms, alerts, cost ceilings - and tools to let them decide how to respond to costs, how to manage their data availability, how to manage capacity, when to shut down services or limit scaling, what can and cannot be deleted from storage. It would also 100% need to allow for practically unbounded capacity/costs for customers who need that (Think AliExpress on their "Single's Day" event where they processed $1 billion in sales in 5 minutes.) All this would need - for the $100k+/month customers - to be machine drivable and automateable, with extensive monitoring and reliable alerting mechanisms - and the ability to build as much reliability and availability into the alerting/reporting/monitoring system and the automated provisioning and deprovisioning systems as each customer needs.
And at least to a first approximation - we've just invented 70% of the AWS ecosystem.
You might think Amazon don't cater to people who want hard $5 or $70 per month upper limits on their spending. You're _mostly_ right. There are many other people playing in that space, and it's _clearly_ not a high priority for Amazon to complete for the pennies a month available in the race-to-the-bottom webhosting that people like GoDaddy sell for $12/year.
The thing to think about is - "who does Amazon consider to be 'their customers'?". I think you'll find for the accounts spending 7 figures a year with AWS - billing _is_ "solved". The rest of us are on the loss-leader path (quite literally for the "free tier" accounts) - because Amazon only need to turn a few tenths or hundredths of a percent of "little accounts" into "their customers" for it all to work out as spectacularly profitably as it is doing right now.
Except that that's what this announcement is.
Which makes me think this may be AZON's fix to runaway billing - if you don't have the resources to pay for mistakes[1], stay in the per-month kiddie pool and don't play with the heavy machinery.
[1] I started to add, "or trust yourself not to make them", but that's silly, because mistakes will happen.
When a customer's ceiling is reached, their mix of services goes into limp mode. Things slow down, degrade, maybe become unavailable, depending on each service's "freeze model". Alarms ring. SMS messages are sent to emergency phone numbers. The customer is given a description of the problem and an opportunity to solve it -- raise the cap or cut services.
So wouldn't this cost Amazon money? Sure, but that's a cost of doing business. And as others in the thread have pointed out, the actual costs to Amazon are surely much lower than the "loss" they're incurring by not unquestioningly billing the customer. Especially since Amazon often refunds large surprise bills anyway.
If this were the official policy -- no dickering required -- there's a definite cohort of risk- and uncertainty-averse customers who would be willing to start using Amazon (or switch back).
That's what stopping instances _is_ already. You don't get charged for stopped instances which is a defining feature of Amazon's cloud. Very few providers actually offer this. Most just charge away for the compute even if the instances are powered off, Azure being one exception.
This whole "spin up compute and get charged a minimal amount when not in usage, but keep your working environment" model was pioneered by Amazon.
> So wouldn't this cost Amazon money? Sure, but that's a cost of doing business.
Why would Amazon spend a bunch of money, so that they can charge customers _less_ money, in order to keep customers who are cheapskates, and/or won't take the time to learn the platform properly?
Citation Required
But, they don't don't give us the choice. I need to keep an eye every moment of every day for an alarm, as hundreds or thousands of dollars rack up. That's the ONE THING I DON'T WANT. I'd take anything else (delete my data, lock everything, whatever) over charging me money I can't afford to pay.
I think it would be reasonable to put everything into a no access / deep freeze mode, until I pay up and choose to unfreeze. Would it cost Amazon that much to just keep my data locked for a couple of weeks while I sort out my storage? I'd even be happy for a reserved $100 or so to pay for keeping the storage going.
You know you can make a machine do that for you - right?
In fact all the tools Amazon would use to do this are available to you right now. Cloudwatch, SNS, and Lambda are 98% likely to be all you need - apart from the time to get it set up to do whatever you think is "the right thing".
This seems like the kind of thing you really want to get right, and it will be (I imagine) hard to get right. If it was easy, I would expect some company to offer it (along with, of course, a guarantee that if they mess it up, they will pay my bill).
No one running a real business on AWS wants a hard ceiling instead of billing alerts and service by service throttling. Which Amazon has.
So, this is just the nuclear option for people's pet projects. It's not a bad thing to have but I wouldn't expect it to operate any differently than what would happen if you broke the TOS and they suspended your account.
That's absurd. Of course there are businesses that want hard ceilings. Perhaps not on their production website[1], but on clusters handed over to engineers and whatnot for projects, experimentation, etc.? I've seen these things lay around for months before they were noticed.
[1] Maybe you don't consider startups 'real' enough, but I can totally imagine early stage startups wanting limits on their prod website, too. You can't save CPU cycles for later consumption.
Are you sure? I'd imagine many startups would rather take a few hours of downtime over billed thousands erroneously. The latter could easily mean the end of the company but the former, when you are just striking out is not the end of the world by far.
I know startups that I could bankrupt with a few lines of code and a ~$60 server somewhere long before they'd be able to react to a billing alert if it wasn't for AWS being reasonably good about forgiving unexpected costs.
I'm not so sure no one running a "real business" would like a harder ceiling to avoid being at the mercy of how charitable AWS feels in those kinds of situations, or when a developer messes up a loop condition, or similar.
Perhaps not a 100% "stop everything costing money" option that'd involve deleting everything, but yes, some risks are existential enough that you want someone to figuratively pull the power plug out of your server on a seconds notice if you have the option.
If you can't afford downtime you probably can afford to wait for the alert and choose your own mitigation strategy. A system that can't tolerate downtime probably has an on-call rotation and these triggers ought to be reasonably fast.
If you can't react or can't afford to react, you probably can afford some downtime / data loss.
So the system doesn't need to have granular user defined controls. Just two modes. That was my point.
I think I triggered people with the phrase "real business" and I apologize for that.
You know exactly how much a paused EC2 instance charges you. The ceiling implementation could say, if the total amount charged so far this month, plus the cost of pausing the instance for the rest of the month, exceeds the ceiling, pause it now. So there's no data loss; the worst case is the customer's service is offline for the remainder of the month (or until they approve adding more money). At some point less than this number, start sending angry alerts. But you still have a hard cap that doesn't lose data.
It's not what a serious production user wants, but it's exactly what someone experimenting with AWS wants, either a running service that's looking at a cloud migration, or a new project/startup that hasn't launched yet.
Granted, for a big company, that amount may be so big it's unrealistic to ever hit it.
Most companies will hold onto your data for a time, then delete it afterwards.
This doesn't smell like technical concerns to me. It smells like sneaky Amazon-wants-to-make-more-money concerns.
(<snarky> What's a gallon of milk on the shelf really cost Walmart? And how much of it is opportunity cost? If I usually buy 2 gallons a week - why can't I keep taking home a gallon every few days for a month or so after I stop paying, then cut me off afterwards? Sounds like a sneaky Walmart-wants-to-make-more-money concern.)
If only Walmart would have a process in place to notice that I was ordering a spectacular and unusual amount of milk and save us all the trouble.
Also how our are analogies alike? Milk is a consumable, data is information. Completely different usage pattern.
Finally, every internet service provider I've ever used that held data for some reason granted me a grace period, even if it was never officially stated. Sometimes you just have to ask nicely
On the other hand, based on near-universal industry practice, there doesn't seem to be a huge demand for this. I suspect it may be better for everyone concerned to have heavy-duty users control their costs in various ways and for Amazon to refund money when things go haywire without bringing someone's service down.
I've seen engineering teams hand out accounts to support teams for testing, and since the resources are not under the purview of the dev team things go unnoticed until someone gets the bill. Arguably there are better ways to handle these requirements, but it'd be nice if you could force people down the path of setting billing alerts because these individuals don't always realize that they are spending money.
So maybe a couple of EC2 instances go down, but you pay for and keep S3, Dynamo, etc. At least enough to salvage or implement a contingency. You'd still owe Amazon the money.
It's tempting to wonder why Amazon would incur that risk, but it is a risk already inherent to their post-pay model, and it serves as good faith mitigation to the runaway cost risk that is currently borne by the customer.
Not perfect, but maybe a compromise.