Whilst this might sound funny, we were surprised to see it as a common use-cases with users putting https://github.com/infracost/infracost in their CI/CD pipelines to act as safety net. Currently it only works for Terraform users, but we plan to add other infra-as-code tools in the future. We're also discussing how we can do this for people who don't use infra-as-code in https://github.com/infracost/infracost/issues/840 but it's not clear what the workflow could look like for them. Perhaps having separate AWS accounts with a budget alert that emails you to run https://github.com/rebuy-de/aws-nuke is a work-around just now.
(I'm co-founder of Infracost)
You absolutely must, MUST, MUST be using separate AWS accounts for separate purposes. You can have as many as you’d like and roll up the billing into one actual paying account.
This is a win for accountability (roll up dev and easily see the split out for separate environments), but more importantly for security as it limits the blast radius for any one environment. Combined with per-account budget alerts it’s a win across the board.
A couple of fun billing surprises I've seen.
1. A bug in a system that uploaded quite a lot of data to Amazon S3 caused it to hit the S3 API to the tune of about $10K/day. Because AWS billing is usually 2-3 days lagged, it took us 3 days to notice. We fixed it right away once we found it. Goodby to that $30K.
2. An engineer did an Athena query that happened to walk many TB of data. And they unknowingly did it in us-west-2, but the data was in us-east-1. So that resulted in a cross region transfer to the tune of $10K for that single query.
Then last month I got an email saying "Hey, those quotas you were setting using the API documented to set quotas, those were actually not being enforced the whole time because of undocumented issues with our systems." So basically you can't rely on the documented behavior of these systems, there's no good way to test whether your code is correct or whether your limits will work without actually exceeding your budget for real, and the whole thing is a clusterfuck. When you get a surprise bill you just have to throw yourself at the mercy of whichever first line billing support rep is randomly assigned to your case.
Limiting your bill to something less than "potentially infinite" is just a basic fundamental feature that shouldn't require rolling your own bill-monitoring service relying on poorly documented and malfunctioning APIs with no provision for testing. There's no excuse strong enough to explain why the cloud providers can't do something reasonable here.
The "tiny bit of lag" between usage and billing calculation explodes when there's a lot of usage - in my case, a broken job tried resubmitting itself continuously, and the lag increased to 8 hours and $5000 just when I needed the alert the most. My team's response time was 5 minutes... After the 8 hour GCP lag.
Very similar to this guy's story: https://blog.tomilkieway.com/72k-1/
I had to go back and forth with them on email for weeks, and ultimately threaten them with a draft blog post with a lot of graphs and screenshots of their recommendations for them to cancel the bill.
I’d love it if GCP’s official method were to disable billing if your bill went over a limit.
Sadly, I suspect it would just disable systems instead.
I use Vultr or Digitalocean if I need a server somewhere because at least it's just a pre-set cost.
I'm in love with DigitalOcean because you know the price you'd pay each month. If it's just 5 USD or 5000 USD, it's what you expected, nothing more.
I believe the rest of the clan (Linode, Vultr, etc.) give you the same certainty.
To be fair, AWS Lightsail should be an option too. Lightsail machines come with a fairly competitive amount of bandwidth.
That's exactly how it should work. It would even be useful if I could designate my development / testing account as unimportant by default so everything can be nuked to limit spending.
I think it's the kind of thing that will only be solved by regulation. The government needs to institute the concept of capped overages for cloud providers where if I set my budget to $100 / month they aren't allowed to send me a 100x bill for $10k.
Here's the 9 year old request for the same thing on Azure.
https://feedback.azure.com/forums/170030-signup-and-billing/...
I politely asked @AzureSupport (on Twitter) if they could have someone provide an update, but they didn't deliver on their promise to follow up :-(
The possibility that someone flood the server even for static resources causing bandwidth spiked Bill is scary.
Genuinely curious, is this just a side-effect of the cloud craze or did DDoS attacks become so powerful that old-school approaches of appropriately-sized bare-metal infrastructure with finite but unmetered bandwidth are no longer viable?
The way I see it, you can provision enough unmetered bandwidth to cover your typical load + a safety margin at a flat rate per month, and worst case scenario if the attack is big enough you merely get downtime (allowing you to re-evaluate the situation and decide whether to throw more bandwidth at the problem or purchase attack mitigation services) instead of an infinite bill?
My current ISP gives me 1Gbps unmetered. Worst case scenario the connection is saturated but at no point the ISP will come to me and ask for extra money.
The limited protections available against this threat from the big cloud providers have to be seen as a warning sign. It's only a matter of time before any small business using these services for hosting can be subject to sudden shakedowns by criminals. "Nice business-critical infrastructure you have there, be a shame if anything were to happen to it." Some of the providers do offer a DoS mitigation service, but the cost for the higher levels can start to look like a shakedown itself.
My guess is that billing lags enough that they can't stick to a price cap, which means that they either have to guarantee the price cap and swallow the difference, which could be exploited by malicious users to get free compute, or they have to say that there's a delay on it which makes the cap fairly useless.
Some of these services are billed by such small increments I can't even imagine how complex billing for them is in practice. I'd be surprised if bills are eventually consistent within 24 hours.
I wouldn't be surprised if we see an announcement like billing being guaranteed after 1 hour at some point in the not too distant future, but I'd be surprised if we see realtime caps.
I understand the role and the necessity for "the cloud", but it's a re-invention of the role of the mainframe. I hate seeing one of the most notable aspects of the microcomputer era go away which the ability of a motivated individual to gain computer skills using an individual's resources.
A publicly accessible mainframe, where anyone anywhere in the world can script the provision of machines and other resources with little more than terminal and a text editor.
That would have been utopian science fiction in the heyday of the mainframe.
[1] https://news.ycombinator.com/item?id=21694835
[2] https://wordpress.org/support/topic/amazon-cloudfront-invali...
This inspired us to add billing limits to our SaaS product so that users don't have be in scary situations with bill run offs: https://mediamachine.io/blog/protect-your-customers-with-bil...
AWS, anecdotally, has removed 5k++ mistakes I’ve made with little question.
(One example they forgave due to my carelessness: ECS and Fargate service with logging to CloudWatch but with verbose logging on. The bill was 8k that month for just CloudWatch usage)
AWS's unknowable policy for the cost of errors represents a huge risk for individuals and small businesses. It puts a lot of people off.
I had a security issue related to a SaaS product which led to a $7k AWS line item when someone started sending a LIST request to S3 buckets billions of times. They would not consider refunding.
Now I’m having a bunch of problems terminating some AWS Orgs accounts and they are being deliberately difficult in getting it tidied up whilst I’m incurring significant costs.
The whole billing stuff is complex and opaque and there aren’t enough controls and limits on spend. I feel like I need to dedicate 1 x FTE at least on AWS cost control which is a high cost for a small business.
As a CTO, I’ve previously influenced $millions in spend on AWS, but would be very nervous putting my reputation on the line to spend big with them in future. I’m frankly losing trust in their commercial approach.
I’ve never had an unexpected cost they didn’t readily credit back, provided we were taking the recommended and reasonably easy steps to keep on top of costs and limits.
We were a bit shocked to see this happen and it was a very subtle increase that was sort of hidden in Cost Explorer unless you spent hours digging into it and comparing your past invoices.
(I'm a co-founder of CloudForecast)
The way Route53 pricing works, you could get a 1000$ bill any day - for DNS... Having just a single domain on there is enough.
Ironically the Oracle cloud seems more price-reasonable (for now).
- Good terms with proprietary lock-in.
- Milk that cow for all it's worth.
It's more nuanced than that -- I gave an oversimplified model -- but I've never seen anyone come out ahead doing business with Oracle long-term.
Will some people/businesses prefer this because it's not 'credit'—does AWS scrobble to your Credit Report in any country?
I am failing to see the appeal here...
Reg. Section 1.461-1(a)(1) provides the following:
If an expenditure results in the creation of an asset having a useful life which extends substantially beyond the close of the taxable year, such an expenditure may not be deductible, or may be deductible only in part, for the taxable year in which made.
https://www.law.cornell.edu/cfr/text/26/1.461-1
If you buy 10+ months of AWS credits in December and have a Jan-Dec fiscal year, I'd argue that you bought "an asset having a useful life which extends substantially beyond the close of your taxable year"
Also on a technical note, this allowed for some nice internal data models/patterns that could be utilized in further use-cases
In a past life, I did some work with government clients who preferred to be charged up-front in a lump sum, because it was much easier for them to get funding for that than a recurring subscription.
This doesn't appear to actually shut down the resources once the preallocated spend is exhausted. Its just a way to pay for bills preemptively instead of when you receive them. Its an accounting thing, not a new feature.
If that could be used as a hard limit that would be more interesting
Linode is very similar pricing/offering and has incredible customer service. I'm very happy with them.
Their emails even use language like "you need to top up your account".
It's frustrating me to the point where I might just leave this site. I'm sick and tired of this new-wave guerilla marketing.
A group I worked with bought about 5 years worth of a specific consumable they needed to continue working, 2-3 year service contract with a vendor to maintain aspects of things so some work could continue and be leveraged for future grants, and hosting/software licenses were often purchased for long time horizons in advance, where possible.
With use it or lose it money, you use it. Whether money should be provisioned that way and coming in under budget should be punished is another story...
I’d like this to work like a prepaid phone.
Why is it just 1 or 3 years? What if I only need it for 6 months or 2 years? Can’t I just get a discount proportional to a custom length of time?
Why can’t I choose the amount of money I want to pay under the “partial upfront” option?
Why can I only reserve some AWS services and not others? Why can’t I reserve a certain amount of S3 storage for example?
Used AWS for 3 years at a decent sized agency. It seems we underestimated how much not to forget checking and scrutinize every line item in the bill because our lighsail instances had another DB attached to it that we had no idea about, but was charging a crazy fee (converting our local currency to dollars = 19x)
There was much finger-pointing.
This is largely desired by customers with complicated acquisitions and budget allocation periods (Government)
/s
I'm a co-founder of https://www.vantage.sh/ which helps organizations track their AWS costs and we'll look at incorporating Advance Pay balances into the platform.
I'm not surprised. I'm convinced AWS has strategically focused on making costs difficult to keep on top of so you just pick a service, assume it's magically cost optimized for you and use it even though that's not reality.
Side note, I love the Vantage EC2 instance comparison chart, I've used it a few times recently and it made my life so much easier. Thank you and your team(s) for providing this freely and publicly: https://instances.vantage.sh/