> This comparison is the #1 flawed sales tactic the cloud companies use to convince you youre saving money
Time is of a limited quantity and time spent managing postgres backups (for example) is time not spent doing other (possibly more meaningful/impactful _to the business_) work.
And if you need some sort of cluster-aware lock to coordinate backups among different peers, you'll need to decide which system works for you, implement it, and maintain that as a separate system. And if that needs to be upgraded, figure out a bulletproof process for upgrading it while it's still being used as a coordinator.
Then, you need to ensure there's storage for the backup. You need to decide what kind of storage you're going to use, make sure you've got enough space, figure out how to encrypt the storage (very important in secure environments), how to protect the storage using authn/authz. And lots of environments have retention and storage lifecycle policies - you don't want to put the old backups on the expensive fast media; you want it on the cheap slow media. And some environments make you dispose of old data, so you have to figure out how to age it out but without ever losing the backups you want to keep.
Finally, you need to make sure the backups you create are valid and usable. So you'll want to build an automated regression testing procedure to ensure that every time you make a change (regardless of how minor) to the system being backed up or the backup process, that you end up with usable backups.
(Disclaimer: I work for AWS, but opinions expressed here are my own and not necessarily those of my employer.)
Yes it is work, but this company’s whole reason for being is to save AWS spend, so I assume they have patterns they employ for their clients regularly that achieve their SLO.
Well, our storage server barfed and the data was gone. Went to restore from backups, all the hourly tar files were there... but were zero bytes.
We looked at the backup script the engineer had put together and it was one of those classic “didn’t give the right parameter to have tar recurse” type bugs. Unfortunately we lost all the photos of the foundation and much of the photos of the electric being run. Oops.
System reliability is hard, and the cloud makes that easier.
The number one backup solutions nowadays is AWS S3, because it's easy-to-use unlimited storage.
How does a company handle backups without S3? Usually they don't. That would require employees to buy machines/SAN with tens of TB of storage and maintain them (weeks in ordering and travelling to the datacenter once in a while). It's too much hassle so nevermind.
Unless you take your DR plans seriously, the cloud doesn't eliminate risk, it just changes it.
The place I work at forces a failover on a monthly basis, and does a full-on offsite DR exercise twice a year.
I'm sure it took time to set it all up, but now that it's there it takes almost no effort to continue.
The answer is definitely not clear to me at all
EDIT: no sarcasm, I legitimately don't know which I would choose as a biz owner
RDS SLA's are here (doesn't mention backups though, so not sure how that's handled): https://aws.amazon.com/rds/sla/
Not a lawyer or anything, but my layman's understanding is that, you essentially are voluntarily opting-in to waiving liability when you sign up for AWS and accept the terms and conditions, and instead of liability, you agree to accept service credits if SLAs are not met.
Can you think of a better example?
Deciding whether you get enough benefit from doing that yourself is a classic business trade off which any experienced engineer should consider.