undefined | Better HN

0 pointsmoonchrome3y ago0 comments

The more I work with cloud the more I see this 'saves time on infrastructure' being a half-truth at best - we still have overwhelmed DevOps people who, unlike managing on-prem services, have 0 insight or control over what goes on inside AWS. And the services require a decent amount of hand holding, proprietary know how, etc.

In the end you replaced sysadmin with DevOps and got up charged multiples.

0 comments

4 comments · 2 top-level

acdha3y ago· 2 in thread

It's certainly possible to overspend on cloud services but in most cases when I see comparisons people tend to forget to fully include their true costs for things like staff time, infrastructure, etc. and especially things like opportunity cost for the delays caused by provisioning infrastructure, less capable interfaces (e.g. if you're swapping Terraform for a Jira ticket), or the technical decisions people make because they have fewer services available (this could be a Lambda function but we don't have that so now we need to manage a full VM).

Some examples which come to mind:

* Comparing S3 to the on-premise tape system but ignoring the fact that it involved an expensive tape robot, DR had access times measured in days, etc.

* Comparing S3 to on-premise storage, ignoring the difference in redundancy, forcing users to handle bitrot at the application level, and the procurement process meaning that when they ran out of storage it took months of telling people they couldn't allocate more.

* Saying their devops engineer cost twice as much as their sysadmins (true) but then when you look the devops engineer is using automation and managing literally a hundred times more systems than the “cheaper” ops team.

* Saying their cost to run a VM was cheaper than EC2, which true if you looked only at the instance but not when you calculated how much they were spending on underutilized VM hosts, power / HVAC, facilities people, etc.

It's totally possible to beat a major cloud provider on costs[1] but you usually need to be operating at fairly large scale to even approach the break-even point. This is especially true when you have regulatory or policy requirements for things like security and you include the cost of monitoring all of the things which fall under the cloud provider's responsibility — management networks, firmware version management, robust logging and IAM, etc. are all easy to accidentally exclude when making comparisons.

1. Network egress as the most obvious area to attack

moonchromeOP3y ago

But that's the part that I'm not getting - it's either AWS or building storage shelters and tape robots.

There are hosting providers like Hetzner that will rent you bare metal machines so you don't have to deal with actual hardware, their networking is basically unmetered (last I checked), they will also sell you spot VMs, blob storage service, etc. The pricing difference is in integer multiples and much less "gotchas" or lock-in (they are usually dumbed down).

They won't sell you fancy SAAS - but the ammount of fiddling required to get most of AWS services working in practice and debugging it - you're not very far off from using the OSS solution they are reselling - chances are your devops/sysadmin team can do that as well.

And there's plenty cloud agnostic SAAS as well.

I mean if you're a big corp that can accept the inefficiency for the sake of having one bill to pay at the end of the month, SLA you can use to cover your ass to higher ups and have the "nobody got fired for picking AWS" mentality I get it. But for startups and small companies ? All this "magical simplicity" that AWS/Azure/GCP is supposedly brining to the table - I'm not seeing it. Real redundancy (across AZ) is still really hard and requires extensive planning - no magic there. Dynamic scaling "savings" is often offset by being multiples more expensive to begin with and I have to deal with warmup issues, and the pricing is sneaky/unpredictable, easy to fuck up. Lambda architecture is pushing towards distributed microservices by default - the hardest thing to debug in practice.

So end of the day, I'm forced to work around arbitrary pricing models rather then technical limitations, debugging opaque black box services, not being "big enough" to get actual support, overly generic/complex systems built as "one size fits all" solutions, or get sold in to practically abandoned "service" because it's available etc.

And it's not like being first party service is any indication of quality - as soon as you move off of the most used stuff you get to see some really shit services in my experience.

acdha3y ago

> There are hosting providers like Hetzner that will rent you bare metal machines so you don't have to deal with actual hardware, their networking is basically unmetered (last I checked), they will also sell you spot VMs, blob storage service, etc. The pricing difference is in integer multiples and much less "gotchas" or lock-in (they are usually dumbed down).

If all you need is bare metal, you don't care about network quality, etc. then, yes, you have a lot of options. The reason why you use something like AWS/Azure/GCP/etc. is that you're looking for more than basic infrastructure services and don't want to have to manage relationships and support infrastructure for multiple providers.

This even covers basic things: for example, look at all of the reports of Hetzner customers losing data due to something like a drive failure. It's not like there isn't a way to deal with that yourself but now you're in the business of monitoring the drives, setting up software RAID and testing/benchmarking it to make sure your configuration is robust, scheduling downtime to reflash servers with storage updates following some process which depends on the hardware & software combo you're using, etc. Your competitor using AWS isn't paying their sysadmins to do that because they do it for you, and it's completely transparent.

Similarly, the major cloud providers can live migrate VMs off of failing hosts without your awareness — another thing you have to build and operate yourself on bare metal which isn't trivial to get right. When Spectre et al. came out, the major cloud providers' customers were all patched on the day of the annoucement. Bare metal hosting users had to schedule downtime, roll patches, and make sure nothing broke.

Again, I'm not saying you can't do that yourself — having done so myself at moderate scale since the turn of the century — but just not to underestimate the direct costs to provide equivalent service, as well as the opportunity costs of having your expensive staff time locked into infrastructure O&M. Business needs change and it's really nice to be able to handle curveballs – say you went all-in on Hetzner, but now you need a server outside of Central Europe/Virginia and they want Windows + SQL Server instead of your usual Linux and they need it next week but only for 6 months … how much do you need to invest getting that set up versus someone in AWS/Azure/GCP having it done in 15 minutes with infrastructure provably meeting the same standards and configuration as you use in your primary locations?

> They won't sell you fancy SAAS - but the ammount of fiddling required to get most of AWS services working in practice and debugging it - you're not very far off from using the OSS solution they are reselling - chances are your devops/sysadmin team can do that as well.

This is the opposite of my experience: with a few exceptions, you're looking at significantly more work to build an equivalent service yourself, especially if you need to worry about reliability, security, etc. That last part is important if you need to be able to make strong statements about who has access to data, whether logging can be tampered with, etc. — those are all things you _can_ setup yourself but the cost of doing so is greater than years of your usage until you're at a pretty large scale.

As a simple example, most security benchmarks require you to enable storage encryption. That protects against data leaking when drives are replaced or rotated out of service and if you do it right can also prevent data from being leaked when someone compromises part of your infrastructure or even a sysadmin. This is a turn-key service on AWS – at most check a box for the basic form, or configure KMS policies for whatever you need in the more complex cases, and it's the same set of tools for EC2 servers, databases, S3, EFS, etc. as well as your applications using KMS directly.

If you're using Hetzner, here's what they say: “We recommend that you not use server-side encryption.”

You can configure luks, etc. yourself but now you have a ton of work to configure and validate, and the problem is inherently harder because you now need to deal with things like access control & auditing yourself. If you need to demonstrate this to an auditor, you have to convince them that you have sufficient controls to limit an attacker or rogue sysadmin whereas if you're using AWS/Azure/GCP/etc. you can just point them at the platform docs and show that you configured the intended policy.

Repeat for things like immutable storage: if you need to care about that, the cost of being able to prove that your storage is protected is going to pay for exabytes of Glacier storage.

1 more reply

zo13y ago

Agreed, devops (and whatever the hell cloud wranglers are being sold as today) is the new priesthood. It's meta-level job creation and job security.

j / k navigate · click thread line to collapse