"Don't go to the cloud, just buy your own servers" completely ignores the reason anyone rents cloud servers in the first place. If I could easily say "I need exactly this much capacity, no more no less, with no unexpected scaling needs and no code/infrastructure changes" then I'd be sitting pretty. Now for a show of hands, how many companies does this describe?
And of those companies that raised their hand, how many can say "I'm fine with just having a server in Germany" who will then go on to say "and I don't need a CDN to serve customers in other regions"?
>keep running live/live setup...This is very critical
Oh I didn't know it was that easy. If all you have to do to keep a server running is say "stay running, it's very critical" then of course no one needs any managed services. Outages are solved forever.
Completely worthless advice from a text file written by... I'm sorry who is this person and what authority do they have?
But I do think he's performing a useful service in highlighting just how much you're paying for not having to hire a devops or sysadmin person. 10x performance differentials have been replicated not just by him but in other benchmarks. I remember seeing a chart when I was still working at Google and GCP was being justified that showed a graph of hard disk prices vs. S3 rates - since inception (2007-2012 at this time), S3 rates had gone down by a factor of about 2-3x, but price/GB of physical hard disk space had gone down by ~100x. Amazon is hiding all of this improvement in the physical hardware behind vCPUs and opaque billing, and it becomes pure profit to them.
I'm surprised more mid-size companies - those with AWS bills in the 6-figure-per-month range - don't leave the cloud, get physical hardware, and pay some sysadmins. At that level you could easily afford them, and you probably can get old Linux greybeards cheap now that everyone thinks the job description is obsolete.
I'm not surprised at all: that's thanks to the beauty of lock-in effects. Especially all those cloud-specific managed services make it really hard to switch, and even if you are treating the cloud as a pure VM platform, it is considerable work and risk to switch to a self-managed bare metal setup.
Basically, if you don't start bare-metal (incurring having to assemble the necessary know-how to do so) you will have a hard time to get clean from the cloud drug later on, when you think you're ready to afford having the know-how. I think this aspect has been a main driver behind the linked article.
S3 costs pretty much what that infrastructure costs to run yourself.
But certainly, go ahead and run a Riak CS cluster (plus those other things) on top of some Herzberg servers, and try to outcompete S3 on price. I—and many others—will be waiting. :)
$12 million/year for cloud seems like a high enough threshold to go internal but I'm not even sure that's enough money to sway executives away from cloud.
The main issue is that internal IT departments at corporations have slow service from the IT staff compared to AWS/GCP/Azure and its engineers. Corporate IT departments typically don't treat their coworkers in other departments as internal customers. Instead, they treat them as adversaries and a nuisance.
That's why at non-tech corporations, the first group that experimented with the new AWS cloud service were the development teams. They got fed up with IT service backlogs waiting for new dev/sandbox/text servers. They got tired of waiting for IT to requisition a server and then install the os, db, etc. If the programming manager then asked IT to install an Oracle update, the IT department might say "well, Susan our db admin is on vacation till next Monday so we'll get to it then." Those kinds of inter-department interactions were frustrating to the internal customers of the IT department. With AWS, the programmers got their sandbox servers in spun up minutes instead of weeks or months. Once the dev teams were sold, the more risk-averse and mission-critical production workloads eventually moved to AWS too.
The author of this article doesn't seem to understand the dysfunctional relationship between internal IT and their coworkers they serve. It's more about responsiveness and iteration speed of the AWS/GCP/Azure employees' vs the IT employees than expensive AWS EC2 compute power vs cheaper internal racks of cpu.
E.g. The Guardian pays AWS more for cpu+diskspace+network than what they can build on their own. That's the raw hardware costs. But it doesn't matter because Guardian's internal IT staff with an internal cloud stack just can't match AWS: https://www.computerworld.com/article/3427004/the-guardian-g...
To me, it totally makes sense for Dropbox to migrate off of AWS to save money. They're a tech company and have the engineering culture to do it. A lot of non-tech companies (like The Guardian) don't.
But people are REALLY REALLY expensive (not only the ongoing costs, but hiring, firing and overhead), and Cloud has another huge advantage: accounting tricks (which help a lot to said midsize firms).
Literally one of the first things they teach you at the Azure training is the difference between CapEx and OpEx: https://docs.microsoft.com/en-us/learn/paths/azure-fundament...
Ya just gotta know your business needs.
The author's advice is good for basically one scenario: a personal hobby project run by somebody who already knows they would prefer to run their own servers instead of use a cloud host.
I have to strongly disagree with this sentiment. Thoughts which are against mainstream deserve to be visible and thus open for debate.
My company for example decided to go with our own production hardware. I'm not saying it's easy. But it's doable. And very educational. =)
One has to remember that running your systems on 3rd party cloud provider will not strip you from responsibility and/or software maintenance cycle which usually presents most of the "human workload". Their nines are not yours.
According to my experience with two reliable machines (redundant PSU's etc) one can create a very reliable service platform. Using correct kind of management tools of course is essential. RAID disk system and a reliable backup arrangement are essential too.
We used Ganeti (https://en.wikipedia.org/wiki/Ganeti) in our 1st generation setup. The next generation will use our own manager (Deux).
Also, good luck finding a Dev Ops engineer for $57,600 a year.
So the salary we're really talking about actually is less than 40k a year.
Good luck ^^'
From my experience, doing sysadmin for a single server only takes time initially. You might spend a couple weeks full time to get everything going, but 6 months in it's probably only a couple hours a week.
Add downvoting and the article queue would quickly become the same groupthink-swamp that the comments section has become.
One of the nice things about HN is you can still find articles that are against the grain and have topics that challenge the comments section’s “one correct opinion”.
I agree, they are very expensive for what you get at any even modest scale. (Except for s3 which is absolutely amazing). However, if you want to run your own stuff, you're buying into knowing the details of how it works. eg you better understand what vacuum is for pg, and how it interacts with availability, and be willing to create backups, etc. AWS does do real work; it's just expensive.
All I know is we never had a $30,000 “oopsie” in the data center because a dev left a bunch of stuff running they should have shut down.
We aren't going to suddenly have 20x employees one day, a lot of our infrastructure is highly predicable and stable in load and demand.
Some things need instant scaling, sure. But not everything that's in the cloud does.
I know you hate their opinion, but also attacking the character is kinda unnecessary. Can we just not?
https://github.com/jackdoe/txt.black/#start-of-content
In other words this is the OP "zulgan"
user: zulgan created: July 11, 2018 karma: 530 about: https://scrambled-eggs.xyz https://baxx.dev https://github.com/jackdoe
With some cloudflare-ing you could quite easily get away with a single origin server in Germany, too. The latency from a rails app is probably higher anyway.
My anecdata with hetzner - 1.5 decades of lightly used servers for hobby project. One power supply failure resolved in a day. They have some quirks, but.
If you truly have to scale out, sure, cloud is great (sort of). But you can engineer your solution to take the benefit of both, you win.
I think there are valid reasons for each viewpoint.
His "homepage" is a raw directory dump with a high-contrast black background (the preferred color scheme of 10x engineers).
Isn't that enough authority for you?
BTW if anyone knows of more nuanced explorations of Cloud v. DIY tradeoffs -- we all know there's a tipping point in favor of the latter, somewhere -- please do share.
100% agree. This article goes so far as to be farcical, but that doesn't mean there isn't something worthwhile to be had in the conversation... as long as that conversation was framed appropriately. Especially around cloud value-adds like managed databases and Lambda and security technologies like CloudWatch/Trail.
I'm the sole (part-time) dev-ops resource for a very small company and we (by which I mean I) DIY just about everything running on a small VPS on DigitalOcean. I run my own database and manage my own dokku and user access etc. I use S3 and have dipped my toes into Lambda but I always wonder if I'm missing out on something not using cloud technologies more heavily.
I would absolutely love an analysis of what cloud technology companies SHOULD be using and at which point they become or are no longer worthwhile in favor of an alternative.
Honestly? Lots.
And that benefit only improves with scale.
That's hard to derive from the same statement - there's a point after which you can afford your own 8 member SRE team across 3 timezones.
The cloud is incredibly attractive when you consider your scale problems as unknown (as a startup, the blitz scale day isn't a good time to go rack nodes yourself) & mostly as way to scale down the costs faster than physical infrastructure + associated human costs can.
I've started looking at this as slightly different model - I'm no longer hiring a good operations manager for those 8 people, I'm hiring a decent programmer who can automate and have code handling deployment tasks instead of a human being taking orders (i.e rack an NVMe + reboot this).
Honestly, even with your physical infrastructure, the trend is towards API based deployment models.
My time at Zynga working on ZCloud, migrating off EC2 to on-prem, that is what I saw. The API based deployment model + reactions from machines (on failure), was worth building everywhere - now that applies to k8s.
Many similar benefits when you have a huge for-profit entity running the infrastructure at their scale. They optimize a lot for you.
AWS almost never forcibly spins down legacy functionality, even if they stop advertising it, unless literally no one is left using it.
That's actually a selling point of AWS to enterprise: while idealistic devs may like “our cloud vendor will force us to use the latest and greatest so we don't have to have that fight”, enterprise buyers don't want “we’re going to be compelled to reengineer our working systems on someone else's timetable”.
Maybe, but you probably have to go pretty far on the cloud maturity spectrum (using cloud native services effectively, not simple VMs like EC2 or even comparatively thin services that abstract just the machine/OS layer like non-Aurora RDS) before you are saving more in local-machine-focused ops teams than you are paying in cloud-focussed ops teams. The latter are more likely to be timeslices of time of team members that are primarily developers, but its still work with a cost.
The trick is to make yourself the linchpin, so that once the site grows large enough you can never leave, as you're the only one able to keep that writhing mass of wires up and running.
But yes, this is outsourcing a problem for a cost.
https://www.techempower.com/benchmarks/#section=data-r18&hw=...
Do cloud providers increase your ability to choose scale-up instead of scale-out? Instead of your sharded and replicated MySQL that is almost inoperable and never seems to maintain consistency, would you be better served with one gigantic database? Sure it costs $13k/month to rent one from Google but it costs a quarter million to buy one from Dell. These are all aspects of the decision-making process.
Want to scale up? 5 weeks minimum. Your opportunity may have been lost by then.
They don't really do custom specs, iirc part of the low pricing is because of how much useful automation they have. Sure, it's not cloud cloud, but if I have a personal side project that I need a few terabytes of ram for, there is no way I can afford AWS/GCP/etc.
What makes you think it’s a “baremetal physical machine”? This description sounds very cloud-y to me.
I'm just glad I can do my things without even having to think about, let alone maintain, 1/10th of all this, really. But YMMV, I guess.
* It takes forever for Dell or HP to ship your stuff.
* Your co-lo can physically ran out of space.
* Your predecessor will inevitable have tons of one-off unversioned changes in prod.
* Repetitive manual work when scaling multi DCs.
* Making unscalable database choices (non-master-master) because you are not in the cloud mindset.
* Don't worry about CDN? AWS S3 comes with CDN automatically.
* Too many more bullet points to mention.
The number of times I've run into this at clients is ridiculous. Either there isn't enough space in the rack, or they're out of network switch ports, even down to being out of SFPs or network cables themselves. I've even seen clients run out of power... as in, they couldn't add one of our servers to the data center because the UPS and power delivery systems could not safely handle the load of one more server.
Running a datacenter is a lot harder and more costly than most people think.
Not a problem for this guy, he uses hetzner's cloud. Did I say cloud, sorry I mean server rental.
I switched from the dedicated hosting that he advocates to AWS long ago because I was always having things get broken by the people at Softlayer. For instance one time I upgraded my network port and somehow that broke my record in the issue tracking system so I could not longer put issues (or any work reports in.)
There was the clunky and expensive backup server (I'm not sure if it would really restore...) and the breaking point was when somebody added a new disk but they did it wrong so that the partition table got overwritten when the machine rebooted. I was able to fix the partition table but then I moved my files to EC2 as fast as I could because i didn[t have time to deal with that.
As for queues I wonder what is up with that. When I build systems based on message queues they work OK, but I've got an eye for detail and for architecture that many people don't seem to have because I always see other people get in trouble with them.
But for most people, it's simpler to just use them and get it over with, and I'm glad it worked for you.
I prefer renting virtual private servers from Digital Ocean (or, yes, EC2 instances) and configuring them myself. Doing so is perhaps more involved than buying into the AWS/GCP/Azure ecosystem, but I already know the tools involved, and I value the safeguard against vendor lock-in/buy-in enough to spend more time on them.
Those VPSs are "cloud" solutions in my opinion, and a happy medium between AWS and the author's unnecessarily-profanity-laden proposal that everyone should rent/buy metal. You can be "pro-cloud" and "anti-closed-cloud-ecosystem" at the same time.
RIP sysadmins, and all the IT knowledge that's slowly disappearing from the web because the sysadmins all work on proprietary stacks at FANG now
And the sysadmins are doing pretty well I am sure, they'll pitch in here once they find their way out of Terraform's documentation.
"The Cloud" (presumably AWS/GOOG/AZURE) has become very expensive and very complex.
In the “roll your own, close to the metal” world this is a pretty sane opinion when the alternative is like, spark or hadoop or something that suck up ops time.
This was the giveaway for me:
Don't go to the cloud.
It will force you to... over-complicate your infrastructure to incredible degree.
It is truly a piece of shit and will just force you to design systems in a horrible way."I've never seen invalidation in decades. you just have unique names.
As a Director of Technology for a real company, I have never, nor would I ever dream of even considering moving to bare metal for ANYTHING mission critical. The sheer number of possible catastrophies that can occur at the infrastructure level would have me shitting my pants 24/7. Could we hire someone (or a team) to do it? Yes. But EC2 and similar infrastructure beats the pants off those salaries any day.
Yes cloud is expensive in a raw performance per $ sense, but that isn't it's value proposition. Most workloads aren't limited by the processing power that you can buy anyway. The draw for cloud is that it abstracts away entire classes of problems. Certainly it provides new problems as well, but those can be worked on in concert with everyone else working on the same platform.
That's not the deal for most startups, though. You're optimizing for the search process. So optimize for the search process. Even after you've found some fit, you're still searching and attempting to grow. Everything that cuts friction from ideation to sale is pure gold. That's why the cloud rules.
I don't figure out how to host the right-sized server with the right infra. I put it on GCS and BigQuery that shit. Or I snowplow into S3 and then Redshift Athena it. Two days to discovering product doesn't work.
If you've ever worked at a big company, you'll know that they have strict IT ops and then a shadow IT operation that manages to be every growth opportunity that gets late merged into the real thing. You want to bypass that. You want to enable people to build things they want to build and sell those things while having automatic best practices. That's the magic of the cloud: instantaneous infrastructure, instantaneous tooling.
$5k is nothing. For the gain I get from that I'd pay out of my own pocket.
On Cloud vs baremetal arguments. I respect people find cloud works and is valuable for them. This inspite of my own personal experiences, direct or indirect with cloud have been adverse. I agree with author, for me, for dollar and time effort, baremetal is just better economically and performance.
Most recent. A web service I use heavily migrated from bare metal to cloud and their performance / reliability dropped significantly with numerous outages which tied me up on the phones to my clients. I gave them 6 months to get their act together, as a loyal client then gave up and moved elsewhere. Being a reasonably big client their sales tried to retain then win me back. Conversations were always the same:
Me: Moved because you guys stopped being reliable, reliability is important to me. Ever since you moved to AWS you have been unreliable.
Sales: we had to move to scale and grow
Me: whats the point in moving if moving weakens your product. there are other ways to scale.
Sale: we had to move to scale and grow
Before, just getting someone to run down a blinking disk light on a server was a chore. Getting them to rack a new server took several days or weeks. Doing anything, any changes, took several days minimum. Now, we can do it all much faster, and from all accounts of my management, it's cheaper.
I don't doubt you can run your own data center cheaper, but not for my company.
At least he's consistent.
It all just depends on the context. Going to the cloud, can really make a lot of sense. For example, if you are a start up that is targeting a mass market and need to scale quickly in multiple regions... Or, if you are the CTO of a large company and don't want to spend time and focus on running your own data center and everything involved with it...
There are also many situations where running your own collocated dedicated server is the best choice. Keeping it lean and mean and removing as much as possible dependencies, complexity and layers does have benefits.
It is hyperbolic to state that something is just shit for everyone in all circumstances.
This ignores the pay of the extra people you need to pay to build/maintain your infrastructure..
Note to bloggers: put dates in your damned posts.
Sounds like this persons job is being made redundant by clients moving to the cloud and he's not adapting to the market.
- disaster recovery is the first thing you should have in mind, where is the second.
- the right strategy is to preserve the business, not going to the cloud.
I always had fun building my own environments. After the fun came the work and, economically speaking, managing and maintaining those environments has always been an advantage for me. If however, I consider the time it takes to understand and fix problems in VMWare, or SQL optimization, or configure Microsoft AD, Nginx, and others..., I don't know how much I would recommend building everything from scratch by renting some servers.
But yeah, sure, the cloud is a piece of shit. Just a beautiful, magical piece of shit.
Managed services aren't perfect and I've seen them go bad, esp for things like failover. When they hiccup (either perf-wise or downtime) it's hard to debug effectively depending on what logs / stats the vendor makes available.
Most of the small/medium sized companies I've consulted for could have been easily served by a triple-redundancy Hetzner dedicated server setup for 200 EUR/month. Yet they they chose AWS and paid thousands per month because "that's what everybody does". I have not regularly encountered actual calculation or other analysis as basis for this decision.
Many people here and in industry seem to think that it's either AWS or "build your own datacenter or colocation and maintain machines". Be reminded: This is incorrect. The post recommends renting dedicated servers from a server provider (in this case Hetzner), which includes all hardware handling personnel, hardware replacement, power, and so on, for 50 EUR/month per big-sized server.
Others complain about the "too points in the list". In practice, all these points have to be addressed analogously in an AWS deployment as well, and require approximately as much knowledge as the "sysadmin you'd have to hire". It's a myth that you can just use AWS without reasonable sysadmin knowledge. As a "DevOps engineer" building sophisticated projects on AWS, you usually have to have a superset of typical sysadmin knowledge.
Regarding scalability: For me, Hetzner has reliably fulfilled orders for new dedicated servers within 15 minutes (!) to ssh. In contrast, I've had cases where the AWS support request to raise the limit of 0 big machines in a region to 1 has taken more than 2 business days ("a request of this size has to be escalated").
With a reasonable dedicated server setup, many companies don't need to scale for a while, because you can go a long way with ~5000 SQL inserts that a modern dedicated machine can achieve on an SSD.
Again others highlight the cloud for the available internal bandwidth. Note Hetzner gives you 10 Gbit/s as a 39 EUR/month upgrade. Such bandwidth is reliable; I have measured that I can reliably transfer at Gigabit speed even across the oceans.
A 1 Gbit/s dedicated server with unlimited traffic for 50 EUR/month (at Hetzner) can transfer ~260 TB per month egress. Per AWS calculator, this would cost 18700 USD there -- that's a factor of 375x on traffic cost.
In general, in my experience, the factors you pay for common cloud providers over rented dedicated servers are:
* 10-20x for compute
* 2-4x for storage (this computation includes multi-AZ redundancy)
* ~300x for traffic
The article's point about not being able to use strace/gdb/iostat to debug problems on hosted services is also a good one. For example, we have found AWS EFS to have abysmal performance and could not address this problem. Using CephFS on dedicated infrastructure instead, any bottlenecks could be quickly debugged and performance was great.
Similarly, the point about simplicity is good. Having just simple Linux processes running under systemd on a machine, instead of processes-in-docker-on-kubernetes-in-VMs-on-EC2, removes complexity and makes systems many times simpler and easier to debug, and thus much faster to iterate on.
The only point of the article I don't quite a agree with is to not use CDNs. If you want fast loading times over HTTPs (multiple roundtrips), you need to use a CDN, because of physics. (But you can build a CDN with dedicated servers very easily.)
I have successfully run a triple-redundancy dedicated Hetzner setup over the last 2 years (managed as infrastructure-as-code using NixOS) with a High-Availability database (postgres with Stolon), High-Availability file system (CephFS), systemd and round-robin DNS. So basically, everything the post recommends. It's been the simplest to manage deployment I've used for years, compared to many cloud setups I've worked on. It's had higher uptime than AWS in the last 2 years. It has extremely predictable performance. It costs 200 EUR/month. It's a pleasure for me as a programmer.
In summary, I think the article is on-spot (even though it uses odd language).
There remains one significant benefit of big cloud providers like AWS: Being able to spawn whatever you need via /one/ unified API in very many locations across the world. If you need something that must be distributed across as many data centers as possible, this is quite a plus. But very few people actually need this.
Our very early, small startup (4 engineers, low six figures ARR) accumulates $4,000+/month in cloud bills. Its very difficult to manage cloud costs on a small engineering team.
One of our recent examples is with SQS. Our SQS bill is something like $50/month. We process ~100k messages every month. Why is it so expensive, then? It isn't because its processing tons of messages; its because we use open-source polling libraries that are pretty aggressive with their poll rate (then multiply that out by N instances of the poller and there's your bill).
I find it fun to think about "what could I get for $50/month" in regards to a queuing solution. As an extreme counterpoint, Redis running on a raspberry pi in the closet would be a rounding error in our cloud bills over the span of a year. The gray area between these two extremes is where the value lies, and I think that's the author's point: redis on a VPC at Hetzner or DigitalOcean. Billions upon billions of messages processed every month at a fraction of the cost.
So, why pay AWS? There's not NO reason. Reliability is the biggest one I think. But herein lies what I believe to be the intrinsic problem with the cloud: No one can put a price on reliability. If I pay an extra $10k a year to be 99.99999 available (~10s/year), versus 99.99 (~1hr/year), how much extra value does my business receive? No one even CARES what that number is. They just know that more nines is better, so we gotta pay it. But we won't pay too much for it. At some point, it crosses some TOTALLY arbitrary threshold in the CTO's head where its "too expensive" and we need to reduce costs. But, then, we're so bought in that we can't. We can tweak poll rates, we can buy reserved instances, maybe see a 20% reduction, but the pattern is set. We're just a serf to Amazon now.
The other quoted reason is always "the cloud is easier". "You'll save money on HR." "Your engineers can focus on the product." I think Amazon has shouted this line so much from the rooftops that even very smart engineering managers have begun to believe it. Heroku was really the last platform I've used that felt like it could fulfill this promise to any degree of meaning. It really did everything, but AWS has managed to create a network of services that is so fundamentally complex that even when they release services to "simplify" things they actually create more complexity (beanstalk, amplify). You're not getting around having an engineer spending a decent chunk of her time managing this complexity (building CI pipelines, managing IAM permissions, alerting strategies, centralized log ingestion, autoscaling, cost management, encoding everything into CloudFormation, ugh maybe terraform will be easier, figuring out the madness behind VPCs because some random CF stack you found that you thought could automate something for you just spat out an egress-only internet gateway and what the fuck even is that I'm just one person why did I need that).
I've been working in AWS since... the beginning. There are three good AWS services: S3, SQS, and EC2 (in that order). Everything else ranges somewhere between "Fuck it, I guess I'll use cloudwatch" and "what was the rekognition team smoking when they built this, and are they getting a daily dose of it through the vents because its been years and its still not better."
The best part about those "good" services is that they're all solved problems available on a wide variety of cloud providers. Block Storage, Queues, and VMs. I think that's what the author is getting at, in a weird way. The cloud is fine. Managed services on top of open source software is probably fine. Even closed-source managed services can be fine, if they're so stupidly simple that you could explain it to your 8 year old and he'd understand it. But the vast majority of AWS (or Azure, or Google Cloud) is none of these things, and that's the stuff that you should avoid. Or, if you want to save some money, head over to Hetzner or Digital Ocean.
https://github.com/jackdoe "old man yells at cloud" - I loled
I run big batch jobs and setting up a full cluster filesystem and fast network is non-trivial.
Recently, using S3 with VMs that have 25Gbit network, I can easily make 'aws s3 cp' (after tuning) generate 10+Gbit of traffic reliably.
Don't go to the cloud you're such a good sysadmin aha
I would say - run everything in kubernetes (cloud or not cloud). Change your infra layer to use kuberentes objects.
Then you minimise the problem to : where is my kubernetes cluster, on cloud, on prem, I do not care.
This seems to be a massive misunderstanding of what "cloud" even is.
Web searching shows a few but I'd love some recommendations. Ideally an online spreadsheet, but excel would do too.