Nowadays internet speed is great to do self hosting. I have a business line internet at home with ~1gb up&down! Bought couple of 6-7 year old enterprise Dell servers (2x12core xeon, 128gb ram each) and no longer pay any cloud provider ... i'm also hosting 2 backend solutions for mobile apps with decent traffic for friends' startups!
The learning experience has been tremendous! It has actually gotten a lot better and easier with new solutions coming out for homelabs. Get started with Proxmox clusters and go from there...
One thing that came up is development. Modern devops culture is quite a good thing, and what's lovely about "cloud" - as in the ability to quickly buy compute and storage capability - is that ideas you would have tinkered with in on-prem labs (or across private sites) for months can be imagined and prototyped in hours.
I'm a big advocate of rapid prototyping as a _huge_ business lever, because the ability to try out ideas quickly, to easily reconfigure things, is the key for time to market. You can quickly see if something is going to fly or not.
And that's where the advantage ends.
After that, it's all downhill. Asymmetry. Lock-in and portability. Trust and privacy issues. Security perimeters. Unpredictable costs....
So the way forward is to render unto Caesar only the things that are Caesar's.... in other words, take the advantages of "cloud" when it suits you, and then get the hell out of Dodge.
What is ongoing from that conversation is media companies being interested in strategic planning to build, and even share, their own distributed computing resources to pull back to once a technology is off the ground.
Someone even mentioned that it's time for a European Cloud initiative,
IMO it should be a sneaky powerful declaration by major corps that your app should be built to be deployed nearly at will on at least two clouds. I mean terraform is so tantalizingly close to it, until it isn't. This is like Bezos sending out the "thou shalt service everything".
AWS knows this and they are all about lock-in. They want you on the more complicated products, because those are really hard to move off of. Oh yes, don't use cassandra, use dynamo. Man you'll never move off that.
So if you let the devs have "you can develop on AWS" but then they have to deploy on Hetzner ... that will force the devs to be far more cloud-independent. I guess if I was a CIO (never let me become one) I'd try to institute that.
Too many businesses aren’t even properly utilising that key advantage. They’re moving servers to the cloud but still using their outdated development and deployment processes, and things move just as slowly in the cloud as they used to on prem. They know what Infrastructure as Code means, but only as separate words.
They’re moving servers to the cloud but still using their outdated development and deployment processes, and things move just as slowly in the cloud as they used to on prem.
For many non-tech corps, the purpose of moving to cloud is to downsize IT admin staff. It works well.I spent most of my career in large enterprises. The leverage you have against AWS or Microsoft is 0 compared to the old days. They are probably landing more infrastructure every month than my global company had in datacenters 15 years ago.
You can just have on-premise k8s and keep most of the velocity gained from developers being able to "just run stuff" instead of anything having to go thru sysadmins.
You can just rent few servers off OVH to start and not have to worry about actual hardware, while still being few times cheaper than cloud.
Yeah you won't have access to the slew of cloud services and will have to deploy your own database but with amount of readily available code and software to do it it doesn't really slow down experimenting all that much
You can deploy bare k8s, but then you'd figure that you need a lot more, starting with a load balancer (luckily there is MetalLB).
It's all possible, but not simple.
With stuff like proxmox you get a pretty similar level of ease of use to managed VM services too.
Also there were plenty of upstream routing issues where solving that became a headache. The #1 thing we wanted was uptime and the #1 outage was our upstream providers having trouble routing to other upstream providers.
The number one reason and tradeoff for cloud is uptime and availability and the cost of not having it
If I really wanted to optimize for power efficiency, I could do much better. I've seen decent homelab setups (with NAS, router, switch, and some slow compute nodes) that run under 100W, which would cost me only $10 per month in power, and would be far more powerful than a small DigitalOcean droplet.
I do some homelabbing at home and i do work for some "big tech". The difference, essentially, is reliability and high availability.
Most homelab posts i see are one decent (not even large) disaster away from losing everything.
I do something in that space at home, mostly around data backup and replication, but i am well aware that in case of decent disaster I'd probably be at least a couple of days offline (potentially up to one or two weeks).
Most people underestimate facet of the discussion.
I'm using about 60-100W for my home-prod, and a lot of it is "older". I'm running about 15 small VMs at any given time these days, and probably 20 containers.
I think my biggest single draw is the Mikrotik CCR1036 in the garage, but it saved me from buying new gear. Sure there's a break even point with hydro, but that's years in the future when the device is free. It's also pretty fun to watch VPN connections testing at 700Mbps from home.
I don't really care about uptime, and I've got gigabit fibre to the house, so bandwidth isn't a huge problem. It worked fine on 300/60Mb cable too.
Ryzen 3 2200G, 32GB RAM, 1T NVMe, 10TB HDD. This one runs services.
Orange Pi 5 16GB with another 500GB NVMe. This runs redundant services and monitoring.
Next time I have the energy (hah) for this flavor of home maintenance, the idea is to split the work it does off to a few fanless systems, I'm pretty sure I can knock at least 100W off that. Main challenge there is storage - I have a SAS shelf and need a low-wattage machine that speaks SF-8088.
Experimented some with a couple Raspberry Pis for some things, but they just don't seem built to run 24/7. One lasted about 4 months, the other died at about 12. (They PXEbooted, no local storage, it wasn't that.)
I think even with the cost of electricity, you can easily beat cloud hosting on a per-month basis. But, factoring in the initial cost of hardware and electricity, it's probably a wash.
But then, if you're running a hypervisor, and would otherwise have a LOT of VMs in the cloud, maybe it swings back the other way?
That’s a UniFi Dream Machine Pro, UniFi 24 port switch (powering two APs), 3x Dell R620s with a few SSDs and NVMe, and 2x Supermicros (one of which is the aforementioned backup server), each with a lot of spinners. Also some additional load from the overhead of the rack UPS.
I pay about $0.08/kWh, although with the base fee of $40 it's more like $0.11/kWh. In any case, it means I pay maybe an extra $30-40/month for my homelab, plus whatever additional heat load costs it places on my A/C.
If I moved to somewhere where electricity was significantly pricier, I would probably either invest in home solar, compress compute to a single node, or both.
That's the main advantage of the cloud early on -- flexibility.
Sit back and relax? Being massively overprovisioned is a benefit of homelabs.
there are many dimensions to provisioning, not all of them are one ebay/amazon/newegg purchase away.
you could hardly get a symmetrical 10 gbps internet connection at home (in most places), and if you do it would be unlikely to be timely (and in that case, your business could probably be suffering).
Frankly, i think that the time when your startup is taking off might be the right time to start thinking about moving to the cloud (or to a proper datacenter).
If anything, if your startup is taking off then you're starting to get a real sense of what kind of compute and storage you actually need, and can maybe negotiate accordingly (eg: long-term committment for resources in some clouds give you very relevant discounts).
EDIT: regarding the internet connection... on a consumer connection, most contracts include a minimal guaranteed bandwidth that's usually way lower than the advertised peak bandwidth. i wouldn't be surprised to discover people getting throttled at those speeds if they start getting serious traffic...
What if your internet goes out? Even with a business line, I've had to wait five days for them to replace a fiber line that a squirrel chewed through.
What if the power goes out? I just had a five hour power outage. Even if you have a battery backup, when the neighborhood power is out for a while, the ISP equipment will die when its batteries go out.
What if your hardware dies and you aren't home to switch it out, assuming you even have spare hardware?
What if your A/C goes out and your server overheats and has to get shut down?
All of these are things you usually don't have to deal with when using the cloud or even a $5 VPS, because they design for all of these failure cases from the start.
If you're running a business from your house, it is by definition a lifestyle business, and that's not really what we are talking about here.
BTW, I can get multiple lines if i'd ever need it
At what point does a dynamic-but-unchanging IP become functionally static?
I feel like the biggest difference is the fact that there's no guarantee that the dynamic IP won't change, so all systems need to be prepared for that, or you need to be mentally prepared for that day.
* Egress pricing margins ensure lock-in which makes it hard to builds competitors to “commodity” services (eg you can’t spin up your own price-competitive S3 within AWS). Lack of competition means less innovation and more expensive pricing.
* While compute costs don’t necessarily come down, the CPUs get more powerful. At scale, this should be the same as prices coming down. However, it’s not exact and cloud provides pocket this difference as profit / R&D investment.
* SRE costs are a huge chunk of own infra (managing servers at scale). If you’re small, this is a negligible cost. If you’re a large business this is a huge cost. Cloud providers target large businesses so the savings when you’re smaller are less obvious.
* Elasticity is a huge part of cloud capabilities. Most people use dynamic paygo pricing which is more expensive than baseload demand which cloud provides typically discount because baseload revenues can be used for purchasing additional capacity.
- a Intel 3770K was released in 2012 for $330 retail, or 20 CPUMark/2012$. - a AMD 7800X3D was released in 2022 for $450 retail ($330 in 2012 dollars), or 105 CPUMark/2012$ or 77 CPUMark/2023$
Looks like consumer performance/price has improved by 5.3x.
I'm not sure what exactly the math works out here for cloud hardware, since I don't really have any performance numbers. I'd still expect it to be quite a bit less improvement, since it "feels" like server CPUs have gotten better at cramming more cores on a die, rather than increase each core's performance. Looking at Xeon Platinum 8480+ (2023, MSRP $10k), there's 56 cores@3154 ST CPU Mark, vs. Xeon E7-8890 v2 (2014, MSRP $6.8k) has 15 cores@2175 ST CPU Mark.
Assuming your cloud prices are correct, and with these rough performance figures, you're getting a 4.5x cloud performance/price improvement, significantly worse than with real hardware.
So you can't factor in things like inflation and hardware performance, which equally made self-hosted solutions cheaper.
There are three companies doing monopolistic things, they have got away with it because of that plurality.
It wasn't in the past.
It wouldn't be if they had to compete with providers that also wanted to compete.
if not, then instead of migrating just negotiate.
- there are a lot of companies that nowadays operates their computing infrastructure.
- there is a lot of space in the middle, you are not forced to choose just over full self-host and full-cloud, there is colocation, hosting, providers like hertzner, digital ocean and so on.
You can pay for a lot of cloud resources for that much, even when the cloud resources are massively overpriced compared to what you could manage if you had more scale.
The only other alternative is finding someone who's good enough at it to make that full time sysadmin job just a small part of their "actual" job responsibilities, but that person is pretty hard to find.
The cloud promises to make admin tasks easier, however I have never seen it eliminate the role of a sysadmin in practice. In my experience, most organizations that run their infra on a cloud still have dedicated admin roles (often called "DevOps" or infra teams).
Hence, I think that the claim that a sysadmin's salary is an extra expense for self-managed infrastructure is exaggerated. You may need more sysadmins for achieving the same features in a self-managed setup compared to the cloud, but it is not a linear scale.
Then you get a step function cost increase building and operating all of the services your developers aren’t getting out of the box.
IMHO, the biggest wins for cloud are when you can't fill up a whole machine and when your needs vary significantly throughout the day and you can scale up and down quickly and not have to pay for unused capacity.
As long as you know your predicted growth yes. Cloud providers are operating "efficiently" because they do oversubscription on a lot of services. You don't really think that those 8 vCPUs are dedicated to your VM only, do you?
Also, the thing with cloud are the managed services that make it easy for developers to process/exchange data, like Amazon SQS or cloud functions and things like this.
Like the first comment says, if you know what you are doing, it is quite easy actually to host your infrastructure on-prem.
In one hand I am very thankful that this caused a boom of innovation and tooling to make engineering operations easier and more accessible to our industry as a whole. If I had the tools I have now, managing 50+ racks in my old datacenter would have been so much easier.
But in the other hand, the promises the Cloud providers have shoved down our throats have been used against our Industry with raised prices, reduce freedoms on what providers you can choose, and consolidation of duties on Engineering staff. The average developer has to do more work, across many different domains to get paid well. The war waged to kill off Systems Administration as a core job responsibility and using the DevOps movement to kill FTE count for specialized work on bare metal or datacenter infra keep us locked into using Cloud APIs to do our jobs. I think this really sucks.
I do think this article hits home on some points I have been seeing over the past few years. The costs are too high for the goods provided in comparison to getting a couple of racks in few datacenters and DIY which can for sure work for the right kinds of companies at the right stage of growth. I also think Hybrid deployments of Cloud + Bare Metal in a datacenter is so much more of a viable option for less mature companies. The pendulum is starting to slowly swing back to doing things on Bare Metal.
I am excited for the next 10 years of progress and I hope the Cloud still remains a powerful tool in our toolbox, but it isn't used for everything. The cost savings are too real as this article points out when you don't use the cloud and you'd be surprised how much easier it is these days to get some bare metal deployments going in your eng org.
I love that we can use Cloud APIs to do our jobs. That's piles better than negotiating with (and waiting for) an Ops team to provision you hardware or networking that you don't have much visibility into and then relying on them to troubleshoot.
I don't mind paying more for that (which you absolutely do on a per-unit of compute, network, or storage).
I'd argue you have less visibility on what is going on at a cloud provider than you did in the past with your Ops team. Support tickets without a massive yearly spend at a cloud provider are pain and suffering. Even with over 1 million/year spend at one of my previous companies, support and visibility on underlying issues was a very poor experience.
Even with that, AWS's out of the box offering utterly dominates what we could offer as a medium-sized operation. You might have 5 arbitrary points' worth less visibility behind the scenes, but you need 15 fewer points of visibility behind the scenes. (We're also on enterprise support and have an excellent support & solution architecture team attached to our account, but even for my side projects where I have no support plan whatsoever, I have enough visibility into how to make things work.)
I don't think anyone wants to rely on a human to unload a rack from the car, walk into a data center and install it.
Doesn't mean that Lambda and hosted kubernetes should be our only alternatives.
> I don't mind paying more for that
I mean, it sounds like this is all through work. So regarding your second point about being okay paying more: are you the one cutting the cheques?
This phrase is key:
"You’re crazy if you don’t start in the cloud; you’re crazy if you stay on it"
Perhaps I'd say that "you're crazy if you keep all your IT on it, as you scale".
There's still a case to be made for a small % of your workloads in the cloud. The flexibility is priceless. But for everything else, there's a well managed on-prem.
Side note: interesting how, after Mark Andreessen's disastrous post on techno-optimism a few days ago, I approached the reading of this article with rolled eyes and low expectations. Seeing Marting Casado as the author immediately changed it, however.
Spot on. Gonna have to steal that for my next meeting.
Although, given the potential Cavium-Marvell TPM issue viz ECC and that we're not really there with homomorphic compute, I might make a caveat for developers working on very sensitive new ideas; Keep it on-prem until you have a good enough security segmentation to push out what you're happy with on untrusted cloud infra.
And by scale, I think companies should be at least in the double digit millions spent in cloud infrastructure before they start building out their own. Everybody is used to the cloud APIs and going back to the old days of managing things in a static datacenter are not coming back for these companies.
This doesn't help much.
It doesn't talk about hybrid.
What about AI. Do I go buy a bunch of H100s at peak price when something better/cheaper might be out in 3 years. What if my load is spikey.
It should be perhaps "You're crazy if you are not strategic with cloud vs. on premise spending"
That’s begging a number of questions, however - the number of places who can approach a cloud service portfolio at all, much less with a cost savings or security parity, has to be at least an order of magnitude smaller than the number of places who incorrectly thought they could.
Off-cloud (CDN, Hyperscale compute, *aaS, on-prem) Fit on a $5 VPS; Have simpler neat 12 factor PaaSy container workloads (Fly/Vercel); Well-defined heavy workloads; heavy egress (use CDN); heavy non-interruptible AI/specialist compute; large storage (CDN for objects) or compute-local with high IO (colo, onprem); HA with more effort; Specialist DB or boutique DBaaS (Supabase); Portable security (k8s) Iff you have invested in portability (containers, k8s, FaaS frameworks, interruptible workloads, high level APIs), HA testing and appopriate security
I find that quite hard to believe.
I mean, I can see that if you were selling cloud backup services and stored customer data on S3, I can understand S3 costs being a big part of your budget.
But for the vast majority of businesses - I'd expect a supermarket selling a $50 basket of groceries to spend maybe $0.01 on database storage and CPU and whatnot.
I don't see that a company like Slack, which has revenue of about $8.75/user/month would need to spend anything like that much on cloud costs.
If you are selling just software and 80% of that is on the cloud you are undercharging or wasting crazy resources.
Best in class SaaS costs are like 10% of revenue
I started helping maintain https://ec2instances.info in my spare time and the code base is literally full of IF statements to paper over AWS billing quirks. Later, I joined Vantage, one of the companies linked in the article.
It's a little bit undecided whether finops will have the growth that devops did but the problem is seemingly felt very acutely.
absolutely. massive struggle at the orgs I've been / am at.
processes for provisioning new resources haven't changed, and a lot of teams kind of push their own builds, or get reservations for X amount of space and dollars and then they can play with their space.
the result is OpEx shifts wildly from one quarter to the next, and control has been hit or miss. it continues mostly because projects need to deliver -- and do -- but that "do what you gotta do to make it work" approach has turned basically into the wild west and no one plays by the rules.
If you are a changing, growing, non-technology company then cloud is the place you should be. You will get screwed by Oracle, you will get screwed by Accenture. You will not be able to optimise your workload and you may as well get screwed by aws/azure/gcp and not think about infrastructure that much instead.
However - this would take a huge amount of upfront investment, have high risks for black swan events against 5y contracts without crypto-style escrow, and I have been unsuccessful pitching this direction in elevators to folks, which makes me think I don't have a visceral enough idea.
[0] https://lambdalabs.com/blog/cutting-the-cost-of-high-perform...
Consider how much premiums would be on $TSLA put options that expire in 2028.
What about a good way to to judge the cost of the cloud - pricing has always been overly complex and arcane and full of gaps in understanding the overhead involved
I suspect fewer cloud migrations would occur if companies could map their onprem costs to specific products much more easily rather than going "Oh it might be a lot for this team but who knows?", which opens the room to someone to sell the pricing transparency use case to an exec.
It adds a bit of ops complexity at the start though. But running your own rack starts to make sense when your AWS bill is at least the same order of magnitude as an SRE salary; at that point, you already have needs that are varied enough.
We’re going to be running all of our own dev/test GPU servers with only inference hosted outside our cluster.
I mean, Dropbox itself is essentially a provider of cloud storage. It seems like owning their own data centers was obvious from the beginning: Use AWS initially to scale, but given the size of Dropbox it obviously doesn't make sense to have another middleman in the mix.
I just point this out because Dropbox is so unlike your average enterprise user of the cloud. I'd like to see some better examples than Dropbox of enterprises saving more (or innovating as quickly) by going on their own.
The cost of cloud, a trillion dollar paradox https://news.ycombinator.com/item?id=27401081 (June 4, 2021 — 7 points, 2 comments)
The Cost of Cloud, a Trillion Dollar Paradox - https://news.ycombinator.com/item?id=32832433 - Sept 2022 (15 comments)
The Cost of Cloud, a Trillion Dollar Paradox - https://news.ycombinator.com/item?id=32672553 - Sept 2022 (1 comment)
The cost of cloud, a trillion dollar paradox - https://news.ycombinator.com/item?id=27401081 - June 2021 (2 comments)
The cost of cloud, a trillion dollar paradox - https://news.ycombinator.com/item?id=27306742 - May 2021 (207 comments)
Would have been a lot easier if we had S3 (launched around this time, no one really knew about it or what it was).
There will be big growth in the "make it easier to manage on prem deployments" sector. Obviously you need capable sysadmin if you are going to move workloads away from cloud but improving the tech (I'm thinking of Oxide Computer here) will make it palatable for smaller and smaller orgs to consider repatriation, as they can do so with less effort and expertise.
This makes me hopeful as well that internet infrastructure will become less centralized. It's closer to the spirit of the internet that not every bit of information you see online passes through us-east-1.
May 27, 2021 https://news.ycombinator.com/item?id=27306742 207 comments
Sept 14, 2022 https://news.ycombinator.com/item?id=32832433 16 comments
If you are paying someone else for the service as a package, the costs scale linearly, or some close approximation of that. AWS gives you a small discount if you have larger volumes for some services, but these are essentially a rounding error.
If instead you launch a similar service in house, your costs are often dramatically lower per unit the more units you use. You start with a small team supporting a system with some large capacity you're not using yet because you need room to grow, and you realize large steps along the way where you can handle tens or hundreds of times the volume you see at each staffing level. Buying hardware or developing custom software gets cheaper at volume. This is sub-linear scaling.
repatriation is not the global solution, it applies to mature and large companies
the dimension to explore is co-ownership of infrastructure by multiple independent companies, both small and large
a captive entity - wholly owned by member companies - manages a vast pool of resources achieving both the economies of scale and the flexibility needed by growing companies.
you don't "rent" somebody elses linux server, you invest and own a share.
R&D for evolving the platform is something outsourced to third parties (maybe even current cloud providers)
Not to mention the difficulty in hiring people who are actually comfortable with doing the infra work of setting up all our services, with the required multi-AZ config, on bare metal. Its hard enough finding people who can reliably setup services in aws with terraform and keep them alive.
One of the main reasons for using cloud infrastructure is that many people know how to use it, internal documentation requirements are dramatically lessened, and you don't need to higher infrastructure IT employees to maintain and monitor it.
If cloud only costs double internal IT, it's almost certainly cheaper until you reach a billion dollar valuation - or more depending on what your infra needs are.
Can the same be said of all F500 companies that may want to basically engineer their own cloud? Because the dev team is almost certainly used to all the convenience of the cloud, so you'll need a good infra team.
https://world.hey.com/dhh/why-we-re-leaving-the-cloud-654b47...