Your comparisons are similar to many others out there that focus on measuring basic cpu and memory. This type of easy comparison where AWS/Azure/GCP is treated as a "dumb" datacenter is easy for alternatives like Hetzner or self-hosting to "win".
>Do you really need the advanced features of AWS and Azure right now? Or would a simple virtual machine at a reasonable price be sufficient? [...] There’s a growing movement among tech companies and startups to opt for more cost-effective hosting solutions like Hetzner. The high costs associated with AWS and Azure
Many (most?) YC startups are not using AWS as a low-level dumb data center with blank EC2 virtual machines and installing infrastructure software like Linux and PostgreSQL on it. Instead, they are using higher-level AWS managed services such as DynamoDB, Kinesis, SQS, etc :
Therefore, the more difficult comparison (that almost no blog post ever does) is the startup's costs for its employees to re-create/re-invent the set of higher-level AWS services that they need.
Sure, there's the "but you don't need to pay expensive AWS costs for DynamoDB when one can just install open-source Cassandra at Hetzner; and instead of AWS Kinesis, install your own Kafka, etc". Well, you add up more and more of those "just install and manage your own X,Y,Zs" and you can end up crossing the threshold where paying AWS cloud fees cost less than your staff maintaining it. The threshold for AWS isn't just massive scale of 100+ million users. The threshold can be the complexity and scope of higher-level services you need the cloud to take care of on your behalf so your small team can concentrate on the aspects of the business that are true differentiators. In other words, instead of employees installing Cassandra, they're adding features to the smartphone app.
If your company doesn't need any of the Big 3 clouds' higher-level platform services, it's easier to save money with alternatives.
As soon as your startup does get big, it starts to make more sense to try and migrate to 'dumb' machines and save on infrastructure costs, especially if your business is low margin and your infrastructure costs are high.
And adding one dev/engineer is _massively_ more expensive, so you seldom want to scale in that axis when the option is to, say, use a managed database or even a complete data pipeline.
If you have a good understanding of load up front, however, those are probably non-issues.
E.g., if you're running K8s (one thing I typically recommend you buy a managed one of), you can install your own Kafka in it, using an operator that does about 85% of what MSK does.
Sure, you'll need to dedicate person hours to support the operator, but is supporting that any more expensive than supporting AWS products? That you're already paying through the nose for?
If you are bootstrapping a crud app business then 1 beefy hetzner box (or something slightly more reliable) with postgresql is probably fine until you reach scale where you sell the business. You care about burn rate above all.
If you are VC backed go all in on gcp or aws because thats what you're expected to do and and what the expensive people you hire are going to know.
Same with RDS, etc.
It’s pretty great not to waste time when the lottery for the bizarrest of 0.000001% issues arise.
The operator only solves the happy path. An AWS support ticket usually can solve the unhappy path.
For high-scale operations, you need to think real hard about how you do things and usually simplicity is key, and trying to do a little as possible on the high throughput parts is useful.
The costs do add up when you have professionals maintaining your Cassadra/Kafka boxes, but the same degree of complexity exists on AWS, when you try to weave together a tapestry of EC2s, lambdas, various storage services, with all the delicious complexity of multiple VPCs and networking fineries while not blowing the budget.
It's a different skillset, but not less work.
Even storage in hyperscalers is inherently redundant—and I keep getting folk who ask about setting up their own RAID array, or using their own containers and job management when there’s a dozen zero-code alternatives in each individual hyperscaler.
Part of me thinks, man, the engineers not afraid of setting up a p Postgres or Redis really should be worth a lot more, given how absurd the prices can get. I guess the getting started costs for these services are usually manageable though; by the time the bill is big it's a "nice problem to have" because you have significant load now, and presumably customers & revenue to show for it.
More so, I think orgs are somewhat rightfully afraid of running infra because historically we have been bad at it. It's been every sys-op or devops for themselves in the world. Everyone making their own practices, assembling their own stack of networking setup, init scripts, db procedures, monitoring, alerting, resilience/reliability. This stuff has a lot of dimensions of care to it.
And even when you go the extra mile to document everything, it's still rough to hand-off ownership. A new gal joins; how long does it take to get comfortable? And how much will her style & preferences mesh with whats been string up so far? Or worse, what happens when someone quits? How load bearing were they?
And this is why I'm so humungouely excited about Kubernetes. Fleet was pretty sweet & cool & direct in the past, RIP, but like so many of the "way to run containers" option it was just that: a way to run containers. Having an extensible system, where operators keep networking, storage, databases running, where tasks like backups and migrations and high availability are built in to well tested controllers: it cuts out so so so many things that operators had to discover, socialize, and test test test test test test before. There's such incredibly good load bearing systems-that-maintain-systems (i.g. autonomic) available, that compete very much with the paid for/managed services that have done likewise for us for so long.
And it's a consistent paradigm, for whatever you are up to. Write a manifest with what you want, send it to api-server, wait for operator to make it so. Instead of having different dimensions or concerns have different operational paradigms & styles, there's a unified extensible Desired State Management that does a damn good job.
It felt like running services was in a dark ages for so long, that each.shop was fractured & alone with their infrastructure, and it was obvious why managed services were winning. But today there's a hope that we can run services, well, in a way that will be very clear & explicit if it ever needs to be handed off.
But only if they agree to be on call 24/7 to support what they deployed. Ask engineers to guarantee you won’t loose data and see how they tell you to buy RDS.
To add, if you every want to get ISO/PCIDSS etc certification done then good luck implementing gazillion check list items which Azure/AWS/GCP have already taken care of.
Both of these scale to zero and offer 180k vCPU/s free per month, 360k GB/s free per month. You incur billing only against the active execution time. Cloud Run Jobs has a whole separate free monthly grant as well.
You can run A LOT for free within those constraints. Certainly a blog or website. To prevent cold starts, just set up Cloud Scheduler (also free for this purpose) to ping the container every few minutes.
Use Supabase for a DB or one of the serverless options (if it works for your data use case) like Firestore, CosmosDB and you can run workloads for a few cents per month with an architecture that will scale easily if you need it to.
6 min video showing the receipts and how easy this is: https://youtu.be/GlnEm7JyvyY
PaaS services or even VM scaling sets with volatile instances can still be stupefyingly cheaper, but that point is really hard to make to architecture astronauts.
> They’re conceptually simple, but you soon realize that you need at least a couple of 24/7 always on boxes and that you only really should use Cloud Run-like services for burstable workloads.
This is simply not true and Cloud Run-like services offer an easy path for progressive scaling.1. You can scale it to 0 at the outset as you build your app
2. You can set it to scale to a minimum of n instances (e.g. minimum 1, 2) to have fast response times
3. If you find a need for a 24x7 instance, take the same container image and you can launch a Compute Engine instance with the container directly and scale that way.
4. If you need more control beyond that, move those containers into GKE Autopilot or full GKE or your container orchestrator of choice.
Not only is it easy and free to get started, it provides a straightforward path to adapting the underlying deployment and compute model based on needs as the app scales without the need to pay anything until you actually need 24x7 compute (and even then, it's a matter of setting your Cloud Run service to min=1 instances to get 24x7 compute or configuring a CE instance with the same exact container).
Most people think it is easier to use EC2 than FarGate since the first is the most famous one. But actually, it is the other way around!
Hetzner doesn't have the services AWS provides, that's the reason most companies I know use AWS for.
If we could run our crap on any server, we would, but managed services are still cost-effective vs hiring our own 24/7/365 rotation of on-call ops people.
They have the skills, cash flow, and resources to do whatever they want.
Yeah if people had less shaky stacks. But it is always easier to pay someone to run the hack.
They will have object storage soon, but dont hold your breath for one-click kubernetes etc. So the fancier you infrastructure, the more you your startup would need to invest in time and money to use Hetzner and thus make it "not worth it".
There is also a gpt that you can use that will genereate you the module block based on your requirements.
The pricing is more on par with Digital Ocean/Linode.
Maybe, just maybe, I want to use LVM or something entirely unknown to them. Not necessarily in a privacy sense, but control.
Take the recent Lichess downtime, for example. Their main server had a hardware issue that required physical intervention. This meant the site was down for over 10 hours, and there wasn't much they could do except wait for OVH to send a tech.
If Lichess had been on AWS, the provider would have automatically moved their workload to a functioning server, and the outage would have been much shorter or possibly avoided altogether.
For Lichess, a non-profit, this tradeoff still make sense. Their service, while important to its users, isn't critical. Nobody dies if Lichess is down and the cost savings help them keep running. But if your business can't afford downtime, the extra guarantees from a public cloud provider can definitely be worth paying for.
If you not a HN person with systemadmin skills yes. But is NOT that hard to have in house RADI hd setup, with failover server. Or failover NAT gateway. AWS and cloud provider are just a rip off.
Lichess admins are highly skilled and I'm sure they already have a well designed infrastructure. You can see what they use at https://docs.google.com/spreadsheets/d/1Si3PMUJGR9KrpE5lngSk...
The issue was on a network equipment that they didn't even manage. You can't load balance when your core network is down. There was nothing they could do as I understand it.
More details at: https://lichess.org/@/Lichess/blog/post-mortem-of-our-longes...
OPs comment is valid - physical servers might incur downtime.
But I do agree with your sentiment. "Downtime" is not an argument which should tilt the discussion towards either physical servers or the cloud. AWS data centers famously also have outages, while physical servers often have uptimes of multiple years. So what's better? It's hard to tell, but at the very least, none of these solutions is downtime-free.
Hetzner starts at 50 Euro, only has servers and Europe and is going to require a ton more work.
AWS has the right idea, they give everyone who asks nicely thousands in free credits to get started. Then 2 years in your hooked. I don't want to learn a new system.
It will take slightly more effort than Lightsail, yes.
I still don't think I feel like migration though. Captain Rover isn't exactly lightweight.
For example, instead of the ancient F8 series used in the article, a modern D8as_v5 Azure instance under a 3-year Savings Plan is $115/mo.
Also, the article compares CPX41 to EC2 and Azure VMs with dedicated cores, not shared cores. The CCX33 Hetzner model is closer to the normal clouds, and costs $50/mo, so now we're at 2x the price instead of 10x the price. (Conversely, the B8als_v2 size uses shared cores and is also 2x the price of CPX41 at $74/mo)
For that 2x cost you get a lot more features, first-party and third-party support, more locations, faster networking, etc... That's worth it for most large enterprises that care about ticking checkboxes on audit reports more than absolute cost. Or to put it this way: the annual price difference is just $600, which is the same cost to an org as half a day of engineer-time or less. If Hetzner is the slightest bit more difficult than a large public cloud VM for anything, ever, then it's not cheaper. This could be patching, maintenance, migrations, backup, recovery, automation, encryption, or just about anything else.
There are other differences as well. Hetzner has a separate charge for load balancers and IP addresses, whereas with Azure they're included in the price of the VM.
The biggest cost difference is that the public clouds charge eyewatering amounts for Internet egress traffic. Azure is about 100x as expensive as Hetzner, which is just crazy.
"In the beginning" the clouds promised to use their scale to soak up your unpredictable demand. You as the customer didn't have to think about capacity, or planning ahead, budgets, opex, etc... Just swipe your credit card and go from zero to any number you please and back again at any time of your choosing. Because there are so many other customers using the cloud with you, the unpredictable nature of your individual usage is averaged out and the cloud vendor gets a (slightly) noisy but manageable usage level of their resources. They have to work a little harder to predict future capacity needs, but you pay a premium for this.
"A little later" the MBAs realise that they can squeeze 5% more profit out of their customers with lock-in contracts that make everything "nice and predictable" instead of the stochastic noise they had to "deal with" before. Getting rid of that makes things a lot harder for you as the customer, but they don't care. They care about that 5%.
Ta-da... we're back at having to "procure", we're back at budgets that have to be planned for 3 years in advance, we're back at having to have time machines.
YMMV but all costs aren't instance costs.
And they're not just salespeople, they've actually said multiple times if a feature doesn't work for us without trying to hold it wrong in a dangerous (and expensive) way.
Can you give examples of this? I'd love to hear more about the kinds of guidance they can give.
This is one of the more important points and why the point "The learning curve of a single server isn't so big, especially when compared to AWS" is sitting a bit wrong with me.
Sure, if you talk about 1 VM, I agree. And I wouldn't second guess doing this, at all. It would be my initial plan as well as long as I don't have to make any strong availability guarantees. And for this use case, I'd call AWS a bad choice. It's not a simple VM provider.
But once you start running e.g. a redundant postgres cluster for updates without downtime, the amount of stuff to know also grows, a lot. Suddenly you also need backups, tests of backups. And this is where AWS/the cloud allows you to save time, and treadmill time.
Would probably give them way more budget in actually building applications than running the infrastructure.
Maybe I'll extend the article to include the point of using a managed postgres at AWS / Azure / fly.io, whatever, in combination with Hetzner VMs.
Even with automation tools like Ansible or immutable server images, packing as Docker images and running on a container orchestrator have always been much easier.
It seems lost on the authors that yes that might work for some folks just fine, but others really do want the Land Rover and all its additional baked in features beyond getting you from A to B.
If you're looking for a cheap one-off server, the server auction has some very good deals.
[0] Full details at https://blog.searchmysite.net/posts/migrating-off-aws-has-re...
They're leaving other things on AWS, i.e. partial migration is quite doable.
I have only stumbled on one service that do it. its a datadog alternative, so the bar is not that high for pricing.