With that cost reduction you also removed monitoring of the platform, people oncall to fix issues that appear, upgrades, continuous improvements, etc. Who/What is going to be doing that on this new platform and how much does that cost?
Now you need to maintain k8s, postgresql, elasticsearch, redis, secret managements, OSs, storage... These are complex systems that require people understanding how they internally work, how they scale and common pitfalls.
Who is going to upgrade kubernetes when they release a new version that has breaking changes? What happens when Elasticsearch decides to splitbrain and your search stops working? When the DB goes down or you need to set up replication? What is monitoring replication lag? Or even simply things like disks being close to full? What is acting on that?
I don't mean to say Heroku is fairly priced (I honestly have no idea) but this comparison is not apples to apples. You could have your team focused on your product before. Now you need people dedicated to work on this stuff.
Whenever I see people doing something like this I remember I did the same when I was in 10 people startups and it required A LOT of work to keep all these things running (mostly because back then we didn't have all these cloud managed systems) and that time would have been better invested in the product instead of wasting time figuring out how these tools work.
I see value in this kind of work if you're at the scale of something like Dropbox and moving from S3 will greatly improve your bottom line and you have a team that knows exactly what they're doing and will be assigned the maintenance of this work. If this is being done merely from a cost cutting perspective and you don't have the people that understand these systems, its a recipe for disaster and once shit is on fire the people that would be assigned to "fix" the problem will quickly disappear because the "on call schedule is insane".
It really depends on what you're doing. Back then a lot of non-VC startups worked better and the savings possibly helped. It also helps grow the team and have less reliance on the vendor. It's long term value.
Is it really time wasted? People often go into resume building mode and do all kinds of wacky things regardless. Perhaps this just helps scratch that itch.
All by design, really, because at that point you're not part of an engineering team you're a code monkey operating in service of growth metrics.
Honest question: how long ago was that? I stepped away from that ecosystem four or so years ago. Perhaps ease of use has substantially improved?
You don't think they have any monitoring within Kubernetes?
I imagine they have more monitoring capabilities now than they did with Heroku.
The reason I think parent comment is FUD isn't because I don't acknowledge tradeoffs (they are very real).
It's because parent comment implies that people behind "reclaim the stack" didn't account for the monitoring, people's cost etc.
Obviously any reasonable person making that decision includes it into calculation. Obviously nobody sane throws entire monitoring out of the window for savings.
Accounting for all of these it can be still viable and significantly cheaper to run own infra. Especially if you operate outside of the US and you're able to eat an initial investment.
That said, at least in 2016 Heroku was way overpriced for high volume sites. My startup of 10 engineers w/ 1M monthly active users saved 300k+/yr switching off heroku. But we had Jerry. Jerry was a beast and did most of the migration work in a month, with some dead-simple AWS scaling. His solution lacked many of the features of Heroku, but it massively reduced costs for developers running full test stacks which, in turn increased internal productivity. And did I mention it was dead simple? It's hard to overstate how valuable this was for the rest of us, who could easily grok the inner workings and know the consequences of our decisions.
Perhaps this stack will open that opportunity to less equipped startups, but I've found few open source "drop-in replacements" to be truly drop-in. And I've never found k3 to be dead simple.
Also AWS is also, complex, also requires configuration and also generates alerts in the middle of the night.
It's still a lot cheaper than managed service.
You just mentioned one dimension of what I described, and "when you know what you are doing" is doing a lot of the heavy lifting in your argument.
> Also AWS is also, complex, also requires configuration and also generates alerts in the middle of the night.
I'm confused. So we are on agreement there?
I feel you might be confusing my point with an on-prem vs AWS discussion, and that's not it.
This is encouraging teams to run databases / search / cache / secrets and everything on top of k8s and assuming a magic k8s operator is doing the same job as a team of humans and automation managing all those services for you.
You can also trade operational complexity for cash via support contracts and/or enterprise solutions (like just throwing money at Hitachi for storage rather than trying to keep Ceph alive).
And that number might be high, in larger more established companies there might be more engineers who want to stick to their comfort bubble. So many developers reject the idea of writing SQL themselves instead of having the ORM do it, let alone know how to configure replication and failover.
I'd maybe hire for the people who could and would, but the people advocating for just having the cloud take care of these things have a point. You might miss out on an excellent application engineer, if you reject them for not having any Linux skills.
> you also removed monitoring of the platform
No we did not: Monitoring: https://reclaim-the-stack.com/docs/platform-components/monit...
Log aggregation: https://reclaim-the-stack.com/docs/platform-components/log-a...
Observability is on the whole better than what we had at Heroku since we now have direct access to realtime resource consumption of all infrastructure parts. We also have infinite log retention which would have been prohibitively expensive using Heroku logging addons (though we cap retention at 12 months for GDPR reasons).
> Who/What is going to be doing that on this new platform and how much does that cost?
Me and my colleague who created the tool together manage infrastructure / OS upgrades and look into issues etc. So far we've been in production 1.5 years on this platform. On average we spent perhaps 3 days per month doing platform related work (mostly software upgrades). The rest we spend on full stack application development.
The hypothesis for migrating to Kubernetes was that the available database operators would be robust enough to automate all common high availability / backup / disaster recovery issues. This has proven to be true, apart from the Redis operator which has been our only pain point from a software point of view so far. We are currently rolling out a replacement approach using our own Kubernetes templates instead of relying on an operator at all for Redis.
> Now you need to maintain k8s, postgresql, elasticsearch, redis, secret managements, OSs, storage... These are complex systems that require people understanding how they internally work
Thanks to Talos Linux (https://www.talos.dev/), maintaining K8s has been a non issue.
Running databases via operators has been a non issue, apart from Redis.
Secret management via sealed secrets + CLI tooling has been a non issue (https://reclaim-the-stack.com/docs/platform-components/secre...)
OS management with Talos Linux has been a learning curve but not too bad. We built talos-manager to manage bootstrapping new nodes to our cluster straight forward (https://reclaim-the-stack.com/docs/talos-manager/introductio...). The only remaining OS related maintenance is OS upgrades, which requires rebooting servers, but that's about it.
For storage we chose to go with simple local storage instead of complicated network based storage (https://reclaim-the-stack.com/docs/platform-components/persi...). Our servers come with datacenter grade NVMe drives. All our databases are replicated across multiple servers so we can gracefully deal with failures, should they occur.
> Who is going to upgrade kubernetes when they release a new version that has breaking changes?
Ugrading kubernetes in general can be done with 0 downtime and is handled by a single talosctl CLI command. Breaking changes in K8s implies changes to existing resource manifest schemas and are detected by tooling before upgrades occur. Given how stable Kubernetes resource schemas are and how averse the community is to push breaking changes I don't expect this to cause major issues going forward. But of course software upgrades will always require due diligence and can sometimes be time consuming, K8s is no exception.
> What happens when ElasticSearch decides to splitbrain and your search stops working?
ElasticSearch, since major version 7, should not enter split brain if correctly deployed across 3 or more nodes. That said, in case of a complete disaster we could either rebuild our index from source of truth (Postgres) or do disaster recovery from off site backups.
It's not like using ElasticCloud protects against these things in any meaningfully different way. However, the feedback loop of contacting support would be slower.
> When the DB goes down or you need to set up replication?
Operators handle failovers. If we would lose all replicas in a major disaster event we would have to recover from off site backups. Same rules would apply for managed databases.
> What is monitoring replication lag?
For Postgres, which is our only critical data source. Replication lag monitoring + alerting is built into the operator.
It should be straight forward to add this for Redis and ElasticSearch as well.
> Or even simply things like disks being close to full?
Disk space monitoring and alerting is built into our monitoring stack.
At the end of the day I can only describe to you the facts of our experience. We have reduced costs to cover hiring about 4 full time DevOps people so far. But we have hired 0 new engineers and are managing fine with just a few days of additional platform maintenance per month.
That said, we're not trying to make the point that EVERYONE should Reclaim the Stack. We documented our thoughts about it here: https://reclaim-the-stack.com/docs/kubernetes-platform/intro...
- Front page links to docs and disord.
- First page of docs only has a link to discord.
- Installation references a "get started" repo that is... somehow also the main repo, not just "get started"?
This also assumes your infra doesn't grow and requires more maintenance or you have to deal with other issues.
Focusing on building features and generating revenue is much valuable than wasting precious engineering time maintain stacks.
This is hardly a "win" in my book.
I had two questions just to learn more.
* What has been your experience with using local NVMes with K8s? It feels like K8s has some assumptions around volume persistence, so I'm curious if these impacted you at all in production.
* How does 'Reclaim the Stack' compare to Kamal? Was migrating off of Heroku your primary motivation for building 'Reclaim the Stack'?
Again, asking just to understand. For context, I'm one of the founders at Ubicloud. We're looking to build a managed K8s service next and evaluating trade-offs related to storage, networking, and IAM. We're also looking at Kamal as a way to deploy web apps. This post is super interesting, so wanted to learn more.
If you're already a web platform with hired talent (and someone using Heroku for a SaaS probably already is), I'd be surprised if the marginal cost was 10x.that paid support is of course coming at a premium, and isn't too flexible on what level of support you need.
And yeah, it isn't apples to apples. Maybe you are in a low CoL area and can find a decent DevOps for 80-100k. Maybe you're in SF and any extra dev will be 250k. It'll vary immensely on cost.
My preferred solution to this problem is different, though. For most businesses, apps, a monolith (maybe with a few extra services) + 1 relational DB is all you need. In such a simple setup, many of the problems faced either disappear or get much smaller.
When some component fails you absolutely do not want to spend time trying to figure out the underlying cause. Almost all the cases you hear in media of exchange outages are due to unnecessary complexity added to what is already a remarkably complex distributed (in most well designed cases) state machine.
You generally want things to be as simple and streamlined as possible so when something does pop (and it will) your mean time to resolution is inside of a minute.
Not for Heroku, they're absolute garbage these days, but definitely for a better run PaaS.
Plenty of situations where running it yourself makes sense of course. If you have the people and the skills available (and the cost tradeoffs make sense) or if downtime really doesn't matter much at all to you then go ahead and consider things like this (or possibly simpler self hosting options, it depdns).But no, "you gotta run kubernettes yourself unless you're a stock exchange" is not a sensible position.
You need to understand your business and your requirements. Us engineers love to think that we can solve everything with the right tools or right engineering solutions. That's not true. There is no "perfect framework." No one sized fits all solution that will magically solve everything. What "stack" you choose, what programming language, which frameworks, which hosting providers ... these are all as much business decisions as they are engineering decisions.
Good engineering isn't just about finding the simplest or cheapest solution. It is about understanding the business requirements and finding the right solution for the business.
You're asking the right questions that only a few people know they need answers to.
In my opinion, the closest thing to "reclaiming the stack" while still being a PaaS is to use a "deploy to your cloud account" PaaS provider. These services offer the convenience of a PaaS provider, yet allow you to "eject" to using the cloud provider on your own should your use case evolve.
Example services include https://stacktape.com, https://flightcontrol.dev, and https://www.withcoherence.com.
I'm also working on a PaaS comparison site at https://paascout.io.
Disclosure: I am a founder of Stacktape.
Big mistake. Overnight, the cluster config files I used were no longer supported by the k8s version DigitalOcean auto upgraded my cluster to and _boom_. Every single business was offline.
Made the switch to some simple bash scripts for bootstrapping/monitoring/scaling and systemd for starting/restarting apps (nodejs). I'll never look back.
My occasional moral dilemma is idle power usage of overprovisioned resources, but we've found some interesting things to throw at idle hardware to ease our conscience about it.
1. Shovel salesman insisting all "real" gold miners use their shovels
2. Those that have already acquired shovels not wanting their purchase to be mocked/have been made in vain.
Neither are grounded in reality. Why people believe their tiny applications require the same tech that Google invented to help manage their (massive) scale is beyond me.
Which was it?
I had a heck of a time finding accurate docs on the correct apiVersion to use for things like my ingress and service files (they had a nasty habit of doing beta versions and changing config patterns w/ little backwards compatibility). This was a few years back when your options were a lot of Googling, SO, etc, so the info I found was mixed/spotty.
As a solo founder, I found what worked at the time and assumed (foolishly, in retrospect) that it would just continue to work as my needs were modest.
Now, I just build my app to an encrypted tarball, upload it to a secure bucket, and then create a short-lived signed URL for instances to curl the code from. From there, I just install deps on the machine and start up the app with systemd.
IMO, Docker is overkill for 99% of projects, perhaps all. One of those great ideas, poorly executed (and considering the complexity, I understand why).
Damn, that's the dream right there
It's only good for very large scale stuff. And then a lot of the time that is usually well over provisioned and could be done considerably cheaper using almost any other methodology.
The only good part of Kubernetes I have found in the last 4 years of running it in production is that you can deploy any old limping crap to it and it does its best to keep it alive which means you can spend more time writing YAML and upgrading it every 2 minutes.
A pair of load-balanced web servers and a managed database, with Cloudflare out front, will get you really, really far.
Sounds like user error.
There are so many tools that make it easy to build and deploy apps to your servers (with or without containers) and all of them showcase how easy it is to go from a cloud account to a fully deploy app.
While their claims are true, what they don’t talk about is how to maintain the stack, after “reclaiming” it. Version changes, breaking changes, dependency changes and missing dependencies, disaster recovery plans, backups and restores, major shifts in requirements all add up to a large portion of your time.
If you have that kind of team, budget or problem that deserves those, then more power to you.
This is the operative issue, and it drives me crazy. Companies that can afford to deploy thousands of services in the cloud definitely have the resources to develop in-house talent for hosting all of that on-prem, and saving millions per year. However, middle management in the Fortune 500 has been indoctrinated by the religion that you take your advice from consultants and push everything to third parties so that 1) you build your "kingdom" with terribly wasteful budget, and 2) you can never be blamed if something goes wrong.
As a perfect example, in my Fortune 250, we have created a whole new department to figure out what we can do with AI. Rather than spend any effort to develop in-house expertise with a new technology that MANY of us recognize could revolutionize our engineering workflow... we're buying Palatir's GenAI product, and using it to... optimize plant safety. Whatever you know about AI, it's fundamentally based on statistics, and I simply can't imagine a worse application than trying to find patterns in data that BY DEFINITION is all outliers. I literally can't even.
You smack your forehead, and wonder why the people at the top, making millions in TC, can't understand such basic things, but after years of seeing these kinds of short-sighted, wasteful, foolish decisions, you begin to understand that improving the company's abilities, and making it competitive for the future is not the point. What is the point "is an exercise left to the reader."
Wow, this is literally the solution in search of a problem.
So this is not walk in the park with two willing developers to learn k8s.
The underlying apps (Redis, ES) will have version upgrades.
Their respective operators themselves would have version upgrades.
Essential networking fabric (calico, funnel and such) would have upgrades.
The underlying kubernetes itself would have version upgrades.
The Talos Linux itself might need upgrades.
Of all the above, any single upgrade might lead to infamous controller crash loop where pod starts and dies with little to no indication as to why? And that too no ordinary pod but a crucial pod part of some operator supposed to do the housekeeping for you.
k8s is invented at Google and is more suitable in ZIRP world where money is cheap and to change the logo, you have seven designers on payroll discussing for eight months how nine different tones of brand coloring might convey ten different subliminal messages.
You would have to deal with those with or without k8s. I would argue that without it is much more painful.
> Their respective operators themselves would have version upgrades. > > Essential networking fabric (calico, funnel and such) would have upgrades. > > The underlying kubernetes itself would have version upgrades. > > The Talos Linux itself might need upgrades.
How is this different from regular system upgrades you would have to do without k8s?
K8s does add layers on top that you also have to manage, but it solves a bunch of problems in return that you would have to solve by yourself one way or another.
That essential networking fabric gives you a service mesh for free, that allows you to easily deploy, scale, load balance and manage traffic across your entire infrastructure. Building that yourself would take many person-hours and large teams to maintain, whereas k8s allows you to run this with a fraction of the effort and much smaller teams in comparison.
Oh, you don't need any of that? Great. But I would wager you'll find that the hodge podge solution you build and have to maintain years from now will take much more of your time and effort than if you had chosen an industry standard. By that point just switching would be a monumental effort.
> Of all the above, any single upgrade might lead to infamous controller crash loop where pod starts and dies with little to no indication as to why?
Failures and bugs are inevitable. Have you ever had to deal with a Linux kernel bug?
The modern stack is complex enough as it is, and while I'm not vouching for increasing it, if those additional components solve major problems for me, and they become an industry standard, then it would be foolish to go against the grain and reinvent each component once I have a need for it.
I’ve always been a big cloud/managed service guy, but the costs are getting astronomical and I agree the buy vs build of the stack needs a re-evaluation.
If I were putting together a minimum-viable staffing for a 24x7 available cluster with SLAs on RPO and RTO, I’d be recommending much more than two engineers. I’d probably be recommending closer to five: one senior engineer and one junior for the 8-4 shift, a engineer for the 4-12 shift, another engineer for the 12-8 shift, and another junior who straddles the evening and night shifts. For major outages, this still requires on-call time from all of the engineers, and additional staffing may be necessary to offset overtime hours. Given your metric of roughly $8k an engineer, we’d be looking at a cool $40K/month in labour just to approach four or five 9s of availability.
Would someone be able to recommend an approach that's not a hack, for implementing a custom release command on k8s? Downtime is fine, but this one off job needs to run before the user facing pods are available.
K8s et al are not a silver bullet, but at this point they're highly stable and understood pieces of infrastructure. It's much more painful to deviate from this and build things from scratch, deluding yourself that your approach can be simpler. For trivial and experimental workloads that may be the case, but for anything that requires a bit more sophistication these tools end up saving you resources in the long run.
Tangentially, I think this applies to LLMs too.
This is not my area of expertise. Does it add a significant amount of complexity to configure this kind of system in a way that doesn’t require trusting the network? Where are the pain points?
As an infosec guy, I hate to say it but this is IMO very misguided. Insider attacks and external attacks are often indistinguishable because attackers are happy to steal developer credentials or infect their laptops with malware.
Same with trusting the private network. That’s fine and dandy until attackers are in your network, and now they have free rein because you assumed you could keep the bad people outside the walls protecting your soft, squishy insides.
The secondary effects are entirely dependent on how your microservices talk to their dependencies. Are they already talking to some local proxy that handles load balancing and service discovery? If so, then you can bolt on ssl termination at that layer. If not, and your microservice is using dns and making http requests directly to other services, it’s a game of whack-a-mole modifying all of your software to talk to a local “sidecar”; or you have to configure every service to start doing the SSL validation which can explode in complexity when you end up dealing with a bunch of different languages and libraries.
None of it is impossible by any means, and many companies/stacks do all of this successfully, but it’s all work that doesn’t add features, can lead to performance degradation, and is a hard sell to get funding/time for because your boss’s boss almost certainly trusts the cloud provider to handle such things at their network layer unless they have very specific security requirements and knowledge.
In my experience, that access control is necessary for several reasons (mistakes due to inexperience, cowboys, compliance requirements, client security questions, etc.) around 50-100 developers.
This isn't just "not zero trust", it's access to everything inside the cluster (and maybe the cluster components themselves) or access to nothing -- there is no way to grant partial access to what's running in the cluster.
I’ve used kubernetes as well in the past and it certainly can do the job, but ECS is my go-to currently for a new project. Kubernetes may be better for more complex scenarios, but for a new project or startup I think having a need for kubernetes vs. something simpler like ECS would tend to indicate questionable architecture choices.
- Your current cloud / PaaS costs are north of $5,000/month - You have at least two developers who are into the idea of running Kubernetes and their own infrastructure and are willing to spend some time learning how to do so
So you will spend 150k+/year (2 senior full stake eng salaries in EU - can be much higher, esp for people up to the task) to save 60k+/y in infra costs?
Does not compute for me - is the lock-in that bad?
I understand it for very small/simple use cases - but then do you need k8s at all?
It feels like the ones who will benefit the most is orgs who spend much more on cloud costs - but they need SLAs, compliance and a dozen other enterprisy things.
So I struggle to understand who would benefit from this stack reclaim.
For something called "Reclaim the Stack" to lock discussion into someone else's proprietary walled garden is quite ironic.
Discord is NOT a benefit. Its not publicly searchable and the chat format is just not suitable to a knowledge base or support based format.
Forums are much better in that regard.
I don't think people who choose Discord necessarily care about that. Discord is where the people are, so that's where they go. It also costs close to nothing to setup a server and since it has a lower barrier of entry than hosting your own forum, it's deemed good enough.
That said, modern forum software like Discourse https://www.discourse.org/ or Flarum https://flarum.org/ can be pretty good, though I still miss phpBB.
Wouldn't a single machine and a backup machine do the job?
We have a main monolithic application at the core. But there are plenty of ancillary applications used to run the various parts of our application (eg. analytics, media monitoring, social media monitoring, journalist databases, media delivery, LLM based content sugestion etc).
Then we have at least one staging deployment for each app (the monolith has multiple). All permutations of apps and environments reach about 50 applications deployed on the platform, all with their own highly available databases (Postgres, Redis, ElasticSearch and soon ClickHouse).
I've seen companies run a MiniKube installation on a single server and run their applications that way.
Using ARR as the measurement for how far you can scale devops practices is weird to me. Double-digit million ARR might be a few hundred accounts if you're doing B2B, and double-digit million MAUs if you're doing an ad-funded social platform. Depending on how much software is involved your product could be built by a team of anywhere from 1-50 developers.
If you're a one-developer B2B company handling 1-3 requests per second you wouldn't even need more than one VM except maybe as redundancy. But if you're the fifty-developer company that's building something beyond simple CRUD, there are a lot of perks that come with a full-fledged control plane that would almost certainly be worth the added cost and complexity.
Such as?
Logging is more complicated with multi container microservice deployments. Deploying is more complicated. Debugging and error tracing is more difficult. What are the perks?
I was about to make a similar point, but you made the math, and it's holding-up for the GP's side.
You can push vms and direct to ssh synchronization up to double-digit million MAU (unless you are using stuff like persistent web-sockets). It won't be pretty, but you can get that far.
The question is what you're doing with your infrastructure, not how much revenue you're making. Some things have higher return to "devops" and others have less.
It’s nice seeing some OSS-based tooling around k8s. I know it’s a favorite refrain that “k8s is unnecessary/too complex, you don’t need it” for many folks getting started with their deployments, but I already know and use it in my day job, so it feels like a pretty natural choice.
(But it still needs more accessible tooling! Kompose is a good start though: https://kompose.io/)
Tractors are also unnecessary. Plenty of people grow tomatos off their balcony without tractors.
If somebody insists on growing 40 acres of tomatos without a tractor because tractors aren't necessary, why argue with them? If they try to force you to not use a tractor, that's different.
How Kubernetes works is pretty simple, but administering it is living a life of constant analysis paralysis and churn and hype cycles. It is a world built by companies that have something to sell you.
If I can serve 3 million users / month on a $40/month VPS with just Coolify, Postgres, Nginx, Django Gunicorn without Redis, RabbitMQ why should I use Kubernetes?
But I don't believe it supports HA deployments of Postgres with automated failover / 0 downtime upgrades etc?
Do they even have built in backup support? (a doc exists but appears empty: https://coolify.io/docs/knowledge-base/database-backups)
What makes you feel that Coolify is significantly less complex than Kubernetes?
You shouldn't, but people have started to view Kubernetes as a deployment tool. Kubernetes makes sense when you start having bare metal workers, or high number of services (micro-services). You need to have a pretty dynamic workload for Kubernetes to result in any cost saving on the operations side. There might be a cost saving if it's easier to deploy your services, but I don't see that being greater than the cost of maintaining and debugging a broken Kubernetes cluster in most case.
The majority of uses does not require Kubernetes. The majority of users who think they NEED Kubernetes are wrong. That's not to say that you shouldn't use it, if you believe you get some benefit, it's just not your cheapest option.
And 30% less latency
to ppl who disagree,
what business justifies 18x'ing your operating costs?
9.5k USD can get you 3 senior engineers in Canada. 9 in India.
What are the advantages over the (free) managed k8s provided by DigitalOcean?
---
Gosh, I'm so happy I was able to jump of the k8s hype train. This is not something SMBs should be using. Now I happily manage my fleet of services without large infra overhead via my own paas over Docker Swarm. :)
Anyone looking for a PaaS alternative matching or exceeding the UX of Heroku.
The "is it for you" section of our Introduction may give a better idea: https://reclaim-the-stack.com/docs/kubernetes-platform/intro...
> What are the advantages over the (free) managed k8s provided by DigitalOcean?
You can run the platform on top of any Kubernetes deployment. So you can run it on top of DigitalOcean kubernetes if you wish. But you'll get more bang for the buck using Hetzner dedicated servers.
It probably makes sense to put a few words on the "components" as well, as it seems to be the main selling point and not the privacy/GDPR concerns.
It is a fair source (future Apache 2.0 License) PaaS. I provide a cloud option if you want to manage less and get extra features (soon - included backup space, uptime monitoring from multiple locations, etc) and, of course, you are free to self-host it for free and without any limitations by using a single installation script. ;)
https://github.com/ptah-sh/ptah-server
But anyway, I'm really curious to know the answers to the questions I have posted above. Thanks!
I mean, I also use Docker Swarm and it's pretty good, especially with Portainer.
To me, the logical order of tools goes with scale a bit like this: Docker Compose --> Docker Swarm --> Hashicorp Nomad / Kubernetes
(with maybe Podman variety of tools where needed)
I've yet to see a company that really needs the latter group of options, but maybe that's because I work in a country that's on the smaller side of things.
All that being said, however, both Nomad and some K8s distributions like K3s https://k3s.io/ can be a fairly okay experience nowadays. It's just that it's also easy to end up with more complexity than you need. I wonder if it's going to be the meme about going full circle and me eventually just using shared hosting with PHP or something again, though so far containers feel like the "right" choice for shipping things reasonably quickly, while being in control of how resources are distributed.
Nowaday I prefer simple tooling over "flexible" for my needs.
Enterprises, however, should stick to k8s-alike solutions, as there are just too many variables everywhere: starting from security, and ending the software architecture itself.
[0] https://kamal-deploy.org [1] https://kamalmanual.com/handbook/
We don't do autoscaling.
The main reason for Kubernetes for us was automation of monitoring / logs / alerting and highly available database deployments.
37signals has a dedicated operations team with more than 10 people. We have 0 dedicated operations people. We would not have been able to run our product with Kamal given our four nines uptime target.
(that said, I do like Kamal, especially v2 seems to smooth out some edges, and I'm all for simple single server deployments)
I am in a company with dedicated infra team and my CEO is a infra enthusiastic. He use terraform and k8s to build the company's infra. But the results are.
- Every deployment take days, in my experience, I need to woke for 24 hr streak to make it work. - The infra is complicated to a level that quite hard to adjust
And benefits wise, I can't even think about it. We don't have many users so the claimed scalability is not even there.
I will strongly argue startup should not touch k8s until you have fair user base and retention.
It's a nightmare to work with.
You can also use Kubernetes with compose files (e.g. with Kompose [1]; I plan to add support to Lunni, too).
[1]: https://kompose.io/
Of course, I dont have millions of users, but until then this is enough for me.
Reclaim the Stack provides a fully highly available multi node platform to host large scale SaaS applications aiming for four nines of uptime.
(From https://reclaim-the-stack.com/docs/platform-components/ingre...)
An I reading this right that they built a k8s-based platform where by default they can't horizontally scale applications?
This seems like a lot of complexity to develop and maintain if they're running applications that don't even need that.
That said, there is some kind of balancing across multiple cloudflared replicas. But when we measured the traffic Cloudflare sent ~80% of traffic to just one of the available replicas.
We haven't looked into what the actual algorithm is. It may well be that load starts getting better distributed if we were to start hitting the upper limits of a single replica.
Or it may be by design that the load balancing is crappy to provide incentive for Cloudflare customers to buy their dedicated Load Balancing product (https://developers.cloudflare.com/load-balancing/).
What's the scale of this service? How many machines are we talking here?
At Ubicloud, we are attacking the same problem, though from a different angle. We are building an open-source alternative to AWS. You can host it yourself or use our managed services (which are 3x-10x more affordable than comparable services). We already built some primitives such as VMs, PostgreSQL, private networking, load balancers and also working on K8s.
I have a question to HN crowd; which primitives are required to run your workloads? It seems the OP's list consists of Postgres, Redis, Elasticsearch, Secret Manager, Logging/Monitoring, Ingress and Service Mesh. I wonder if this is representative of typical requirements to run HN crowd's workloads.
PS: I like what you guys are doing, I'd subscribe to your mailing list if you had one! :)
Are there plans to address that too long term?
That said, switching out cloudflared for a more traditional ingress like nginx etc would be straight forward. No parts of the RtS tooling as actually dependent on using Cloudflare for ingress in particular.
If you’re looking for something simpler, try https://dokku.com/ (the OG self-hosted Heroku) or https://lunni.dev/ (which I’ve been working on for a while, with a docker-compose based workflow instead). (I've also heard good things about coolify.io!)
I'm starting to suspect the wide range of experiences has to do with engineering decisions. Nowadays, it's almost trivial to over-engineer a Kubernetes setup. In fact, with platform engineering becoming all the rage these days, I can't help but notice how over-engineered most reference architectures are for your average mid-sized company. Of course, that's probably by design (Humanitec sure enjoys the money), but it's all completely optional. I intentionally started with a dead-simple EKS setup: flat VPC with no crazy networking, simple EBS volumes for persistence, an ALB on the edge to cover ingress, and External Secrets to sync from AWS Secrets Manager. No service mesh, no fancy BPF shenanigans, just a cluster so simple that replicating to multiple environments was trivial.
The great part is that because we've had such excellent stability, I've been able to slowly build out a custom platform that abstracts what little complexity there was (mostly around writing manifests). I'm not suggesting Kubernetes is for everyone, but the hate it tends to get on HN still continues to make me scratch my head to this day.
I don't mean to sound dismissive, but maybe the problem is just that Heroku is/was slow and expensive? Meaning this isn't necessarily the right or quote-unquote "best" approach to reclaiming the stack
There's a lot of disagreements pitting one solution against another. Even if one hosting solution were better than another, the problem is there are SO MANY solutions that exist on so many axis of tradeoffs, it's determine an appropriate solution (heroku, reclaim, etc) without consideration to its application and context of use.
Heroku has all sorts of issues: super expensive, limited functionality, but if it happens to be what a developer team knows and works for their needs, heroku could save them lots of money even considering the high cost.
The same is true for reclaim. _If_ you're familiar with all of the tooling, you could host an application with more functionality for less money than heroku.
Remember 2022? https://www.bleepingcomputer.com/news/security/heroku-admits...
Works grand until it blows up in your face for non obvious reasons
That’s definitely mostly a skill issue on my end but still would make me very wary betting a startup on it
I thought this was either a joke I was missing, or a rant about Kubernetes. It turned out it was neither, and now I am confused.
And heroku is based on LXC containers. I'd say it's almost the same thing.
Curious what accounts are being attributed to said costs.
Many new maintenance-related lines will be added, with only one (subscription) removed.