Same with RDS, etc.
It’s pretty great not to waste time when the lottery for the bizarrest of 0.000001% issues arise.
The operator only solves the happy path. An AWS support ticket usually can solve the unhappy path.
For free.
Yep, if your Kafka is mission critical and crashes hard, that is bad.
But things like Kafka are _never_ a black box you just spin up and never worry about, if anyone thinks so, CAP theorem will give them an awful surprise one day.
You're always going to need someone in your team who understands the tech and how to make best use of it.
MSK won't tell you how many partitions your topic needs, or whether your retention strategy should be delete, or compact, or both.
You still need that knowledge of the "managed" service to make effective use of it.
And that knowledge sits rather close to knowledge of how the system works, so given you'll need that knowledge anyway, may as well cultivate it instead.
Oh, and the operators also solve a lot of the unhappy paths too, FYI.
I tend to describe the operator approach as "half-managed" because things like multiple-AZ stretch clusters need some configuration.
But then, maybe you didn't want a 3-AZ cluster? Maybe a 2.5? MSK says no.
…
> And that knowledge sits rather close to knowledge of how the system works, so given you'll need that knowledge anyway, may as well cultivate it instead.
This has been my argument forever, and it’s always met with disagreement, because entirely too many people have no desire to learn their tooling. They just want an API that they can push data into, and get it back out. What happens inside is irrelevant.
It’s extremely sad to me.
At some point, we have to decide that there's a lot of knowledge expectations depending on your stack, especially as parts of your application grows.
Say you're a Python-based webapp running with Postgres, Kafka, and Elasticsearch. Your stack requires pretty decent knowledge of:
1. Postgres
2. Kafka
3. Elasticache
4. Linux (and a lot more than what many developers I've encountered seem to have)
5. Kubernetes, because it is 2024
6. Whatever frameworks you're doing with your webapp + ensuring you're keeping up with security best practices
7. + the soup involved with exposing your webapp to customers
Being able to handle any of these 6 at scale require different skillsets. It's unreasonable to expect anyone to be an expert at all of this -- in a real, tried-and-true environment -- especially with deadlines and SLAs involved.
Relying on volunteer support of varying degrees of quality for your business sounds insane.
Also at that point the business should really be donating or contributing to the development of the software otherwise it is considered what we call a dick move.
> Relying on volunteer support of varying degrees of quality for your business sounds insane.
Given my experiences of Confluent paid support, and my experiences of the volunteer support around Kafka, I disagree.
Not sure we agree on the meaning of this phrase in this context.
For 0 money. That kinda free.
I’d rather focus on my expertise and mental energy in other tools that are much more significant to the stack I support.
For big flagship services you can usually get pretty good support (EC2, S3, SQS, Lambda)
For smaller/more niche services where AWS stood up a managed version of some OSS it's more hit and miss (like managed RabbitMQ).
In both cases, it definitely helps to have an open line to your TAM and send them case numbers and they'll usually do some internal nudging to keep things moving. In addition, for projects, you can usually reach out ahead of time and get some dedicated SMEs to help set things up/train you.
In either case, hopefully you've never had the displeasure of working with Azure support.
They usually tend to be genuinely helpful but are a far cry from solving your issues themselves.
Of course there’s a minuscule possibility of you having a new use case. But is that good enough reason to build your infrastructure? That is a business call you need to make.