story
Don't get me wrong, sometimes it's worth it (I particularly like Spark's facilities for distributed statistical modelling), but I really don't get (and have never gotten) why you would want to inflict that pain upon yourself if you don't have to.
I’ve been developing for more years than dime if you have lived, and the best thing I’ve heard in years was that Google interviews were requiring developers to understand the overhead of requests.
In addition, they should require understanding of design complexity of asynchronous queues, needing and suffering from management overhead of dead letter, scaling by sharding queues if it makes more sense vs decentralizing and having to have non-transactionality unless it’s absolutely needed.
But not just Google- everyone. Thanks, Mr. Fowler for bringing this into the open.
Adding network is not a limitation. And frankly, I don't understand why you say things like understanding network. Like reliability is taken care of, routing is taken care of. The remaining problems of unboundedness and causal ordering are taken care of (by various frameworks and protocols).
For dlq management, you can simply use a persistent dead letter queue. I mean it's a good thing to have dlq because failures will always happen. About which order to procese queue etc. These are trivial questions.
You say things as if you have been doing software development for ages, but you're missing out on some very simple things.
Each of them introduces a new, rare failure mode; because there are now many rare failure modes, you have frequent-yet-inexplicable errors.
Dead letter queues back up and run out of space; network splits still happen;
And secondly, if you do end up with q distributed systems, remember how many independently failing components there are because thag directly translates to complexity.
On both these counts I agree. Microservices is no silver bullet. Network partitions and failure happen almost every day where I work. But most people are not dealing with that level of problems, partly because of cloud providers.
Same kind of problems will be found on a single machine also. Like you'd need some sort of write ahead log, checkpointing, maybe optimize your kernel for faster boot up, heap size and gc rate.
All of these problems do happen, but most people don't need to think about it.
Requests can fail in a host of ways that a call simply cannot, the complexity is massively greater than a method call.
I realized this one day when I was drawing some nice sequence diagrams and presenting it to a senior and he said "But who's ensuring the sequence?". You'll never ask this question in a single threaded system.
Having said that, these things are unavoidable. The expectations from a system are too great to not have distributed systems in picture.
Monoliths are so hard to deploy. It's even more problematic when you have code optimized for both sync cpu intensive stuff and async io in the same service. Figuring out the optimal fleet size is also harder.
I'd love to hear some ways to address this issue and also not to have microservice bloat.