It seems these things usually get implemented when Google, Microsoft, Twitter, Facebook (big web company x) hit the problem and they each develop their own optimized solution before slowly open sourcing and products converge and mature to the point it's practical for other companies to adopt (I think container orchestration is a good example where Google, Facebook, Apple(?) each had container solutions long before K8s started exploding)
Where IS all this coming from? Why is it acceptable to pull out a few random numbers out of our ass and conclude that it's always best to buy more hardware?
I know Google literally invented HTTP/2 and HTTP/3 to shave a few % off their hardware costs. And they had to not only implement this in Chrome and Google.com but to make it a world-wide standard. So surely they must be the idiots putting top-tier engineers for 6 months (or more) on such optimization efforts, instead of spending more on hardware? I don't know.
The whole section on "bin-packing Paxos/Raft is more efficient" is strange, because people don't generally bin-pack Paxos/Raft—The bin-packing orchestrators are built off of Paxos/Raft!
Your opinions are very specific to using consensus in orchestrators and control planes but the overwhelming majority of writes and reads to a Paxos or Raft cluster are in much higher throughput and latency sensitive systems such as databases.
Indeed we don't need better consensus algorithms. We need to closely examine our problems and step down to weaker consistency models and protocols that aren't worse to operate than Raft/Paxos or harder for developers to program against where ever possible.
If your raft state machines are doing IO via some write through cache (which they often are) then having specific machines do specific jobs can increase the cache quality. I.e. your leader node can have a better cache for your write workload, whilst your follower nodes can have better caches for your read workload.
This may lead to higher throughput (yay) but then also leave you vulnerable to significant slow-downs after leader elections (boo).
What makes sense will depend on your use case, but I personally agree with the author that multiple simple raft/paxos groups scheduled across nodes by some workload aware component might be the best of both worlds.
Does leader = master here? My first reaction is that this is a multi-master system but I can't quite unpack "opportunistic coordinator".
So this is different from multi-master in that one node is preferred until it's not, as opposed to any node being able to accept writes at any time.
Actual title - why fast replication protocols are slow