story
Also, while I don't disagree that it is indeed a hard problem, I do have very good experience with an async java stack, where I didn't have to worry about things like this. As long as a sane queue limit is defined on let's say the jetty http client, if something bad happens at the other end, the back pressure would kick in by failing immediately the requests that couldn't make it into the queue. Other parts of the app would then continue to be functional.
So, I would contend that it has a lot to do with ruby high memory usage, made much worse when single-threaded, and it looks like ruby 3.0 still won't have a complete async story yet?
EDIT: I checked the link again, and it looks Jeff Dean was talking about latency at p999 or above? By "hiccup", I actually mean something that would increase avg latency by perhaps 5~10x times, e.g. avg latency of 100ms under steady state + timeout of 1 second + the remote being down. Sorry for the confusion. Here, I am lucky if people start caring about p95.