> I disagree. I think the trade-off is very reasonable. At some point you need to retry (even if the trigger is user manually pressing F5 in the browser/clicking a button again/running a program again). Because they actually have some goal to accomplish.
I don't think your belief holds water if you think about your example. The goal of a retry from a client standpoint is to introduce an acceptable delay in order to pretend the original request was successful. This strategy is only valid if the number of retries are enough to not penalize perceived performance or the normal operational state of a service. Consequently, all retry strategies involve sending multiple requests per second. The link to Retry Budgets posted in this discussion explicitly mentions "a minimum of 10 retries per second."
A user pressing F5 will never generate this volume of requests.
> Some failures really are random, let's say 0.1% of requests fail.
That's why failing fast and not retry is the best strategy for most if not all applications. Retry strategies introduce high levels of complexity to a task that only rarely happens, and in the rare case that it happens it can be trivially fixed by the user triggering a refresh.
If it's an applications that already outputs a high volume of requests, once your first request fails then it will simply post again a request as part of their happy path.
Some developers like retries because they use it to patch their broken code path to pretend that they do not have to deal with scenarios where a network is not 100% reliable. They onboard a retry library, they update their requests to transparently appear to be a single request, and they proceed as if their application doesn't have a failure mode. Except it does, but now they also decide to tradeoff their wishful thinking with higher risk of causing a cascading DDoS attack on their own infrastructure.