I think the requirement is a bit vague. I'm assuming an HTTP call, and that the rate limit exists on the server. Given this, there are two different error conditions: 1) the request makes it to the server, is counted against the server-side rate limit, but fails for some reason. 2) maybe the network is down, the server never sees the request, so the failure is not counted against the rate limit.
I opted to assume all failures are of the first type, which means my code should also behave as expected, but will be slower when encountering failures of the second time.
I'm actually curious, I think it would be easier to change my code to handle both types of errors, but I could be wrong since I don't know the scala stuff you are using that well.