OP's "extra reads" is dumb because he could have had normal metrics for memcached load and planned to only support like 70% capacity or somesuch, and when load hit that number, he would immediately increase capacity. Instead he's running with a handicap. It's just useless.
Secondly, you should know what your capacity is. Stress testing exists for a reason.
Stress-testing, shared-nothing and dollar-scalable are platonic ideals, and they're not always achievable. If Dropbox had three infrastructure engineers, they probably weren't able to build proper capacity planning models, and probably couldn't afford to build a full production work-alike for stress testing anyway. (And at some scales, that's literally impossible. Our vendors couldn't physically manufacture enough servers to build a full test environment, cost aside.) I'm sure they did some simulated tests as well, but those won't tell you the whole story.
You're focused on IOPS, but you have no idea if that's what Dropbox's bottlenecks were. (Not to mention: What does IOPS mean on an EBS and S3 infrastructure?) Complex systems fall over in complex ways. You can predict the next bottleneck, but not the one after that; by the time you get there, your fix for the first bottleneck will have changed the dynamics.
It sounds like they did do stress testing, using real-world loads, on a system that was 100% similar to their production system. They ran continuous just-in-time stress tests in the Big Lab.
Incidentally, fuel dump systems were initially added due to a rule by the FAA that a plane's structural landing weight not be exceeded by its takeoff weight. Many commercial planes never had this problem, so dumping systems were not installed. As a result, most planes just circle until they've burned up enough fuel, or land anyway overweight. You could dump fuel to lessen the chance of explosion, but only if your plane is equipped with a fuel dump system, and such incidents are so rare it's not even a safety consideration.
"Why not just plan ahead? Because most of the time, it was a very abrupt failure that we couldn’t detect with monitoring."
They actually do this kind of stuff (except for the "lets dump the lead" part), in stress tests, especially in cargo and millitary planes. And they do similar tests not only in aviation, but in most kinds of engineering.
So maybe misplaced sarcasm?