> Checking for overflow is basically adding a conditional branch (usually to out-of-line code) after arithmetic.
Yes, but now as well you have to return a value to communicate the overflow to the call site. And call site has to check for that value and that's yet another and another ... and another branch. Depends how deeply you want to propagate that error and decide how (?) to deal with it, this will grow the code size, which can contribute to the higher frequency of I-cache misses and page-faults, but it can also inhibit compiler optimizations.
On the CPU level, I think there also could be an attached cost as well in case some of those branches end up as entries either in branch-target (BTB) or branch-order (BOB) buffers or both. Sizes of these buffers are quite scarce so ending up with the unfavorable ratio of check-for-overflow entries vs entries occupied by other type of branches found in the code is something that will put more pressure to our branch-prediction unit. More "important" branches will now more frequently start to lack their entry in the branch history simply because of the fact that we started sprinkling check-for-overflow branches. And yet we know that branch misprediction is the costliest operation (15-20 cycles) we can encounter in the CPU pipeline.
Also, I think a bigger picture must be observed in this context. E.g. what is the percentage of arithmetic operations some big real-world sized binaries contain? I'd figure that in average it would be a sizeable amount, and in ones with a lot of math even more so. And then I wonder what we could observe if we applied the check-for-overflow transformation to all such signed-arithmetic operations.
I'm aware that there are some artificial benchmarks showing that there's no cost attached to branches which are essentially never taken but it makes me wonder if that cost would really be zero if we exercised that change on the actual code instead. For at least the reasons from above.