undefined | Better HN

0 pointsmenaerus3y ago0 comments

> If you want a different semantics at the source level, e.g. overflow is an exception, then the compiler needs to emit additional code.

Yes, and which is exactly what I hinted at with my question to your comment. With the HW we have today, implementing such semantics without an extra hit is not possible and which is why I thought your comment wasn't completely fair but coming from more theoretical stance.

0 comments

2 comments · 1 top-level

titzer3y ago· 1 in thread

Checking for overflow is basically adding a conditional branch (usually to out-of-line code) after arithmetic. It's very cheap, almost free, if there are enough execution units, I-cache isn't a bottleneck, etc etc. But yeah, it is theoretically an overhead, which is why I avoided it for Virgil.

I suppose range analysis can eliminate many overflow checks, e.g. in counted loops, but it's not completely zero cost.

menaerusOP3y ago

> Checking for overflow is basically adding a conditional branch (usually to out-of-line code) after arithmetic.

Yes, but now as well you have to return a value to communicate the overflow to the call site. And call site has to check for that value and that's yet another and another ... and another branch. Depends how deeply you want to propagate that error and decide how (?) to deal with it, this will grow the code size, which can contribute to the higher frequency of I-cache misses and page-faults, but it can also inhibit compiler optimizations.

On the CPU level, I think there also could be an attached cost as well in case some of those branches end up as entries either in branch-target (BTB) or branch-order (BOB) buffers or both. Sizes of these buffers are quite scarce so ending up with the unfavorable ratio of check-for-overflow entries vs entries occupied by other type of branches found in the code is something that will put more pressure to our branch-prediction unit. More "important" branches will now more frequently start to lack their entry in the branch history simply because of the fact that we started sprinkling check-for-overflow branches. And yet we know that branch misprediction is the costliest operation (15-20 cycles) we can encounter in the CPU pipeline.

Also, I think a bigger picture must be observed in this context. E.g. what is the percentage of arithmetic operations some big real-world sized binaries contain? I'd figure that in average it would be a sizeable amount, and in ones with a lot of math even more so. And then I wonder what we could observe if we applied the check-for-overflow transformation to all such signed-arithmetic operations.

I'm aware that there are some artificial benchmarks showing that there's no cost attached to branches which are essentially never taken but it makes me wonder if that cost would really be zero if we exercised that change on the actual code instead. For at least the reasons from above.

j / k navigate · click thread line to collapse