For example, consider a very simple statement.
let array[10];
let i = some_function();
print(array[i]);
The function might not even be known to the compiler at compilation time if it was from a DLL or something.But the compiler is like "hey! you used the result of this function as an index for this array! i must be in the range [0, 10)! I can use that information!"
As a developer who has seen lots of developers (including himself) make really dumb mistakes, this seems like a very strange statement.
Imagine if you hired a security guard to stand outside your house. One day, he sees you leave the house and forget to lock the door. So he reasons, "Oh, nothing important inside the house today -- guess I can take the day off", and walks off. That's what a lot of these "I can infer X must be true" reasonings sounds like to me: they assume that developers don't make mistakes; and that all unwanted behavior is exactly the same.
So suppose we have code that does this:
int array[10];
int i = some_function();
/* Lots of stuff */
if ( i > 10 ) {
return -EINVAL;
}
array[i] = newval;
And then someone decides to add some optional debug logging, and forgets that `i` hasn't been sanitized yet: int array[10];
int i = some_function();
logf("old value: %d\n", array[i]);
/* Lots of stuff */
if ( i > 10 ) {
return -EINVAL;
}
array[i] = newval;
Now reading `array[i]` if `i` > 10 is certainly UB; but in a lot of cases, it will be harmless; and in the worst case it will crash with a segfault.But suppose a clever compiler says, "We've accessed array[i], so I can infer that i < 10, and get rid of the check entirely!" Now we've changed an out-of-bounds read into an out-of-bounds write, which has changed worst-case a DoS into a privilege escalation!
I don't know whether anything like this has ever happened, but 1) it's certainly the kind of thing allowed by the spec, 2) it makes C a much more dangerous language to deal with.
This has been accepted as what to expect from C. All compiler authors think it is OK. People who are aware of the problem are overwhelmed at the size of it and there is no chance of fixing it any time soon.
The fact that this has become to be seen as normal and OK, is an example of Normalization of Deviance. See http://lmcontheline.blogspot.com/2013/01/the-normalization-o... for a description of what I mean. And deviance will continue to be normalized right until someone writes an automated program that walks through projects, finds the surprising undefined behavior, and tries to come up with exploits. After project after project gets security holes, perhaps the C language committee will realize that this really ISN'T okay.
And the people who already migrated to Rust will be laughing their asses off in the corner.
I am not sure if a segfault is always the worst case. It could be by some coincidence that array[i] contains some confidential information [maybe part of a private key? 32 bits of the user's password?] and you've now written it to a log file.
I know it's hard to imagine a mis-read of ~32 bits would have bad consequences of that sort, but it's not out of the question.
int a[2][2];
int f (int i, int j)
{
int t = a[1][j];
a[0][i] = 0; // cannot change a[1]
return a[1][j] - t; // can be folded to zero
}
The language says that elements of the matrix a must only be accessed by indices that are valid for each bound, so compilers can and some do optimize code based on that requirement (see https://godbolt.org/z/spSF8e).But when a program breaks that requirement (say, by calling f(2, 0)) the function will likely return an unexpected value.
Yes, that is a perfect example of buggy compiler handling of undefined behaviour. A non-buggy compiler would either behave in a manner chacteristic of the environment (ie read address array+i), ignore the situation entirely (which also results in reading array+i), or (preferably) issue a error to the effect of "possible array access out of bounds, suggest 'assert(i<10);' here".
Can we just make that binding again? After all, it used to be.
It should be obvious to compiler writers what the intention of the standard is, because it says so in the dang text, but since this was downgraded to a note and you are technically not in violation if you do something different, everyone now acts as if doing the exact opposite of what is written there is somehow OK.
The downgrade to note-status seemed to be predicted on the idea implementors can be trusted to do The Right Thing™ in these cases. It is now evidently clear that they cannot, so we have to force them.
But we must write code that relies on the same rules and requirements that programs are held to (and vice versa). When either party breaks those rules, either accidentally or deliberately, bad things happen.
What sometimes happens is that code written years or decades ago relies on the absence of an explicit guarantee in the language suddenly stops working because a compiler change depends on the assumption that code doesn't rely on the absence of the guarantee. That can happen as a result of improving optimizations, which is often but not not necessarily always motivated by improving the efficiency of programs. Better analysis can also help find bugs in code or avoid issuing warnings for safe code.
Right now, it seems like compiler vendors are playing a game of chicken with their users.
Compilers do deliberately look to optimize loops with signed counters by exploiting UB to assume that they will never wrap.
Compiler implementers are happy when they don't have to care about some edge case because then the code is simpler. Thus, only for unsigned counters there is the extra logic to compile them correctly.
That is my interpretation of "The opposite is the case". Writing a compiler is easier with lots of undefined behavior.
I think you meant "provenance" (mentioning it for the sake of anyone who wants to search for it).
I can't ever tell if I'm looking in the right place. :)