undefined | Better HN

0 points3JPLW6y ago0 comments

Does it concern you how aggressively compiler teams are exploiting UB?

0 comments

22 comments · 3 top-level

Spivak6y ago· 8 in thread

You do have to understand that compiler teams aren't saying something like "this triggers UB, quick just replace it with noop." It's just something that naturally happens when you need to reason about code.

For example, consider a very simple statement.

    let array[10];
    let i = some_function();
    print(array[i]);

The function might not even be known to the compiler at compilation time if it was from a DLL or something.

But the compiler is like "hey! you used the result of this function as an index for this array! i must be in the range [0, 10)! I can use that information!"

gwd6y ago

> But the compiler is like "hey! you used the result of this function as an index for this array! i must be in the range [0, 10)! I can use that information!"

As a developer who has seen lots of developers (including himself) make really dumb mistakes, this seems like a very strange statement.

Imagine if you hired a security guard to stand outside your house. One day, he sees you leave the house and forget to lock the door. So he reasons, "Oh, nothing important inside the house today -- guess I can take the day off", and walks off. That's what a lot of these "I can infer X must be true" reasonings sounds like to me: they assume that developers don't make mistakes; and that all unwanted behavior is exactly the same.

So suppose we have code that does this:

  int array[10];
  int i = some_function();

  /* Lots of stuff */
  if ( i > 10 ) {
    return -EINVAL;
  }

  array[i] = newval;

And then someone decides to add some optional debug logging, and forgets that `i` hasn't been sanitized yet:

  int array[10];
  int i = some_function();

  logf("old value: %d\n", array[i]);

  /* Lots of stuff */

  if ( i > 10 ) {
    return -EINVAL;
  }

  array[i] = newval;

Now reading `array[i]` if `i` > 10 is certainly UB; but in a lot of cases, it will be harmless; and in the worst case it will crash with a segfault.

But suppose a clever compiler says, "We've accessed array[i], so I can infer that i < 10, and get rid of the check entirely!" Now we've changed an out-of-bounds read into an out-of-bounds write, which has changed worst-case a DoS into a privilege escalation!

I don't know whether anything like this has ever happened, but 1) it's certainly the kind of thing allowed by the spec, 2) it makes C a much more dangerous language to deal with.

btilly6y ago

Per https://lwn.net/Articles/575563/, Debian at one point found that 40% of the C/C++ programs that they have are vulnerable to known categories of undefined behavior like this which can open up a variety of security holes.

This has been accepted as what to expect from C. All compiler authors think it is OK. People who are aware of the problem are overwhelmed at the size of it and there is no chance of fixing it any time soon.

The fact that this has become to be seen as normal and OK, is an example of Normalization of Deviance. See http://lmcontheline.blogspot.com/2013/01/the-normalization-o... for a description of what I mean. And deviance will continue to be normalized right until someone writes an automated program that walks through projects, finds the surprising undefined behavior, and tries to come up with exploits. After project after project gets security holes, perhaps the C language committee will realize that this really ISN'T okay.

And the people who already migrated to Rust will be laughing their asses off in the corner.

1 more reply

asveikau6y ago

> in a lot of cases, it will be harmless; and in the worst case it will crash with a segfault.

I am not sure if a segfault is always the worst case. It could be by some coincidence that array[i] contains some confidential information [maybe part of a private key? 32 bits of the user's password?] and you've now written it to a log file.

I know it's hard to imagine a mis-read of ~32 bits would have bad consequences of that sort, but it's not out of the question.

1 more reply

timq6y ago

I would be more concerned by the fact that if i is 10, then you already are in trouble ;)

msebor6y ago

This is a good example. Let me flesh it out a bit more to illustrate a specific instance of this problem:

  int a[2][2];
  int f (int i, int j)
   {
       int t = a[1][j];
       a[0][i] = 0;          // cannot change a[1]
       return a[1][j] - t;   // can be folded to zero
   }

The language says that elements of the matrix a must only be accessed by indices that are valid for each bound, so compilers can and some do optimize code based on that requirement (see https://godbolt.org/z/spSF8e).

But when a program breaks that requirement (say, by calling f(2, 0)) the function will likely return an unexpected value.

Spivak6y ago

But I don't know what you want to happen in this case? If you actually call f(2,0) then the program makes no sense. How can you have an expected value for a function call that violates its preconditions?

1 more reply

a13692099936y ago

> ["]I can use that information!"

Yes, that is a perfect example of buggy compiler handling of undefined behaviour. A non-buggy compiler would either behave in a manner chacteristic of the environment (ie read address array+i), ignore the situation entirely (which also results in reading array+i), or (preferably) issue a error to the effect of "possible array access out of bounds, suggest 'assert(i<10);' here".

mpweiher6y ago

Very well put (deliberately using the exact terminology used in the standard)!

Can we just make that binding again? After all, it used to be.

It should be obvious to compiler writers what the intention of the standard is, because it says so in the dang text, but since this was downgraded to a note and you are technically not in violation if you do something different, everyone now acts as if doing the exact opposite of what is written there is somehow OK.

The downgrade to note-status seemed to be predicted on the idea implementors can be trusted to do The Right Thing™ in these cases. It is now evidently clear that they cannot, so we have to force them.

1 more reply

msebor6y ago· 7 in thread

This is a common misconception (or poor way of phrasing it, sorry). Compiler implementers don't go looking for instances of undefined behavior in a program with the goal of optimizing it in some way. There is little value in optimizing invalid code. The opposite is the case.

But we must write code that relies on the same rules and requirements that programs are held to (and vice versa). When either party breaks those rules, either accidentally or deliberately, bad things happen.

What sometimes happens is that code written years or decades ago relies on the absence of an explicit guarantee in the language suddenly stops working because a compiler change depends on the assumption that code doesn't rely on the absence of the guarantee. That can happen as a result of improving optimizations, which is often but not not necessarily always motivated by improving the efficiency of programs. Better analysis can also help find bugs in code or avoid issuing warnings for safe code.

flatfinger6y ago

The fact that the Standard does not impose requirements upon how a piece of code behaves implies that the code is not strictly conforming, but the notion that it is "invalid" runs directly contrary to the intentions of the C89 and C99 Standards Committees, as documented in the published C99 Rationale. That document recognizes Undefined Behavior as, among other things, "identifying avenues of conforming language extension". Code that relies upon such extensions may be non-portable, but the authors of the Standard have expressly said that they did not wish to demean useful programs that happen to be non-portable.

ori_b6y ago

There are rules and requirements documented in the spec, and there are de-facto rules and requirements that programs expect. Not only that, but when they do exploit these rules, often the code generated is obviously incorrect, and could have been flagged at compile time.

Right now, it seems like compiler vendors are playing a game of chicken with their users.

saagarjha6y ago

I think the issue is that many of these "obviously incorrect" things are not obvious at the level that the optimizations are taking place. Perhaps it would be worth considering adding higher-level passes in the compiler that can detect these kinds of surprising changes and warn about them.

2 more replies

cwzwarich6y ago

> This is a common misconception (or poor way of phrasing it, sorry). Compiler implementers don't go looking for instances of undefined behavior in a program with the goal of optimizing it in some way. There is little value in optimizing invalid code. The opposite is the case.

Compilers do deliberately look to optimize loops with signed counters by exploiting UB to assume that they will never wrap.

qznc6y ago

I'd say both statements are correct.

Compiler implementers are happy when they don't have to care about some edge case because then the code is simpler. Thus, only for unsigned counters there is the extra logic to compile them correctly.

That is my interpretation of "The opposite is the case". Writing a compiler is easier with lots of undefined behavior.

1 more reply

Leherenn6y ago

Well yes, they assume they never wrap because that is not allowed by the language, by definition. UB are the results of broken preconditions at the language level.

GoblinSlayer6y ago

Terminology can go either way, but is it such a good idea what gcc actually does?

rseacord6y ago· 4 in thread

I would say that there is a lot of concern in the committee about how compilers are optimizing based on pointer providence. There has been a study group looking at this. It now appears that they are likely to publish their proposal as a Technical Report.

_kst_6y ago

"based on pointer providence"

I think you meant "provenance" (mentioning it for the sake of anyone who wants to search for it).

rseacord6y ago

Yes, my mistake--I was thinking of Rhode Island. I wrote a short bit about this at https://www.nccgroup.trust/us/about-us/newsroom-and-events/b... if anyone is interested.

1 more reply

revertts6y ago

What's the best way to keep an eye out for that TR? Periodically checking http://www.open-std.org/jtc1/sc22/wg14/ ?

I can't ever tell if I'm looking in the right place. :)

AaronBallman6y ago

If you're interested in the final TR, I would imagine we'd list it on that page you linked. If you're interested in following the drafts before it becomes published, you'd fine them on http://www.open-std.org/jtc1/sc22/wg14/www/wg14_document_log... (A draft has yet to be posted, though, so you won't find one there yet.)

j / k navigate · click thread line to collapse

0 comments

22 comments · 3 top-level

Spivak6y ago· 8 in thread

For example, consider a very simple statement.

    let array[10];
    let i = some_function();
    print(array[i]);

The function might not even be known to the compiler at compilation time if it was from a DLL or something.

But the compiler is like "hey! you used the result of this function as an index for this array! i must be in the range [0, 10)! I can use that information!"

gwd6y ago

> But the compiler is like "hey! you used the result of this function as an index for this array! i must be in the range [0, 10)! I can use that information!"

As a developer who has seen lots of developers (including himself) make really dumb mistakes, this seems like a very strange statement.

So suppose we have code that does this:

  int array[10];
  int i = some_function();

  /* Lots of stuff */
  if ( i > 10 ) {
    return -EINVAL;
  }

  array[i] = newval;

And then someone decides to add some optional debug logging, and forgets that `i` hasn't been sanitized yet:

  int array[10];
  int i = some_function();

  logf("old value: %d\n", array[i]);

  /* Lots of stuff */

  if ( i > 10 ) {
    return -EINVAL;
  }

  array[i] = newval;

Now reading `array[i]` if `i` > 10 is certainly UB; but in a lot of cases, it will be harmless; and in the worst case it will crash with a segfault.

I don't know whether anything like this has ever happened, but 1) it's certainly the kind of thing allowed by the spec, 2) it makes C a much more dangerous language to deal with.

btilly6y ago

And the people who already migrated to Rust will be laughing their asses off in the corner.

1 more reply

asveikau6y ago

> in a lot of cases, it will be harmless; and in the worst case it will crash with a segfault.

I know it's hard to imagine a mis-read of ~32 bits would have bad consequences of that sort, but it's not out of the question.

1 more reply

timq6y ago

I would be more concerned by the fact that if i is 10, then you already are in trouble ;)

msebor6y ago

This is a good example. Let me flesh it out a bit more to illustrate a specific instance of this problem:

  int a[2][2];
  int f (int i, int j)
   {
       int t = a[1][j];
       a[0][i] = 0;          // cannot change a[1]
       return a[1][j] - t;   // can be folded to zero
   }

But when a program breaks that requirement (say, by calling f(2, 0)) the function will likely return an unexpected value.

Spivak6y ago

1 more reply

a13692099936y ago

> ["]I can use that information!"

mpweiher6y ago

Very well put (deliberately using the exact terminology used in the standard)!

Can we just make that binding again? After all, it used to be.

1 more reply

msebor6y ago· 7 in thread

flatfinger6y ago

ori_b6y ago

Right now, it seems like compiler vendors are playing a game of chicken with their users.

saagarjha6y ago

2 more replies

cwzwarich6y ago

Compilers do deliberately look to optimize loops with signed counters by exploiting UB to assume that they will never wrap.

qznc6y ago

I'd say both statements are correct.

That is my interpretation of "The opposite is the case". Writing a compiler is easier with lots of undefined behavior.

1 more reply

Leherenn6y ago

Well yes, they assume they never wrap because that is not allowed by the language, by definition. UB are the results of broken preconditions at the language level.

GoblinSlayer6y ago

Terminology can go either way, but is it such a good idea what gcc actually does?

rseacord6y ago· 4 in thread

_kst_6y ago

"based on pointer providence"

I think you meant "provenance" (mentioning it for the sake of anyone who wants to search for it).

rseacord6y ago

Yes, my mistake--I was thinking of Rhode Island. I wrote a short bit about this at https://www.nccgroup.trust/us/about-us/newsroom-and-events/b... if anyone is interested.

1 more reply

revertts6y ago

What's the best way to keep an eye out for that TR? Periodically checking http://www.open-std.org/jtc1/sc22/wg14/ ?

I can't ever tell if I'm looking in the right place. :)

AaronBallman6y ago

j / k navigate · click thread line to collapse