I Do Not Know C: Short quiz on undefined behavior (2015) (opens in new tab)

(kukuruku.co)

214 pointswaynecolvin9y ago178 comments

178 comments

64 comments · 19 top-level

DSMan1952769y ago· 13 in thread

I'll be honest, I didn't find any of these to be particularly surprising. If you've been using C and are familiar with strict-aliasing and common UB issues I wouldn't expect any of these questions to seriously trip you up. Number 2 is probably the one most people are unlikely to guess, but that example has also been beaten to death so much since it started happening that I think lots of people (Or at least, the people likely to read this) have already seen it before.

I'd also add that there are ways to 'get around' some of these issues if necessary - for example, gcc has a flag for disabling strict-aliasing, and a flag for 2's complement signed-integer wrapping.

mjevans9y ago

I don't think #2 has been fully beaten to death yet.

Assuming a platform where you don't segfault (say that 'page 0' variables are valid) and thus runtime does proceed; I still can't think of any /valid/ reason to eliminate the if that follows (focus line 2 in the comments).

Under what set of logic does being able to de-reference a pointer confer that it's value is not 0 (which is what the test equates to)?

In my opinion that is an, often working but, incorrect optimization.

maxlybbert9y ago

C programmers expect dead code removal. Especially when the compiler also inlines functions (and, of course, inlining makes the biggest impact on short functions; and one way to get short functions is to have aggressive dead code removal). And macros can expand into very weird, but valid, code; so the statement that "nobody would ever write code like that" isn't relevant. The compiler may well have to handle unnatural looking code.

As others have stated, compilers generally don't actually have special case code to create unintuitive behavior if it looks like the programmer goofed.

It's possible and desirable for a compiler to remove branches of "if" statements that it knows at compile time won't ever be true. And, of course, one special case of statically known "if" statements are checks for NULL or not-NULL pointers in cases where the compiler knows that a pointer will never be NULL (e.g., it points to the stack) or will always be NULL (e.g., it was initialized to NULL and passed to a function or macro).

So the standard allows the compiler to say "this pointer cannot be NULL at this point because it was already dereferenced." Either the compiler is right because the pointer couldn't be NULL, or dereferencing the pointer already triggered undefined behavior, in which case unexpected behavior is perfectly acceptable. Some programmers will complain because the compiler won't act sensibly in this case, but C doesn't have any sensible option for what the compiler should do when you dereference a NULL pointer (yes, your operating system may give you a SEGFAULT, but the rules are written by a committee that can't guarantee that there will be an operating system).

2 more replies

msbarnett9y ago

> Under what set of logic does being able to de-reference a pointer confer that it's value is not 0 (which is what the test equates to)?

You're conflating null and zero (which C encourages you to do for various terrible reasons). The test does not test that x is not zero; it tests that x is not null (null, like zero, is falsey, but again, null is not to be mistaken for zero), which in C is sometimes represented by the character '0' but which legally can be totally distinct from the bit-pattern zero and which should be thought of as totally distinct. Zero can be a valid address in memory. Null is never a valid address in memory. The integral type with value zero when cast to a pointer is guaranteed to be the null pointer (which may have a different bit pattern than zero!). Casting non-integral types that happen to have the value zero to a pointer is not guaranteed to produce a null pointer. Confused yet?

The compiler isn't. It knows that you're testing that a pointer is not null.

Since x has already been derefrenced, and since derefrencing x has no translatable meaning if x is null, it follows that we can only produce a meaningful translation of this program iff x is not null.

It therefore follows that x must not be null in the test, since x has not changed.

1 more reply

E63009y ago

> Under what set of logic does being able to de-reference a pointer confer that it's value is not 0 (which is what the test equates to)?

Simple: undefined behavior makes all physically possible behaviors permissible.

In reality though, such an elimination would only be correct if the compiler was able to prove that the function is ever called with NULL, and if the compiler is smart enough to do that, hopefully the compiler writers are not A-holes and will warn instead of playing silly-buggers.

2 more replies

DSMan1952769y ago

You're thinking about it in the context of actual computers. The C standard says absolutely nothing about what NULL has to be, besides that the integer value 0 is considered to be the NULL address and that dereferencing it is considered invalid. The NULL address does not have to be all 0 bits. Architectures are generally free to define it to any invalid address they want to be NULL, 0 just happens to be a common and easy one. The catch you're pointing out is that on x86 there are technically no 'invalid' addresses, so we just use 0 and assume you won't ever attempt to use the stuff there (Which in practice on x86, nobody puts anything there).

1 more reply

wahern9y ago

  Under what set of logic does being able to de-reference a pointer confer that it's value is not 0 (which is what the test equates to)?

Normal deductive logic?

  * No NULL pointer can be dereferenced. 
  * x is dereferenced.
  * Therefore, x is not a NULL pointer.

Of course, the compiler is presuming that your code is correct. That's a reasonable presumption when dealing with computer programming languages. Programming languages would be rather hard to interpret and translate--not to mention impossible to optimize--if you couldn't apply basic deductive logic to their statements.

Imagine the routine had this code, instead:

  void foo (int *x) {
    if (*x != *x) {
    {
      return;
    }
    bar();
    return;
  }

wouldn't you expect the compiler to apply the same optimizations? Or would you be upset that eliding the check broke some code that depended on a race condition somewhere else in your program?

Also, pointing out that the "value is not 0 (which is what the test equates to)" is a non-sequitur. During compilation the literal 0 can behave as a NULL pointer constant. But the machine representation of a NULL pointer does not need to be all-bits 0, and such machine still exist today. And usually, as in this case, the distinction is irrelevant. It doesn't matter that the 0th page is mappable on your hardware. What matters is that the C specification says that a NULL pointer cannot be dereferenced; that dereferencing a NULL pointer is non-sense code.

There's an argument that compilers should be careful about the optimizations they make. Not all programs are correct, and taking that presumption too far can be detrimental. But it's not always trivial to implement an optimizing compiler to "do what I say, not what I mean". Optimizations depend on the soundness of being able to apply deductive logic to a program--that is, being able to string together a series of simple predicates to reach a conclusion about program behavior. You often have to add _more_ complexity to a compiler to _not_ optimize certain syntactic constructs. Recognizing the larger construct, especially only the subset that are pathological, without optimizing the ones everybody expects to actually be optimized, can be more difficult than simply applying a series of very basic deductive rules. So it's no wonder that most compiler implementations, especially high-performance compilers, tend to push back on this front.

What would be nice is for compilers to attempt to generate diagnostics when they elide code like that. An optimizer needs to be 100% correct all the time, every time. A diagnostic can be wrong some amount of time, which means it's easier to implement and the implementation of a particular check doesn't ripple through the entire code base.

GCC and clang implement many good diagnostics. But with -Wall -Wextra they also tend to generate alot of noise. Nothing is more annoying than GCC or clang complaining about perfectly compliant code for which there's no chance of it hiding a bug. For example, I used to often wrote initializer macros like:

  #define OPTIONS_INIT(...) { .foo = 1, .bar = 3, __VA_ARGS__ }
  struct options {
    int foo;
    int bar;
  };

allowing applications to have code like:

  struct options opts = OPTIONS_INIT(.bar = 0);

But with -Wall GCC and clang will complain about the second .bar definition overriding the first. (Because the macro expands to { .foo = 1, .bar = 3, .bar = 0 }). The C specification guarantees in no unambiguous terms that the last definition of .bar wins. And presumably they guarantee that precisely to make writing such macros feasible. I've never once had a problem with unintentionally redefine a struct field in an initializer list. Yet GCC and clang are adamant about complaining. It's so annoying especially because 1) there's absolutely nothing wrong with the code and 2) disabling the warning requires a different flag for clang than for GCC.

(I realize that for such option types you usually want define the semantics so that the default, most common value is 0. But it's not always desirable, and certainly not always practical, to be able to stick that mode. And that's just one example of that construct.)

1 more reply

kelnos9y ago

Yeah, I agree. I used to write C full time, but haven't in around 6 years, and I only flubbed #11 & #12 (I knew there was undefined behavior but couldn't remember why; after reading the answers I was like "duh", esp for #12 after having read #11).

I've never actually run into #2 in practice, though: even at -O3 the dereference in line 1 has always crashed for me, though I guess probably because I've never written code for an OS where an address of 0 is valid and doesn't cause a SIGSEGV or similar.

What's the best way to "fix" strict aliasing without disabling the undefined behavior around it? Using a union?

DSMan1952769y ago

I think the catch with #2 is that we probably run into it more in the opposite way - a unnecessary NULL check put in from, say, inlining a function or expanding a macro is removed. On that note though, even in OS code address 0 is usually setup to cause some type of fault to catch errors. I think the issue happens when the compiler removes or moves the deference somewhere else - though obviously the cases where this is legal are limited. But in the posted code, for example, the `y` variable is unused and thus could be removed entirely, which would also remove the NULL dereference (but still remove the NULL check).

> What's the best way to "fix" strict aliasing without disabling the undefined behavior around it? Using a union?

I was just talking about `-fno-strict-aliasing`, which is a flag for `gcc` (And `clang` I assume), but it does remove all UB like you're saying by simply allowing all pointers to alias.

The other options are unions like you're thinking (Though that's also technically UB, since writing to one union member and reading from another is undefined, though most compilers allow it without incident), or extensions like `gcc`s `may_alias` attribute. The `may_alias` is really the cleanest way to do it, but for something like OS code the aliasing happens in such strange places that just disabling strict-aliasing completely tends to be the way to go.

E63009y ago

> What's the best way to "fix" strict aliasing without disabling the undefined behavior around it? Using a union?

I had this discussion with another C++ programmer and we came to the conclusion that, if you care to avoid that particular UB, any time you cast pointers between unrelated or basic types and you're going to write to one pointer and read from the other, you need to go through a union, as annoying as it is.

1 more reply

the_cap_theorem9y ago

Yes, like most of the "undefined behaviour allows your computer to format the disk"-style posts this one seems to be written by a programmer with novice-intermediate C knowledge.

What irks me is the intro >> The purpose of this article is to make everyone (especially C programmers) say: “I do not know C”. <<

I think the purpose of the article was mainly for the author to write down some things he learned. Apparently it was his expectation that readers wouldn't be able to answer the quiz.

However, if you can't answer (at least most) of these questions correctly you're _not_ an expert c programmer.

So I think the correct intro here should be "The purpose of this blog post is to to show that if you want to learn C, you actually have to learn it and should not attempt to 'wing it'".

...and maybe also that you should not write patronizing blog posts about a topic which you haven't fully grasped yet yourself.

rwj9y ago

Not a full-time C programmer, and I was still correct on all of them except #1. Certainly C is more dangerous than other languages, but I don't understand the push to convince people that it is impossible to understand.

deathanatos9y ago

I don't think most C programmers share your depth of the language. I tried hard to explain strict aliasing once, and utterly failed. The dev was convinced that he knew the exact behavior of the platform, and that it was fine. Yet people constantly find examples where we "know" what the compiler will do, and it does something completely different.

DSMan1952769y ago

I would agree that strict-aliasing is a hurt point for a lot of C devs, which is unfortunate. I'd only suggest that in general, if the strict-aliasing rule is coming into play you're probably already doing some really shady to begin with. Like in this example, casting a `long ` to an `int ` is likely a bad way to go about things even without worrying about strict-aliasing. In a lot of ways, I'd say that problems with the strict-aliasing rule are a symptom of a larger problem. If you can convince them that what they're doing is just bad coding practice to begin with, you might have a better time making them write correct code in the long run.

Now if you're working more directly with hardware (Which is of course possible/likely with C) then it might just be easier to disable strict-aliasing all together if you can, since identifying all the spots where it might be a problem tends to be an issue.

2 more replies

userbinator9y ago· 10 in thread

IMHO the problem is with compilers (and their developers) who think UB really means they can do anything, when what programmers usually expect is, and the standard even notes for one of the possible interpretations of UB, "behaving during translation or program execution in a documented manner characteristic of the environment".

http://blog.regehr.org/archives/1180 and https://news.ycombinator.com/item?id=8233484

sjolsen9y ago

>the problem is with compilers (and their developers) who think UB really means they can do anything

But that's exactly what undefined behavior means.

The actual problem is that programmers are surprised-- that is, programmers' expectations are not aligned with the actual behavior of the system. More precisely, the misalignment is not between the actual behavior and the specified behavior (any actual behavior is valid when the specified behavior is undefined, by definition), but between the specified behavior and the programmers' expectations.

In other words, the compiler is not at fault for doing surprising things in cases where the behavior is undefined; that's the entire point of undefined behavior. It's the language that's at fault for specifying the behavior as undefined.

In other other words, if programmers need to be able to rely on certain behaviors, then those behaviors should be part of the specification.

wfo9y ago

In some sense the language is the compiler and the compiler is the language; the language is much like a human language, used for its utility in expressing things (ideas, programs). You can tell if your human language words work by determining if people understand you. If people start being obtuse and refusing to understand you because of an arbitrary grammar rule that isn't really enforced, you'd be right to be upset with the people just as much as the grammar.

It in fact doesn't matter at all what the standard says if GCC and LLVM say something different, because you can't use the standard to generate assembly code.

The standard doesn't have anything to say about UB, so it's the compiler's responsibility to do the most reasonable, non-shocking thing with it possible: if I'm a GCC developer and you ran GCC on one of these fairly mundane examples and it compiled without error then ran rm -rf / or stole your private RSA keys and posted them on 4chan and I said "well, you can't be mad because it's undefined, it's the standard's fault" you'd probably punch me in the face after some quick damage control.

If it deletes an if loop or terminates a spinlock early that's potentially even worse than those two examples.

1 more reply

E63009y ago

On the other hand, the expected and desirable behavior in one platform might be different from that in another platform. It's possible to overspecify and end up requiring extra code when performing ordinary arithmetic operations, or lock yourself out of useful optimizations.

1 more reply

vyodaiken9y ago

It's a stupid convention of compiler writers and standards writers at the expense of common sense and engineering standards. In fact there are many thousands of lines of C code that depend on compilers doing something sensible with UB. For example 0 is a valid address in many cases (even in some versions of UNIX). The decision to allow compiler writers to make counter-factual assumptions on the basis of UB is the kind of decision one expects from petty bureaucrats.

1 more reply

monocasa9y ago

There's no compiler writers throwing out

  if(undefined_behavior) {
    ruin_developers_day();
  }

It tends to be the effects of valid by the spec optimizations making assumptions that would only not be true during undefined behavior.

maxlybbert9y ago

People have been a little sloppy with the terms, but there's a difference between implementation defined behavior and undefined behavior. Generally, the committee allows undefined behavior when it doesn't believe a compiler can detect a bug cheaply.

Of course, many programmers complain about how the committee defines "cheaply." Trying to access an invalid array index is undefined because the way to prevent that kind of bug would be to add range checking to every array access. So, each extra check isn't expensive, but the committee decided that requiring a check on every array access would be too expensive overall. The same applies to automatically detecting NULL pointers.

And the fact that the standard doesn't require a lot -- a C program might not have an operating system underneath it, or might be compiled for a CPU that doesn't offer memory protection -- means that the committee's idea of "expensive" isn't necessarily based on whatever platforms you're familiar with.

But it is certainly true that a compiler can add the checks, or can declare that it will generate code that acts reliably even though the standard doesn't require it. And it's even true that compilers often have command line switches specifically for that purpose. But in general I believe those switches make things worse: your program isn't actually portable to other compilers, and when somebody tries to run your code through a different compiler, there's a very good chance they won't get any warnings that the binary won't act as expected.

nsajko9y ago

Why restrict yourself to one compiler if you can write portable code?

Clang and gcc provide flags that enable nonstandard behavior, and you can use static and dynamic (asan, ubsan) tools to detect errors in your code, it does not have to be hard to write correct code.

jcranmer9y ago

Strict aliasing and ODR violations are extremely difficult to detect; these are the poster children for "undefined behavior that's hard to avoid and could seriously ruin your day if the compiler gets wind of it."

There does appear to finally be a strict aliasing checker, but I have no experience with it.

to3m9y ago

In the main, people seem to be unfamiliar with what lies underneath C, so they never seem to really get this idea that you might be able to (or want to) expect any behaviour other than that imposed by its own definition.

TillE9y ago

Right. Except for a few optimizer edge cases, you generally know what "undefined behavior" is going to spit out on a particular machine. Signed integer overflow, for example, almost always happens exactly the way you'd expect.

aidanhs9y ago· 7 in thread

My 'favourite' bit of surprising (not undefined) behaviour I've seen recently in the C11 spec is around infinite loops, where

void foo() { while (1) {} }

will loop forever, but

void foo(int i) { while (i) {} }

is permitted to terminate...even if i is 1:

> An iteration statement whose controlling expression is not a constant expression, that performs no input/output operations, does not access volatile objects, and performs no synchronization or atomic operations in its body, controlling expression, or (in the case of a for statement) its expression-3, may be assumed by the implementation to terminate

To make things a bit worse, llvm can incorrectly both of the above terminate - https://bugs.llvm.org//show_bug.cgi?id=965.

pcvarmint9y ago

It means that empty loops (loops with empty bodies) can be completely removed if the controlling expression has no side effects.

> This is intended to allow compiler transformations such as removal of empty loops even when termination cannot be proven.

It means while(i) {} can be eliminated as if i were 0, because there are no side effects in the loop expression or the loop body, and what would be the point of the loop if it never terminated on a non-constant expression?

As an optimization, the optimizer is allowed to eliminate it as a useless loop with no side effects. If you really want an infinite loop, you can use while (1) {}.

There are cases where automatically generated C code might have empty loops which are useless.

If you really want to go to sleep, use pause() or similar. An infinite loop eats up CPU cycles.

TorKlingberg9y ago

It's quite common in embedded systems to have the fault handler end with an infinite loop, to give the programmer a chance to attach a debugger an inspect the call stack. Sometimes this behavior is turned on or off with a debug flag, which can trigger this unexpected optimization if the flag is not a compile time constant.

3 more replies

rkv9y ago

> If you really want to go to sleep, use pause() or similar. An infinite loop eats up CPU cycles

Yes but an infinite loop + sleep is okay, right?

1 more reply

fmap9y ago

This definition is actually required for the correctness of many standard compiler optimizations such as partial redundancy elimination and code motion.

adamnemecek9y ago

What's the point of this?

mnarayan019y ago

If the optimizer can determine that "nothing happens" in the loop, it can optimize the loop away without attempting to determine whether or not the loop terminates.

bcoates9y ago

It allows the optimizer to assume away the halting problem; all nontrivial loops are obligated to halt.

2 more replies

junk_disposal9y ago· 7 in thread

Honestly, Optimizing compilers will kill C.

It killed the one thing C was good at - simplicity (you know exactly what happens where, note I'm not saying speed, as C++ can be quite a bit faster than C).

Now, due to language lawyering, you can't just know C and your CPU, you have to know your compiler (and every iteration of it!). And if you slip somewhere, your security checks blow up (http://blog.regehr.org/archives/970 https://bugs.chromium.org/p/nativeclient/issues/detail?id=24...) .

msbarnett9y ago

> Now, due to language lawyering, you can't just know C and your CPU, you have to know your compiler (and every iteration of it!).

This mythical time never existed. You always had to know your compiler -- C simply isn't well specified enough that you can accurately predict the meaning of many constructs without reference to the implementation you're using.

It used to, if anything, be much much worse, with different compilers on different platforms behaving drastically different.

vyodaiken9y ago

This is not really correct. The kinds of implementation dependencies usually encountered reflected processor architecture. The C standards committee and compiler community have created a situation in which different levels of "optimization" can change the logical behavior of the code! Truly a ridiculous state of affairs. The standards committee has some mysterious idea I suppose, but the compiler writers who want to do program transformation should work on mathematica or prolog, not C.

1 more reply

dbaupp9y ago

Optimisers are what made C what it is: they convert the idealised PDP-11 assembly into something efficient on modern computers, and speed is something C programmers care about.

ArkyBeagle9y ago

In the large, no, they don't care about the 90-95% of the code base that's not performance critical. And these days, the stuff that is critical will be #ifdef and asm(...) stew.

I can't tell you how many projects I have been on where disabling optimization made no measurable difference in performance.

This being said, I cannot speak for game devs nor video device driver developers.

2 more replies

tacostakohashi9y ago

If you do know your compiler and your CPU (singular), you're probably not really programming C.

Conversely, if you maintain software that compiles on a bunch of compilers, operating systems and architectures (particularly little endian + big endian, 32 bit + 64 bit), then it's probably written in something rather like C. A lot of people do this.

1 more reply

ArkyBeagle9y ago

Just use the parts that work unambiguously. It's a surprisingly small subset of the language.

baby9y ago

undefined behavior has always been undefined behavior. Optimizing the compiler doesn't change that fact.

E63009y ago· 3 in thread

1. Unless C's variable definition rules are completely different from C++'s, int i; is a full definition, not a declaration. If both definitions appear at the same scope (e.g. global), this will cause either a compiler error or a linker error. A variable declaration would be extern int i;

khedoros19y ago

C's variable definition rules are different from C++'s. gcc happily compiles those two lines, g++ exits with the "redefinition" error.

caf9y ago

Yes, in C a plain

  int i;

at file scope is a tentative definition - if, by the end of the compilation unit, no definition has been seen, one of them will become a definition, otherwise it is just a declaration.

On the other hand, this:

  int i = 0;

is a definition, and you can't have two of those.

E63009y ago

That was unexpected.

1 more reply

Kenji9y ago· 2 in thread

I'm sorry, but the answer this website gives to 1. is wrong. See for yourself:

  int i;
  int i = 10;
  
  int main(int argc, char* argv[]){
  	return 0;
  }

Try to compile it. It doesn't work (gcc.exe (GCC) 5.3.0), the error is:

  a.cc:2:5: error: redefinition of 'int i'
   int i = 10;
       ^
  a.cc:1:5: note: 'int i' previously declared here
   int i;
       ^

Either I misunderstood the author and this example, or I do know C.

mauricioc9y ago

Judging by the .cc extension, you are compiling this with a C++ compiler. Quoting from Annex C (which documents the incompatibilities between C++ and ISO C) of the C++ standard:

   Change: C++ does not have “tentative definitions” as in C E.g., at
   file scope,

   int i;
   int i;

   is valid in C, invalid in C++. This makes it impossible to define
   mutually referential file-local static objects, if initializers are
   restricted to the syntactic forms of C. For example,

   struct X { int i; struct X *next; };
   static struct X a;
   static struct X b = { 0, &a };
   static struct X a = { 1, &b };

   Rationale: This avoids having different initialization rules for
   fundamental types and user-defined types.
   
   Effect on original feature: Deletion of semantically well-defined
   feature.

   Difficulty of converting: Semantic transformation.

   Rationale: In C++, the initializer for one of a set of
   mutually-referential file-local static objects must invoke a
   function call to achieve the initialization.

   How widely used: Seldom.

Kenji9y ago

facepalm of course, even if I use gcc, if I compile a.cc it switches to the c++ compiler. Thanks.

Tharre9y ago· 1 in thread

I don't think this Q&A format makes for a good case of not knowing C.

I mean I got all answers right without thinking about them too much, but would I too if I had to review hundreds of lines of someone else's code? What about if I'm tired?

It's easy to spot mistakes in isolated code pieces, especially if the question already tells you more or less what's wrong with it. But that doesn't mean you'll spot those mistakes in a real codebase (or even when you write such code yourself).

moosingin3space9y ago

This is further compounded by how difficult it is to build useful abstractions in C, meaning that much real-world C consists of common patterns, and reviewers focus on recognizing common patterns, which increases the chances that small things slip through code review.

Agreed that these little examples aren't too difficult, especially if you have experience, but I certainly do not envy Linus Torvalds' job.

sparky_9y ago· 1 in thread

I suppose this sort of ambiguity is what drives the passion of Rust and Go programmers.

barsonme9y ago

Sorta. I write mostly Go (some JS, PHP) and I got 6/10, forgetting mostly stupid stuff like passing (-INT_MIN, -1) to #12.

But some of those are prevalent in Go. For example, 1.0 / 1e-309 is +Inf in Go, just as it is in C—it's IEEE 754 rules. int might not always be able to hold the size of an object in Go, just like C. In Go #6 wraps around and is an infinite loop, just like C.

The questions that don't, in some way, translate to Go are #2, #7, #8, and #10.

But, to your credit, I do like how Go has very limited UB (basically race conditions + some uses of the unsafe package) and works pretty much how you'd expect it to work.

rdc129y ago· 1 in thread

Isn't this line from #3, undefined behavior not mentioned in the article (sequence point violation)

zp++ = xp + *yp;

msbarnett9y ago

That's not a sequence point violation. The C standard makes it clear that zp gets xp + *yp prior to the increment. Quoting 6.5.2.4

> The result of the postfix ++ operator is the value of the operand. After the result is obtained, the value of the operand is incremented. (That is, the value 1 of the appropriate type is added to it.) See the discussions of additive operators and compound assignment for information on constraints, types, and conversions and the effects of operations on pointers. The side effect of updating the stored value of the operand shall occur between the previous and the next sequence point.

The last sentence is key.

hermitdev9y ago

It's worth noting that for example #12, the assert will only fire for debug builds (i.e. the macro NDEBUG is not defined). So, depending on how the source is compiled, it may be able to invoke the div function with b == 0.

eon19y ago

C also: https://news.ycombinator.com/item?id=12902304

federicoponzi9y ago

Before: What? I know C. After 3 questions: Ok, I don't know C. Well played sir.

brianmurphy9y ago

As a former C programmer, you know not to fool around at the max bounds of a type. That avoids all of the integer overflow/underflow conditions. When in doubt, you just throw a long or unsigned on there for insurance. :)

nightcracker9y ago

I got every single one right. Does that mean I know C through and through? Perhaps. But all of these are the 'default' FAQ pitfalls of C, not the really tricky stuff.

AndyKelley9y ago

I made this post as a response. Disclaimer: yet another programming language trying to dethrone C. People seem to be less enthusiastic about the subject these days.

http://andrewkelley.me/post/zig-already-more-knowable-than-c...

kvakkefly9y ago

Anyone who enjoys this will also enjoy http://cppquiz.org

Hydraulix9899y ago

I feel bad because I'm smart enough to answer these questions correctly in a quiz format, but if I saw any of them in production code, I would not even think twice about it.

(the quiz questions themselves lead you on, plus I read the MIT paper on undefined behavior that was posted on here back in 2013)

wmu9y ago

#4 is not really language issue, rather a floating point numbers feature.

raarts9y ago

(2015)

j / k navigate · click thread line to collapse

178 comments

64 comments · 19 top-level

DSMan1952769y ago· 13 in thread

I'd also add that there are ways to 'get around' some of these issues if necessary - for example, gcc has a flag for disabling strict-aliasing, and a flag for 2's complement signed-integer wrapping.

mjevans9y ago

I don't think #2 has been fully beaten to death yet.

Under what set of logic does being able to de-reference a pointer confer that it's value is not 0 (which is what the test equates to)?

In my opinion that is an, often working but, incorrect optimization.

maxlybbert9y ago

As others have stated, compilers generally don't actually have special case code to create unintuitive behavior if it looks like the programmer goofed.

2 more replies

msbarnett9y ago

> Under what set of logic does being able to de-reference a pointer confer that it's value is not 0 (which is what the test equates to)?

The compiler isn't. It knows that you're testing that a pointer is not null.

It therefore follows that x must not be null in the test, since x has not changed.

1 more reply

E63009y ago

> Under what set of logic does being able to de-reference a pointer confer that it's value is not 0 (which is what the test equates to)?

Simple: undefined behavior makes all physically possible behaviors permissible.

2 more replies

DSMan1952769y ago

1 more reply

wahern9y ago

  Under what set of logic does being able to de-reference a pointer confer that it's value is not 0 (which is what the test equates to)?

Normal deductive logic?

  * No NULL pointer can be dereferenced. 
  * x is dereferenced.
  * Therefore, x is not a NULL pointer.

Imagine the routine had this code, instead:

  void foo (int *x) {
    if (*x != *x) {
    {
      return;
    }
    bar();
    return;
  }

wouldn't you expect the compiler to apply the same optimizations? Or would you be upset that eliding the check broke some code that depended on a race condition somewhere else in your program?

  #define OPTIONS_INIT(...) { .foo = 1, .bar = 3, __VA_ARGS__ }
  struct options {
    int foo;
    int bar;
  };

allowing applications to have code like:

  struct options opts = OPTIONS_INIT(.bar = 0);

1 more reply

kelnos9y ago

What's the best way to "fix" strict aliasing without disabling the undefined behavior around it? Using a union?

DSMan1952769y ago

> What's the best way to "fix" strict aliasing without disabling the undefined behavior around it? Using a union?

I was just talking about `-fno-strict-aliasing`, which is a flag for `gcc` (And `clang` I assume), but it does remove all UB like you're saying by simply allowing all pointers to alias.

E63009y ago

> What's the best way to "fix" strict aliasing without disabling the undefined behavior around it? Using a union?

1 more reply

the_cap_theorem9y ago

Yes, like most of the "undefined behaviour allows your computer to format the disk"-style posts this one seems to be written by a programmer with novice-intermediate C knowledge.

What irks me is the intro >> The purpose of this article is to make everyone (especially C programmers) say: “I do not know C”. <<

I think the purpose of the article was mainly for the author to write down some things he learned. Apparently it was his expectation that readers wouldn't be able to answer the quiz.

However, if you can't answer (at least most) of these questions correctly you're _not_ an expert c programmer.

So I think the correct intro here should be "The purpose of this blog post is to to show that if you want to learn C, you actually have to learn it and should not attempt to 'wing it'".

...and maybe also that you should not write patronizing blog posts about a topic which you haven't fully grasped yet yourself.

rwj9y ago

deathanatos9y ago

DSMan1952769y ago

2 more replies

userbinator9y ago· 10 in thread

http://blog.regehr.org/archives/1180 and https://news.ycombinator.com/item?id=8233484

sjolsen9y ago

>the problem is with compilers (and their developers) who think UB really means they can do anything

But that's exactly what undefined behavior means.

In other other words, if programmers need to be able to rely on certain behaviors, then those behaviors should be part of the specification.

wfo9y ago

It in fact doesn't matter at all what the standard says if GCC and LLVM say something different, because you can't use the standard to generate assembly code.

If it deletes an if loop or terminates a spinlock early that's potentially even worse than those two examples.

1 more reply

E63009y ago

1 more reply

vyodaiken9y ago

1 more reply

monocasa9y ago

There's no compiler writers throwing out

  if(undefined_behavior) {
    ruin_developers_day();
  }

It tends to be the effects of valid by the spec optimizations making assumptions that would only not be true during undefined behavior.

maxlybbert9y ago

nsajko9y ago

Why restrict yourself to one compiler if you can write portable code?

Clang and gcc provide flags that enable nonstandard behavior, and you can use static and dynamic (asan, ubsan) tools to detect errors in your code, it does not have to be hard to write correct code.

jcranmer9y ago

There does appear to finally be a strict aliasing checker, but I have no experience with it.

to3m9y ago

TillE9y ago

aidanhs9y ago· 7 in thread

My 'favourite' bit of surprising (not undefined) behaviour I've seen recently in the C11 spec is around infinite loops, where

void foo() { while (1) {} }

will loop forever, but

void foo(int i) { while (i) {} }

is permitted to terminate...even if i is 1:

To make things a bit worse, llvm can incorrectly both of the above terminate - https://bugs.llvm.org//show_bug.cgi?id=965.

pcvarmint9y ago

It means that empty loops (loops with empty bodies) can be completely removed if the controlling expression has no side effects.

> This is intended to allow compiler transformations such as removal of empty loops even when termination cannot be proven.

As an optimization, the optimizer is allowed to eliminate it as a useless loop with no side effects. If you really want an infinite loop, you can use while (1) {}.

There are cases where automatically generated C code might have empty loops which are useless.

If you really want to go to sleep, use pause() or similar. An infinite loop eats up CPU cycles.

TorKlingberg9y ago

3 more replies

rkv9y ago

> If you really want to go to sleep, use pause() or similar. An infinite loop eats up CPU cycles

Yes but an infinite loop + sleep is okay, right?

1 more reply

fmap9y ago

This definition is actually required for the correctness of many standard compiler optimizations such as partial redundancy elimination and code motion.

adamnemecek9y ago

What's the point of this?

mnarayan019y ago

If the optimizer can determine that "nothing happens" in the loop, it can optimize the loop away without attempting to determine whether or not the loop terminates.

bcoates9y ago

It allows the optimizer to assume away the halting problem; all nontrivial loops are obligated to halt.

2 more replies

junk_disposal9y ago· 7 in thread

Honestly, Optimizing compilers will kill C.

It killed the one thing C was good at - simplicity (you know exactly what happens where, note I'm not saying speed, as C++ can be quite a bit faster than C).

msbarnett9y ago

> Now, due to language lawyering, you can't just know C and your CPU, you have to know your compiler (and every iteration of it!).

It used to, if anything, be much much worse, with different compilers on different platforms behaving drastically different.

vyodaiken9y ago

1 more reply

dbaupp9y ago

Optimisers are what made C what it is: they convert the idealised PDP-11 assembly into something efficient on modern computers, and speed is something C programmers care about.

ArkyBeagle9y ago

In the large, no, they don't care about the 90-95% of the code base that's not performance critical. And these days, the stuff that is critical will be #ifdef and asm(...) stew.

I can't tell you how many projects I have been on where disabling optimization made no measurable difference in performance.

This being said, I cannot speak for game devs nor video device driver developers.

2 more replies

tacostakohashi9y ago

If you do know your compiler and your CPU (singular), you're probably not really programming C.

1 more reply

ArkyBeagle9y ago

Just use the parts that work unambiguously. It's a surprisingly small subset of the language.

baby9y ago

undefined behavior has always been undefined behavior. Optimizing the compiler doesn't change that fact.

E63009y ago· 3 in thread

khedoros19y ago

C's variable definition rules are different from C++'s. gcc happily compiles those two lines, g++ exits with the "redefinition" error.

caf9y ago

Yes, in C a plain

  int i;

at file scope is a tentative definition - if, by the end of the compilation unit, no definition has been seen, one of them will become a definition, otherwise it is just a declaration.

On the other hand, this:

  int i = 0;

is a definition, and you can't have two of those.

E63009y ago

That was unexpected.

1 more reply

Kenji9y ago· 2 in thread

I'm sorry, but the answer this website gives to 1. is wrong. See for yourself:

  int i;
  int i = 10;
  
  int main(int argc, char* argv[]){
  	return 0;
  }

Try to compile it. It doesn't work (gcc.exe (GCC) 5.3.0), the error is:

  a.cc:2:5: error: redefinition of 'int i'
   int i = 10;
       ^
  a.cc:1:5: note: 'int i' previously declared here
   int i;
       ^

Either I misunderstood the author and this example, or I do know C.

mauricioc9y ago

Judging by the .cc extension, you are compiling this with a C++ compiler. Quoting from Annex C (which documents the incompatibilities between C++ and ISO C) of the C++ standard:

   Change: C++ does not have “tentative definitions” as in C E.g., at
   file scope,

   int i;
   int i;

   is valid in C, invalid in C++. This makes it impossible to define
   mutually referential file-local static objects, if initializers are
   restricted to the syntactic forms of C. For example,

   struct X { int i; struct X *next; };
   static struct X a;
   static struct X b = { 0, &a };
   static struct X a = { 1, &b };

   Rationale: This avoids having different initialization rules for
   fundamental types and user-defined types.
   
   Effect on original feature: Deletion of semantically well-defined
   feature.

   Difficulty of converting: Semantic transformation.

   Rationale: In C++, the initializer for one of a set of
   mutually-referential file-local static objects must invoke a
   function call to achieve the initialization.

   How widely used: Seldom.

Kenji9y ago

facepalm of course, even if I use gcc, if I compile a.cc it switches to the c++ compiler. Thanks.

Tharre9y ago· 1 in thread

I don't think this Q&A format makes for a good case of not knowing C.

I mean I got all answers right without thinking about them too much, but would I too if I had to review hundreds of lines of someone else's code? What about if I'm tired?

moosingin3space9y ago

Agreed that these little examples aren't too difficult, especially if you have experience, but I certainly do not envy Linus Torvalds' job.

sparky_9y ago· 1 in thread

I suppose this sort of ambiguity is what drives the passion of Rust and Go programmers.

barsonme9y ago

Sorta. I write mostly Go (some JS, PHP) and I got 6/10, forgetting mostly stupid stuff like passing (-INT_MIN, -1) to #12.

The questions that don't, in some way, translate to Go are #2, #7, #8, and #10.

But, to your credit, I do like how Go has very limited UB (basically race conditions + some uses of the unsafe package) and works pretty much how you'd expect it to work.

rdc129y ago· 1 in thread

Isn't this line from #3, undefined behavior not mentioned in the article (sequence point violation)

zp++ = xp + *yp;

msbarnett9y ago

That's not a sequence point violation. The C standard makes it clear that zp gets xp + *yp prior to the increment. Quoting 6.5.2.4

The last sentence is key.

hermitdev9y ago

eon19y ago

C also: https://news.ycombinator.com/item?id=12902304

federicoponzi9y ago

Before: What? I know C. After 3 questions: Ok, I don't know C. Well played sir.

brianmurphy9y ago

nightcracker9y ago

I got every single one right. Does that mean I know C through and through? Perhaps. But all of these are the 'default' FAQ pitfalls of C, not the really tricky stuff.

AndyKelley9y ago

I made this post as a response. Disclaimer: yet another programming language trying to dethrone C. People seem to be less enthusiastic about the subject these days.

http://andrewkelley.me/post/zig-already-more-knowable-than-c...

kvakkefly9y ago

Anyone who enjoys this will also enjoy http://cppquiz.org

Hydraulix9899y ago

I feel bad because I'm smart enough to answer these questions correctly in a quiz format, but if I saw any of them in production code, I would not even think twice about it.

(the quiz questions themselves lead you on, plus I read the MIT paper on undefined behavior that was posted on here back in 2013)

wmu9y ago

#4 is not really language issue, rather a floating point numbers feature.

raarts9y ago

(2015)

j / k navigate · click thread line to collapse