Undefined Behavior in C and C++ | Better HN

82 comments

39 comments · 9 top-level

tyfighter2y ago· 12 in thread

I keep finding myself angry about the recent (some number of years) focus on C and C++'s undefined behavior. I have been writing C and C++ for 27 years, 16 years professionally, and despite all the scary implications, I do not understand why ANYONE cares. I do not get it. This is yet another article that goes on and on about nonsensical situations that are just shitty code. Integer overflow? Who cares? Unless you're targeting a specific compiler and architecture, it doesn't matter. C and C++ have footguns. Everyone knows that. Who cares?

I am anger commenting, because I'm just sick of this, but this article still says nothing to convince me that any of this matters.

rwallaceOP2y ago

Right. To be clear, the purpose of the article is not 'zounds, C and C++ have footguns!' but 'C and C++ have footguns – yeah, this is not exactly breaking news – but here is a hopefully helpful summary of where they are, why they exist, and what you can do to avoid them'.

If you are already satisfied you know how to avoid them, and you don't need any more help with that, then you are not the target audience, and should by all means ignore the article.

AnimalMuppet2y ago

But tyfighter has some reason. People take such articles, and use them to beat up anyone writing C++, arguing that they are stupid to use such an undependable tool.

So, yes, people who know what they're doing can ignore such articles, as a first-order effect. But there are second-order effects from such articles, and while they don't change anything, they are rather unpleasant. Hence tyfighter's anger - he gets tired of being on the receiving end of the fallout from such articles.

lifthrasiir2y ago

You haven't changed but compilers have changed, unfortunately. Unless you stick on -O0 or -fno-strict-aliasing all the time, the chance is that your UB-ridden code can break in the future with more powerful compilers exploiting more UBs. So that's why you have to care now if you didn't so far. (Or you can argue that optimizations should be turned off, which is indeed another valid, though uncommon, answer preferred by djb for example.)

tyfighter2y ago

Actually, I have changed, and I've changed corporate C/C++ MANY times to satisfy compiler upgrades. It is always just shitty code. It's never something insidious. Bugs happen. It probably wasn't your intention, but you've made UB sound pretty awesome.

eminence322y ago

> This is yet another article that goes on and on about nonsensical situations that are just shitty code.

> Who cares?

The short flippant answer is: because everyone writes shitty code at some point. It generally doesn't get committed or released, but during development, shitty buggy code with Undefined Behavior happens.

Here's a concrete example of some code that I actually wrote (simplified greatly so it could be a small illustrative example): https://godbolt.org/z/xzehrWE57

Knowing about UB is a useful way to describe what's going on in this code example, and why the compiler is doing what it's doing. If you see your code behaving in "impossible" ways, knowing about UB can give you some hints about where to look.

AlotOfReading2y ago

As an engineer, my job isn't to write code, it's to deliver systems that do specific things. That means that I need to understand the defined behavior of the code I put into the system. Undefined behavior anywhere means you lack defined behavior everywhere in C/C++.

You can't work around this by writing more code or eliminate undefined behavior with tools like linters and tests. Your one and only option is to write perfect code that only has defined behavior. The number of people that can accomplish this in practice rounds to zero.

So yeah, how can you not care about UB? It's the semantic elephant in the room. Every conversation has to include it, implicitly or not.

tyfighter2y ago

What universe are you working in where you think ANY of that is actually true? In the land of reality where I live and work (I work in hardware), I'm not constructing philosophical prose about well-defined systems. This is another bad faith argument where undefined behavior is made out to be some house of cards. I hate to break it to you, but every computer and all it's software you've ever used is a monument to the glory of undefined behavior, because people just didn't worry about it.

dzaima2y ago

UB is far from the only source of systems not doing the desired thing - writing code that ends up at UB is as wrong as writing code that was written with an incorrect understanding of the invoked behavior.

Sure, the neat trick of a+1<a not working is perhaps undesirable, but, even if signed addition was defined to wrap, in most contexts an "a+1" subtracting four billion is not gonna be the specific thing you want it to do in your system.

Alternatively, signed overflow could be defined to return exactly 31415, which would be very concrete defined behavior, but barely if at all more useful compared to it being UB.

charcircuit2y ago

Because behavior does eventually get defined somewhere. Just because it's not defined in the C standard it does not mean you can't reason about it.

AnimalMuppet2y ago

Yeah, no. Yes, in theory undefined behavior can destroy your entire program. In practice? Not so much.

I do not care about bogeymen that exist in theory. I don't even care about bogeymen that affect your code. I only care about bogeymen that actually affect my code.

FpUser2y ago

>"Undefined behavior anywhere means you lack defined behavior everywhere in C/C++."

Well, stop programming then. Undefined behavior is everywhere. Your hardware, CPU microcode, any software written in any language etc. etc.

>"As an engineer"

Your statements suggest otherwise.

AnimalMuppet2y ago

Overall I agree with you. But the people writing the C++ standard library have to care.

mjevans2y ago· 7 in thread

Once again, I want to plead. At least have a Warning option to annotate any time undefined behavior is encountered by a compiler. The goal should be to promote optimizations to written code and improve code quality. Not just the result of one particular compiler.

Hi! I'm a former compiler engineer who specialized on undefined behaviour.

Would you like warnings on:

   * int f(int x, int y) { return x + y; }
   * int get_x_coord(Point *p) { return p->x; }
   * void compute_and_cache(const char *key) { *get_cache_bucket_for(key) = compute_value_for(key); }

I'm curious, what would you do with a warning on every load or store through a pointer?

On the flip side, I can offer -fsanitize=undefined which will catch when you do many things that have UB at runtime. It does not change the ABI which means that there are some bugs it can't catch, but deploying it is easier since you do not need to recompile all your libraries with it (like your C++ standard library and C library, in particular). You can use this to help you build unit tests that send intentionally overflowing values into your functions and show that they do not overflow. It turns untestable problem (since you cannot check for UB after it happens) into a problem you can write deterministic tests for.

wrs2y ago

The problem isn’t overflow, the problem is the backwards logic of the compiler assuming there will never be any execution that leads to overflow, so the code that overflows just vanishes completely.

In other words,

    bool overflowed = (x+1)<x;

should be meaningful. It may or may not do what you want on any given architecture, but it shouldn’t just be assumed false.

Different levels of warnings might be useful.

-Wub # Warn _anytime_ there is detected potential undefined behavior, irrespective of if there is an associated optimization.

-Wubelim # Warn any time code is eliminated as a result of undefined behavior / assumptions.

-Wub... # Any other classes of UB optimizations that change the program as (incorrectly) written.

Again, the goal is to provide feedback that improves the program and possibly educates / reminds the programmer about how their meanings might be misunderstood.

I wouldn't like warnings for these things. I'd like them all to trap in a well-defined (but non-recoverable) way if UB actually gets triggered. And I'd like this to be the default behavior, even in release builds. Safety should never be opt-in.

pornel2y ago

UB is not an event that happens. It’s an assumption baked into the design of the compiler.

For example, on 64-bit arch if you index arrays by an int, compiler can use CPU’s 64-bit addressing modes even when they don’t overflow when 32-bit int would. The compiler is taking advantage of it all the time. It wouldn’t make sense to warn about every array, but OTOH the compiler can’t know at compile time if your pointer arithmetic will ever overflow an int.

layer82y ago

Undefined behavior are runtime conditions, in the general case, not compile-time conditions.

wrs2y ago

Eliminating a statement by assuming UB will never happen is a compile-time condition.

I think the problem is more that it’s not as if there’s a single place in the compiler saying “aha! UB! let’s surprise the developer!”. It’s the effect of propagation through multiple optimization steps.

olliej2y ago· 4 in thread

A core part of the problem of UB in C and C++, is that it is gratuitously over applied.

Mercifully the article calls out the BS argument of "old hardware" justifying UB. It is simply a false argument. The overwhelming majority of UB in C and C++ should be either implementation defined or unspecified behaviour. Security vulnerabilities due to overflow or null dereferences being UB should never have been possible because there are no platforms in which those operations are not defined (some trap, some wrap, some go to infinity), but that is all under the banner of implementation defined behavior. Labelling these things as UB is _solely_ to allow performance optimizations in narrow cases, at the cost of safety in all cases.

In committee meetings I've been in recently the new refrain I'm hearing/reading that has replaced "we need to support various hardware" is an even more stupid argument: if we make it so that these aren't UB then people will rely on the common behavior and write code that is incorrect on platforms that behave differently. e.g. instead of software that is always wrong on one platform, you make software that is semi-randomly wrong on all platforms (because whether or not a compiler removes UB in one case is dependent on compiler version, flags, inlining, etc and if any of those change then suddenly the same code you had yesterday has a security bug when shipped).

Ub is a bug. We can define what happens, but your code is still wrong if it gets there. Leaving it undefined mean the optimizer can make useful optimizations sith no harm as your code should be useful anyway.

ack_complete2y ago

Not always. [fs.race.behavior] makes it undefined behavior to use the C++ filesystem library in a way that introduces a race on the filesystem, including with _other processes_:

https://eel.is/c++draft/fs.race.behavior

I'm not sure how it is possible for a program to avoid this.

lifthrasiir2y ago

Many enough UBs are just a mismatch between the specification and programmer intents. Strict aliasing is a good example: why should aliasing be only allowed through `union` and otherwise UB? Only because it's easier to analyze and optimize. The specification could have instead defined any pair of explicitly aliasing types should be considered aliased, but then the possible optimization will be severely limited (for example, aliasing in one translation unit can inhibit an optimization in other units).

olliej2y ago

In what sense? C and C++ aren’t memory safe, so the specification has to say something about what happens if you’re dereferencing an invalid pointer (random value, out of bounds, frees pointer, etc).

That’s what UB exists for: there’s no behaviour we can actually define for some operations.

nullhole2y ago· 2 in thread

My favourite description of undefined behaviour. The poster is corrected later on in the thread about whether the specific operation discussed would invoke undefined behaviour, but the description of what happens when undefined behaviour occurs is gold:

https://groups.google.com/g/comp.lang.c/c/ZE2B2UorTtM/m/1ROv...

Joona I Palaste, 2001-01-19, comp.lang.c

    This isn't about the post-increment operator, this is about the order
    of evaluation of the operands.
    Since you're modifying the value of i twice without a sequence point
    in between, either of the two results are exactly as much "expected".
    Also, equally "expected" behaviour includes incrementing every
    variable in the array, flipping all the bits in every variable in the
    array, converting all instances of the text string "/usr" in memory
    to "fsck", changing the colours of your screen to purple, calling the
    police on your modem line and telling them you're being attacked by
    a one-eyed Martian wielding a herring while singing "Hi ho, it's off
    to work we go", and even weirder stuff.
    So... what it all boils to... when writing your compiler, just flip
    a coin and use the one of the two behaviours you listed that
    corresponds with the coin's face.

tom_2y ago

And yet the standard explicitly states that undefined behaviour can behave in some documented manner characteristic of the environment. As a simple question of quality of implementation, we should surely be able to demand that nothing confusing happens.

I don't disagree, and I think the quote above follows that idea. Undefined behaviour means that anything _could_ happen, but compiler writers should ensure something sensible happens in those cases. At least, that's what I took from it.

rwallaceOP2y ago· 2 in thread

Not the first discussion of this topic, by any means. In this case, I've tried to boil it down to the essential points a practical programmer needs to know, but the article still ended up longer than I initially aimed for.

wrs2y ago

One hopefully constructive comment… I didn’t find this a motivating example as intended:

    int foo_or_bar(int which) {
        // Assumes you don't mind both functions being called
        int x = foo();
        int y = bar();
        return *(&x + which);
    }

The argument being (if I understood right) if x has to have an address, it can’t be put in a register, so that must be UB or we can’t use registers. Well, how about the rule is that if I take the address of x, it can’t be put in a register? That seems like an obvious rule, and I seem to remember that was a safe assumption before the “great UBification” of compilers.

I’m sure there’s a better example of why UB helps optimization, but this one didn’t work for me.

josephcsible2y ago

> Well, how about the rule is that if I take the address of x, it can’t be put in a register?

The issue is that y might end up in a register, and you didn't take the address of y.

Jun82y ago· 1 in thread

Here's another interesting post if you want to delve further into an example of undefined behavior created by gcc optimization: https://thephd.dev/c-undefined-behavior-and-the-sledgehammer....

Also, this quote comes to mind: "C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do it blows your whole leg off": https://www.stroustrup.com/quotes.html

photochemsyn2y ago

Here's some more on gcc optimization from 2010, discusses how GCC optimization started eliminating null pointer checks in Linux kernel code necessiting compiling at a lower optimization level (towards the bottom). On the 'good' side, it also mentions tight loops can speed up 30-50% if the compiler can ignore signed integer overflow. Also has 'best practices' list:

https://blog.regehr.org/archives/213

andy992y ago· 1 in thread

In the bit where he shows

  void error(const char* msg);

  int successor(int a) {
      if (a + 1 < a) error("Integer overflow!");
      return a + 1;
  }

and says the if is compiled away at -O3, does any one know if it remains at any lower optimization level? I know some of the more aggressive optimizations intentionally ignore some checks, I don't know if that applies here. I found the -O3 odd for trying to help make his point, unless it doesn't work at -O2.

dzaima2y ago

It's optimized out on both gcc and clang on -O1 and above. -O3 is presumably just what the author defaults to for enabling optimizations (I also write -O3 everywhere by default).

iwsk2y ago· 1 in thread

I don't get it.

How can UB on double-free, use-after-free, dangling pointers, etc lead to optimizations?

lifthrasiir2y ago

Making double-free an UB makes `free` more efficient because there are less checks to make. Combined with use-after-free as an UB, that deallocated memory can be immediately reused for the next allocation without any repercussion. And making dangling pointer an UB makes most pointer analysis much more doable.

layer82y ago

I recommend reading the resources under https://en.cppreference.com/w/c/language/behavior#External_l... (–> External links).

j / k navigate · click thread line to collapse