I am anger commenting, because I'm just sick of this, but this article still says nothing to convince me that any of this matters.
If you are already satisfied you know how to avoid them, and you don't need any more help with that, then you are not the target audience, and should by all means ignore the article.
So, yes, people who know what they're doing can ignore such articles, as a first-order effect. But there are second-order effects from such articles, and while they don't change anything, they are rather unpleasant. Hence tyfighter's anger - he gets tired of being on the receiving end of the fallout from such articles.
> Who cares?
The short flippant answer is: because everyone writes shitty code at some point. It generally doesn't get committed or released, but during development, shitty buggy code with Undefined Behavior happens.
Here's a concrete example of some code that I actually wrote (simplified greatly so it could be a small illustrative example): https://godbolt.org/z/xzehrWE57
Knowing about UB is a useful way to describe what's going on in this code example, and why the compiler is doing what it's doing. If you see your code behaving in "impossible" ways, knowing about UB can give you some hints about where to look.
You can't work around this by writing more code or eliminate undefined behavior with tools like linters and tests. Your one and only option is to write perfect code that only has defined behavior. The number of people that can accomplish this in practice rounds to zero.
So yeah, how can you not care about UB? It's the semantic elephant in the room. Every conversation has to include it, implicitly or not.
Sure, the neat trick of a+1<a not working is perhaps undesirable, but, even if signed addition was defined to wrap, in most contexts an "a+1" subtracting four billion is not gonna be the specific thing you want it to do in your system.
Alternatively, signed overflow could be defined to return exactly 31415, which would be very concrete defined behavior, but barely if at all more useful compared to it being UB.
I do not care about bogeymen that exist in theory. I don't even care about bogeymen that affect your code. I only care about bogeymen that actually affect my code.
Well, stop programming then. Undefined behavior is everywhere. Your hardware, CPU microcode, any software written in any language etc. etc.
>"As an engineer"
Your statements suggest otherwise.
Would you like warnings on:
* int f(int x, int y) { return x + y; }
* int get_x_coord(Point *p) { return p->x; }
* void compute_and_cache(const char *key) { *get_cache_bucket_for(key) = compute_value_for(key); }
I'm curious, what would you do with a warning on every load or store through a pointer?On the flip side, I can offer -fsanitize=undefined which will catch when you do many things that have UB at runtime. It does not change the ABI which means that there are some bugs it can't catch, but deploying it is easier since you do not need to recompile all your libraries with it (like your C++ standard library and C library, in particular). You can use this to help you build unit tests that send intentionally overflowing values into your functions and show that they do not overflow. It turns untestable problem (since you cannot check for UB after it happens) into a problem you can write deterministic tests for.
In other words,
bool overflowed = (x+1)<x;
should be meaningful. It may or may not do what you want on any given architecture, but it shouldn’t just be assumed false.-Wub # Warn _anytime_ there is detected potential undefined behavior, irrespective of if there is an associated optimization.
-Wubelim # Warn any time code is eliminated as a result of undefined behavior / assumptions.
-Wub... # Any other classes of UB optimizations that change the program as (incorrectly) written.
Again, the goal is to provide feedback that improves the program and possibly educates / reminds the programmer about how their meanings might be misunderstood.
For example, on 64-bit arch if you index arrays by an int, compiler can use CPU’s 64-bit addressing modes even when they don’t overflow when 32-bit int would. The compiler is taking advantage of it all the time. It wouldn’t make sense to warn about every array, but OTOH the compiler can’t know at compile time if your pointer arithmetic will ever overflow an int.
I think the problem is more that it’s not as if there’s a single place in the compiler saying “aha! UB! let’s surprise the developer!”. It’s the effect of propagation through multiple optimization steps.
Mercifully the article calls out the BS argument of "old hardware" justifying UB. It is simply a false argument. The overwhelming majority of UB in C and C++ should be either implementation defined or unspecified behaviour. Security vulnerabilities due to overflow or null dereferences being UB should never have been possible because there are no platforms in which those operations are not defined (some trap, some wrap, some go to infinity), but that is all under the banner of implementation defined behavior. Labelling these things as UB is _solely_ to allow performance optimizations in narrow cases, at the cost of safety in all cases.
In committee meetings I've been in recently the new refrain I'm hearing/reading that has replaced "we need to support various hardware" is an even more stupid argument: if we make it so that these aren't UB then people will rely on the common behavior and write code that is incorrect on platforms that behave differently. e.g. instead of software that is always wrong on one platform, you make software that is semi-randomly wrong on all platforms (because whether or not a compiler removes UB in one case is dependent on compiler version, flags, inlining, etc and if any of those change then suddenly the same code you had yesterday has a security bug when shipped).
https://eel.is/c++draft/fs.race.behavior
I'm not sure how it is possible for a program to avoid this.
That’s what UB exists for: there’s no behaviour we can actually define for some operations.
https://groups.google.com/g/comp.lang.c/c/ZE2B2UorTtM/m/1ROv...
Joona I Palaste, 2001-01-19, comp.lang.c
This isn't about the post-increment operator, this is about the order
of evaluation of the operands.
Since you're modifying the value of i twice without a sequence point
in between, either of the two results are exactly as much "expected".
Also, equally "expected" behaviour includes incrementing every
variable in the array, flipping all the bits in every variable in the
array, converting all instances of the text string "/usr" in memory
to "fsck", changing the colours of your screen to purple, calling the
police on your modem line and telling them you're being attacked by
a one-eyed Martian wielding a herring while singing "Hi ho, it's off
to work we go", and even weirder stuff.
So... what it all boils to... when writing your compiler, just flip
a coin and use the one of the two behaviours you listed that
corresponds with the coin's face. int foo_or_bar(int which) {
// Assumes you don't mind both functions being called
int x = foo();
int y = bar();
return *(&x + which);
}
The argument being (if I understood right) if x has to have an address, it can’t be put in a register, so that must be UB or we can’t use registers. Well, how about the rule is that if I take the address of x, it can’t be put in a register? That seems like an obvious rule, and I seem to remember that was a safe assumption before the “great UBification” of compilers.I’m sure there’s a better example of why UB helps optimization, but this one didn’t work for me.
The issue is that y might end up in a register, and you didn't take the address of y.
Also, this quote comes to mind: "C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do it blows your whole leg off": https://www.stroustrup.com/quotes.html
void error(const char* msg);
int successor(int a) {
if (a + 1 < a) error("Integer overflow!");
return a + 1;
}
and says the if is compiled away at -O3, does any one know if it remains at any lower optimization level? I know some of the more aggressive optimizations intentionally ignore some checks, I don't know if that applies here. I found the -O3 odd for trying to help make his point, unless it doesn't work at -O2.How can UB on double-free, use-after-free, dangling pointers, etc lead to optimizations?