Here is why I don’t blame the developers: writing fast, efficient systems code that satisfies the requirements of strict aliasing as defined by C/C++ is surprisingly difficult. It has taken me years to figure out the technically correct incantations for every weird edge case such that they always satisfy the requirements of strict aliasing. The code gymnastics in some cases are entirely unreasonable. In fairness, recent versions of C++ have been adding ways to express each of these cases directly, eliminating the need to use obtuse incantations. But we still have huge old code bases that assume compiler behavior, as was the practice for decades.
I am not here to attribute blame, I think it the causes are pretty diffuse honestly. This is just a part of the systems world we failed to do well, and it impacts the code we write every day. I see strict aliasing violations in almost every code base I look at.
In particular, C++20 gave us std::bit_cast (https://en.cppreference.com/w/cpp/numeric/bit_cast) for type punning and C++23 added std::start_life_time_as (https://en.cppreference.com/w/cpp/memory/start_lifetime_as) for interpreting raw bytes as an object.
That paper also highlights that checking is crucial, their initial Euclid compiler just required that there's no aliasing, but never checked. So of course programmers will make mistakes and without the checks those mistakes leak into running code. The finished compiler checked, which means the mistake won't even compile.
Shifting left in this way is huge, WUFFS shifts bounds misses left - when you write code which can have a bounds miss in C of course it just does have a bounds miss at runtime, there's a stray read or overwrite and chaos results maybe it's Remote Code Execution, in Rust the miss panics at runtime - maybe a Denial of Service or at least a major inconvenience. But in WUFFS it won't compile - you find out about your bug likely before it gets sent out for code review.
Most software can't be written in WUFFS, but "most" is doing a lot of work there, plenty of code which should be in WUFFS or an analogous language is not, meaning mistakes are not shifted left.
I blame this on how people like to teach C and present C.
It's very important that the second anyone conceives of the idea of learning C that they first off informed that trying things and seeing what happens is a highly unreliable method of learning how C programs behave and that C is not a high level assembly language.
If you teach C in relation to the abstract machine instead of any real world machine you will understandably scare off most people. Which is good, since most people shouldn't be learning or writing C. It's a language which can barely be written correctly even by people with the necessary self discipline to only write code they're 100% certain is well defined.
> It is difficult to determine if I’ve been successful in this endeavor.
Why is your program so full of casts between pointer types that you have difficulty determining if you've avoided strict aliasing?
Yes, if you treat C as a high level assembly language (like the linux kernel likes to do) then it becomes difficult to reason about the behaviour of your programs where 50% of them are in the grey area of uncertainty of whether they're well defined or not.
If you are forced to write C in a non-learning context, don't write any line of code unless you're certain you could tell someone which parts of the standard describe its behaviour.
> Here is why I don’t blame the developers: writing fast, efficient systems code that satisfies the requirements of strict aliasing as defined by C/C++ is surprisingly difficult.
C/C++ isn't a language. So I will stick to C because I don't know nor care about C++.
That being said, it's not hard to write efficient C which satisfies the requirements of strict aliasing except when you're dealing with idiotic APIs like bind or connect. Most code by default, assuming you use appropriate algorithms and data structures, is performant. The only time it becomes difficult with regards to strict aliasing is if you're micro optimizing.
While non-trivial, the case of converting between unsigned long and float shown in the article is entirely possible to do with completely safe C constructs. Likewise serialization/deserialization of binary data never requires coming close to aliasing unless you're dealing with a "native" endian protocol. In the case of general serialisation and deserialisation, compilers will reliably optimise such operations into one or two instructions (depending on whether you're decoding same-endianness or not).
I write database storage engines. Most of the runtime address space is being dynamically paged to storage directly by user space. You can't use mmap() for this. Consequently, objects don't have a fixed address over their lifetime and what a pointer actually points to is not always knowable at compile-time. These are all things that have to be dynamically resolved at runtime with zero copies in every context the memory might be touched. Fairly standard high-performance database stuff. The intrinsic ambiguity about the contents of a memory address create many opportunities to inadvertently create strict aliasing violations.
I've been doing it a long time, so I know the correct incantation for virtually every difficult strict aliasing edge case. Most developers are ignorant of at least some of these incantations because they are surprisingly difficult to lookup, it took me years to figure out some of them. When developers don't know they tend to YOLO it and hope the compiler does the desired thing. Which mostly works in practice, until it doesn't.
Recent versions of C++ have added explicit helper functions, which is a big improvement. Most developers don't know the code incantation required to reliably achieve the same effect as std::start_lifetime_as and they shouldn't have to.
First of all compilers disagree on many interpretations and consequences of abstract machine rules. Also compilers have bugs.
So a proficient C/C++ programmer does have to learn what compilers actually do in practice and what they guarantee beyond the standard (or how they differ from it).
> C/C++ isn't a language.
It isn't, but it is a family of languages that share a lot of syntax and semantics.
int foo(int *x) {
*x = 0;
// wait until another thread writes to *x
return *x;
}
Can the C compiler really optimize foo to always return 0? That seems extremely unintuitive to me.Yes
> That seems extremely unintuitive to me.
C compilers are extremely unintuitive. This is a relatively sane case, they do things that are much more surprising than this.
It's very common for beginner embedded programmers to forget to do this and spend hours debugging why the register doesn't change when it should.
[1] https://en.m.wikipedia.org/wiki/Volatile_(computer_programmi...
At lower levels, you might have something like an IPC primitive there, which would be protected by a spinlock or similar abstraction, the inline assembly for which will include a memory barrier.
And even farther down still, the memory pointed to by "x" might be shared with another async context entirely and the "wait for" operation might be a delay loop waiting on external hardware to complete. In that case this code would be buggy and you should have declared the data volatile.
This is a wrong, a memory barrier would not salvage this code from UB. The read from `x` must at the very least be synchronized, and there might be other UB lurking as well.
int volatile *x
as the parameter to get the changes from a different thread.That being said, my intuition matches what little anecdotal data I’ve seen from real perf-sensitive systems, and I’d ballpark 10-15% where it matters.
But no-one cares about real-world performance, people pick C and pick a C compiler because they want the thing that's fastest on artificial microbenchmarks.
Even small changes often require years and many revisions to be accepted - burnout is common. You would need to build a consensus that this change is desirable - that's highly unlikely at best. Strict aliasing has been widely implemented since the 1990s and many compilers benefit from the rules; many compiler vendors are on the committee. You'd have to convince them that they should make their customer's code slower.
What might be achievable, however, is some kind of technical report on undefined or implementation defined behavior. Many compilers have options that allow programs with some undefined behavior to behave as the user would expect. Microsoft's C and C++ compilers, for example, don't enforce strict aliasing and allow some forms of integer overflow in loop conditionals. There would be substantial value in defining a common profile for these options. It would still be an uphill battle to get it through the committee, though.
If we can't even get that, I doubt strict aliasing will ever be voted out.
And when it became an issue c. late 90's, it was actually "NO strict aliasing" that was the point of contention. Optimizers were suddenly able to do all sorts of magic, and compiler authors realized they were getting tripped up by the inability (c.f. the halting problem) to know for sure that this arbitrary pointer wasn't scribbling over the memory contents they were trying to optimize. You'd get better (often much better) code with -fno-strict-aliasing, which was tempting enough to turn it on and hope for better analysis tools to come along and save us from the resulting bugs.
We're still waiting, alas.
Basically, C code compiled to assembly in the Amiga era looked much more straightforward than the output produced by modern C compilers (with optimizations enabled at least), you could put both side by side and see a near 1:1 relationship between the C code and the assembly code (maybe also because the Motorola 68000 seems to have taken a lot of inspiration from the PDP instruction set).
The Strict Aliasing Situation Is Pretty Bad - https://news.ycombinator.com/item?id=11288665 - March 2016 (67 comments)