Skip to content

Top Best Ask Show New Jobs

PLOS2021: ISO-C became unusable for operating systems (opens in new tab)

(yodaiken.com)

60 pointsvyodaiken4y ago98 comments

98 comments

49 comments · 9 top-level

pcwalton4y ago· 10 in thread

In general I've long been very skeptical of removing optimizations that rely on undefined behavior. People say "I'd happily sacrifice 1% for better theoretical semantics", but theoretical semantics don't pay the bills of compiler writers. Instead, compiler developers are employed by the largest companies, where a 1% win is massive amounts of dollars saved. Any complaint about undefined behavior in C must acknowledge the underlying economics to have relevance to the real world.

As the paper notes, there are plenty of alternative C compilers available to choose from. The reason why GCC and LLVM ended up attaining overwhelming market share is simply that they produce the fastest possible code, because, at the end of the day, that is what users want.

If you want to blame someone, blame the designers of the C language for doing things like making int the natural idiom to iterate over arrays even when size_t would be better. The fact that C programmers continue to write "for (int i = 0; i < n; i++)" to iterate over an array is why signed overflow is undefined, and it is absolutely a critical optimization in practice.

userbinator4y ago

The reason why GCC and LLVM ended up attaining overwhelming market is simply that they produce the fastest possible code

No, I think it's more because they are free.

In my experience, ICC can be much better at instruction selection while also not being so crazy with exploiting UB.

I'm also skeptical about the often-claimed superiority of ICC. The numbers I've seen are very equivocal.

Besides, it's irrelevant: there are lots of free C compilers that don't exploit UB, and also rarely get used.

ICC nowadays is based on LLVM (https://software.intel.com/content/www/us/en/develop/blogs/a...).

That might mean the differences have mostly disappeared, but that may depend on what the front end (icx vs clang) does.

gHosts4y ago

They also support the widest range of CPU's

vyodaikenOP4y ago

It is because they are "free" and also because they attract massive investment from FAANG and National Labs etc.

> If you want to blame someone, blame the designers of the C language for doing things like making int the natural idiom to iterate over arrays even when size_t would be better. The fact that C programmers continue to write "for (int i = 0; i < n; i++)" to iterate over an array is why signed overflow is undefined, and it is absolutely a critical optimization in practice.

Well, size_t is unsigned and has defined overflow, so you'd lose the optimization if you switched to it. (Specifically, there's cases where defining overflow means a loop is possibly infinite, which blocks all kinds of optimizations.)

Many languages try to fix this by defaulting to wrap on overflow, but that was a mistake because you rarely actually want that. A better solution is to have a loop iteration statement that doesn't have an explicit "int i" or "i++" written out.

The optimization I'm referring to is widening a 32-bit loop IV to 64-bit so it can stay in a register: https://gist.github.com/rygorous/e0f055bfb74e3d5f0af20690759...

size_t obviates the need for this optimization.

vyodaikenOP4y ago

There was no size_t in K&R1. Size_t was introduced in the standards process as was the definition of index variables in the for loop. You may have a complain with the standard there.

As for the optimization, it is based on misunderstanding of C semantics. The only place where the sign extend makes a difference is where pointers are longer than "ints" AND where the the iterator can overflow, and in that case, sign extend only makes a difference if the loop is incorrectly coded so that the end condition is never true. The code should just provoke a warning and then omit the sign extend (and it almost certainly doesn't make much of a difference since sign extend is highly optimized and has zero cost in a pipelined processor).

tsukikage4y ago

We make a distinction between undefined and implementation defined behaviour for a reason. Saying that certain runtime behaviours result in malformed programs while being impractical to generate explicit checks for is not entirely crazy.

That said, I believe the set of undefined behaviours in our current standards is much, much too large - most of these should rightly be filed in the implementation-defined category instead. It is no longer the 70s and the very same modern compilers that perform more and more extreme optimisations year on year really do not need to account for a giant zoo of quirky experimental architectures; over the decades we've basically settled on a consensus on how pointers, integers, floating-point numbers etc ought to work.

mwcampbell4y ago

Priorities change though; in particular, security is a rising priority. So it's not inevitable that funded compiler development must focus on wringing every last optimization out of software that might perform well enough if it were simpler. Organizations like NLnet and ISRG, for example, could fund work on getting to a usable Unix-like system that's entirely compiled with one of the simpler C compilers that don't exploit undefined behavior so much. They could justify it with the argument that security is best achieved through simplicity at all levels of the stack, including a simple build toolchain.

po1nt4y ago· 10 in thread

I think it's about time to abandon C language entirely.

MobiusHorizons4y ago

I'm curious what you would replace it with? I can't think of anything actually suitable for most of the low-level operating systems / embedded level things that use C.

I know people recommend rust for this kind of thing, but Rust really isn't appropriate in a lot of cases, especially when dealing with microcontrollers not supported by llvm (ie PIC, 8051 off the top of my head).

This may be changing, but I was also under the impression that Rust can't easily produce as small binaries as C can.

kjs34y ago

As an alternative in your use case, there was a pretty decent Modula-2 compiler for the 8051 (Mod51). It's a small, safe language and a reasonable fit for embedded architectures. It is a Wirth-ian BEGIN...END language which for some people means it has cooties, but I wrote a some useful stuff in it a long time ago and it was pretty painless, as such things go. There's even some standardization around Modula-2 in the embedded world (IEC 61311-3 ST). Unfortunately, I don't think Mod51 is sold any more; C won in that world.

yjftsjthsd-h4y ago

I'd nominate Pascal (which has been used for OS dev before) and Ada. Not 100% sure how broad their support is for different hardware, but it also seems like adding new platforms to an existing toolchain is less work than inventing a new language while still benefiting from being not-C.

I'm partial to Forth, but I think "get rid of C" is a bit of an extreme opinion...

gHosts4y ago

D

kjs34y ago

As long as you write compilers for all of the platforms which currently only have C language production quality compilers. That includes porting the whole development ecosystem for those platforms (like libraries) to whatever NextNewShiny language you deem worthy.

We'll wait...

Asooka4y ago

Sometimes I do wonder if all these UB optimisations aren't pushed by people aiming to make C and C++ unusable, so that people will be forced to move to other languages.

bsder4y ago

That's a touch unfair.

The problem is that C is sufficiently primitive that optimizing it is effectively trying infer structure backwards because the language doesn't specify it.

For-loops are a great example. Why are we even discussing about a "loop index"? Because we need to iterate over an array, string, etc. and "length" isn't something stored with the data structure so the compiler can't do the loop for you.

The real question is probably more along the lines of "Should the C standard start pinning things down more than it does?"

The answer is probably yes, but it's not straightforward to do. Look at just how much verbiage it took to create a decent memory model. And, still, using atomics in C (as opposed to C++) is brutal and the compiler can't help you very much.

rightbyte4y ago

It has come to my mind too. It is user hostile none the less, blaming the user for writing incorrect but intuitive code. The Stackoverflow mindset?

My favourite is how "x + 1 < x" is optimized away for signed ...

jjtheblunt4y ago

UB = Undefined Behavior ?

bombcar4y ago· 9 in thread

The fact that a compiler is allowed/encouraged to silently remove whole sections of code because of some obscure factoid is an amazing source of footguns.

At least the warnings are getting a bit better for some of these.

not2b4y ago

Without these optimizations, you can't write fast scientific code in C. This was realized back in the early 1980s and it's why those rules were added.

In Fortran the aliasing rules are even stricter: given two arrays passed in as arguments the compiler can assume that they do not overlap, for example. I remember messing that up as a student long ago and getting strange results. The Fortran rule was to enable vectorization, which has been done for many decades.

vyodaikenOP4y ago

I doubt that very much, and one of the points of the article is that there is a shortage of data to back up claims like yours.

In any case, for people who want to write OS or cryptography or embedded systems or arithmetic libraries or ... in C, this is not a relevant point.

Asooka4y ago

Cool, make a sci-C with all the optimisations allowed that is trivially interoperable with simple-C. Mangling the latter into the former serves no-one.

How much slower would removing the UB footgun make scientific code? Do you have benchmarks? Because we have endless firsthand accounts that it makes standard C useless.

I don't get how the fact that the compiler can remove or modify the code was thought to be a good idea. I get removing unused functions, but not conditions and changing the flow of the code. If there is unreachable code, best to issue a warning and let the programmer fix it. The compiler should optimize without changing the semantic of the code, even if it contains undefined/unspecified behavior.

To this it's impossible to write C without using a ton of non standard attributes and compiler options to just make it do the correct thing.

> The compiler should optimize without changing the semantic of the code, even if it contains undefined/unspecified behavior.

That is what it does. "Undefined behavior" is a lack of semantics, so it is preserving semantics when it leaves those paths out. You can make a "defined C" with `-fsanitize-trap=undefined`, but C was never a high level assembler, and performance is critical for C users too.

ncmncm4y ago

Very often, a function inlined can be determined, at the place where it is expanded, to have substantial sections of dead code, based on knowledge of the values of arguments passed to it in that place. Expanded in a different place, different parts are dead. Warnings about the dead parts would lead you to turning off those warnings, so they are never turned on in the first place.

This gets more complicated when an inlined function calls another inlined function. The whole inner function may be in the middle of dead code, and thus everything it calls, too, and so all be elided. This sort of thing happens all the time. These elisions don't typically make code obviously much faster, cycle-wise, but they do make it smaller, with a smaller cache footprint. Cache footprint has a huge effect on overall performance.

In principle, the compiler could arrange to put the dead code in some other page, with a conditional jump there that never happens, consuming just another branch prediction slot. But branch prediction slots are cache, and the branch, though never taken, nonetheless consumes a slot. A processor ISA could, in principle, have a conditional branch instruction that is assumed by branch prediction always to go one way, and so not need to consume any branch prediction cache. But I don't know of any. RISC-V does not seem to be entertaining plans to specify such a branch.

Several extant chips do allow "hint" prefixes for branches, but I think they are ignored in current chips, according to experience where they were determined generally to be wrong. This is unfortunate, as sometimes the most frequently taken direction is not the one you want to be fastest. E.g., when spinning while watching an atomic flag, you want the looping branch to be assumed not taken, to minimize latency once the flag is clear, even though it most frequently is taken in recent history. (High-frequency trading code often has to resort to trickery to get the desired behavior.)

(There is a famous story about engineers at Digital Equipment Corporation, whence we got PDP-11s and Vaxen, and thus, indirectly, Unix. A comprehensive cycle count showed that a huge proportion of the instructions executed in their OS kernel were just one instruction. They moved heaven and earth to optimize this one instruction, but the kernel with the new instruction was exactly 0% faster. It turned out the instruction was used only in the idle loop that ran while waiting until something useful could be done. This really happened, and the same mistake is made again and again, to this day: e.g., people will sincerely swear that "rotl" and "popcnt" instructions are not important, on about the same basis.)

Intel recently implemented an atomic test-and-loop instruction that just stalls execution until an identified cache line is touched, thus avoiding the branch prediction stall getting out of the loop. I have not heard of anybody using it.

Asooka4y ago

Absolutely, those optimisations should be opt-in, otherwise it's impossible to reason about the correctness of your code. At work we had to replace some arithmetic by inline assembly, as there was literally no other way of making the compiler generate the correct expression.

cocoafleck4y ago

I'm very curious to learn more about this as I thought it was easily possible to opt out of the overly aggressive defaults of GCC, Clang, and the like.

Could you give an example of such input?

pdw4y ago· 3 in thread

How much of this is driven by modern C++ style? I always assumed optimizers needed to become much more aggressive because template-heavy code results in convoluted IR with tons of unreachable code. And UB-based reasoning is the most effective tool to prove unreachability.

not2b4y ago

None of it: we're talking about C here, and every point in the article applies to the C standard as it existed when Linus posted his message to comp.os.minux so long ago.

aw16211074y ago

> template-heavy code results in convoluted IR with tons of unreachable code

Does it? Honest question; my impression was that template-heavy code can tend to produce deep call trees, but not necessarily outright unreachable code unless you count instantiations ruled out by SFINAE/std::enable_if/tag dispatching, for which UB-based analyses are not necessary.

In addition, I thought template (meta)programming relied very heavily on compile-time knowledge, which seems to obviate the need for UB-based analyses in many cases.

I'm not particularly experienced, though, so maybe there's a gaping hole I'm missing.

ncmncm4y ago

It is quite common for any heavily inlined code to have lots of dead sections. Template code is very frequently heavily inlined. So, it is common for compilers to prune out the dead branches.

But C code is often heavily inlined, too, and similarly pruned.

alerighi4y ago· 3 in thread

To me the stupid thing is the abuse of undefined behavior for changing the semantic of the code. The fact that a behavior is not defined in the standard doesn't mean that on a particular hardware platform it doesn't have a particular meaning (and most C programs doesn't need to be portable, since C it's mainly used for embedded these days and thus you are targeting a particular microcontroller/SOC).

These optimizations leave for C++ folks. C doesn't need all of that, just leave it as the "high level assembler" that it was in the old days, where if I write an instruction I can picture the assembler output in my mind.

Optimizers should not change the code semantics to me. Unfortunately with gcc it's impossible to rely on optimizations, so the only safe option is to turn them off entirely (-O0).

Approximately the entire software stack on any PC is shipped with gcc/clang/MSVC with optimizations turned on, and yet the Earth keeps turning.

gHosts4y ago

So define the behaviour and get it pushed through the standards committee.

Or even define the behavior and get your compiler writer to implement it.

ps: If I index past the end of the array... what behaviour are you going to define?

> These optimizations leave for C++ folks. C doesn't need all of that

Yes, it does. In fact C needs it more, because of the "for (int i = 0; i < n; i++)" idiom. At least idiomatic C++ code uses iterators.

astrange4y ago· 2 in thread

This is a typical whining about UB article, but removing it won't get what you want, in particular your program still won't behave correctly across architectures. Overflow on shift left may be undefined, but how do you want to define it? If you want a "high level assembler", well, the underlying instructions behave differently on ARM, x86 scalar, and x86 SIMD.

The reason they claim program optimizations aren't important is because you can do it by hand for a specific architecture pretty easily, but you'll still want them when porting to a new one, eg if it wants loop counters to go in the opposite direction.

vyodaikenOP4y ago

Nobody claims optimizations are not important.

Asooka4y ago

Yes, your program will have different semantics on different architectures. This is already the case with e.g. big vs little endian. But you will be able to reason about the program's semantics rather than going "I sure hope there's no UB in here" and throwing your hands up.

tialaramex4y ago· 1 in thread

A bad workman blames his tools, so they say.

There is a large population of C "real programmers" who, when they write a C program that unsurprisingly doesn't work, conclude this must be somebody else's fault. After all, as a real programmer they certainly meant for their program to work, and so the fact it doesn't can't very well be their fault.

Such programmers tend to favour very terse styles, because if you don't write much then it can't be said to be your fault when, invariably, it doesn't do what you intended. It must instead be somebody else's fault for misunderstanding. The compiler is wrong, the thing you wanted was the obviously and indeed only correct interpretation and the compiler is willfully misinterpreting your program as written.

Such programmers of course don't want an error diagnostic. Their program doesn't have an error, it's correct. The compilers are wrong. Likewise newer better languages are unsuitable because the terse, meaningless programs won't compile in such languages. New languages often demand specificity, the real programmer is obliged to spell out what they meant, which introduces the possibility that they're capable of mistakes because what they meant was plain wrong.

boomlinde4y ago

What an unimaginative, dull-witted way to skirt around the concrete criticisms made in TFA.

gjm114y ago· 1 in thread

The actual title of the paper is "How ISO C became unusable for operating systems development". Is there a particular reason why the first and last words have been removed here?

Hemospectrum4y ago

HN removes words like "how" and "why" automatically. I think it's intended as a clickbait filter. Sometimes the mods change it back.

The submitter may have left out "development" to fit the title in the character limit.

davidgerard4y ago· 1 in thread

direct PDF link: https://www.yodaiken.com/wp-content/uploads/2021/10/yodaiken...

wffurr4y ago

Thanks. That embedded viewer is unusable on mobile.

j / k navigate · click thread line to collapse