Catch-23: The New C Standard Sets the World on Fire (opens in new tab)

(queue.acm.org)

245 pointsdonmcc3y ago264 comments

264 comments

120 comments · 26 top-level

Dylan168073y ago· 15 in thread

> C23 furthermore gives the compiler license to use an unreachable annotation on one code path to justify removing, without notice or warning, an entirely different code path that is not marked unreachable: see the discussion of puts() in Example 1 on page 316 of N3054.9

I don't agree with that description at all. Here's the code:

  1 if (argc <= 2)
  2   unreachable();
  3 else
  4   return printf("%s: we see %s", argv[0], argv[1]);
  5 return puts("this should never be reached");

The only code path that's "entirely different" is lines 1,4,5 and in that case of course you remove a return that's after a return.

And the other valid code path is 1,2,5, which has `puts` after `unreachable`.

To need `puts` you have to imagine a code path that gets past the "if" without taking either branch?

Maybe the author means something by "code path" that's very different from how I interpret it?

I would be pretty surprised if the above code means something different from:

  if (argc <= 2) {
    unreachable();
    return puts("this should never be reached");
  } else {
    return printf("%s: we see %s", argv[0], argv[1]);
    return puts("this should never be reached");
  }

ternaryoperator3y ago

This reminds me of a point made by the late Stan Kelly-Bootle, who for years wrote the Devil's Advocate column in UNIX Review magazine. In the early 1990s, he was discussing Microsoft's new C compiler and noted that in the promo material for the new compiler, it showed a benchmark for a loop that counted from 1 to 10,000 then printed "Hello". MS claimed that without optimization it took a few milliseconds, after optimization: 0 ms. A small asterisk explained the optimizer simply removed the loop. Kelly-Bootle pointed out, that the only reason a developer would write such a loop was to introduce a needed delay. Therefore, deleting the loop was not optimizing, but in fact pessimizing. And so, it was in fact Microsoft's Pessimizing C compiler.

codeflo3y ago

Of course, that's technically incorrect. The way the standards are written, the compiler is free to replace the program with any other program that has the same (in a precisely defined sense) observable behavior (these are the famous "as if" formulations in language specs). Heating up the CPU is not considered observable behavior.

If someone really just wants a delay, it's easy to either (for programs running on normal OSs) call a sleep function, or (on tiny embedded systems) add an empty inline assembler statement that the compiler can't see through.

1 more reply

hyperhopper3y ago

This is not true at all:

I've been many loops that turn into no-ops because all the functionality has been refactored out but this fact is hidden in function calls.

Sure, this should ideally be surfaced as a lint error, not a compiler optimization, but you cannot say that intentional delays are the "only" reason.

Also since processing time is variable, using that as a method should be extremely heavily discouraged/warned/require-opt-in

viraptor3y ago

Those delay loops are common on microcontrollers and the usual solution is to either make the counter volatile or insert something opaque to the compiler in the loop body.

It would be of course nice if a warning was produced for that specific case: This whole loop was removed - is it really what you wanted, or is it a broken delay loop?

kzrdude3y ago

I think it's a practical example of how the C language has made a journey to being more high abstraction than it used to be, in practice. And how that unsettles those used to the old behaviour.

wahern3y ago

I think the point is that if the `argc <= 2` path is unreachable, then that means argc is always greater than 2, permitting the compiler to optimize the entire block to just:

  return printf("%s: we see %s", argv[0], argv[1]);

IOW, the conditional has been elided. But you're right in that the wording of the complaint doesn't match the example. The author presumably had in mind some of the more infamous NULL pointer-related optimizations, without spending the time to put together a properly analogous example.

dtolnay3y ago

I interpreted the author's characterization to be about something like:

  1  if (argc <= 2)
  2    puts("A");
  3  puts("B");
  4  if (argc <= 2)
  5    unreachable();
  6  else
  7    return puts("C");
  8  return puts("D");

in which not just lines 4-6,8 go away (as you said) but also lines 1-2.

It makes sense to me but I can see why the author would characterize this situation as "license to use an unreachable annotation on one code path to justify removing an entirely different code path that is not marked unreachable". In a different world one might expect A to be printed "before the UB happens".

2 more replies

alwaysbeconsing3y ago

One way to look at it (and I am not sure if this is correct, but it may be what the essay author meant) is to not treat the `unreachable` as affecting the presence of the decision, but only the result of the decision. If `unreachable` was replaced by a normal statement, we'd have:

    if (argc <= 2)
        do_something();
    else
        return printf("%s: we see %s", argv[0], argv[1]);

So the `return printf` is executed when `argc` is greater than 2. If we remove just the body of the first branch:

    if (argc <= 2)
        ;
    else
        return printf("%s: we see %s", argv[0], argv[1]);

the same thing holds. And additionally when `argc <= 2`, control will move past the `if`.

Under this view, if the `unreachable` won't cause the entire removal of the `if`, the compiler will produce the equivalent of:

    if (argc > 2)
        return printf("%s: we see %s", argv[0], argv[1]);

    return puts("this should never be reached")

Again, I don't say this is the correct interpretation, but it is one possibility, that would have to be ruled out by other parts of the standard.

Dylan168073y ago

I understand that interpretation, but that's what the end of my comment is about. If we treat unreachable as affecting the block it's in, but pretend it's not there for control flow, then the two versions of the code do different things. That's confusing and hard to preserve.

Asooka3y ago

This just shows that "unreachable" is almost impossible to use safely. The only safe use of unreachable is if it is immediately after an instruction that makes the program stop running. It is not for "this cannot happen", because things that "cannot happen" happen all the time. If you use "unreachable", you're just asking for trouble and it seems the compiler authors are happy to oblige.

josephcsible3y ago

This couldn't be more wrong. What you say to never use unreachable for is one of the most important use cases of unreachable. The whole point is to give the optimizer an assumption that it can't figure out on its own.

lldb3y ago

One example of it being useful is unchecked std::variant access in c++ - there isn’t any api to access it like a union (if you already know the type) but you can mark the wrong type path unreachable to the same effect.

cryptonector3y ago

There's no problem with this feature. I don't understand TFA's problem with it. As a programmer I get to not use `unreachable()` if I don't want to, and if I do I'm happy that the compiler takes my word for it and does the right thing. This is not at all like code elision in UB cases.

The `realloc()` change though...

_8j503y ago

Shouldn't the compiler warn or error on unreachable code?

codeflo3y ago

This is not about code that's found to be unreachable through static analysis (where compilers might warn), but about a manual programmer annotation that claims the code is dynamically unreachable even though statically it might look otherwise.

1 more reply

i-use-nixos-btw3y ago· 12 in thread

This is written with quite a lot of hyperbole.

The predominant focus is realloc(pre,0) becoming UB instead of what the author misleadingly describes as useful, consistent behaviour. It is far from that, and that’s the entire reason that it was declared UB in the first place: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2464.pdf. Note that this wasn’t a proposal to change something, it’s a defect report: the original wording was never suitable.

The second part is the misconception about the impact of UB. Making something UB does not dictate that its usage will initiate the rise of zombie velociraptors. It grants the implementation the power to decide the best course of action. That is, after all, what they’ve been doing all this time anyway.

Note that this deviates from implementation-defined behaviour, because an implementation-defined behaviour has to be consistent. Where implementations choose to let realloc(ptr,0) summon the zombie raptors, they are free to do so. Don’t like it? Don’t target their implementation. Again, this isn’t a change from the POV of implementers - it’s a defect in the existing wording.

In this case, the course of action that any implementation will choose is to stick with the status quo. It is clearly not a deciding factor in whether or not you embrace the new standard, and to suggest otherwise is dishonest, sensationalist nonsense. The feature was broken, and it’s just being named as such.

Arch-TK3y ago

I agree that realloc was poorly defined for the 0 size case, I think UB or IDB both would have worked in this case to really drive that point home, the WG chose UB.

That being said, you're completely wrong about what UB means. Making use of UB may as well initiate the rise of zombie velociraptors. Except for the situation where your implementation explicitly specifies that it provides a predictable behaviour for a specific case of UB, there's literally no guarantee of what will happen. Assuming that the implementation will stick with some status quo and your code won't exhibit absolutely unusual behaviour is just naiive.

Please don't mislead people into thinking that it's ever a good idea to assume that undefined behaviour will be handled sensibly, this kind of mislead assumption is one of the major sources of bugs in C code.

coliveira3y ago

> this kind of mislead assumption is one of the major sources of bugs in C code.

This is not even close to be true. Most bugs in C code are from programmer mistakes, not from UB behavior. The exaggeration that is spread by some people regarding UB is close to absurd. If something is UB, it may generate different results in different situations, even with the same compiler. The standard is just clarifying this problem. A good compiler will do something sensible, or at least issue a warning when this situation is detected. If you have a bad compiler that does strange things with your code, it's not a defect of UB but the compiler instead.

3 more replies

cryptonector3y ago

Right, this should have been left to the implementor if they didn't want to standardize one behavior. Making it UB is the worst possible outcome. Yes, people who write portable code will still want to not rely on `realloc()`'s freeing behavior, but if you do and your realloc() implementation doesn't, then you suffer a leak, while if you do and realloc() decides to wipe your drive and make your power supply explode...

astrange3y ago

> Except for the situation where your implementation explicitly specifies that it provides a predictable behaviour for a specific case of UB, there's literally no guarantee of what will happen.

That situation is "when you have UBSan turned on".

1 more reply

c4mpute3y ago

> The second part is the misconception about the impact of UB. [...] It grants the implementation the power to decide the best course of action. That is, after all, what they’ve been doing all this time anyway.

Wrong, Wrong, Wrong.

UB allows the implementation to take any arbitrary course of action, without informing anyone, without documentation, without any conscious decision, without weighing anything to be better/worse. Nondeterministically catching fire and launching nuclear rockets is a completely compliant reaction to UB.

What you are describing is "implementation defined" behavior. That has to be deterministic, documented, and conforming to some definition of sanity. Examples are the binary representation of NULL, sizes of integer types or stuff like the maximum filename length. Sadly, too many things in C have "undefined behavior", too few have "implementation defined" behavior.

And UB has always been an excuse for compilers to screw over programmers in hideous ways. Programmers are rightfully afraid of any kind of new UB being introduced, because it will mean that whole new classes of bugs will arise because the compiler optimized out that realloc(..., a) where a might be 0, because thats UB, so screw you and your code... And this change is especially dangerous because it makes a lot of existing code UB.

chongli3y ago

And UB has always been an excuse for compilers to screw over programmers in hideous ways

Your reply was great up until this. Compiler writers aren’t looking to screw over programmers, they’re looking to make code faster. UB gives them the ability to make assumptions about what is and is not true, at a particular moment in time, in order to skip doing unnecessary work at runtime.

By assuming that code is always on the happy path, you can cut a lot of corners and skip checks that would otherwise greatly slow down the code. Furthermore, these benefits can cascade into more and more optimizations. Sometimes you can have these large, complicated functions and call graphs get optimized down to a handful of inlined instructions. Sometimes the speedup can be so dramatic that the entire application is unusable without it!

Many of these optimizations would be impossible if compilers were forced to assume the opposite: that UB will occur whenever possible.

The tool programmers have available to them is compiler flags. You can use flags to turn off these assumptions, at the cost of losing out on optimizations, if your code needs it and you’re unable to fix it. But it’s better to turn on all possible warnings and treat warnings as errors, rather than ignoring them, to push yourself to fix the code.

1 more reply

AlotOfReading3y ago

I wish UB were only as nasty as "nondeterministic behavior". In fact, if there's UB in anything the compiler sees, nothing at all can be assumed, including whether you even get an output. What you've given the compiler isn't C, so it doesn't have any obligations to do anything with it. The codepath with UB doesn't have to run for the nuclear rockets to launch and the nasal demons to appear.

Since approximately every nontrivial program ever written has UB, in actual practice we're only saved by the fact that compilers aren't entirely maliciously compliant.

2 more replies

jcranmer3y ago

The case of realloc being declared UB (as opposed to impl-defined) was not driven by the compiler writers but by the people who write the C libraries.

This isn't a case of compilers screwing over the programmers, because the people who are responsible for those optimizations are the people who are scratching their heads as to why it's UB and not impl-defined behavior.

GuB-423y ago

UB can initiate the rise of zombie velociraptors.

  int n;
  printf("type 0 to stop the rise of zombie velociraptors");
  scanf("%d", &n);
  realloc(pre, n);
  if (n != 0) rise_zombie_velociraptors()

May result in velociraptors raising even if the user enters "0".

The reason is that because realloc(pre, 0) is UB, for the compiler, it cannot happen, so n can't be 0, so the n != 0 test can be optimized out, so, velociraptors.

Asooka3y ago

> The second part is the misconception about the impact of UB. Making something UB does not dictate that its usage will initiate the rise of zombie velociraptors. It grants the implementation the power to decide the best course of action. That is, after all, what they’ve been doing all this time anyway.

Wrong. UB never happens. That is the promise the program writer makes to the compiler. UB never happens. A correct C program never executes UB. This allows the compiler to assume that anything that is UB never happens. Does some branch of your program unconditionally execute realloc(..., 0) after constant propagation? That branch never happens and can just be deleted.

Reading the defect report, they state "Classifying a call to realloc with a size of 0 as undefined behavior would allow POSIX to define the otherwise undefined behavior however they please." which is wrong. UB cannot be defined, if you define it, you are no longer writing standard C. It should instead have been classified as "implementation-defined behaviour".

In any case it's not that hard to just write a sane wrapper. This one is placed in the Public Domain:

    void *sane_realloc(void *ptr, size_t sz)
    {
        if (sz == 0) {
            free(ptr); /*free(NULL) is no-op*/
            return NULL;
        }
        if (ptr == NULL) {
            return malloc(sz);
        }
        return realloc(ptr, sz);
    }

I am calling it sane and not safe, because it is not safe. You still have the confusion of what happens when the function returns NULL (was it allocation failure or did we free the object?) - check errno. However, it has the same fully defined semantics on most all implementations and acts like people would expect.

You may be tempted to make the function return the value of errno, mark it [[nodiscard]] and take a pointer-to-pointer-to-void, so that the value of the pointer will only be changed if the reallocation was successful. I am not sure if that is safer. You are trading one possible bug - null pointer on allocation failure, which then will cause a segmentation fault for another - stale pointer on allocation failure, but with updated size. The latter is more likely to be used in buffer overflow attacks than the former.

omoikane3y ago

> This is written with quite a lot of hyperbole

The first sight of "catch fire" might not have caught my attention, but by the time it got to "instrument of arson" and "Molotov cocktails", the style was sufficiently distracting that I was convinced I wasn't the intended audience.

benj1113y ago

My understanding was that they're changing realloc() because they previously allowed zero length arrays and because you can't tell if this is a zero length array you need to either get rid of zero length arrays or change realloc().

So the feature wasn't broken to begin with, it was broken by another feature.

ChancyChance3y ago· 9 in thread

Is the world finally realizing that "a + b" actually returns two values: pass/fail and the value if pass?

"a + b = c;" is a fundamentally flawed operation from a computer architecture perspective.

c4mpute3y ago

First, you might have meant c = a+b;

The other way isn't really definable as an assignment mathematically.

And there is a lot more to it than just pass/fail. First, an addition doesn't fail, from a computer architecture perspective, the addition will always succeed, the only thing that could fail (in all the usual architectures) are possible memory fetch and store operations when not strictly dealing in register or immediate operands. Second, there is no fail flag. There is a overflow flag, an underflow flag, a zero flag, a sign and a few more that are irrelevant here. Any of overflow, underflow, zero or sign might mean that the operation "failed" depending on the types of your operand. Where the processor doesn't know anything about the type, so there won't be a straightforward 'fail' flag in any case. Only the library or compiler can use type information such as (un)signedness, bignum-ness, nonzeroness, desired wraparound (for modular types) and other possible types together with aforementioned flags to decide if that addition might have failed.

So nothing is fundamentally flawed, what you are describing is just insufficiently complex (because there is no fail flag, just a ton of other flags) or overly complex (because uint32_t c = a + b is modular 2^32 arithmetics and cannot fail).

khazhoux3y ago

> First, you might have meant c = a+b;

> The other way isn't really definable as an assignment mathematically.

This correction is condescending and unnecessary. Unless the person had never written a single line of code in their life, then they would obviously know "a+b" is not a modifiable lvalue.

And the point about pass/fail was also obviously not mean to capture the full complexity of the flags set by a CPU operation. It was very clearly a statement about how basic addition does not behave in computers the way it does on paper -- as simple as that.

From HN guidelines: "Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize."

1 more reply

JonChesterfield3y ago

> The other way isn't really definable as an assignment mathematically.

It's an equality sign. See also, := and unification.

Arch-TK3y ago

There is actually another option.

A more sophisticated type system.

Let's say you had some pseudocode like this:

    let a = 5
    let b = 12
    let c = a + b

The type of a would be Integer[5..5], the type of b would be Integer[12..12], the type of c would therefore be Integer[17..17]. In a more complex example:

    def foo(a: Integer[0..10], b: Integer[0..10]):
        return a + b

The return type of this function would be Integer[0..20].

This kind of type system can solve a number of issues, all but division by zero (which would probably still have to be solved with some kind of optional type).

If type inference dictates that the upper range of an integer would be too large to physically store in a machine data type, then you either resort to bignums or you make it a compilation error. By adding modular and saturating integer types you can handle situations where you want special integer behaviours. By explicitly casting (with the operation returning an optional) you can handle situations where you want to bound the range. This drastically simplifies a lot of code by removing explicit bounds checks in all places except where they are absolutely necessary. If for some reason you care about the space or computational efficiency of the underlying machine type, you can have additional annotations (like C's u?int_(least|fast)[0-9]+_t). If you absolutely must map to a machine type (this is usually misguided, unless you are dealing with existing C interfaces, for which such a language can provide special types) you can have more annotations.

Ada has something resembling this. I believe there are some other languages that implement similar features. I believe this sort of thing has a name, but I am not great with remembering the names of things.

Hopefully this is some food for thought.

im3w1l3y ago

I think the issue with this is that the worst-case bounds normally grow much faster than the actual values. And it can be easy to see for the programmer that the values can't actually grow that much because a is only big when b is small or some property like that, but then you have to convince the compiler of the same. I might be misremembering though.

2 more replies

still_grokking3y ago

> I believe this sort of thing has a name, […]

https://en.wikipedia.org/wiki/Refinement_type

But the concept is just a little bit over 30 years old. So don't expect it shows up in most mainstream languages before the end of the next 20 years, and don't expect it to come to the C languages ever.

Meanwhile in mainstream ML-land:

https://github.com/Iltotore/iron

(Or for the older version of the language: https://github.com/fthomas/refined)

(Please also note that for this feature both versions don't need language support at all but are "just" libraries, as the language is powerful enough to express all kinds of type level / compile time computations in general.)

2 more replies

JonChesterfield3y ago

Compilers do this sort of range tracking anyway. At least within a function. It's useful for loop optimisations.

notfed3y ago

It's a flaw that has a pretty good tradeoff: unparalleled readability.

ChancyChance3y ago

It depends. If you want to study maths, yes. If you want to be a programmer:

[status, value] = add(a, b);

Is much more unparalleled-ly (?) readable from the perspective of how a computer actually operates. In reality, this:

uint c = (uint)a + (uint)b; // (to make that other guy happy)

is really:

c = (a + b) % (sizeof(uint));

in "C", which is less readable but far more accurate.

1 more reply

GuB-423y ago· 6 in thread

I actually like unreachable() a lot. What it does is that it invokes undefined behavior, that's all.

It does nothing trickier than any other kind of UB. In fact, I could implement unreachable() like this: void unreachable() { (char *)0 = 1; }.

Standardizing it however gives interesting options for compilers and tool writers. The best use I can find is to bound the values of the argument of a function. For example, if we have "void foo(int a) { if (a <= 0) unreachable(); }, it tells the compiler that a will always be >0 and it will optimize accordingly, but it can also be used in debug builds to trigger a crash, and static analyzers can use that to issue warnings if, for example, we call foo(0). The advantage of using unreachable() instead of any other UB is that the intention is clear.

lionkor3y ago

Respectfully, you would already be doing this in any C codebase, with `assert()`, right? We are all checking our preconditions with assert... right?

GuB-423y ago

AFAIK, assert() is not undefined behavior, so it can't be used for optimization. It is either implementation-defined in debug mode, or does nothing in release mode.

For example:

  assert(a >= 0);
  if (a < 0) printf("a is negative");

In release mode, assert() will be gone, so the if/printf() will stay. If we used "if (a < 0) unreachable();" instead of assert(), it would optimize away both lines.

pornel3y ago

NDEBUG makes these checks disappear, so that's not an option for checks that are supposed to stay in the program.

lprib3y ago

Using `unreachable()` instead of `assert()` for your preconditions without profiling first is just pre-loading the gun to shoot yourself in the foot in the future. When those preconditions are inevitably violated at some point, you will get random UB corruption rather than simply aborting as is the case for assert.

GuB-423y ago

Yep, undefined behavior is unsafe, C in general is unsafe. There are plenty of languages that are safe, though a little bit rusty like ADA, use one of these if you want safe code.

If you still want to use C, for example for compatibility reasons and want to make it safer, assert isn't going away (unless you set NDEBUG). Preconditions are not "inevitably violated", there are ways of making sure they aren't, and I think an explicit "unreachable()" can help tools that are designed for that purpose.

Should you profile first before using unreachable() for optimization purposes? Maybe, but the important part is that now, you have a way of clearly and effectively tell the compiler what you know will never happen so that it can optimize accordingly, whether it is before or after profiling.

Compilers usually do a great job at optimization, but there are often some edge cases the compiler have to take into account in order to generate code that complies with the C standard, and it can have an impact on performance. unreachable() is one way to tell the compiler "please forget about the edge case, I know it won't happen anyways", the best part is that it is explicit, no obscure tricks here.

Side note about profilers: no matter what your strategy is with regards to optimization, I think profilers are essential tools that don't get enough attention. People talk a lot about linting, coverage and unit tests, but profilers are not to be left out. They are not just tools that tell you where not to optimize your code, they can also find bugs, mostly performance bugs, but not only.

ptx3y ago

> What it does is that it invokes undefined behavior, that's all. [...] it can also be used in debug builds to trigger a crash

How can it be used to trigger a crash (a specific behavior) if the behavior it invokes is undefined? Are you saying it would be defined differently for debug builds so that it doesn't invoke undefined behavior?

1 more reply

firstlink3y ago· 6 in thread

> and that such changes may impose themselves on old code without recompilation when dynamically linked libraries are upgraded.

All I can do is laugh. This is what the dynamic linker fanatics wanted. This is what they explicitly advocate for to this day. Share and enjoy!!

AshamedCaptain3y ago

I really don't think anyone could possibly want the _specified behavior_ of a function changing below their feet.

However, the author is unlikely to be correct here. E.g., to this day, glibc contains _multiple implementations of memcpy_ just to satisfy those executables that depend on the older, memmove-like behavior that was once part of the unspecified behavior of glibc. The only way to get the dynamic linker to choose one of the newer versions is to, well, rebuild the executable. It is inconceivable that glibc would not use symbol versioning with an actual specification change.

The behavior is practically the same as with static linking, and you still get the benefits of dynamic linking.

throwaway8922383y ago

People who don't understand dynamic linking are doomed to re-implement it, poorly.

coliveira3y ago

Exactly! Shared libraries mean that new code with modified behavior can and will be called when made available, independent of how the original code was compiled. It is interesting that people come out to complain about this obvious behavior.

hermitdev3y ago

The problem isn't changing implementation. This is expected with shared libs. The problem is changing the contract of the function and then expecting it to be drop in compatible. It's not. It _should_ be treated as a breaking ABI change, because the old behavior and new behavior are not compatible, yet it's being masqueraded as such. It's quite literally the same behavior/attitude behind the "w" vs "wt" change that led to aCropolyse.

tedunangst3y ago

It's a really weird complaint. The standard specifies that it's now undefined behavior. That imposes zero requirements to change the library. Whatever it is the library was doing, it's one possible undefined behavior.

bayindirh3y ago

I’d rather have small binaries and memory efficient systems instead of huge blobs having their own complete disconnected environments with non-coherent behavior on the same situation. Also, wasting tons of memory while at it.

If I have something that critical, I can always statically compile.

RustyRussell3y ago· 6 in thread

Frankly, the C standards ctte went off the deep end when they effectively banned NULL to memset etc (obv with zero length).

Not because these functions couldn't handle it, but because this assertion simplifies optimizations elsewhere.

This has required adding extra checks in my code, found mainly by trial and error, and has made it less readable and less optimal.

Finally, the checked arithmetic operations returning false on success is a horror show. Fortunately it will be found on the first time the code is run, but that's a damnably low bar :(

ericpauley3y ago

> Finally, the checked arithmetic operations returning false on success is a horror show.

This seems in line with C conventions? Generally a 0 return code means success.

wruza3y ago

With int statuses, not with bools. It’s just a twisted logic in return value you have to deal with in your head.

“If checked operation has a status, then it failed.” - ok

“If checked operation [is true], then it failed.” - wat

2 more replies

Kamq3y ago

> Finally, the checked arithmetic operations returning false on success

That's what got you? C functions returning error flags (with zero meaning no error) isn't exactly new.

Dwedit3y ago

Replace memset with a macro, that's the C way.

notfed3y ago

Isn't the return value just a carry bit?

spc4763y ago

Not every CPU C runs on has a carry bit. MIPS, SPARC, RISC-V, all don't have the concept of a "carry bit."

tzs3y ago· 6 in thread

> Pointers to free'd memory are akin to uninitialized pointers, so free(p) followed by if (p==q) is an instrument of arson

What's the reason for this?

jcranmer3y ago

Given the following code:

    void *p = malloc(N);
    do_random_stuff(p);
    void *q = malloc(N);

With this rule, the compiler can conclude that p and q cannot alias, even if it doesn't have body of do_random_stuff. Without it, it would first have to prove that p is never freed before calling q, which is basically impossible (moving the body of intervening code into a different file, for example, would do the trick).

tedunangst3y ago

900 years ago there was a CPU which stored pointers in special registers and trapped if you loaded a pointer with an invalid segment. And so loading the pointer into a register to compare it would crash.

coliveira3y ago

Using a freed pointer is incorrect behavior, a bug in shorter terms. If you do anything with a freed pointer (other than assigning new memory), you're inviting all kinds of bugs (independent of what the compiler might be doing with your code).

xigoi3y ago

Obviously dereferencing a freed pointer is incorrect behavior, but what harm is there in using its numerical value?

Dylan168073y ago

I can't tell you exactly why but it's consistent with just about everything else involving p being undefined, and the result of the comparison would be useless anyway.

tzs3y ago

Why would the comparison be useless?

I can imagine situations where a pointer q might sometimes be a copy of pointer p and sometimes might point to something else, and the code wants to free q if and only if it is not a copy of p (because p has been free'd earlier).

1 more reply

MatmaRex3y ago· 6 in thread

> As C89 was taking shape, the neurodivergent notion of a "zero-length object" was making the rounds

I'm surprised that the authors decided to, and were able to, slip in this little euphemism.

bee_rider3y ago

I wonder if somewhere along the chain there was an automated tool to convert frequently abused mental-health related terms like “insane” into something leas hurtful, or something along those lines?

I haven’t seen widespread use of the word “neurodivergent” as a kind of… whatever this is, weirdly euphemistic slur, almost?

chongli3y ago

I haven’t seen widespread use of the word “neurodivergent” as a kind of… whatever this is, weirdly euphemistic slur, almost?

It’s a continuation of the euphemism treadmill [1]. It won’t be long before “neurodivergent” is considered politically incorrect and a new term is invented to replace it.

[1] https://www.urbandictionary.com/define.php?term=Euphemism%20...

wizzwizz43y ago

> I wonder if somewhere along the chain there was an automated tool to convert frequently abused mental-health related terms like “insane” into something leas hurtful, or something along those lines?

And yet again, these Lincoln systems mess up. While giving advice to the author can avoid Great Grimsby mistakes, making the replacements automatically is an utterly Scunthorpe decision, with failures as Slough as they are foreseeable.

If the intent is to be unhurtful, it should need to choose a different word to if the intent is to be hurtful! Even our most sophisticated automated systems are Milton Keynes at determining that kind of thing.

1 more reply

peterashford3y ago

Yeah, that's pretty gross, tbh

blahedo3y ago

Thanks for pointing this out---when I read the article I tripped on that word, thought it odd and not sure what the author was trying to say, and moved on, but now that you call it out it seems very obviously to be used in just the same way that a lot of people used to use the r-slur (and some still do).

nimish3y ago

It's still apt, even as someone ostensibly in that category.

It does require some abstract thinking to comprehend sets of zero measure, negative measure or complex measure in mathematics. A "zero length object" is also encountered pretty often in practice:http://docs.autodesk.com/CIV3D/2013/ENU/index.html?url=files... and zero-length files come to mind.

The euphemism ends up working out fine, though likely not the author's intent.

__s3y ago· 5 in thread

tl;dr `realloc(p, 0)` is slated to be undefined behavior in C23, whereas it's been somewhat implementation defined until now, with recommendation being realloc(p, 0) is equivalent to free(p)

Seems a bit tone deaf to create new undefined behavior in memory handling, especially when a sane default behavior seems to be de facto

I've used that free-on-0 behavior myself. Unfortunately the code that uses this will often have 0 be a length variable, so hard to grep for this. Ideally musl/glibc will both stick to that undefined behavior being free & gcc/clang won't go about making this something to point their optimizations at

Lest we have to stop using realloc outside of a safe_realloc wrapper

  static void *safe_realloc(void *p, size_t newlen)
  {
    if (newlen == 0) { free(p); return NULL; }
    return realloc(p, newlen);
  }

What got this whole thing weird is that C doesn't like zero sized objects, but implementations were allowed to return a unique pointer for a zero sized allocation. Which then raises the matter that being portable there require freeing that reserved chunk for non-free implementations. In theory this reservation code could be more efficient when code frequently reallocates between 0 & some small value. & there was uncertainty because NULL is a way to say allocation failure, but then if one did a NULL check on realloc's return value they also had to check that the size was non-zero

wahern3y ago

> Seems a bit tone deaf to create new undefined behavior in memory handling,

It's only tone deaf to people who understand "undefined behavior" as an epithet or as synonymous with giving a license to compilers to screw you over. The term doesn't have either of those meaning to those on the C committee. In fact, one of the explicit rationales for the proposal is that, "Classifying a call to realloc with a size of 0 as undefined behavior would allow POSIX to define the otherwise undefined behavior however they please." https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2464.pdf

> especially when a sane default behavior seems to be de facto

The above proposal, N2464, gives the behavior for AIX, zOS, BSD (unspecified), MSVC (crt unspecified), and glibc. They each have different behaviors.

Why they chose to finally make it undefined (it was marked as obsolescent for a long time) rather than keep it as implementation-defined, I don't know. Perhaps because it 1) simplifies the standard, and 2) by making it undefined it suggests compilers should start warning about it--despite all this time neither has there arisen a consensus among implementations about the best behavior, nor are programmers aware that the behavior actually varies widely.

EDIT: The draft SUSv5/POSIX-202x standard has indeed directly addressed this issue. See, e.g., https://www.austingroupbugs.net/view.php?id=374 The most recent draft included the following addition to RETURN VALUE:

  OB     If size is 0,
  OB CX  or either nelem or elsize is 0,
  OB     either:

  OB     * A null pointer shall be returned
  OB CX    and, if ptr is not a null pointer, errno shall be set to [EINVAL].

  OB     * A pointer to the allocated space shall be returned, and the memory object pointed to by ptr
           shall be freed. The application shall ensure that the pointer is not used to access an object.

CX marks points of divergence with C17. The first CX is because of the addition of reallocarray, absent from C17. The second is because POSIX will mandate the setting of EINVAL if NULL is returned.

peppermint_gum3y ago

>It's only tone deaf to people who understand "undefined behavior" as an epithet or as synonymous with giving a license to compilers to screw you over. The term doesn't have either of those meaning to those on the C committee.

It's unfortunate but not surprising that the C committee isn't aware of the problems with the undefined behavior.

In fact, after I started reading WG14 meetings minutes, I completely lost faith that any of the serious problems with the standard will ever get fixed.

1 more reply

adgjlsfhk13y ago

> It's only tone deaf to people who understand "undefined behavior" as an epithet or as synonymous with giving a license to compilers to screw you over.

Unfortunately, this is the correct understanding of UB.

JoshTriplett3y ago

realloc to 0 size being free is useful in particular because it means a function pointer to realloc is a complete memory allocator: call realloc with pointer NULL to get malloc, and call realloc with size 0 to get free.

moremetadata3y ago

> What got this whole thing weird is that C doesn't like zero sized objects, but implementations were allowed to return a unique pointer for a zero sized allocation.

Some of the windows API's work like this, so how much is pressure from MS?

Same discussion from 7 months ago.

https://news.ycombinator.com/item?id=32352965

https://thephd.dev/c23-is-coming-here-is-what-is-on-the-menu...

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2897.htm

Pattern matching ram for variables/objects whilst they exist even if zero'ed or prefilled with a value doesnt give perfect security. Random values would make it harder to work out the variable/object.

Dwedit3y ago· 4 in thread

Did we ever legalize type punning?

JonChesterfield3y ago

We have "pointer provenance" which allows license to track type punning across more of your program than ever before in order to delete more parts of it with no diagnostic required.

For bonus marks, int and atomic_int are unrelated types, and simd vector types aren't a thing, so enjoy the unfixable performance cost of choosing C.

cryptonector3y ago

cry

sob

But this will speed the transition to Rust.

cryptonector3y ago

Asking the real questions. Without looking I'm willing to bet the answer is "no, and stop asking".

kzrdude3y ago

Through union yes, I think

solidsnack90003y ago· 3 in thread

"Looking forward, marijuana legalization will surely beget notions such as fractional-, imaginary-, and negative-length objects, each with as much potential for mayhem as zero-length objects."

It's a funny thing to say.

firstlink3y ago

Rust seems to do fine with ZSTs somehow.

kibwen3y ago

ZSTs work splendidly in Safe Rust, but you do need to consider them if you're writing unsafe generic code. Here's the relevant section of the Rustonomicon: https://doc.rust-lang.org/nomicon/exotic-sizes.html#zero-siz... .

garbagecoder3y ago

>negative-length

nervous Minkowski laughter

eternalban3y ago· 3 in thread

C is a very large language masquerading as a small language.

pjmlp3y ago

Only because many keep worshiping K&R C, ignoring what is the actual C that modern compilers support.

MichaelZuo3y ago

What does that make C++?

eternalban3y ago

https://upload.wikimedia.org/wikipedia/commons/a/a7/Frankens...

(don't get me wrong. love C. but in an innocent sort of way, like a teenager quite unaware of betrayals, heartbreak, love triangles, or UB, UsB, and IDB..)

juunpp3y ago· 2 in thread

> The ckd_* macros steer a refreshingly sane path around arithmetic pitfalls including C's "usual arithmetic conversions."

A 7 letter function to add two numbers and that returns a boolean... not entirely sure I'd call that 'sane'.

ludocode3y ago

I'd prefer if it were more letters. It bothers me when API designers omit random letters just to save a few keystrokes. These are particularly egregious because I keep forgetting which letters they kept. Is it "chk"? or "ckd"? or "chd"? or something else?

I wrote a portability library that wraps these with compiler intrinsic and standard C fallbacks. I chose to spell out the full word in addition to making the type explicit. It's a lot more verbose of course but a lot clearer to read:

https://github.com/ludocode/ghost/blob/develop/include/ghost...

goatlover3y ago

A saner language would handle the conversion for you so it would work with just the normal math operators.

1 more reply

GianFabien3y ago· 2 in thread

Maybe I'm being dense. To me it appears that the standards are telling compiler writers what should be done. In doing so the compilers will become ever more complex and thus bug-prone.

I learnt C back when K&R (first edition) was the reference. Ok, it was hardly much more than a universal assembler to make every computer look like a PDP-11. In my experience C is the language to use when you want to be close to the metal. For the rest I use which ever high-level language/environment is best suited. Admittedly some FFI are a pain to use, but once you get the boilerplate bedded down your much higher level language gets the coordination done.

RobotToaster3y ago

>To me it appears that the standards are telling compiler writers what should be done.

Isn't that what standards are supposed to do?

JonChesterfield3y ago

Traditionally they recorded existing practice and gently encouraged diverging implementations to converge.

The alternative approach is to invent things by committee, hopefully with some implementers watching, and hope for the best.

a-bit-of-code3y ago· 2 in thread

Is it just me that thinks that the article is a [skilfully drafted] joke (or parody or whatever the correct word is)? The fact that it has been published close to April 1st raises more suspicions.

brxaf3y ago

I thought the same initially, but the realloc() parts are definitely true.

still_grokking3y ago

My interpretation would be rather that the C language is a carefully drafted joke or parody.

pjmlp3y ago· 2 in thread

And zero focus on improving the root causes of memory corruption due to strings and array indexing errors.

The security world will keep burning it seems.

heywhatupboys3y ago

> The security world will keep burning it seems.

There is no alternative to network protocols and IPC that the stringtypes C has. You get a length and a byte array. If you trust the user, you can assume length is correct. Otherwise no.

pjmlp3y ago

Sure there are, as proven by distributed networking stacks not written in C.

In fact Ethernet early days goes back to Mesa not C.

UNIX did not invent networking, networking predates UNIX for at least a decade.

1 more reply

GuB-423y ago· 1 in thread

> C178 purports to be a bug-fix revision of C11. Does the word "toto" on page 1 indicate (a) the editor's musical tastes; (b) that nobody bothered to spell-check the document; (c) that we're not in Kansas anymore; or (d) none of the above?

As a french guy I'd go with (d).

I've often seen "toto" used as a placeholder name, sometimes followed by "titi", "tata", "tutu", I have even used it myself. It is similar to "foo", "bar", "baz". I don't know if it is specific to France, of French speaking countries, but it is definitely a thing here.

rahen3y ago

Most likely toto as the French for foobar.

Jens Gustedt is part of the C comity and participated to C23. He also works for INRIA in France: https://en.wikipedia.org/wiki/French_Institute_for_Research_...

antiquark3y ago· 1 in thread

C reached its zenith in C90, and saw a few good ideas in C99. Everything since has been wankery from people who either are bored, or have a severe case of C++-envy.

pjmlp3y ago

Even then it was already outdated when compared against languages like Modula-2 and Object Pascal, it got lucky to ride into the waves of UNIX adoption.

layer83y ago· 1 in thread

While the situation with realloc() is unfortunate, it is also not difficult to write a wrapper that does what the author wants. I’ve done that before, because it has long been known that not all realloc() implementations conform to the (prior) C standard. One can furthermore assume that existing implementations won’t change their behavior just because C23 made it UB.

p0nce3y ago

Honestly I'm happy the C standard now address how realloc behaves in detail. It was already hard before, and now it's documented.

blippage3y ago· 1 in thread

#embed is what I really want. And separators.

> Standard C advances slowly

They're not joking, either. C is conservative to a fault, I think.

AlbertoGP3y ago

> #embed is what I really want. And separators.

If you want to try out those features now, I made a pre-processor that translates that into standard C99:

https://sentido-labs.com/en/library/cedro/202106171400/use-e...

https://sentido-labs.com/en/library/cedro/202106171400/#numb...

It includes a cc wrapper called cedrocc that you can use as a drop-in replacement:

https://sentido-labs.com/en/library/cedro/202106171400/#cedr...

kgbcia3y ago· 1 in thread

I just need built-in string handling

cryptonector3y ago

> No -the C committee

JonChesterfield3y ago

Author is angry but not wrong. Lifting the most damning quote from the article as I haven't seen it for a while.

C inventor Dennis Ritchie pointed to several flaws in [ANSI C] ... which he said is a licence for the compiler to undertake agressive opimisations that are completely legal by the committee's rules, but make hash of apparently safe programs; the confused attempt to improve optimisation ... spoils the language.

—Dennis Ritchie on the first C standard

andrepd3y ago

> All C standards from C89 onward have permitted compilers to delete code paths containing undefined operations—which compilers merrily do, much to the surprise and outrage of coders.16 C23 introduces a new mechanism for astonishing elision: By marking a code path with the new unreachable annotation,12 the programmer assures the compiler that control will never reach it and thereby explicitly invites the compiler to elide the marked path.

I don't agree with this in the slightest. I'm not "outraged" by undefined behaviour, it's a fundamental tool for writing performant code. Ensuring that dereferencing a null pointer or accessing outside the bounds of an array is undefined behaviour is what lets the compiler not emit a branch on every array access and pointer dereference.

Furthermore, I really don't understand the outrage that there is another explicit tool to achieve behaviour the author may or may not consider harmful. If it's an explicit macro, it's not a tarpit!

quintussss3y ago

I always wonder how much these new C standards use, as C is now mostly used in areas where one is severely limited when it comes to compiler choice. Where I work, we use GCC 6.2 and iso9899:1990 (C90). If we were able to use a modern compiler, we would probably just use C++.

otabdeveloper43y ago

Hopefully not literally. (But C23 is exactly the kind of programming language you expect to do that.)

cryptonector3y ago

The `realloc()` change calls for pitchforks.

j / k navigate · click thread line to collapse

264 comments

120 comments · 26 top-level

Dylan168073y ago· 15 in thread

I don't agree with that description at all. Here's the code:

  1 if (argc <= 2)
  2   unreachable();
  3 else
  4   return printf("%s: we see %s", argv[0], argv[1]);
  5 return puts("this should never be reached");

The only code path that's "entirely different" is lines 1,4,5 and in that case of course you remove a return that's after a return.

And the other valid code path is 1,2,5, which has `puts` after `unreachable`.

To need `puts` you have to imagine a code path that gets past the "if" without taking either branch?

Maybe the author means something by "code path" that's very different from how I interpret it?

I would be pretty surprised if the above code means something different from:

  if (argc <= 2) {
    unreachable();
    return puts("this should never be reached");
  } else {
    return printf("%s: we see %s", argv[0], argv[1]);
    return puts("this should never be reached");
  }

ternaryoperator3y ago

codeflo3y ago

1 more reply

hyperhopper3y ago

This is not true at all:

I've been many loops that turn into no-ops because all the functionality has been refactored out but this fact is hidden in function calls.

Sure, this should ideally be surfaced as a lint error, not a compiler optimization, but you cannot say that intentional delays are the "only" reason.

Also since processing time is variable, using that as a method should be extremely heavily discouraged/warned/require-opt-in

viraptor3y ago

Those delay loops are common on microcontrollers and the usual solution is to either make the counter volatile or insert something opaque to the compiler in the loop body.

It would be of course nice if a warning was produced for that specific case: This whole loop was removed - is it really what you wanted, or is it a broken delay loop?

kzrdude3y ago

I think it's a practical example of how the C language has made a journey to being more high abstraction than it used to be, in practice. And how that unsettles those used to the old behaviour.

wahern3y ago

I think the point is that if the `argc <= 2` path is unreachable, then that means argc is always greater than 2, permitting the compiler to optimize the entire block to just:

  return printf("%s: we see %s", argv[0], argv[1]);

dtolnay3y ago

I interpreted the author's characterization to be about something like:

  1  if (argc <= 2)
  2    puts("A");
  3  puts("B");
  4  if (argc <= 2)
  5    unreachable();
  6  else
  7    return puts("C");
  8  return puts("D");

in which not just lines 4-6,8 go away (as you said) but also lines 1-2.

2 more replies

alwaysbeconsing3y ago

    if (argc <= 2)
        do_something();
    else
        return printf("%s: we see %s", argv[0], argv[1]);

So the `return printf` is executed when `argc` is greater than 2. If we remove just the body of the first branch:

    if (argc <= 2)
        ;
    else
        return printf("%s: we see %s", argv[0], argv[1]);

the same thing holds. And additionally when `argc <= 2`, control will move past the `if`.

Under this view, if the `unreachable` won't cause the entire removal of the `if`, the compiler will produce the equivalent of:

    if (argc > 2)
        return printf("%s: we see %s", argv[0], argv[1]);

    return puts("this should never be reached")

Again, I don't say this is the correct interpretation, but it is one possibility, that would have to be ruled out by other parts of the standard.

Dylan168073y ago

Asooka3y ago

josephcsible3y ago

lldb3y ago

cryptonector3y ago

The `realloc()` change though...

_8j503y ago

Shouldn't the compiler warn or error on unreachable code?

codeflo3y ago

1 more reply

i-use-nixos-btw3y ago· 12 in thread

This is written with quite a lot of hyperbole.

Arch-TK3y ago

I agree that realloc was poorly defined for the 0 size case, I think UB or IDB both would have worked in this case to really drive that point home, the WG chose UB.

coliveira3y ago

> this kind of mislead assumption is one of the major sources of bugs in C code.

3 more replies

cryptonector3y ago

astrange3y ago

> Except for the situation where your implementation explicitly specifies that it provides a predictable behaviour for a specific case of UB, there's literally no guarantee of what will happen.

That situation is "when you have UBSan turned on".

1 more reply

c4mpute3y ago

Wrong, Wrong, Wrong.

chongli3y ago

And UB has always been an excuse for compilers to screw over programmers in hideous ways

Many of these optimizations would be impossible if compilers were forced to assume the opposite: that UB will occur whenever possible.

1 more reply

AlotOfReading3y ago

Since approximately every nontrivial program ever written has UB, in actual practice we're only saved by the fact that compilers aren't entirely maliciously compliant.

2 more replies

jcranmer3y ago

The case of realloc being declared UB (as opposed to impl-defined) was not driven by the compiler writers but by the people who write the C libraries.

GuB-423y ago

UB can initiate the rise of zombie velociraptors.

  int n;
  printf("type 0 to stop the rise of zombie velociraptors");
  scanf("%d", &n);
  realloc(pre, n);
  if (n != 0) rise_zombie_velociraptors()

May result in velociraptors raising even if the user enters "0".

The reason is that because realloc(pre, 0) is UB, for the compiler, it cannot happen, so n can't be 0, so the n != 0 test can be optimized out, so, velociraptors.

Asooka3y ago

In any case it's not that hard to just write a sane wrapper. This one is placed in the Public Domain:

    void *sane_realloc(void *ptr, size_t sz)
    {
        if (sz == 0) {
            free(ptr); /*free(NULL) is no-op*/
            return NULL;
        }
        if (ptr == NULL) {
            return malloc(sz);
        }
        return realloc(ptr, sz);
    }

omoikane3y ago

> This is written with quite a lot of hyperbole

benj1113y ago

So the feature wasn't broken to begin with, it was broken by another feature.

ChancyChance3y ago· 9 in thread

Is the world finally realizing that "a + b" actually returns two values: pass/fail and the value if pass?

"a + b = c;" is a fundamentally flawed operation from a computer architecture perspective.

c4mpute3y ago

First, you might have meant c = a+b;

The other way isn't really definable as an assignment mathematically.

khazhoux3y ago

> First, you might have meant c = a+b;

> The other way isn't really definable as an assignment mathematically.

This correction is condescending and unnecessary. Unless the person had never written a single line of code in their life, then they would obviously know "a+b" is not a modifiable lvalue.

From HN guidelines: "Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize."

1 more reply

JonChesterfield3y ago

> The other way isn't really definable as an assignment mathematically.

It's an equality sign. See also, := and unification.

Arch-TK3y ago

There is actually another option.

A more sophisticated type system.

Let's say you had some pseudocode like this:

    let a = 5
    let b = 12
    let c = a + b

The type of a would be Integer[5..5], the type of b would be Integer[12..12], the type of c would therefore be Integer[17..17]. In a more complex example:

    def foo(a: Integer[0..10], b: Integer[0..10]):
        return a + b

The return type of this function would be Integer[0..20].

This kind of type system can solve a number of issues, all but division by zero (which would probably still have to be solved with some kind of optional type).

Hopefully this is some food for thought.

im3w1l3y ago

2 more replies

still_grokking3y ago

> I believe this sort of thing has a name, […]

https://en.wikipedia.org/wiki/Refinement_type

Meanwhile in mainstream ML-land:

https://github.com/Iltotore/iron

(Or for the older version of the language: https://github.com/fthomas/refined)

2 more replies

JonChesterfield3y ago

Compilers do this sort of range tracking anyway. At least within a function. It's useful for loop optimisations.

notfed3y ago

It's a flaw that has a pretty good tradeoff: unparalleled readability.

ChancyChance3y ago

It depends. If you want to study maths, yes. If you want to be a programmer:

[status, value] = add(a, b);

Is much more unparalleled-ly (?) readable from the perspective of how a computer actually operates. In reality, this:

uint c = (uint)a + (uint)b; // (to make that other guy happy)

is really:

c = (a + b) % (sizeof(uint));

in "C", which is less readable but far more accurate.

1 more reply

GuB-423y ago· 6 in thread

I actually like unreachable() a lot. What it does is that it invokes undefined behavior, that's all.

It does nothing trickier than any other kind of UB. In fact, I could implement unreachable() like this: void unreachable() { (char *)0 = 1; }.

lionkor3y ago

Respectfully, you would already be doing this in any C codebase, with `assert()`, right? We are all checking our preconditions with assert... right?

GuB-423y ago

AFAIK, assert() is not undefined behavior, so it can't be used for optimization. It is either implementation-defined in debug mode, or does nothing in release mode.

For example:

  assert(a >= 0);
  if (a < 0) printf("a is negative");

In release mode, assert() will be gone, so the if/printf() will stay. If we used "if (a < 0) unreachable();" instead of assert(), it would optimize away both lines.

pornel3y ago

NDEBUG makes these checks disappear, so that's not an option for checks that are supposed to stay in the program.

lprib3y ago

GuB-423y ago

Yep, undefined behavior is unsafe, C in general is unsafe. There are plenty of languages that are safe, though a little bit rusty like ADA, use one of these if you want safe code.

ptx3y ago

> What it does is that it invokes undefined behavior, that's all. [...] it can also be used in debug builds to trigger a crash

1 more reply

firstlink3y ago· 6 in thread

> and that such changes may impose themselves on old code without recompilation when dynamically linked libraries are upgraded.

All I can do is laugh. This is what the dynamic linker fanatics wanted. This is what they explicitly advocate for to this day. Share and enjoy!!

AshamedCaptain3y ago

I really don't think anyone could possibly want the _specified behavior_ of a function changing below their feet.

The behavior is practically the same as with static linking, and you still get the benefits of dynamic linking.

throwaway8922383y ago

People who don't understand dynamic linking are doomed to re-implement it, poorly.

coliveira3y ago

hermitdev3y ago

tedunangst3y ago

bayindirh3y ago

If I have something that critical, I can always statically compile.

RustyRussell3y ago· 6 in thread

Frankly, the C standards ctte went off the deep end when they effectively banned NULL to memset etc (obv with zero length).

Not because these functions couldn't handle it, but because this assertion simplifies optimizations elsewhere.

This has required adding extra checks in my code, found mainly by trial and error, and has made it less readable and less optimal.

Finally, the checked arithmetic operations returning false on success is a horror show. Fortunately it will be found on the first time the code is run, but that's a damnably low bar :(

ericpauley3y ago

> Finally, the checked arithmetic operations returning false on success is a horror show.

This seems in line with C conventions? Generally a 0 return code means success.

wruza3y ago

With int statuses, not with bools. It’s just a twisted logic in return value you have to deal with in your head.

“If checked operation has a status, then it failed.” - ok

“If checked operation [is true], then it failed.” - wat

2 more replies

Kamq3y ago

> Finally, the checked arithmetic operations returning false on success

That's what got you? C functions returning error flags (with zero meaning no error) isn't exactly new.

Dwedit3y ago

Replace memset with a macro, that's the C way.

notfed3y ago

Isn't the return value just a carry bit?

spc4763y ago

Not every CPU C runs on has a carry bit. MIPS, SPARC, RISC-V, all don't have the concept of a "carry bit."

tzs3y ago· 6 in thread

> Pointers to free'd memory are akin to uninitialized pointers, so free(p) followed by if (p==q) is an instrument of arson

What's the reason for this?

jcranmer3y ago

Given the following code:

    void *p = malloc(N);
    do_random_stuff(p);
    void *q = malloc(N);

tedunangst3y ago

coliveira3y ago

xigoi3y ago

Obviously dereferencing a freed pointer is incorrect behavior, but what harm is there in using its numerical value?

Dylan168073y ago

I can't tell you exactly why but it's consistent with just about everything else involving p being undefined, and the result of the comparison would be useless anyway.

tzs3y ago

Why would the comparison be useless?

1 more reply

MatmaRex3y ago· 6 in thread

> As C89 was taking shape, the neurodivergent notion of a "zero-length object" was making the rounds

I'm surprised that the authors decided to, and were able to, slip in this little euphemism.

bee_rider3y ago

I haven’t seen widespread use of the word “neurodivergent” as a kind of… whatever this is, weirdly euphemistic slur, almost?

chongli3y ago

I haven’t seen widespread use of the word “neurodivergent” as a kind of… whatever this is, weirdly euphemistic slur, almost?

It’s a continuation of the euphemism treadmill [1]. It won’t be long before “neurodivergent” is considered politically incorrect and a new term is invented to replace it.

[1] https://www.urbandictionary.com/define.php?term=Euphemism%20...

wizzwizz43y ago

1 more reply

peterashford3y ago

Yeah, that's pretty gross, tbh

blahedo3y ago

nimish3y ago

It's still apt, even as someone ostensibly in that category.

The euphemism ends up working out fine, though likely not the author's intent.

__s3y ago· 5 in thread

tl;dr `realloc(p, 0)` is slated to be undefined behavior in C23, whereas it's been somewhat implementation defined until now, with recommendation being realloc(p, 0) is equivalent to free(p)

Seems a bit tone deaf to create new undefined behavior in memory handling, especially when a sane default behavior seems to be de facto

Lest we have to stop using realloc outside of a safe_realloc wrapper

  static void *safe_realloc(void *p, size_t newlen)
  {
    if (newlen == 0) { free(p); return NULL; }
    return realloc(p, newlen);
  }

wahern3y ago

> Seems a bit tone deaf to create new undefined behavior in memory handling,

> especially when a sane default behavior seems to be de facto

The above proposal, N2464, gives the behavior for AIX, zOS, BSD (unspecified), MSVC (crt unspecified), and glibc. They each have different behaviors.

  OB     If size is 0,
  OB CX  or either nelem or elsize is 0,
  OB     either:

  OB     * A null pointer shall be returned
  OB CX    and, if ptr is not a null pointer, errno shall be set to [EINVAL].

  OB     * A pointer to the allocated space shall be returned, and the memory object pointed to by ptr
           shall be freed. The application shall ensure that the pointer is not used to access an object.

CX marks points of divergence with C17. The first CX is because of the addition of reallocarray, absent from C17. The second is because POSIX will mandate the setting of EINVAL if NULL is returned.

peppermint_gum3y ago

It's unfortunate but not surprising that the C committee isn't aware of the problems with the undefined behavior.

In fact, after I started reading WG14 meetings minutes, I completely lost faith that any of the serious problems with the standard will ever get fixed.

1 more reply

adgjlsfhk13y ago

> It's only tone deaf to people who understand "undefined behavior" as an epithet or as synonymous with giving a license to compilers to screw you over.

Unfortunately, this is the correct understanding of UB.

JoshTriplett3y ago

moremetadata3y ago

> What got this whole thing weird is that C doesn't like zero sized objects, but implementations were allowed to return a unique pointer for a zero sized allocation.

Some of the windows API's work like this, so how much is pressure from MS?

Same discussion from 7 months ago.

https://news.ycombinator.com/item?id=32352965

https://thephd.dev/c23-is-coming-here-is-what-is-on-the-menu...

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2897.htm

Dwedit3y ago· 4 in thread

Did we ever legalize type punning?

JonChesterfield3y ago

We have "pointer provenance" which allows license to track type punning across more of your program than ever before in order to delete more parts of it with no diagnostic required.

For bonus marks, int and atomic_int are unrelated types, and simd vector types aren't a thing, so enjoy the unfixable performance cost of choosing C.

cryptonector3y ago

cry

sob

But this will speed the transition to Rust.

cryptonector3y ago

Asking the real questions. Without looking I'm willing to bet the answer is "no, and stop asking".

kzrdude3y ago

Through union yes, I think

solidsnack90003y ago· 3 in thread

"Looking forward, marijuana legalization will surely beget notions such as fractional-, imaginary-, and negative-length objects, each with as much potential for mayhem as zero-length objects."

It's a funny thing to say.

firstlink3y ago

Rust seems to do fine with ZSTs somehow.

kibwen3y ago

garbagecoder3y ago

>negative-length

nervous Minkowski laughter

eternalban3y ago· 3 in thread

C is a very large language masquerading as a small language.

pjmlp3y ago

Only because many keep worshiping K&R C, ignoring what is the actual C that modern compilers support.

MichaelZuo3y ago

What does that make C++?

eternalban3y ago

https://upload.wikimedia.org/wikipedia/commons/a/a7/Frankens...

(don't get me wrong. love C. but in an innocent sort of way, like a teenager quite unaware of betrayals, heartbreak, love triangles, or UB, UsB, and IDB..)

juunpp3y ago· 2 in thread

> The ckd_* macros steer a refreshingly sane path around arithmetic pitfalls including C's "usual arithmetic conversions."

A 7 letter function to add two numbers and that returns a boolean... not entirely sure I'd call that 'sane'.

ludocode3y ago

https://github.com/ludocode/ghost/blob/develop/include/ghost...

goatlover3y ago

A saner language would handle the conversion for you so it would work with just the normal math operators.

1 more reply

GianFabien3y ago· 2 in thread

Maybe I'm being dense. To me it appears that the standards are telling compiler writers what should be done. In doing so the compilers will become ever more complex and thus bug-prone.

RobotToaster3y ago

>To me it appears that the standards are telling compiler writers what should be done.

Isn't that what standards are supposed to do?

JonChesterfield3y ago

Traditionally they recorded existing practice and gently encouraged diverging implementations to converge.

The alternative approach is to invent things by committee, hopefully with some implementers watching, and hope for the best.

a-bit-of-code3y ago· 2 in thread

Is it just me that thinks that the article is a [skilfully drafted] joke (or parody or whatever the correct word is)? The fact that it has been published close to April 1st raises more suspicions.

brxaf3y ago

I thought the same initially, but the realloc() parts are definitely true.

still_grokking3y ago

My interpretation would be rather that the C language is a carefully drafted joke or parody.

pjmlp3y ago· 2 in thread

And zero focus on improving the root causes of memory corruption due to strings and array indexing errors.

The security world will keep burning it seems.

heywhatupboys3y ago

> The security world will keep burning it seems.

There is no alternative to network protocols and IPC that the stringtypes C has. You get a length and a byte array. If you trust the user, you can assume length is correct. Otherwise no.

pjmlp3y ago

Sure there are, as proven by distributed networking stacks not written in C.

In fact Ethernet early days goes back to Mesa not C.

UNIX did not invent networking, networking predates UNIX for at least a decade.

1 more reply

GuB-423y ago· 1 in thread

As a french guy I'd go with (d).

rahen3y ago

Most likely toto as the French for foobar.

Jens Gustedt is part of the C comity and participated to C23. He also works for INRIA in France: https://en.wikipedia.org/wiki/French_Institute_for_Research_...

antiquark3y ago· 1 in thread

C reached its zenith in C90, and saw a few good ideas in C99. Everything since has been wankery from people who either are bored, or have a severe case of C++-envy.

pjmlp3y ago

Even then it was already outdated when compared against languages like Modula-2 and Object Pascal, it got lucky to ride into the waves of UNIX adoption.

layer83y ago· 1 in thread

p0nce3y ago

Honestly I'm happy the C standard now address how realloc behaves in detail. It was already hard before, and now it's documented.

blippage3y ago· 1 in thread

#embed is what I really want. And separators.

> Standard C advances slowly

They're not joking, either. C is conservative to a fault, I think.

AlbertoGP3y ago

> #embed is what I really want. And separators.

If you want to try out those features now, I made a pre-processor that translates that into standard C99:

https://sentido-labs.com/en/library/cedro/202106171400/use-e...

https://sentido-labs.com/en/library/cedro/202106171400/#numb...

It includes a cc wrapper called cedrocc that you can use as a drop-in replacement:

https://sentido-labs.com/en/library/cedro/202106171400/#cedr...

kgbcia3y ago· 1 in thread

I just need built-in string handling

cryptonector3y ago

> No -the C committee

JonChesterfield3y ago

Author is angry but not wrong. Lifting the most damning quote from the article as I haven't seen it for a while.

—Dennis Ritchie on the first C standard

andrepd3y ago

Furthermore, I really don't understand the outrage that there is another explicit tool to achieve behaviour the author may or may not consider harmful. If it's an explicit macro, it's not a tarpit!

quintussss3y ago

otabdeveloper43y ago

Hopefully not literally. (But C23 is exactly the kind of programming language you expect to do that.)

cryptonector3y ago

The `realloc()` change calls for pitchforks.

j / k navigate · click thread line to collapse