A Special Kind of Hell: intmax_t in C and C++ (2020) (opens in new tab)

(thephd.dev)

32 pointstoastedwedge3y ago31 comments

31 comments

21 comments · 5 top-level

RcouF1uZ4gsC3y ago· 9 in thread

> the vast majority of the shared ecosystem depends on shared libraries/dynamically linked libraries for the standard.

The more I use C and C++, the more I am convinced that shared libraries are the biggest technical debt for the whole ecosystem. It is these shared libraries that are the driving impetus behind “ABI stability”. It is because of shared libraries that we can’t have nice performance or safety enhancing features.

Right now the C and C++ ecosystem is groaning under the weight of shared library technical debt.

Gibbon13y ago

My dog in this hunt is I don't care about ABI stability because everything is statically linked in my world. And when I see this stuff my sour thought is why aren't they versioning their ABI's?

I remember a friend in the dark ages worked on a mixed Pascal and C codebase and they had some tool that generated an adapter layer between the two. All it usually did was flip stuff around on the stack before calling the routine. And then clean things up before returning.

pjmlp3y ago

Those shared libraries are a way to create plugins for commercial software.

Surely one can use OS IPC instead, like we used to do during the days static linking ruled the compiler world, but then don't complain about higher resource usage when every single plugin is its own process.

UncleMeat3y ago

It used to be the case that resource limitations meant that shared libraries were meaningfully more efficient. I know that some software communities like to complain about modern software bloat but statically linking everything is so incredibly valuable across so many different dimensions that it is absolutely worth the cost. Heck, modern link-time optimization probably means that applications run faster despite the code-size bloat.

Shared libraries not only require this absolutely crippling adherence to the ABI they also are a security mess since you need updates from both your software provider and your operating system vendor to ensure that anything is safe.

1 more reply

RcouF1uZ4gsC3y ago

> Those shared libraries are a way to create plugins for commercial software.

That is an entirely reasonable use. And one that can have specific types and ABI guarantees for the commercial software and the plug-in. In addition, this would only be required if you were the developer of the commercial software or a plug-in. One example of this is COM which had specific types and memory allocation/deallocation.

However, the case where the C standard library is being dynamically linked, forces everyone to pay the price for ABI stability.

1 more reply

krastanov3y ago

Wasn't dynamical linking crucial 20 years ago, when drive space and bandwidth were expensive?

dralley3y ago

It's still pretty important today for things like security-critical libraries, libc, plugin systems, etc.

However it is probably overused relative to the situations where it is genuinely useful.

1 more reply

gpderetta3y ago

How would you link against, say, an hardware specific libopengl.so without shared libraries?

intelVISA3y ago

I haven't used a shared lib in many years thankfully, outside of OpenSSL most libs aren't too painful to link statically.

mburee3y ago

And glibc, but most people interested in static linking are using musl/dietlibc/uclib anyways

bsder3y ago· 4 in thread

The problem isn't "intmax_t". The problem is "int".

If you have an ABI, you need to put an explicit size and signedness on every parameter and return value. Period. No excuses.

No "int". No "unsigned int". If I'm being really pedantic, don't even use "char".

It should be "int32_t", "uint32_t", and "uint8_t".

Every time I see objections, it's always someone who wants to use some weird 16-bit architecture. The problem is that those libraries probably won't work anyhow since nobody tests their libraries on anything other than x86 and maybe Arm. If your "int" is 16 bits, you're likely to have a broken library anyway.

CodesInChaos3y ago

There are a few variable size types that make sense in an ABI. Like `size_t` and `(u)intptr_t`.

> If I'm being really pedantic, don't even use "char".

Careful about that one. I believe `uint8_t` isn't required to be a character type, which has implications for type based aliasing.

bsder3y ago

> Careful about that one. I believe `uint8_t` isn't required to be a character type, which has implications for type based aliasing.

The fact that every single compiler typedefs "uint8_t" and "unsigned char" and yet that isn't guaranteed by the standard is the kind of thing that just makes you want to cry.

I'm actually genuinely curious about this: people talk about the possibility of existence, but I haven't seen anybody point to an actual compiler that implements uint8_t as an extended numeric type rather than being an equivalent typedef to "unsigned char". Is there a compiler that really does this?

2 more replies

Gibbon13y ago

Personal opinion, 'int' is way above NULL as a bad idea.

bsder3y ago

"int" wasn't a bad idea "back in the day".

Different architectures had all manner of different "native" sizes. 8-bit micros were still relatively expensive even up through mid-1980s. The 80286, for example, was 16-bit and went away only in the early 1990s. IBM AS/400s were still 48-bits through the mid-1990s. Apple was still dealing with non-32 bit clean applications in the mid 90s. Linux running on Alpha was still cleaning up x86-isms in mid 1990s.

C99 finally introduced "stdint.h". By the mid-2000s, everything had converged to 32-bit (or 64-bit).

C11 should have deprecated "int" so that compilers could throw warnings on it. But, then, we still haven't removed K&R signatures from the C standards, so here we are.

1 more reply

gpderetta3y ago· 2 in thread

As a developer I complain that passing unique_ptr is a couple if cycles slower than it needs to be because of the ABI and I wish the committee/GCC was more aggressive with ABI breaking.

As an user I complain that I can't run linux games from 20 years ago because GCC broke the libstdc++ ABI 15 years ago and how WIN32 is the only stable unix ABI.

Luckily I'm not a compiler developer.

neeeeees3y ago

Out of curiosity, what field do you work in that a couple of cycles matters/is measurable?

gpderetta3y ago

I do work in a field where cycles matter (HFT), but in this specific case I was definitely hyperbolic :).

Const-me3y ago· 1 in thread

I think C# did it right in 2000. Regardless of CPU architecture, integer types like short, int, or ulong have fixed size of 16, 32 and 64 bits respectively.

There're couple special types with machine-dependent size, like IntPtr, but these only used for opaque handles and C interop.

Macha3y ago

This is something it inherited from Java and is standard in newer languages like Rust and Go too

kazinator3y ago

intmax_t should be kept out of stable ABI definitions, and out of API's.

There has to be an ABI for it because we have to pin down what it means to pass an intmax_t as an argument to a function, how it is aligned on the stack if passed that way, and how it's placed into a structure and so on.

However, there could be a provision that the ABI treatment of intmax_t is not guaranteed; it is subject to change due to the redefinition of intmax_t.

And, for that reason, it should be kept out of API's.

That leaves API's that deal specifically with intmax_t itself rather than using it to represent something. Those can use aliasing and versioning.

Say we had a function like this:

  struct intmax_quot_rem intmax_div(intmax_t, intmax_t);

today intmax_t might be 64 bits, so the application should be compiled in such a way that the call to intmax_div goes to some __intmax_div_64, which looks like this:

  struct intmax_quot_rem __intmax_div_64(int64_t, int64_t);

Even when intmax_t changes to 128, that compiled program continues to reference __intmax_div_64 which uses int64_t parameters and structure members. A newly compiled program calls __intmax_div_128.

A particular problem would be functions in the printf family. Say we have a conversion specifier which prints intmax_t which is 64 bits today. Here, the solution is even simpler. The "PRI" macros introduced in C99 provide it. Given an intmax_t value x, we print it like this:

  printf("x = %" PRIdMAX "\n", x);

so today that might expand to some conversion specifier that is identical to the one for PRIx64. And so that compiled program will have that baked into its conversion string, so everything will continue to be the same even if the platform moves to a 128 bit intmax_t.

A newly compiled program on the 128 bit intmax_t platform will get a different PRIdMAX string from the header file, which expands to a conversion specifier matching int128_t.

Basically all the issues are solvable except the issue of some application code carelessly using intmax_t in its APIs without any plan for versioning.

j / k navigate · click thread line to collapse

31 comments

21 comments · 5 top-level

RcouF1uZ4gsC3y ago· 9 in thread

> the vast majority of the shared ecosystem depends on shared libraries/dynamically linked libraries for the standard.

Right now the C and C++ ecosystem is groaning under the weight of shared library technical debt.

Gibbon13y ago

My dog in this hunt is I don't care about ABI stability because everything is statically linked in my world. And when I see this stuff my sour thought is why aren't they versioning their ABI's?

pjmlp3y ago

Those shared libraries are a way to create plugins for commercial software.

UncleMeat3y ago

1 more reply

RcouF1uZ4gsC3y ago

> Those shared libraries are a way to create plugins for commercial software.

However, the case where the C standard library is being dynamically linked, forces everyone to pay the price for ABI stability.

1 more reply

krastanov3y ago

Wasn't dynamical linking crucial 20 years ago, when drive space and bandwidth were expensive?

dralley3y ago

It's still pretty important today for things like security-critical libraries, libc, plugin systems, etc.

However it is probably overused relative to the situations where it is genuinely useful.

1 more reply

gpderetta3y ago

How would you link against, say, an hardware specific libopengl.so without shared libraries?

intelVISA3y ago

I haven't used a shared lib in many years thankfully, outside of OpenSSL most libs aren't too painful to link statically.

mburee3y ago

And glibc, but most people interested in static linking are using musl/dietlibc/uclib anyways

bsder3y ago· 4 in thread

The problem isn't "intmax_t". The problem is "int".

If you have an ABI, you need to put an explicit size and signedness on every parameter and return value. Period. No excuses.

No "int". No "unsigned int". If I'm being really pedantic, don't even use "char".

It should be "int32_t", "uint32_t", and "uint8_t".

CodesInChaos3y ago

There are a few variable size types that make sense in an ABI. Like `size_t` and `(u)intptr_t`.

> If I'm being really pedantic, don't even use "char".

Careful about that one. I believe `uint8_t` isn't required to be a character type, which has implications for type based aliasing.

bsder3y ago

> Careful about that one. I believe `uint8_t` isn't required to be a character type, which has implications for type based aliasing.

The fact that every single compiler typedefs "uint8_t" and "unsigned char" and yet that isn't guaranteed by the standard is the kind of thing that just makes you want to cry.

2 more replies

Gibbon13y ago

Personal opinion, 'int' is way above NULL as a bad idea.

bsder3y ago

"int" wasn't a bad idea "back in the day".

C99 finally introduced "stdint.h". By the mid-2000s, everything had converged to 32-bit (or 64-bit).

C11 should have deprecated "int" so that compilers could throw warnings on it. But, then, we still haven't removed K&R signatures from the C standards, so here we are.

1 more reply

gpderetta3y ago· 2 in thread

As a developer I complain that passing unique_ptr is a couple if cycles slower than it needs to be because of the ABI and I wish the committee/GCC was more aggressive with ABI breaking.

As an user I complain that I can't run linux games from 20 years ago because GCC broke the libstdc++ ABI 15 years ago and how WIN32 is the only stable unix ABI.

Luckily I'm not a compiler developer.

neeeeees3y ago

Out of curiosity, what field do you work in that a couple of cycles matters/is measurable?

gpderetta3y ago

I do work in a field where cycles matter (HFT), but in this specific case I was definitely hyperbolic :).

Const-me3y ago· 1 in thread

I think C# did it right in 2000. Regardless of CPU architecture, integer types like short, int, or ulong have fixed size of 16, 32 and 64 bits respectively.

There're couple special types with machine-dependent size, like IntPtr, but these only used for opaque handles and C interop.

Macha3y ago

This is something it inherited from Java and is standard in newer languages like Rust and Go too

kazinator3y ago

intmax_t should be kept out of stable ABI definitions, and out of API's.

However, there could be a provision that the ABI treatment of intmax_t is not guaranteed; it is subject to change due to the redefinition of intmax_t.

And, for that reason, it should be kept out of API's.

That leaves API's that deal specifically with intmax_t itself rather than using it to represent something. Those can use aliasing and versioning.

Say we had a function like this:

  struct intmax_quot_rem intmax_div(intmax_t, intmax_t);

today intmax_t might be 64 bits, so the application should be compiled in such a way that the call to intmax_div goes to some __intmax_div_64, which looks like this:

  struct intmax_quot_rem __intmax_div_64(int64_t, int64_t);

Even when intmax_t changes to 128, that compiled program continues to reference __intmax_div_64 which uses int64_t parameters and structure members. A newly compiled program calls __intmax_div_128.

  printf("x = %" PRIdMAX "\n", x);

A newly compiled program on the 128 bit intmax_t platform will get a different PRIdMAX string from the header file, which expands to a conversion specifier matching int128_t.

Basically all the issues are solvable except the issue of some application code carelessly using intmax_t in its APIs without any plan for versioning.

j / k navigate · click thread line to collapse