The new Clang _ExtInt feature provides exact bitwidth integer types (opens in new tab)

You can already do this in C though, If EVER SO SLIGHTLY wasteful in non-practical terms.

If you need a 23bit object you just structure it to be that. It’s a couple of AND or SHIFT ops when accessing, but so what? Even for 100Gbit networking you aren’t going to max out even a slightly appropriate CPU.

2 more replies

pjc506y ago

I was wondering that, and share your skepticism of autotranslation (it basically never works, and the only reason people like it is that the HDLs are stuck in the 80s).

But I think the "no automatic promotion or conversion" combined with "will error if combined with different width" could actually make extint(8) and extint(16) useful - it's a massive hint to autovectorisers and lets you generate the SIMD instructions for those widths.

Doubly so if they make sure never to write the words "undefined" where they mean "implementation-defined" for extint. At the moment normal arithmetic in C (x = x+1) is potentially undefined behaviour.

qppo6y ago

High level synthesis works perfectly fine, just not from C. HDLs have chugged along too, it's just the toolchains are ridiculously expensive and risky to change. That's why hardware tech stacks lag behind the state-of-the-art.

I share the skepticism of high level synthesis from C as being a bad motivation. The workflow is more like metaprogramming, and C is terrible at that.

jschwartzi6y ago

It provides another way to represent individual register fields without using bitfields. And probably gives you stronger guarantees about what happens when the bitfield overflows.

It also provides a way to pass those values around without passing the whole register struct around.

bubblethink6y ago

There's also the obvious compression use case. Assuming the rest of your code is sufficiently robust, you can shave off all the excess bits from your data storage. There may be a performance penalty, but you won't have to deal with low level ops or alignment issues. Most real world big data will exceed 32 bits (i.e., the identifiers will exceed 32 bits), but is nowhere close to 64 bits. The benefit is more meaningful if your data now fits in a cache/fast memory whereas it didn't before.

huit6y ago

It's not C to gates if you had to rewrite all your C to manually specific the bit widths of every single signal

Ideally, for FPGA design you only have to use the special bitwidths for the interface of a module. The implementation can be in normal wider C types. The compiler can optimize these operations to smaller bitwidths by realizing the higher input bits are zero/signextend and higher output bits are not used. You can help the optimizer by making some variables smaller bitwidths, but no need to rewrite everything.

I implemented this once for a c-to-hardware compiler and it worked quite well. The compiler had a lot of builtin-types, all signed and unsigned integers from 1 to 64 bits wide, named __int1..int64. See 'extended integer types' in the manual: http://valhalla.altium.com/Learning-Guides/GU0122%20C-to-Har...

strenholme6y ago

Well, encryption research. RadioGatún (SHA-3’s direct predecessor), for example, allows the bit width to be any number between 1 and 64, so this will allow us to see how, say, 29-bit integers work with this algorithm.

Most cryptographic algorithms (notably RC5 and RC6, but also Rijndael/AES) can be extended in to 128-bit word size variants, and having guaranteed support for 128-bit integers in C would be useful to see how these variants act, and run programs to evaluate their security margin.

moonchild6y ago

They fixed autopromotion rules:

> if a Binary expression involves operands which are both _ExtInt, rather than promoting both operands to int the narrower operand will be promoted to match the size of the wider operand, and the result of the binary operation is the wider type.

steerablesafe6y ago

Although it's static size, it could be a building block for bignum arithmetic. The last time I tried compilers were pretty bad at optimizing generic code for bignum addition, although it's pretty easy to hand-optimize in assembly.

steerablesafe6y ago

Generic code for getting the high 64 part from unsigned unsigned integer multiplication of 64 bit values. Can be useful for fixed-point math for example.

elgfare6y ago

I can see this being used in serialization protocols.

segfaultbuserr6y ago· 14 in thread

I think it's funny. C was originally invented in an era when machines didn't have a standard integer size, 36-bit architectures were at their heydays, so C integers - char, short, int, and long - only have a guaranteed minimum size that could be taken for granted, but nothing else, to achieve portability. But after the computers of world have converged to multiple-of-8-bit integers, the inability to specify a particular size of an integer become an issue. As a result, in modern C programming, the standard is to use uint8_t, uint16_t, uint32_t, etc., defined in <stdint.h>, C's inherent support of different integer sizes are basically abandoned - no one needs it anymore, and it only creates confusions in practice, especially in the bitmasking and bitshifting world of low-level programming. Now, if N-bit integers are introduced to C, it's kind of a negation-of-the-negation, and we complete a full cycle - the ability to work on non-multiple-of-8-bit integers will come back (although the original integer-size independence and portability will not come back).

phoe-krk6y ago

Common Lisp programmer here.

While contemporary implementations are most commonly tailored to use (UNSIGNED-BYTE 8), (UNSIGNED-BYTE 16), (UNSIGNED-BYTE 32), and (UNSIGNED-BYTE 64) along with their signed counterparts, our language allows one to freely specify and use integer types such as (UNSIGNED-BYTE 53) that could - in theory - be optimized for on architectures that use unique, by today's standards, word sizes.

This also comes from the fact that Common Lisp was specified during times that had no real standardized word sizes, and so the standard had to accomodate for different machine types on which a byte could mean different and mutually exclusive things.

AaronFriel6y ago

There's something fundamentally different and mistaken about C's original implementation of variable integer sizes though.

People often describe C as "portable assembly", but despite this, integer sizes varying on different platforms results in non-portability of anything those programs _produce_. That is, a "file", or bit stream (not byte stream!) produced by one machine may be incompatible with another. The original integer-size independence is decidedly not portable.

That was probably less of a problem when it was rare to send data from one physical machine to another machine, let alone one of another type. But now the world is inter-net-worked and we have all sorts of machines talking to each other all the time.

Making the interfaces explicit reduces errors. These days we now even have virtual machines and programs running at different bit widths on the same machine, and emulated machines on the same machine running different ISAs!

I'm also part of what I'm sure is a small number of users who believe using "usize" should be a lint error manually overridden on Rust and also thinks endianness should also be explicit. Heck, it should be a compiler error to write a struct to a socket if it contains any non-explicit values!

TwoBit6y ago

Original C was portable to -platforms- and not to -users-.

[1] https://www.cplusplus.com/reference/ctime/tm/

m4636y ago

Why not just go to values?

Some languages like Ada allow a type that say goes from -273 to 600.

weinzierl6y ago

Pascal has it too. I always found it quite natural to specify the range I want with the data type and not as precondition in the function. Another big advantage is that you can avoid a good deal of potential off-by-one errors, if you define your data types appropriately. For example the following definition in Pascal would be much less error prone than the corresponding definition in C[1]:

    var
    weekday:  0 ... 6;
    monthday: 1 ... 31;

For a quarter of a century I wonder why no one seems to miss that feature. I really hope we will get them in C one day. Even more so I hope that the proposals for refinement types in Rust[2] will one day be resolved and become implemented.

[2] https://github.com/rust-lang/rfcs/issues/671

guerby6y ago

And GNAT (GCC Ada front-end) will use a biased representation for range type when packing tight:

  with Ada.Text_IO; use Ada.Text_IO;
  procedure T is
   type T1 is range 16..19;
   type T2 is range -7..0;
   type R is record
      A : T1;
      B,C : T2;
   end record;
   for  R use
    record
       A at 0 range  0 .. 1;
       B at 0 range  2 .. 4;
       C at 0 range  5 .. 7;
    end record;
   X : R := (17,-2,-3);
  begin
   Put_Line(X'Size'Image); -- 8 bits
  end T;

Gibbon16y ago

I would very much prefer this and the ability to spec what happens on overflow.

There are a lot of arguments back and forth because there actually is no 'right way' to handle overflow.

BenoitEssiambre6y ago

Yes this is kind of funny. My understanding is that Thomson and Richie deliberately left out non power-of-two word size support (which other languages had at the time because cpu manufacturers were adding a couple of bits at a time to new models (Get this years 14bit cpu, two more than last years' cpu!).

To make a more simple, more elegant, more portable language they decided to settle on power-of-two word lengths. This is similar to how Unix came about, leaving out the cruft and complexity from the over engineered Multics.

anticensor6y ago

The requirement is not to be power of two, it is to be multiple of sizeof(char).

begriffs6y ago

As you mention, the fundamental integer types have guaranteed minimum sizes (or pedantically a range that matches the following sizes if twos' complement is used):

* >=8 bits: char (CHAR_BIT is exactly 8 in POSIX)

* >=16 bits: short and int

* >=32 bits: long

* >=64 bits: long long

The C99 typedefs like uint16_t have to be chosen internally to be one of the underlying types. For those sizes that have no matching underlying type, the implementation will omit typedefs.

However don't forget the more flexible C99 typedefs int_leastN_t and int_fastN_t. They both will give you a type of at least N bits, where the "fast" one chooses whichever type is most convenient for the processor, and the "least" version picks whichever is smallest. (For instance int_least16_t is probably short, and int_fast16_t is probably int.)

Yes, that's right, thanks for your elaboration.

GTP6y ago

Why portability will not come back? Also when the ability of working on non-multiple-of-8-bit integers was lost?

Well, it's not lost in C itself. But in the practice of modern C programming, it's often sacrificed in favor of using integers of exact sizes (uintN_t), and many programs perform bitwise operation by assuming an exact size of integer. By C99, they are guaranteed to have the same number N of bits across all implementations, and they are included only if the implementation supports it. So the programs using them is standard C, but not 100% portable, there is no requirement in C to implement exact-width integers.

Although modification of most programs shouldn't be difficult (there is uint_leastN_t), also, C compilers can be modified to treat extra bits as if they don't exist to allow existing programs to work again.

gumby6y ago

AFAIK the last machine in widespread production that handled multi-length integers was the PDP-10/20 and its clones which essentially died around 1984. I say "around" because though DEC canceled the 20 line, some clones remained (that was Cisco's original business plan, for example)

ralusek6y ago· 5 in thread

A lot of people don't know this, but `BigInt`s are supported in modern JavaScript; integers of arbitrarily large precision.

Try in your browser console:

    2n ** 4096n

    // output (might have to scroll right)
    1044388881413152506691752710716624382579964249047383780384233483283953907971557456848826811934997558340890106714439262837987573438185793607263236087851365277945956976543709998340361590134383718314428070011855946226376318839397712745672334684344586617496807908705803704071284048740118609114467977783598029006686938976881787785946905630190260940599579453432823469303026696443059025015972399867714215541693835559885291486318237914434496734087811872639496475100189041349008417061675093668333850551032972088269550769983616369411933015213796825837188091833656751221318492846368125550225998300412344784862595674492194617023806505913245610825731835380087608622102834270197698202313169017678006675195485079921636419370285375124784014907159135459982790513399611551794271106831134090584272884279791554849782954323534517065223269061394905987693002122963395687782878948440616007412945674919823050571642377154816321380631045902916136926708342856440730447899971901781465763473223850267253059899795996090799469201774624817718449867455659250178329070473119433165550807568221846571746373296884912819520317457002440926616910874148385078411929804522981857338977648103126085903001302413467189726673216491511131602920781738033436090243804708340403154190336n

To use, just add `n` after the number as literal notation, or can cast any Number x with BigInt(x). BigInts may only do operations with other BigInts, so make sure to cast any Numbers where applicable.

I know this is about C, I thought I'd just mention it, since many people seem to be unaware of this.

justicz6y ago

Hm, does this work in Safari? https://caniuse.com/#feat=bigint

ralusek6y ago

Not yet, but I believe babel and others just transpile/polyfill it by having it fall back on a string arithmetic library for working with integers of arbitrary precision.

jakear6y ago

“Syntax Error: No identifiers allowed directly after numeric literal”

recursive6y ago

Safari is the new IE.

waltpad6y ago

So clearly, if llvm is used as a backend for js, this feature will come in handy.

On a side note, apparently, it will also be useful for the rust folks, which has user implemented libraries to emulate C-like bitfields, and implement bigints,

So this work has promising outcomes.

xyzzy20206y ago· 5 in thread

How does this not break sizeof ?

tom_mellior6y ago

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2472.pdf: "_ExtInt types are bit-aligned to the next greatest power-of-2 up to 64 bits: the bit alignment A is min(64, next power-of-2(>=N)). The size of these types is the smallest multiple of the alignment greater than or equal to N. Formally, let M be the smallest integer such that A * M >= N. The size of these types for the purposes of layout and sizeof is the number of bits aligned to this calculated alignment, A * M. This permits the use of these types in allocated arrays using the common sizeof(Array)/sizeof(ElementType) pattern."

saagarjha6y ago

It’d probably round up to the nearest byte, as it already does with the boolean types.

wahern6y ago

The object size has to be at least the alignment size so that arrays work properly--&somearray[1] needs to be properly aligned, and that only works if the object size is a multiple of the alignment: sizeof myint >= _Alignof(myint) && (sizeof myint % _Alignof(myint)) == 0.

As the proposal says, the bit alignment of these types is min(64, next power-of-2(>=N)). (Of course, the alignment can't be smaller than 8 bits, which the proposal fails to account for.) Assuming CHAR_BIT==8, it follows that:

  sizeof _ExtInt(3) == 1   // 5 bits padding
  sizeof _ExtInt(17) == 4  // 15 bits padding
  sizeof _ExtInt(67) == 16 // 61 bits padding

So the amount of padding can be considerable. But that doesn't matter much. What they're trying to conserve is the number of value bits that need to be processed, and in particular minimize the number of logic gates required to process the value. Inside the FPGA presumably the value can be represented with exactly N bits, regardless of how many padding bits there are in external memory.

barbegal6y ago

Where does the spec say that it does that? As far as I can tell C only allows objects to have sizes in whole number of bytes, and that includes booleans.

Although a _Bool type can be used for a bit field (having size of 1 bit) but you can't use sizeof with a bit field.

[1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2472.pdf

Just returns true now... problem solved!

beefhash6y ago· 4 in thread

Note that the spec[1] requires that this tops at an implementation-defined size of integers, so you're likely not getting out of writing bignum code yourself (and even fifimplemented, the bignum operations may likely be variable-time and thus unsuitable for any kind of cryptography). Making the size completely implementation-defined also sounds like it'll be unreliable in practice, I feel like making it at least match the bit size of int would be a worthwhile trade-off between predictability for the programmer aiming for portability and simplicity of implementation.

wongarsu6y ago

Both Clang and gcc already support a 128bit integer type, so it's certainly possible that "implementation-defined" will end up being 128-bit or 256-bit for x64 targets on common compilers (provided MSVC plays along).

TazeTSchnitzel6y ago

LLVM already supports completely arbitrary integer sizes up to 2^23-1 in its IR (https://llvm.org/docs/LangRef.html#integer-type), and I think it can “lower” any integer size to fit what the hardware actually supports. So if Clang doesn't add an artificial constraint on top, in theory you could use a one million-bit integer size if you wanted to?

sgeisler6y ago

Since I had to implement something like that in rust for a base32 codec [1] a few years ago I really like the idea. Although my main concern was ensuring that invariants are checked by the type system, which might not be as much a concern in c with its implicit conversions?

[1] https://github.com/rust-bitcoin/rust-bech32/blob/master/src/...

captainbland6y ago

Judging by the motivation section, the motivation is primarily for FPGAs which I guess is why they want to allow these sub-int sized bit values. You might come up with some custom C-programmable operator that is only 3-bits wide where before you're presumably forced to use the smallest available power of 2 word size which would waste resources. So I think actually the idea is that this is for code which is not supposed to be portable at all, but rather hyper-optimised for custom devices.

derefr6y ago· 4 in thread

> While the spelling is undecided, we intend something like: 1234X would result in an integer literal with the value 1234 represented in an _ExtInt(11), which is the smallest type capable of storing this value.

That “smallest type capable of storing this value” is a disappointing approach, IMHO. It’d be a lot more powerful to just be able to pass in bit patterns (base-2 literals) and have the resulting type match the lexical width of the literal. 0b0010X should have a bit-width of 4, not 2.

Taniwha6y ago

"smallest type" doesn't go far enough for HDL languages - consider a 4-bit counter in verilog

reg [3:0]r;

r = 4'hf; r = r+1; if (r == 0) ....

if r is really 8 bits r+1 will have a non-0 value ....

However all the LLVM people may be saying here is that they're providing the minimal support for arbitrary size math and expect language implementers to generate the masking where required (ie that r+1 above is really (r+1)&4'hf )

I'll note that for Verilog in particular the standard Verilog C-level APIs for accessing data imply that integers are not stored contiguously, instead they're stored in 32-bit chunks with a min size of 32-bits for 1 to 32-bit values - a 33-64 bit value will be stored in 2 non-contiguous 32-bit words ('packed' values are different from this). To be useful any back end support needs to be able to understand stuff stored this way.

floatingatoll6y ago

I think your proposal for 0b0010X makes an excellent addition to the 1234X proposal. Has it already been discussed by the working group? If not, you should email someone to ask them to consider it!

saagarjha6y ago

I wonder if the suffix could be Xn where n is an integer specifying the width.

nybble416y ago

Would 0X12 then be a 12-bit integer with the value zero or a hexadecimal `int` literal with a base-10 value of 18? Does this work for other bases (0X12X12)?

I'm not sure why they picked a letter which can already occur in integer literals rather than one of the many unused letters. Given the focus on FPGAs and HDL it's also worth noting that X is commonly used in binary or hexadecimal constants in HDLs to denote undefined or "don't care" values, which could lead to confusion. Rust integer literal syntax would be perfect here (1234u11 or 1234i11) since it already includes the bit width and is compatible with any base prefix.

waynecochran6y ago· 4 in thread

At some point they need to branch off and not call it C anymore. C should stay relatively small -- small enough that a competent programmer could write a compiler and RTS for it.

weinzierl6y ago

Yes, keep C clean and add all the cruft to another language derived from C. We could call the new language C++.

Gibbon16y ago

My opinion is C and C++ need a divorce. So that C can be modernized with features that make sense in the context of the language. And not constrained as a broken subset of C++.

wongarsu6y ago

A lot of the world still runs on C99, and a lot of (toy or academic) compilers are written for C0 (a simple, small, safe C subset). Even when a new C version gains more features you can still develop against C99, C0, or whatever version you prefer.

waynecochran6y ago

The choice is often an illusion. You only get the control the version for code you write and only if you are programming it in isolation. As soon as you work on a larger team project that may also include third party code you no longer dictate what version of C is being used.

SloopJon6y ago· 4 in thread

C++ has sped up the pace of its releases, but I don't have a sense of where C is. I didn't realize until I looked it up just now that there's a C18, although I gather that this is even smaller a change than C95 was.

Safe to say that a feature like this would be standardized by 2022 at the earliest?

GTP6y ago

I just found out about C18 thanks to your comment, I was still at C11. Thanks. Anyway I think you're right, except for the fact that I don't like when language designers release versions too quickly. I don't know the situation in the C++ land, but as an example I think that Java took the wrong way.

SloopJon6y ago

There's something to be said for the new Java approach of releasing often, with a stable LTS release every now and then, even if Oracle is muddying the waters with their licensing. The only release after 8 that interests me right now is 11. Meanwhile, the features of Java 12, 13, and 14 are available for people who do want to experiment with them.

I think we'll see this implicitly with C++. C++11 and the mostly non-controversial updates in C++14 comprise "modern" C++, whereas adoption of C++17 seems to be a bit slower.

hermitdev6y ago

Since C++11, the ISO committee has been aiming for a new standard release every 3 years. So far, they've kept this cadence up. I don't recall if C++20 is actually out yet, but I know the feature set was finalized last year, if not out yet, it's probably just due to editorial issues (I've not been using C++ for work the last few years, so my knowledge might be a bit dated).

jcelerier6y ago

> but as an example I think that Java took the wrong way.

I wonder what is the right way then ? Java is apparently too fast for you, and yet it gets improvements so slowly that it is getting its marketshare eaten by other JVM languages moving much faster.

If it was even slower it could as well be put directly next to the dusty COBOL and RPG boxes in the IBM attic.

[0] https://www.boost.org/doc/libs/1_72_0/libs/multiprecision/do...

SlowRobotAhead6y ago· 3 in thread

> These tools take C or C++ code and produce a transistor layout to be used by the FPGA.

Hmm, I haven’t been following that but it seems that...

> The result is massively larger FPGA/HLS programs than the programmer needed

And there it is.

Really seems odd to me to try and force procedural C into non-linear execution of FPGA. Like it seems super odd, and when talking about changes to C to help that... I really don’t get it.

This isn’t what C is for. What is the performance advantage over Verilog? How many people want n-bit into in C when automatically handled structures work well for most people.

Maybe I’m just not seeing the bigger picture here and that example was just poor?

Cyph0n6y ago

Not to mention that the first statement is simply false...

The final result is a bitstream that determines which LUTs (lookup tables) and BRAM (memory/block RAM) to use on the chip, and how they should be connected/routed.

The FPGA fabric itself is made of transistors, but your C/C++ (HLS) or HDL code is not directly controlling these transistors. This is what makes FPGAs so flexible relative to ASICs.

marcan_426y ago

There is no performance advantage over Verilog. The reason why (some) people want C is because Verilog and VHDL are unquestionably terrible languages (they weren't even intended to be HDLs, that use case came later) which are damn near impossible to write complex systems in without spending most of your time writing bug-prone boilerplate. Every big name IC designer shop ends up wrapping them in layers of metaprogramming to make them palatable.

So, since those languages suck, people familiar with the procedural side of things end up asking for C. Which is an even bigger impedance mismatch than Verilog, but since you need a smarter backend to even begin to implement it, it can make life easier by that alone.

Personally, I prefer stuff like nMigen, which is basically Python metaprogramming a synthesis-oriented subset of Verilog constructs. Compiles down to Verilog behind the scenes.

krupan6y ago

"Verilog and VHDL are unquestionably terrible languages (they weren't even intended to be HDLs, that use case came later)"

They were intended to be HDLs (for simulation of hardware), but they were never intended to be automatically translated into gates/schematics (i.e., synthesized)

nabla96y ago· 3 in thread

I think most commonly used languages with and without standards, C,C++, JavaScript/Wasm, Python, Java, etc. should standardize new primitive type represetations together (with hardware people included).

If you have different representations in different languages it just creates unnecessary impedance mismatch. It would be better for everyone if you could just pass these types from language to language.

einpoklum6y ago

C++ can have arbitrary-width integers as library types; it would not be that big of a deal IMHO. If `optional`, `variant` and `any` (and maybe soon, `bit`) are not in the language itself, no reason why n-bit-integer should be.

(Of course, this is written from the "we can jerry-rig the existing language to do what you want" perspective with which so much is achievable efficiently in C++.)

hermitdev6y ago

Boost Multiprecision [0] is an example of such a library type. It offers a compile-time arbitrarily wide integers (with predefined types up to 1024 bits) and a C++ wrapper around the GMP or MPIR libraries, which supports arbitrary sizes at runtime (not sure how it's implemented, but probably on top of an array of ints or BCD (binary-coded decimals)).

C++ has had `optional` and `variant` since (I think) C++11, maybe 14. I don't think `any` made the cut. All of these types originated (for C++ standardization) in Boost, as well. I'd caution against using `any`, though. From personal experience, the runtime overhead is quite high, and holding any non-none type is a dynamic allocation. Performance is far better with `variant` at the development cost of needing to know all the types you're going to support at compile-time.

nabla96y ago

> no reason why n-bit-integer should be.

If standard is agreed, it could be pragma similar to calling conventions.

detaro6y ago· 3 in thread

What was wrong with the actual title?

> The New Clang _ExtInt Feature Provides Exact Bitwidth Integer Types

dang6y ago

We've changed back to that above. Submitted title was "C possibly gaining support for N-bit integers".

moonchild6y ago

It implements a proposed feature to the c language, which is the more interesting part of it.

pjmlp6y ago

Not all proposed features get accepted, specially in how conservative WG 14 tends to be.

jhj6y ago· 2 in thread

Much of my time is spent writing Mentor Catapult HLS for ASIC designs these days.

Every HLS vendor or language has their own, incompatible arbitrary bitwidth integer type at present. SystemC sc_int is different from Xilinx Vivado ap_int is different from Mentor Catapult ac_int is different from whatever Intel had for their Altera FPGAs. It's a real mess.

I'm hoping this is another small step to slowly move the industry into a more unified representation, or at least if LLVM support for this at the type level could enable faster simulation of designs on CPU by improving the CPU code that is emitted. What probably matters most for HLS though are the operations which are performed on the types (static or dynamic bit slicing, etc).

aDfbrtVt6y ago

I'm in the same boat. After having played with all the other vendor libraries, I think I like ac_datatypes the most. It's been really fast and the Catapult is a pretty good engine. Can I ask what industry you're in? I'm in telecom.

jhj6y ago

I work for Facebook.

rurban6y ago· 2 in thread

That's what I've wrote to their reddit post:

The feature is of course fantastic. But the syntax still looks bit overblown.

Type system-wise this seems to be more correct:

  _ExtInt(a) + _ExtInt(b) => _ExtInt(MAX(a, b) +1)

And int + _ExtInt(15) might need a pragma or warning flag to warn about that promotion. One little int, or automatic int pollutes all.

Traster6y ago

Problem is:

_ExtInt(16) + _ExtInt(15) => ExtInt(17)

_ExtInt(17) + _ExtInt(15) => ExtInt(18)

So let's say we have a,b and c. a is 16 bits, b and c are 14bits.

a + (b + c) => ExtInt(17) (a + b) + c => ExtInt(18)

Now obviously this a trivial example, but it highlights the fact that unless you're actually willing to carry the true ranges around in your type system, your calculation of bit widths are going to vary due to the details of which operations are done in which order with which intermediary variables.

rurban6y ago

I see, it's now the same problem as with floats. addition is not commutative anymore. But isn't that the same problem in HW also? With the right order you are saving transistors.

mshockwave6y ago· 2 in thread

the title is a little misleading. Since _ExtInst is just an extension of Clang not a standard. GCC and Clang both have some hidden features that are not in standard.

mappu6y ago

It is on the standards track, though, even if N2472 was not completely accepted it seems like there is a process for this (or something very much like it) to become a standard.

dang6y ago

We've changed it now (https://news.ycombinator.com/item?id=22948380).

drfuchs6y ago· 1 in thread

So, if I have an array of extint(3), does it pack them nicely into 10-per-32-bit-word? Or 21-per-64-bit-word? Will a struct with six extint(5) fields fit into 4 bytes? What about just a few global variables of extint(1)? Will they get packed into a single byte? Did I miss where this is covered?

tom_mellior6y ago

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2472.pdf does not mention structs at all, which is disappointing.

I quoted this language below: "_ExtInt types are bit-aligned to the next greatest power-of-2 up to 64 bits: the bit alignment A is min(64, next power-of-2(>=N)). The size of these types is the smallest multiple of the alignment greater than or equal to N. Formally, let M be the smallest integer such that A * M >= N. The size of these types for the purposes of layout and sizeof is the number of bits aligned to this calculated alignment, A * M. This permits the use of these types in allocated arrays using the common sizeof(Array)/sizeof(ElementType) pattern."

But to be honest I don't understand what it's trying to say. If bit width N = 3, the next power of 2 is 4, so would that mean that "bit alignment(?)" A = 4? Then M = 1 is the smallest integer such that A * M >= 3. Then the size of the type would be 4 bits? That wouldn't fly with sizeof.

0xTJ6y ago

I'm very much in support of this. One thing I like about Zig[1] s that integers are explicitly given sizes. I've been playing recently with it, but I'm waiting for a specific "TODO" in the compiler to be fixed.

[1] https://ziglang.org/

waltpad6y ago

First of all, I suppose that it will be possible to make them unsigned (just like for standard types). Is this correct?

Also, what's the relationship between standard types and the new _ExtInts? Are _ExtInt(16) equivalent to shorts, or are they considered distinct and require explicit cast?

> In order to be consistent with the C Language, expressions that include a standard type will still follow integral promotion and conversion rules. All types smaller than int will be promoted, and the operation will then happen at the largest type. This can be surprising in the case where you add a short and an _ExtInt(15), where the result will be int. However, this ends up being the most consistent with the C language specification.

For instance, what if I choose to replace short by _ExtInt(16) in the above? What would be the promotion rule then?

Note that it was already possible to implement arbitrary sized ints for a size <= 64, by using bitfields (although it's possible that you could fall into UB territory in some situations, I've never used that to do modular arithmetic).

Edit: Ah, there's this notion of underlying type: one may use the nearest upper type to implement a given size, but nothing prevents to use a larger type, for instance:

struct short3_s { short value:3; };

struct longlong3_s { long long value:3; };

I don't know what the C standard says about that, but clearly these two types are not identical (sizeof will probably gives different results). What's will it be for _ExtInt? How these types will be converted?

Another idea:

what about

struct extint13_3_s {

  _ExtInt(13) value:3;

};

Will the above be possible? In other words, will it be possible to combine bitfields with this new feature?

I guess it's a much more complicated problem that it appears to be at first.

dang6y ago

Speaking of C, if you missed last week's thread with C Committee members, it was rather amazing: https://news.ycombinator.com/item?id=22865357.

Click 'More' at the bottom to page through it; it just keeps going.

fortran776y ago

I love Erlang for the ability to deal with _bits_. To see this in a compiled language would be wonderful. Of course, you can get down to the bit level with bitwise logical operations, but to be able to express it more naturally would be a great boon to people writing low-level network stuff, and will probably reduce programming errors.

ndesaulniers6y ago

Congrats Erich! One thing I'd be curious about is the ergonomics (or lack of) of explicit integer promotions and conversions for these types, as I find the current rules for implicit integer promotions a little confusing and hard to remember.

For a fun compiler bug in LLVM due to representation of arbitrary width integers, see: https://nickdesaulniers.github.io/blog/2020/04/06/off-by-two...

Someone6y ago

”Likewise, if a Binary expression involves operands which are both _ExtInt, rather than promoting both operands to int the narrower operand will be promoted to match the size of the wider operand, and the result of the binary operation is the wider type.“

I don’t understand that choice. The result should be of the wider type, yes, but, for example, multiplying a _ExtInt(1) by a _ExtInt(1000) should take less hardware than multiplying two ExtInt(1000)s. So, why promote the narrower one to the wider type?

saagarjha6y ago

I wonder if this could help standardize some vectorized code as well.

senozhatsky6y ago

Knuth's MIX computer with its 6-bit bytes and 5-byte words (IIRC) came to my mind [0]

[0] https://en.wikipedia.org/wiki/MIX

pjmlp6y ago

Currently clang is getting it, if ISO C gets it, it is another matter.

rightbyte6y ago

I feel this better be compiler extensions. Writing FPGA code has so much specialness anyway.

j / k navigate · click thread line to collapse

157 comments

97 comments · 25 top-level

Traster6y ago· 16 in thread

flohofwoe6y ago

Admittedly also hardware-related, but:

(Zig also has arbitrary bit-width integers up to 128 bits, but other then that I haven't seen this outside of hardware-description-languages).

mratsim6y ago

Cryptography.

Want to do finite field computation on a 254-bit integer? Now you can (BN254, very popular for zero-knowledge proofs) 381-bit? you're covered.

ori_b6y ago

This is an efficiency hack for fpga.

4 more replies

zamadatix6y ago

Someone6y ago

For packed structures, C already has bit fields. Example (from https://en.cppreference.com/w/cpp/language/bit_field):

  struct S {
    // will usually occupy 2 bytes:
    // 3 bits: value of b1
    // 2 bits: unused
    // 6 bits: value of b2
    // 2 bits: value of b3
    // 3 bits: unused
    unsigned char b1 : 3, : 2, b2 : 6, b3 : 2;
  };

This is more aimed at large integers.

You can already do this in C though, If EVER SO SLIGHTLY wasteful in non-practical terms.

2 more replies

pjc506y ago

I was wondering that, and share your skepticism of autotranslation (it basically never works, and the only reason people like it is that the HDLs are stuck in the 80s).

qppo6y ago

I share the skepticism of high level synthesis from C as being a bad motivation. The workflow is more like metaprogramming, and C is terrible at that.

jschwartzi6y ago

It provides another way to represent individual register fields without using bitfields. And probably gives you stronger guarantees about what happens when the bitfield overflows.

It also provides a way to pass those values around without passing the whole register struct around.

bubblethink6y ago

huit6y ago

It's not C to gates if you had to rewrite all your C to manually specific the bit widths of every single signal

strenholme6y ago

moonchild6y ago

They fixed autopromotion rules:

steerablesafe6y ago

Generic code for getting the high 64 part from unsigned unsigned integer multiplication of 64 bit values. Can be useful for fixed-point math for example.

elgfare6y ago

I can see this being used in serialization protocols.

segfaultbuserr6y ago· 14 in thread

phoe-krk6y ago

Common Lisp programmer here.

AaronFriel6y ago

There's something fundamentally different and mistaken about C's original implementation of variable integer sizes though.

TwoBit6y ago

Original C was portable to -platforms- and not to -users-.

[1] https://www.cplusplus.com/reference/ctime/tm/

m4636y ago

Why not just go to values?

Some languages like Ada allow a type that say goes from -273 to 600.

weinzierl6y ago

    var
    weekday:  0 ... 6;
    monthday: 1 ... 31;

[2] https://github.com/rust-lang/rfcs/issues/671

guerby6y ago

And GNAT (GCC Ada front-end) will use a biased representation for range type when packing tight:

  with Ada.Text_IO; use Ada.Text_IO;
  procedure T is
   type T1 is range 16..19;
   type T2 is range -7..0;
   type R is record
      A : T1;
      B,C : T2;
   end record;
   for  R use
    record
       A at 0 range  0 .. 1;
       B at 0 range  2 .. 4;
       C at 0 range  5 .. 7;
    end record;
   X : R := (17,-2,-3);
  begin
   Put_Line(X'Size'Image); -- 8 bits
  end T;

Gibbon16y ago

I would very much prefer this and the ability to spec what happens on overflow.

There are a lot of arguments back and forth because there actually is no 'right way' to handle overflow.

BenoitEssiambre6y ago

anticensor6y ago

The requirement is not to be power of two, it is to be multiple of sizeof(char).

begriffs6y ago

As you mention, the fundamental integer types have guaranteed minimum sizes (or pedantically a range that matches the following sizes if twos' complement is used):

* >=8 bits: char (CHAR_BIT is exactly 8 in POSIX)

* >=16 bits: short and int

* >=32 bits: long

* >=64 bits: long long

The C99 typedefs like uint16_t have to be chosen internally to be one of the underlying types. For those sizes that have no matching underlying type, the implementation will omit typedefs.

Yes, that's right, thanks for your elaboration.

GTP6y ago

Why portability will not come back? Also when the ability of working on non-multiple-of-8-bit integers was lost?

gumby6y ago

ralusek6y ago· 5 in thread

A lot of people don't know this, but `BigInt`s are supported in modern JavaScript; integers of arbitrarily large precision.

Try in your browser console:

    2n ** 4096n

    // output (might have to scroll right)
    1044388881413152506691752710716624382579964249047383780384233483283953907971557456848826811934997558340890106714439262837987573438185793607263236087851365277945956976543709998340361590134383718314428070011855946226376318839397712745672334684344586617496807908705803704071284048740118609114467977783598029006686938976881787785946905630190260940599579453432823469303026696443059025015972399867714215541693835559885291486318237914434496734087811872639496475100189041349008417061675093668333850551032972088269550769983616369411933015213796825837188091833656751221318492846368125550225998300412344784862595674492194617023806505913245610825731835380087608622102834270197698202313169017678006675195485079921636419370285375124784014907159135459982790513399611551794271106831134090584272884279791554849782954323534517065223269061394905987693002122963395687782878948440616007412945674919823050571642377154816321380631045902916136926708342856440730447899971901781465763473223850267253059899795996090799469201774624817718449867455659250178329070473119433165550807568221846571746373296884912819520317457002440926616910874148385078411929804522981857338977648103126085903001302413467189726673216491511131602920781738033436090243804708340403154190336n

I know this is about C, I thought I'd just mention it, since many people seem to be unaware of this.

justicz6y ago

Hm, does this work in Safari? https://caniuse.com/#feat=bigint

ralusek6y ago

Not yet, but I believe babel and others just transpile/polyfill it by having it fall back on a string arithmetic library for working with integers of arbitrary precision.

jakear6y ago

“Syntax Error: No identifiers allowed directly after numeric literal”

recursive6y ago

Safari is the new IE.

waltpad6y ago

So clearly, if llvm is used as a backend for js, this feature will come in handy.

On a side note, apparently, it will also be useful for the rust folks, which has user implemented libraries to emulate C-like bitfields, and implement bigints,

So this work has promising outcomes.

xyzzy20206y ago· 5 in thread

How does this not break sizeof ?

tom_mellior6y ago

saagarjha6y ago

It’d probably round up to the nearest byte, as it already does with the boolean types.

wahern6y ago

  sizeof _ExtInt(3) == 1   // 5 bits padding
  sizeof _ExtInt(17) == 4  // 15 bits padding
  sizeof _ExtInt(67) == 16 // 61 bits padding

barbegal6y ago

Where does the spec say that it does that? As far as I can tell C only allows objects to have sizes in whole number of bytes, and that includes booleans.

Although a _Bool type can be used for a bit field (having size of 1 bit) but you can't use sizeof with a bit field.

[1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2472.pdf

Just returns true now... problem solved!

beefhash6y ago· 4 in thread

wongarsu6y ago

TazeTSchnitzel6y ago

sgeisler6y ago

[1] https://github.com/rust-bitcoin/rust-bech32/blob/master/src/...

captainbland6y ago

derefr6y ago· 4 in thread

Taniwha6y ago

"smallest type" doesn't go far enough for HDL languages - consider a 4-bit counter in verilog

reg [3:0]r;

r = 4'hf; r = r+1; if (r == 0) ....

if r is really 8 bits r+1 will have a non-0 value ....

floatingatoll6y ago

I think your proposal for 0b0010X makes an excellent addition to the 1234X proposal. Has it already been discussed by the working group? If not, you should email someone to ask them to consider it!

saagarjha6y ago

I wonder if the suffix could be Xn where n is an integer specifying the width.

nybble416y ago

Would 0X12 then be a 12-bit integer with the value zero or a hexadecimal `int` literal with a base-10 value of 18? Does this work for other bases (0X12X12)?

waynecochran6y ago· 4 in thread

At some point they need to branch off and not call it C anymore. C should stay relatively small -- small enough that a competent programmer could write a compiler and RTS for it.

weinzierl6y ago

Yes, keep C clean and add all the cruft to another language derived from C. We could call the new language C++.

Gibbon16y ago

My opinion is C and C++ need a divorce. So that C can be modernized with features that make sense in the context of the language. And not constrained as a broken subset of C++.

wongarsu6y ago

waynecochran6y ago

SloopJon6y ago· 4 in thread

Safe to say that a feature like this would be standardized by 2022 at the earliest?

GTP6y ago

SloopJon6y ago

I think we'll see this implicitly with C++. C++11 and the mostly non-controversial updates in C++14 comprise "modern" C++, whereas adoption of C++17 seems to be a bit slower.

hermitdev6y ago

jcelerier6y ago

> but as an example I think that Java took the wrong way.

I wonder what is the right way then ? Java is apparently too fast for you, and yet it gets improvements so slowly that it is getting its marketshare eaten by other JVM languages moving much faster.

If it was even slower it could as well be put directly next to the dusty COBOL and RPG boxes in the IBM attic.

[0] https://www.boost.org/doc/libs/1_72_0/libs/multiprecision/do...

SlowRobotAhead6y ago· 3 in thread

> These tools take C or C++ code and produce a transistor layout to be used by the FPGA.

Hmm, I haven’t been following that but it seems that...

> The result is massively larger FPGA/HLS programs than the programmer needed

And there it is.

Really seems odd to me to try and force procedural C into non-linear execution of FPGA. Like it seems super odd, and when talking about changes to C to help that... I really don’t get it.

This isn’t what C is for. What is the performance advantage over Verilog? How many people want n-bit into in C when automatically handled structures work well for most people.

Maybe I’m just not seeing the bigger picture here and that example was just poor?

Cyph0n6y ago

Not to mention that the first statement is simply false...

The final result is a bitstream that determines which LUTs (lookup tables) and BRAM (memory/block RAM) to use on the chip, and how they should be connected/routed.

The FPGA fabric itself is made of transistors, but your C/C++ (HLS) or HDL code is not directly controlling these transistors. This is what makes FPGAs so flexible relative to ASICs.

marcan_426y ago

Personally, I prefer stuff like nMigen, which is basically Python metaprogramming a synthesis-oriented subset of Verilog constructs. Compiles down to Verilog behind the scenes.

krupan6y ago

"Verilog and VHDL are unquestionably terrible languages (they weren't even intended to be HDLs, that use case came later)"

They were intended to be HDLs (for simulation of hardware), but they were never intended to be automatically translated into gates/schematics (i.e., synthesized)

nabla96y ago· 3 in thread

einpoklum6y ago

(Of course, this is written from the "we can jerry-rig the existing language to do what you want" perspective with which so much is achievable efficiently in C++.)

hermitdev6y ago

nabla96y ago

> no reason why n-bit-integer should be.

If standard is agreed, it could be pragma similar to calling conventions.

detaro6y ago· 3 in thread

What was wrong with the actual title?

> The New Clang _ExtInt Feature Provides Exact Bitwidth Integer Types

dang6y ago

We've changed back to that above. Submitted title was "C possibly gaining support for N-bit integers".

moonchild6y ago

It implements a proposed feature to the c language, which is the more interesting part of it.

pjmlp6y ago

Not all proposed features get accepted, specially in how conservative WG 14 tends to be.

jhj6y ago· 2 in thread

Much of my time is spent writing Mentor Catapult HLS for ASIC designs these days.

aDfbrtVt6y ago

jhj6y ago

I work for Facebook.

rurban6y ago· 2 in thread

That's what I've wrote to their reddit post:

The feature is of course fantastic. But the syntax still looks bit overblown.

Type system-wise this seems to be more correct:

  _ExtInt(a) + _ExtInt(b) => _ExtInt(MAX(a, b) +1)

And int + _ExtInt(15) might need a pragma or warning flag to warn about that promotion. One little int, or automatic int pollutes all.

Traster6y ago

Problem is:

_ExtInt(16) + _ExtInt(15) => ExtInt(17)

_ExtInt(17) + _ExtInt(15) => ExtInt(18)

So let's say we have a,b and c. a is 16 bits, b and c are 14bits.

a + (b + c) => ExtInt(17) (a + b) + c => ExtInt(18)

rurban6y ago

I see, it's now the same problem as with floats. addition is not commutative anymore. But isn't that the same problem in HW also? With the right order you are saving transistors.