Arbitrary bit-width integers are great for writing computer emulator code. There's a ton of odd-width counters and registers in microchips, and being able to map those directly to integer variables instead of having to do a "bit-mask-dance" after each operation at least would increase readability (and probably also add a bit of type-safety).
(Zig also has arbitrary bit-width integers up to 128 bits, but other then that I haven't seen this outside of hardware-description-languages).
Want to do finite field computation on a 254-bit integer? Now you can (BN254, very popular for zero-knowledge proofs) 381-bit? you're covered.
It's very perf critical and the field modulus bitsize and values are known at compile-time. (example in my library where I basically add to implement the same machinery: https://github.com/mratsim/constantine/blob/ff9dec48/constan...)
This is an efficiency hack for fpga.
struct S {
// will usually occupy 2 bytes:
// 3 bits: value of b1
// 2 bits: unused
// 6 bits: value of b2
// 2 bits: value of b3
// 3 bits: unused
unsigned char b1 : 3, : 2, b2 : 6, b3 : 2;
};
This is more aimed at large integers.If you need a 23bit object you just structure it to be that. It’s a couple of AND or SHIFT ops when accessing, but so what? Even for 100Gbit networking you aren’t going to max out even a slightly appropriate CPU.
But I think the "no automatic promotion or conversion" combined with "will error if combined with different width" could actually make extint(8) and extint(16) useful - it's a massive hint to autovectorisers and lets you generate the SIMD instructions for those widths.
Doubly so if they make sure never to write the words "undefined" where they mean "implementation-defined" for extint. At the moment normal arithmetic in C (x = x+1) is potentially undefined behaviour.
I share the skepticism of high level synthesis from C as being a bad motivation. The workflow is more like metaprogramming, and C is terrible at that.
It also provides a way to pass those values around without passing the whole register struct around.
Ideally, for FPGA design you only have to use the special bitwidths for the interface of a module. The implementation can be in normal wider C types. The compiler can optimize these operations to smaller bitwidths by realizing the higher input bits are zero/signextend and higher output bits are not used. You can help the optimizer by making some variables smaller bitwidths, but no need to rewrite everything.
I implemented this once for a c-to-hardware compiler and it worked quite well. The compiler had a lot of builtin-types, all signed and unsigned integers from 1 to 64 bits wide, named __int1..int64. See 'extended integer types' in the manual: http://valhalla.altium.com/Learning-Guides/GU0122%20C-to-Har...
Most cryptographic algorithms (notably RC5 and RC6, but also Rijndael/AES) can be extended in to 128-bit word size variants, and having guaranteed support for 128-bit integers in C would be useful to see how these variants act, and run programs to evaluate their security margin.
> if a Binary expression involves operands which are both _ExtInt, rather than promoting both operands to int the narrower operand will be promoted to match the size of the wider operand, and the result of the binary operation is the wider type.
While contemporary implementations are most commonly tailored to use (UNSIGNED-BYTE 8), (UNSIGNED-BYTE 16), (UNSIGNED-BYTE 32), and (UNSIGNED-BYTE 64) along with their signed counterparts, our language allows one to freely specify and use integer types such as (UNSIGNED-BYTE 53) that could - in theory - be optimized for on architectures that use unique, by today's standards, word sizes.
This also comes from the fact that Common Lisp was specified during times that had no real standardized word sizes, and so the standard had to accomodate for different machine types on which a byte could mean different and mutually exclusive things.
People often describe C as "portable assembly", but despite this, integer sizes varying on different platforms results in non-portability of anything those programs _produce_. That is, a "file", or bit stream (not byte stream!) produced by one machine may be incompatible with another. The original integer-size independence is decidedly not portable.
That was probably less of a problem when it was rare to send data from one physical machine to another machine, let alone one of another type. But now the world is inter-net-worked and we have all sorts of machines talking to each other all the time.
Making the interfaces explicit reduces errors. These days we now even have virtual machines and programs running at different bit widths on the same machine, and emulated machines on the same machine running different ISAs!
I'm also part of what I'm sure is a small number of users who believe using "usize" should be a lint error manually overridden on Rust and also thinks endianness should also be explicit. Heck, it should be a compiler error to write a struct to a socket if it contains any non-explicit values!
Some languages like Ada allow a type that say goes from -273 to 600.
var
weekday: 0 ... 6;
monthday: 1 ... 31;
For a quarter of a century I wonder why no one seems to miss that feature. I really hope we will get them in C one day. Even more so I hope that the proposals for refinement types in Rust[2] will one day be resolved and become implemented. with Ada.Text_IO; use Ada.Text_IO;
procedure T is
type T1 is range 16..19;
type T2 is range -7..0;
type R is record
A : T1;
B,C : T2;
end record;
for R use
record
A at 0 range 0 .. 1;
B at 0 range 2 .. 4;
C at 0 range 5 .. 7;
end record;
X : R := (17,-2,-3);
begin
Put_Line(X'Size'Image); -- 8 bits
end T;There are a lot of arguments back and forth because there actually is no 'right way' to handle overflow.
To make a more simple, more elegant, more portable language they decided to settle on power-of-two word lengths. This is similar to how Unix came about, leaving out the cruft and complexity from the over engineered Multics.
* >=8 bits: char (CHAR_BIT is exactly 8 in POSIX)
* >=16 bits: short and int
* >=32 bits: long
* >=64 bits: long long
The C99 typedefs like uint16_t have to be chosen internally to be one of the underlying types. For those sizes that have no matching underlying type, the implementation will omit typedefs.
However don't forget the more flexible C99 typedefs int_leastN_t and int_fastN_t. They both will give you a type of at least N bits, where the "fast" one chooses whichever type is most convenient for the processor, and the "least" version picks whichever is smallest. (For instance int_least16_t is probably short, and int_fast16_t is probably int.)
Although modification of most programs shouldn't be difficult (there is uint_leastN_t), also, C compilers can be modified to treat extra bits as if they don't exist to allow existing programs to work again.
Try in your browser console:
2n ** 4096n
// output (might have to scroll right)
1044388881413152506691752710716624382579964249047383780384233483283953907971557456848826811934997558340890106714439262837987573438185793607263236087851365277945956976543709998340361590134383718314428070011855946226376318839397712745672334684344586617496807908705803704071284048740118609114467977783598029006686938976881787785946905630190260940599579453432823469303026696443059025015972399867714215541693835559885291486318237914434496734087811872639496475100189041349008417061675093668333850551032972088269550769983616369411933015213796825837188091833656751221318492846368125550225998300412344784862595674492194617023806505913245610825731835380087608622102834270197698202313169017678006675195485079921636419370285375124784014907159135459982790513399611551794271106831134090584272884279791554849782954323534517065223269061394905987693002122963395687782878948440616007412945674919823050571642377154816321380631045902916136926708342856440730447899971901781465763473223850267253059899795996090799469201774624817718449867455659250178329070473119433165550807568221846571746373296884912819520317457002440926616910874148385078411929804522981857338977648103126085903001302413467189726673216491511131602920781738033436090243804708340403154190336n
To use, just add `n` after the number as literal notation, or can cast any Number x with BigInt(x). BigInts may only do operations with other BigInts, so make sure to cast any Numbers where applicable.I know this is about C, I thought I'd just mention it, since many people seem to be unaware of this.
On a side note, apparently, it will also be useful for the rust folks, which has user implemented libraries to emulate C-like bitfields, and implement bigints,
So this work has promising outcomes.
As the proposal says, the bit alignment of these types is min(64, next power-of-2(>=N)). (Of course, the alignment can't be smaller than 8 bits, which the proposal fails to account for.) Assuming CHAR_BIT==8, it follows that:
sizeof _ExtInt(3) == 1 // 5 bits padding
sizeof _ExtInt(17) == 4 // 15 bits padding
sizeof _ExtInt(67) == 16 // 61 bits padding
So the amount of padding can be considerable. But that doesn't matter much. What they're trying to conserve is the number of value bits that need to be processed, and in particular minimize the number of logic gates required to process the value. Inside the FPGA presumably the value can be represented with exactly N bits, regardless of how many padding bits there are in external memory.Although a _Bool type can be used for a bit field (having size of 1 bit) but you can't use sizeof with a bit field.
[1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2472.pdf
[1] https://github.com/rust-bitcoin/rust-bech32/blob/master/src/...
That “smallest type capable of storing this value” is a disappointing approach, IMHO. It’d be a lot more powerful to just be able to pass in bit patterns (base-2 literals) and have the resulting type match the lexical width of the literal. 0b0010X should have a bit-width of 4, not 2.
reg [3:0]r;
r = 4'hf; r = r+1; if (r == 0) ....
if r is really 8 bits r+1 will have a non-0 value ....
However all the LLVM people may be saying here is that they're providing the minimal support for arbitrary size math and expect language implementers to generate the masking where required (ie that r+1 above is really (r+1)&4'hf )
I'll note that for Verilog in particular the standard Verilog C-level APIs for accessing data imply that integers are not stored contiguously, instead they're stored in 32-bit chunks with a min size of 32-bits for 1 to 32-bit values - a 33-64 bit value will be stored in 2 non-contiguous 32-bit words ('packed' values are different from this). To be useful any back end support needs to be able to understand stuff stored this way.
I'm not sure why they picked a letter which can already occur in integer literals rather than one of the many unused letters. Given the focus on FPGAs and HDL it's also worth noting that X is commonly used in binary or hexadecimal constants in HDLs to denote undefined or "don't care" values, which could lead to confusion. Rust integer literal syntax would be perfect here (1234u11 or 1234i11) since it already includes the bit width and is compatible with any base prefix.
Safe to say that a feature like this would be standardized by 2022 at the earliest?
I think we'll see this implicitly with C++. C++11 and the mostly non-controversial updates in C++14 comprise "modern" C++, whereas adoption of C++17 seems to be a bit slower.
I wonder what is the right way then ? Java is apparently too fast for you, and yet it gets improvements so slowly that it is getting its marketshare eaten by other JVM languages moving much faster.
If it was even slower it could as well be put directly next to the dusty COBOL and RPG boxes in the IBM attic.
Hmm, I haven’t been following that but it seems that...
> The result is massively larger FPGA/HLS programs than the programmer needed
And there it is.
Really seems odd to me to try and force procedural C into non-linear execution of FPGA. Like it seems super odd, and when talking about changes to C to help that... I really don’t get it.
This isn’t what C is for. What is the performance advantage over Verilog? How many people want n-bit into in C when automatically handled structures work well for most people.
Maybe I’m just not seeing the bigger picture here and that example was just poor?
The final result is a bitstream that determines which LUTs (lookup tables) and BRAM (memory/block RAM) to use on the chip, and how they should be connected/routed.
The FPGA fabric itself is made of transistors, but your C/C++ (HLS) or HDL code is not directly controlling these transistors. This is what makes FPGAs so flexible relative to ASICs.
So, since those languages suck, people familiar with the procedural side of things end up asking for C. Which is an even bigger impedance mismatch than Verilog, but since you need a smarter backend to even begin to implement it, it can make life easier by that alone.
Personally, I prefer stuff like nMigen, which is basically Python metaprogramming a synthesis-oriented subset of Verilog constructs. Compiles down to Verilog behind the scenes.
They were intended to be HDLs (for simulation of hardware), but they were never intended to be automatically translated into gates/schematics (i.e., synthesized)
If you have different representations in different languages it just creates unnecessary impedance mismatch. It would be better for everyone if you could just pass these types from language to language.
(Of course, this is written from the "we can jerry-rig the existing language to do what you want" perspective with which so much is achievable efficiently in C++.)
C++ has had `optional` and `variant` since (I think) C++11, maybe 14. I don't think `any` made the cut. All of these types originated (for C++ standardization) in Boost, as well. I'd caution against using `any`, though. From personal experience, the runtime overhead is quite high, and holding any non-none type is a dynamic allocation. Performance is far better with `variant` at the development cost of needing to know all the types you're going to support at compile-time.
[0] https://www.boost.org/doc/libs/1_72_0/libs/multiprecision/do...
If standard is agreed, it could be pragma similar to calling conventions.
> The New Clang _ExtInt Feature Provides Exact Bitwidth Integer Types
Every HLS vendor or language has their own, incompatible arbitrary bitwidth integer type at present. SystemC sc_int is different from Xilinx Vivado ap_int is different from Mentor Catapult ac_int is different from whatever Intel had for their Altera FPGAs. It's a real mess.
I'm hoping this is another small step to slowly move the industry into a more unified representation, or at least if LLVM support for this at the type level could enable faster simulation of designs on CPU by improving the CPU code that is emitted. What probably matters most for HLS though are the operations which are performed on the types (static or dynamic bit slicing, etc).
The feature is of course fantastic. But the syntax still looks bit overblown.
Type system-wise this seems to be more correct:
_ExtInt(a) + _ExtInt(b) => _ExtInt(MAX(a, b) +1)
And int + _ExtInt(15) might need a pragma or warning flag to warn about that promotion. One little int, or automatic int pollutes all._ExtInt(16) + _ExtInt(15) => ExtInt(17)
_ExtInt(17) + _ExtInt(15) => ExtInt(18)
So let's say we have a,b and c. a is 16 bits, b and c are 14bits.
a + (b + c) => ExtInt(17) (a + b) + c => ExtInt(18)
Now obviously this a trivial example, but it highlights the fact that unless you're actually willing to carry the true ranges around in your type system, your calculation of bit widths are going to vary due to the details of which operations are done in which order with which intermediary variables.
I quoted this language below: "_ExtInt types are bit-aligned to the next greatest power-of-2 up to 64 bits: the bit alignment A is min(64, next power-of-2(>=N)). The size of these types is the smallest multiple of the alignment greater than or equal to N. Formally, let M be the smallest integer such that A * M >= N. The size of these types for the purposes of layout and sizeof is the number of bits aligned to this calculated alignment, A * M. This permits the use of these types in allocated arrays using the common sizeof(Array)/sizeof(ElementType) pattern."
But to be honest I don't understand what it's trying to say. If bit width N = 3, the next power of 2 is 4, so would that mean that "bit alignment(?)" A = 4? Then M = 1 is the smallest integer such that A * M >= 3. Then the size of the type would be 4 bits? That wouldn't fly with sizeof.
Also, what's the relationship between standard types and the new _ExtInts? Are _ExtInt(16) equivalent to shorts, or are they considered distinct and require explicit cast?
> In order to be consistent with the C Language, expressions that include a standard type will still follow integral promotion and conversion rules. All types smaller than int will be promoted, and the operation will then happen at the largest type. This can be surprising in the case where you add a short and an _ExtInt(15), where the result will be int. However, this ends up being the most consistent with the C language specification.
For instance, what if I choose to replace short by _ExtInt(16) in the above? What would be the promotion rule then?
Note that it was already possible to implement arbitrary sized ints for a size <= 64, by using bitfields (although it's possible that you could fall into UB territory in some situations, I've never used that to do modular arithmetic).
Edit: Ah, there's this notion of underlying type: one may use the nearest upper type to implement a given size, but nothing prevents to use a larger type, for instance:
struct short3_s { short value:3; };
struct longlong3_s { long long value:3; };
I don't know what the C standard says about that, but clearly these two types are not identical (sizeof will probably gives different results). What's will it be for _ExtInt? How these types will be converted?
Another idea:
what about
struct extint13_3_s {
_ExtInt(13) value:3;
};Will the above be possible? In other words, will it be possible to combine bitfields with this new feature?
I guess it's a much more complicated problem that it appears to be at first.
Click 'More' at the bottom to page through it; it just keeps going.
For a fun compiler bug in LLVM due to representation of arbitrary width integers, see: https://nickdesaulniers.github.io/blog/2020/04/06/off-by-two...
I don’t understand that choice. The result should be of the wider type, yes, but, for example, multiplying a _ExtInt(1) by a _ExtInt(1000) should take less hardware than multiplying two ExtInt(1000)s. So, why promote the narrower one to the wider type?