undefined | Better HN

0 pointsIshKebab1y ago0 comments

In general, you need a type system that supports sets of integer values, e.g. range(2, 7) or set(3, 5, 7). Rust doesn't support that unfortunately so it has a special annotation instead to make NonZero work.

0 comments

jiggawatts1y ago

In terms of category theory, what we would need is subtraction and division types on top of product and sum types.

So a 32-bit integer is the product of 32 two-state bit types. Something akin to NonZero could be defined as that type minus one state, such that there are now 4294967296 - 1 representable values.

Similarly, pointer types on some machines always have some bits set to 0 due to hardware constraints. These can be represented in the type system as 2^64 / 2^n where 'n' is the number of bits that are not usable, resulting in something like 2^46 for typical CPUs. This would allow extra bits of state to be "packed in there".

This concept is more generally useful, not just for bit-packing.

For example, a database table row might be represented by a struct that contains a field for every column. But then, how to represent the result of a SELECT query!? Either you code-gen this into a new type, create that type manually, or use compiler-generated "unspeakable" type names. (Linq in C# does this.)

Or... the language could natively support type division, so you could use one struct type for "all columns" and then use division on the type to delete columns you don't need for a particular query or REST response.

There's a whole ecosystem of tools that work around the problem of not having this as a built-in feature in typical languages: AutoMapper, automap, Mapster, etc...

SkiFire131y ago

> So a 32-bit integer is the product of 32 two-state bit types. Something akin to NonZero could be defined as that type minus one state, such that there are now 4294967296 - 1 representable values.

I can see a couple of problem with this approach:

- you need to be able to talk about bit-level types, which contradicts the common assumption that all types are addressable, i.e. whose size and alignment is some positive integer (or zero) number of bytes;

- what if you want to substract more complex set of values? For example describing a 32 bit number without all multiples of 7? And how do you encode this in the compiler in such a way that type checking remains decidable?

- how do you safely and concisely express conversions and operations on these kind of types?

> Similarly, pointer types on some machines always have some bits set to 0 due to hardware constraints. These can be represented in the type system as 2^64 / 2^n where 'n' is the number of bits that are not usable, resulting in something like 2^46 for typical CPUs. This would allow extra bits of state to be "packed in there".

Note that this is not forward compatible, since newer CPUs can start using more bits. In fact recent CPUs started using 57 bits instead of 48 bits, so a program that packed more state in those top bits would now be broken. You should generally instead try to pack data in the least significant bits that will always be 0 due to alignment constraints.

Moreover the top bits are not always 0, they are equal to the most relevant bit of the used part of the address. On Linux this just happens to always be 0 for userspace addresses (and 1 for kernel addresses) but this won't be the case on all architectures and OSes.

I also wonder how you would define these types using subtraction/division types such that they are different?

- the address type being 64 bits but using only the least significant 48 bits and having the top 16 bits always 0 - the address type being 64 bits but using only the least significant 48 bits and having the top 16 bits always equal to the 48th bit - the address type being 48 bits

Clearly these types are all isomorphic, but they are definitionally different and in a way that really matters to the hardware.

jiggawatts1y ago

My thinking was that this would be up to the compiler, where you’d have to specify the target CPU model for it to take advantage of the feature. Similarly, byte code VM could use this with runtime detection of the CPU properties.

This would work best in a high level language where the specific bit layout of struct types is not defined (by default). Rust is one such language, but this would also work with .NET and JVM languages.

One approach is that integers are represented as A < ( x < S + O ) < B. This would allow ranges, powers of two, “NonZero”, offsets, and the like the be represented in the lowest levels of the type system. Additionally, the high level type system could also keep an additional list of specific numbers that are excluded.

Pointer types could be internally represented as ordinary integers, or aligned non-null pointers on some architectures would be “0 < x << 3”.

This could have no effect on the emitted code, but the compiler would be free to utilise the spare low bits or the non-zero value if it chose to do so. Rust does this in a few hard-coded scenarios, but a more complete type model would allow more flexibility. I.e.: it could pack the enum discriminator and the value in there if the enum has only pointer types as values.

Conversely, the high level language can use this for type checks to give better error messages.

1 more reply

IshKebabOP1y ago

> so a program that packed more state in those top bits would now be broken

You can use pointer masking to avoid this issue - basically you tell the hardware to ignore the top N bits, even if they are part of the virtual address. RISC-V supports this for 7 and 16 top bits. I assume ARM has a similar feature.

1 more reply

justincormack1y ago

Ada has constrained range types. Idris has lots more fun stuff with types. But not very mainstream yet.

IshKebabOP1y ago

As far as I understand it there's no need for that. You don't need to say [0, 256) except 0; you can just say [1, 256).

Lvl999Noob1y ago

With subtraction, you can say [0, 256) except [4, 9, 15, 26]. You can do that just by narrowing a single range.

But then, its kinda hard to optimise such a split range in the general case. The narrowing option gives you 80% (maybe more) of the performance at 20% (probably even less) of the cost.

j / k navigate · click thread line to collapse

0 comments

jiggawatts1y ago

In terms of category theory, what we would need is subtraction and division types on top of product and sum types.

So a 32-bit integer is the product of 32 two-state bit types. Something akin to NonZero could be defined as that type minus one state, such that there are now 4294967296 - 1 representable values.

This concept is more generally useful, not just for bit-packing.

There's a whole ecosystem of tools that work around the problem of not having this as a built-in feature in typical languages: AutoMapper, automap, Mapster, etc...

SkiFire131y ago

> So a 32-bit integer is the product of 32 two-state bit types. Something akin to NonZero could be defined as that type minus one state, such that there are now 4294967296 - 1 representable values.

I can see a couple of problem with this approach:

- how do you safely and concisely express conversions and operations on these kind of types?

I also wonder how you would define these types using subtraction/division types such that they are different?

Clearly these types are all isomorphic, but they are definitionally different and in a way that really matters to the hardware.

jiggawatts1y ago

Pointer types could be internally represented as ordinary integers, or aligned non-null pointers on some architectures would be “0 < x << 3”.

Conversely, the high level language can use this for type checks to give better error messages.

1 more reply

IshKebabOP1y ago

> so a program that packed more state in those top bits would now be broken

1 more reply

justincormack1y ago

Ada has constrained range types. Idris has lots more fun stuff with types. But not very mainstream yet.

IshKebabOP1y ago

As far as I understand it there's no need for that. You don't need to say [0, 256) except 0; you can just say [1, 256).

Lvl999Noob1y ago

With subtraction, you can say [0, 256) except [4, 9, 15, 26]. You can do that just by narrowing a single range.

But then, its kinda hard to optimise such a split range in the general case. The narrowing option gives you 80% (maybe more) of the performance at 20% (probably even less) of the cost.

j / k navigate · click thread line to collapse