Emacs internals: Tagged pointers vs. C++ std:variant and LLVM (Part 3) (opens in new tab)

(thecloudlet.github.io)

80 pointsthecloudlet3mo ago37 comments

37 comments

14 comments · 5 top-level

tialaramex3mo ago· 6 in thread

It's not clear to me (and as an unsafe language it's not called out by your compiler if you do something illegal) what the correct way to spell this kind of trick is in C++

I had thought you need the pointer-sized integer types and mustn't do this directly to an actual pointer, but maybe I was wrong (in theory, obviously practice doesn't follow but that's a dangerous game)

thecloudletOP3mo ago

Doing bitwise operations directly on raw pointers is a fast track to Undefined Behavior in standard C/C++. Emacs gets away with it largely due to its age, its heavy reliance on specific GCC behaviors/extensions, and how its build system configures compiler optimizations.

In modern C++, the technically "correct" and safe way to spell this trick is exactly as you suggested: using uintptr_t (or intptr_t).

trws3mo ago

There’s a paper in flight to add a stdlib type to handle pointer tagging as well while preserving pointer provenance and so-forth. It’s currently best to use the intptr types, but the goal is to make it so that an implementation can provide specializations based on what bits of a pointer are insignificant, or even ignored, on a given target without user code having to be specialized. Not sure where it has landed since discussion in SG1 but seemed like a good idea.

1 more reply

VorpalWay3mo ago

Do (u)intptr_t preserve provenance? Or does this count as exposed provenance when you convert back and forth?

Maybe that is not the correct C++ terminology, I'm more familiar with how provenance works in Rust, where large parts of it got stabilised a little over a year ago. (What was stabilised was "strict provenance", which is a set of rules that if you abide them will definitely be correct, but it is possible the rules might be loosened in the future to be more lenient.)

https://doc.rust-lang.org/std/ptr/index.html#provenance

2 more replies

shadowgovt3mo ago

Is there a similar solution to doing this in Rust? I suppose inside `unsafe` you can do basically anything.

4 more replies

jandrewrogers3mo ago

The idiomatic way to safely do pointer tagging in C++ works through uintptr_t.

If you don't care about portability or using every theoretically available bit then it is trivial. A maximalist implementation must be architecture aware and isn't entirely knowable at compile-time. This makes standardization more complicated since the lowest common denominator is unnecessarily limited.

In C++ this really should be implemented through a tagged pointer wrapper class that abstracts the architectural assumptions and limitations.

db48x3mo ago

Do the way LLVM does it.

ndesaulniers3mo ago· 3 in thread

Happy to see discussion of LLVM's interesting implementation of Static Polymorphism using CRTP. Some recommended reads:

1. https://en.wikipedia.org/wiki/Curiously_recurring_template_p...

2. https://david.alvarezrosa.com/posts/devirtualization-and-sta...

3. https://llvm.org/docs/ProgrammersManual.html#the-isa-cast-an...

dalvrosa3mo ago

Thanks a lot for the reference!

thecloudletOP3mo ago

Thanks for the links, Nick! It's fascinating how LLVM relies so heavily on CRTP.

ndesaulniers3mo ago

Consider amending those references to your post!

1 more reply

internet_points3mo ago

Drawn in by the Emacs, learnt something new about C and C++, thank you for this! Very readable article for someone who doesn't feel too confident with low-level bits.

Btw, is this representation the reason why OCaml's ints are not as big as C ints?

Also interesting that the Haskell pointer tagging you link to[0] was done the way it was to avoid CPU branch misprediction, and that the old way which it replaced was "the source of half of the branch misprediction events". I wonder how "branch prediction friendly" current Haskell is.

[0] https://simonmar.github.io/bib/papers/ptr-tagging.pdf

thecloudletOP3mo ago

Emacs internal part 2 HN link:

https://news.ycombinator.com/item?id=47259961

mshockwave3mo ago

LLVM now has another way to implement RTTI using the `CastInfo` trait instead of `classof`: https://llvm.org/doxygen/structllvm_1_1CastInfo.html

But it's really just an implementation difference, the idea is still to have a lightweight RTTI.

j / k navigate · click thread line to collapse

37 comments

14 comments · 5 top-level

tialaramex3mo ago· 6 in thread

It's not clear to me (and as an unsafe language it's not called out by your compiler if you do something illegal) what the correct way to spell this kind of trick is in C++

thecloudletOP3mo ago

In modern C++, the technically "correct" and safe way to spell this trick is exactly as you suggested: using uintptr_t (or intptr_t).

trws3mo ago

1 more reply

VorpalWay3mo ago

Do (u)intptr_t preserve provenance? Or does this count as exposed provenance when you convert back and forth?

https://doc.rust-lang.org/std/ptr/index.html#provenance

2 more replies

shadowgovt3mo ago

Is there a similar solution to doing this in Rust? I suppose inside `unsafe` you can do basically anything.

4 more replies

jandrewrogers3mo ago

The idiomatic way to safely do pointer tagging in C++ works through uintptr_t.

In C++ this really should be implemented through a tagged pointer wrapper class that abstracts the architectural assumptions and limitations.

db48x3mo ago

Do the way LLVM does it.

ndesaulniers3mo ago· 3 in thread

Happy to see discussion of LLVM's interesting implementation of Static Polymorphism using CRTP. Some recommended reads:

1. https://en.wikipedia.org/wiki/Curiously_recurring_template_p...

2. https://david.alvarezrosa.com/posts/devirtualization-and-sta...

3. https://llvm.org/docs/ProgrammersManual.html#the-isa-cast-an...

dalvrosa3mo ago

Thanks a lot for the reference!

thecloudletOP3mo ago

Thanks for the links, Nick! It's fascinating how LLVM relies so heavily on CRTP.

ndesaulniers3mo ago

Consider amending those references to your post!

1 more reply

internet_points3mo ago

Drawn in by the Emacs, learnt something new about C and C++, thank you for this! Very readable article for someone who doesn't feel too confident with low-level bits.

Btw, is this representation the reason why OCaml's ints are not as big as C ints?

[0] https://simonmar.github.io/bib/papers/ptr-tagging.pdf

thecloudletOP3mo ago

Emacs internal part 2 HN link:

https://news.ycombinator.com/item?id=47259961

mshockwave3mo ago

LLVM now has another way to implement RTTI using the `CastInfo` trait instead of `classof`: https://llvm.org/doxygen/structllvm_1_1CastInfo.html

But it's really just an implementation difference, the idea is still to have a lightweight RTTI.

j / k navigate · click thread line to collapse