Putting the actual data layouts in 44 bits out of 64 is a neat trick which relies on the allocator being aware of the mappings between physical and virtual addresses.
Not too long after someone asked what sort of interesting changes 64 bit will bring. And I’ve been keeping that question in the back of my mind ever since.
Aliasing memory multiple times in order to do read or write barriers and make GC much cheaper is a pretty good one. But another one I know of is that one of the secrets of the L4 microkernel is that its IPC speed comes substantially from reducing the amount of TLB work that needs to be done to switch to another process running in a different address space. They use the same address space and only swap out the access rights which cuts the call overhead in half. It’s pretty easy to put a bunch of processes into a 64 bit address space and just throw each one a randomly located 4GB slice of RAM.
Something like: “From now on, code on these pages can only access data on these pages, and only return to/call into other code through these gates…”