Every single call rax and jump rax that you see on the right will be replaced with the following:
push rax
jmp .indirect_thunk
And indirect_thunk does an lfence and a call to another function that gets rax from the stack and jumps there in a very roundabout way that interacts terribly with CPU prediction. I can see why they somehow didn't get around to benchmarking this yet, because they are not going to like the results. And it doesn't end with the kernel. Every JIT is going to need the treatment.https://support.google.com/faqs/answer/7625886
http://webcache.googleusercontent.com/search?q=cache%3Ahttps...
(some websites with info on this are DoSd and down right now, including the Arm website and LLVM code review site so switched to a cache link)
Pretty painful for C++ but fixable with build changes. JVM users and other JIT-compiled languages will probably be fixable without any serious impact, as HotSpot and other advanced runtimes devirtualise calls (convert indirect to direct) as much as they can anyway, meaning most calls won't be indirect when finished and once the code has warmed up.