The 200-sized reorder buffer says otherwise.
Loads/stores can be reordered for 200+ different concurrent objects on modern Intel skylake (2015 through 2020) CPUs. And its about to get a bump to 300+ sized reorder buffers in Icelake.
Modern CPUs are designed to "think ahead" almost the entirety of DDR4 RAM Latency, allowing reordering of instructions to keep the CPU pipes as full as possible (at least, if the underlying assembly code has enough ILP to fill the pipelines while waiting for RAM).
> Something like Link Time Optimization can be done trivially with a compiler, but it would take an army of engineers decades of work to be able to implement in hardware.
You might be surprised at what the modern Branch predictor is doing.
If your "call rax" indirect call constantly calls the same location, the branch predictor will remember that location these days.