My off-the-cuff guess is they just force a memory/inst fence where x86 makes guarantees which is a bit better but still incurs some overhead.
Visual Studio used to(probably still does) this around volatile for x86 which leads to fun bugs when you port to ARM/etc.