> If someone has a program which is depending on load/store order in userspace then they likely have bugs on x86 as well
Incorrect. On x86, if you write to memory n times, other cores are guaranteed to see the writes in same order. Second write is never going to be visible to other cores before first write. It's correct to rely on x86 memory model in x86 software.
On ARM, those stores can become visible to other cores in any order.
> since threads can be migrated between cores
Irrelevant. This is about code executing concurrently on multiple cores. Operating system and threads are irrelevant. This is about hardware behavior, CPU core load/store system and instruction reordering, not software.
> ... and the compilers are fully allowed to reorder load/stores...
This has nothing to do with compilers. This has everything to do how CPU cores reorder reads and writes.
> Particularly as GCC and friends get more aggressive about determining side effects and tossing code
If GCC has bugs, please report them. Undefined behavior can give that impression, but again, this topic has nothing to do with compilers.
> An emulator is also going to maintain this contract as well. That is why things like qemu work just fine to run x86 binaries on random ARMs today without having to modify the hardware memory model.
This is not true. See: http://wiki.qemu.org/Features/tcg-multithread#Memory_consist...
Remaining Case: strong on weak, ex. emulating x86 memory model on ARM systems
I recommend you read this: https://en.wikipedia.org/wiki/Memory_ordering.