The differences between arm and x86 are known for 15y+, there is nothing new about it. Also concurrency support is one of the major benefits of languages with proper memory model - java started it with JMM[0]
[0]: https://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedL...
The JVM absolutely did some great work walking this path, both in defining a memory model in the first place, and supporting that model on weak and strong hardware memory models, but the JMM was specifically designed to be able to run cleanly on WMO platforms to begin with (early Sparc), so they don't face a lot of the same problems discussed here.
Memory models and JVM are not really relevant when discussing running binaries for a different architecture.
If there is a JIT I'd expect to be able to add a read barriers, on memory location allocated by another thread - incl. allocating bits in the pointers and masking them off on each dereference. If any block appears to be shared - the code that allocated it would need to be recompiled with memory store-store barriers; the reading part would need load-load and so on. There are quite a few ways to deal with the case, aside the obvious - make the hardware compatible.
If in end it's not an easy feat to make up for the stronger memory model correctness, yet correctness should be a prime goal of an 'emulator'