The general idea is that every memory address has several semantic bits that annotate the contents on load and store. Each core has multiple independent hardware threads (128 in the case of MTA) which adapt their scheduling to those annotations on a clock cycle by clock cycle basis. You can design massively multithreaded code for these platforms with almost perfect scalability that would have catastrophically high contention and overhead anywhere else, which was the point.
There are quirks to designing software for these systems, but they don’t involve safety.