> weak memory ordering is not worth it, because it is better to let the hardware do the hard work and not to burden the programmer with it
My question is very simple: why do we have architectures that reorder aggressively if the performance gain is not worth it? CPU manufacturers do understand that weaker guarantees leads to complexity that must be handled at the software level: so, are they mad? The fact of the matter is that we have been moving away from Lisp machines, CISCs and "human" instruction sets to RISCs, because the kernel/ compiler is in a much better position to make optimization decisions than the hardware (it can see the bigger picture).
These architectures exist, and the infrastructure to handle all this complexity has already been written (yes, all major compilers too [1]). Yes, a lot of people had to study a _lot_ to get us where we are today, and the success of our little iPhone apps stands on the shoulders of those giants.
I seriously don't get what software complexity you're whining about. Complexity is not the exception in Linux; it's the bloody rule! Have you seen the fs/ tree implementing various filesystems? Perhaps the kernel/ tree with the state-of-the-art scheduler? Or even the net/ tree implementing TCP/IP?
I'm not interested in discussing some hypothetical idealized textbook world where everything is simple and elegant: I'm interested in contributing whatever little I can to tomorrow's concurrency infrastructure.
The mammoth question in the room has still not been answered: how does ARM manage significantly lower power consumption? Aggressive reordering (=> weaker guarantees) seems to be part of the answer.
[1]: http://en.wikipedia.org/wiki/Memory_ordering#Compiler_suppor...