What are false register dependencies? And how does a three-address code help in register renaming?
Some reading references would help. I'm currently reading [1].
[1]: http://www.rdrop.com/users/paulmck/scalability/paper/whymb.2...
The real cost of weak memory models is in software. Programmers that write multithreaded code need to understand the memory models. And few actually do, because these things are not trivial. So you gain a few percent of performance with a weak memory model and at the same time you dramatically restrict the number of programmers who can write correct code. A very bad deal.
Hennessy, Patterson: "Computer Architecture, A Quantitative Approach." is the standard introduction to computer architecture.
Linux supports Alpha, which provides the weakest ordering guarantees anyway, so no: any architecture stronger than Alpha puts absolutely no burden on anybody. In practice, weaker ordering constraint means more freedom for Linux.
And this particular solution only works inside the Linux kernel and only using GCC. It doesn't matter what we think a sane compiler-writer or a sane computer-architect would do. Many times we just have to live with the choices other people made.
I think you got something wrong here. Architectures with strong memory ordering put more burden on the hardware and less on the programmer. That was the point I was trying to make: weak memory ordering is not worth it, because it is better to let the hardware do the hard work and not to burden the programmer with it.
Linux may work well with relaxed memory ordering, but that does not come for free, a lot of people had to understand first, what a relaxed memory model is, etc.
Registers are entirely internal to a processor. They don't have to be replicated or consistent between different cores.
Memory is shared system state. To write correct parallel programs, CPUs need to define how they manage consistency between different processors. x86 does have a more conservative model with stronger consistency than ARM, and this does have (theoretical) scaling consequences-- but Intel has a huge amount of experience making it fast.
> Memory is shared system state. To write correct parallel programs, CPUs need to define how they manage consistency between different processors. x86 does have a more conservative model with stronger consistency than ARM, and this does have (theoretical) scaling consequences-- but Intel has a huge amount of experience making it fast.
Yeah, but what worries me is not raw throughput scaling; it's power consumption. The mainstream netbooks today: Macbook Air Mid-2012, Chromebook Pixel, and Lenovo X1 Carbon all use the 17W i5-3427U. The MBA Mid-2013 jumped to Haswell early and uses the 15W Core i5-4250U. But we still haven't been able to make a decent quad-core netbook: there's a 15W Core i7-4650U; I believe you can get an MBA 2013 with this as well, but a paltry 1.7 Ghz?
How many years has it been since quad-cores first came out? It looks to me like x86 has stagnated: it doesn't matter how competent Intel is; the architecture seems to have fundamental problems.
Now, I don't fully understand what the ARM A57 is, but the specs look really impressive [1]. Moreover, AMD has confirmed that they will manufacture Opterons in Q1 2014 [2]. I'm not completely sold on ARM or anything, but there is certainly something interesting going on: and we must investigate.
[1]: http://www.arm.com/products/processors/cortex-a50/cortex-a57...
[2]: http://www.amd.com/us/press-releases/Pages/amd-unveils-2013j...
As far as three address code making register renaming easier, I'm not sure what the author had in mind -- isn't `add $5, %rax` essentially a condensed form of `add $5, %rax, %rax` (which is a 3AC)? In fact, 3AC is _more_ general and should make register renaming harder if anything at all.
It's not a general or optimal solution, though.