undefined | Better HN

0 pointsxscott2y ago0 comments

I understand your point: Modern hardware tries REALLY hard to pretend it is a simple set of instructions executing one after another. For all the on the fly clever caching, micro-op translation, branch prediction, speculative execution, register renaming, and whatever else, it consistently presents a sane model to single threaded programs. It's difficult to even see the magic under the hood if you tried, and it mostly shows up in unexpected performance discrepancies or race conditions for multi threaded programs. It's all a huge charade...

However, before dismissing this all as a bad mapping to an outdated 1970s model of computation, I'd like to see a good alternative. CUDA has clearly shown that there's an acceptable model for massively parallel data sets, but that doesn't handle branch heavy code very well at all. And FPGAs have a different approach for a completely different kind of problem, but I don't know how you would expose what Apple, AMD, or Intel chips are doing under the hood and have it be at all manageable to the programmer. How is someone supposed to indicate what's next when a pipeline stalls waiting on the previous operation or a cache miss? Is the programmer going to toss micro ops into separate execution units and wait for the results to come out the other side in arbitrary order? Is this an async/await model for every addition or memory fetch? I think it would be complete spaghetti to even try, but I'd love to be shown I'm wrong.

People get all excited trash talking Itanium, but I think it's a lesson that if you try to expose any alternative to the 1970s model they'll just bitch about how there are no sufficiently smart compilers. And of course it got scooped by AMD64 pretending to execute one instruction after another.

And if there isn't a good alternative, I think C (or Rust, or WASM) are a pretty good fit for what you've actually got to work with at the low level.

0 comments

3 comments · 2 top-level

fanf22y ago· 1 in thread

Itanium was the wrong design not because of the reasons you suggest, but because it assumed that good performance is something that can be statically baked into the object code, and therefore that there is such a thing as a sufficiently smart compiler for an explicitly parallel processor running general purpose code. But evidently the designers were wrong.

Which is not to say that explicit parallelism is bad, it’s clearly useful for GPUs and vector code (and compiling to SVE is not too different from itanic). But it doesn’t work as well as dynamically discovered parallelism for non-vector code.

xscottOP2y ago

It seems to me there's some uncharted territory between "massively parallel" (GPU) and "unpredictable branching" (CPU), and the corpse of Itanium is laying there as a warning to anyone who would go exploring in that area. Maybe it's just a desert, but I doubt it.

jerf2y ago

"I'd like to see a good alternative"

Me too. See my other reply below.

That said, "This is a good match" does not logically follow from "This is a bad match but it's the best match we have." It's still a bad match.

j / k navigate · click thread line to collapse