story
Before this, the idea was that RISC was simpler to implement and could be optimized more easily, ultimately be more cost effective. What wasn't factored in was how good Intel is at optimizing, and how hard they'd push their process, beating the RISC side despite all the disadvantages CISC had.
Now it's the GPU that's eating Intel's lunch, high performance floating point code on the CPU is several orders of magnitude slower than a high-end GPU, so Intel's trying to fight back with their "pile of CPUs" strategy (http://en.wikipedia.org/wiki/Larrabee_(microarchitecture)). It's not working out very well so far.
As to RISC vs CISC, well, it's true that x86 instructions are decoded to micro Ops inside a modern processor but the fact that the instruction was complicated does have a cost even for a modern processor. The act of just decoding four instructions in a clock cycle and transforming them into uOps is quite a bit of work, on the same order as finally executing them if they're simple additions or such. And the uOps that make up an instruction have to be completed all together or else when the processor is interrupted by a page fault or such it will resume in an inconsistent state. And the first time you run through a segment of code you can only run one instruction at a time since figuring out where instruction boundaries are is hard, though you can store the location of those boundaries with just another bit per byte when they're in the L1 instruction cache.
On the other hand, complex variable length instructions mean that you don't need as many bytes to express some piece of code both since you're using less bytes per instruction on average and because complex instructions mean you sometimes use fewer of them.
Of course, Intel is the biggest CPU vendor out there and has correspondingly large and brilliant design teams working hand in hand with the most advanced fabs in the industry.
Now, there are many RISC instruction sets that have taken on x86 before, but they all attacked it from the high end, from upmarket. Doing just the opposite of what ARM is doing now. Will it succeed in dethroning x86 from the low end the way x86 did to it's rivals? Who knows. But I think that previous fights don't tell us much about this one.
Of course, "Intel is doomed" (and "Microsoft is doomed") have been staples of clueless fanboy hype for 40 years. I'm still waiting for one of them to be right....
Historically, it has been my experience that pretty much all the non-x86 platforms the compiler and hardware specific optimizations tend to have a pretty dramatic impact. Intel just has so much code and existing code streams to factor in to their designs for new hardware. Maybe this has changed. It's a hard road if mismatched or non-hardware optimized binaries are slow and pokey and hardware specific optimized binaries are competitive. Come out with a great 64bit ARM core that can run nearly all ARM binaries with decent performance (clearly, excluding stuff that needs custom hardware..) and ARM could be pretty disruptive.