It may sound strange, but one of the advantages of GPUs is actually the programming model, hence why many applications suddenly perform 20-50x faster instead of only the theoretical 5-7x.
Let me explain: One of the most common issues with x86 HPC applications coming from the scientific crowd is a lack of vector optimization such as loop unrolling. Even having the right compiler flags is rather difficult for this kind of thing. Another reason is a lack of understanding on how to program for memory bandwidth optimization. GPU programming on the other hand, especially with CUDA, is hard to get into at first, but once you have the right formula you can apply it pretty easily to most common tasks. Getting to, say, 70% of model performance on GPU is much easier than on x86. One reason is the implicitly bandwidth optimized idiomatic way of writing CUDA/OpenCL programs as a set of scalar kernels applied over a whole data region - this allows the programmer to think of block dimensions in an abstract way - no need to fiddle around manually with loops to achieve this. There is also no need to use any intrinsics, just plain C in idiomatic CUDA is enough.
So, to wrap it up, there is more to GPU programming than just the hardware itself, the software model actually makes a lot more sense than traditional OOP/procedural programming for HPC - often resulting in higher than expected speedups when going from idiomatic x86 to idiomatic GPGPU (since there is no such thing as easy to learn idioms for HPC x86 programming).
And btw. Xeon Phi is the result of Intel not understanding exactly this interrelation, since it doesn't even have OpenCL support as of now.