That's why I'm not skeptical at all. A GPU program can operate over THOUSANDS many more data items in parallel than a single-threaded scalar CPU program can. Yet the speedup is not thousands, not even hundreds, but a mere 8.
Fits perfectly with
> main one being that this particular problem does not fit the paradigm of problems that work well on the GPU
No comments yet.