undefined | Better HN

0 pointsbeagle314y ago0 comments

Oh, K's in memory processing model is surely compatible with modern GPUs - at last for most primitives.

The thing I'm not sure about is whether the transfer overhead (GPU, CPU, Memory) would still be worth it in the general case. There's a question of what you're actually comparing to - e.g. you can have a (relatively) naive implementation, like the first commit of Kona was. You can have Arthur's cache-optimized-but-still-pure-C-and-CPU implementation (I think the main kona branch is comparable these days, but I'm not sure), which is faster.

And then, you can compare it to an SSE2/SSE4 version, still on the CPU; such a version does not exist afaik, but is easier to code than you would expect if you use ispc (https://github.com/ispc/ispc/tree/master/examples) or cilk-plus -- simple things with no data dependencies, as many K primitives are, often get a 4x speedup with ISPC.

Oh, and by the way, thanks for your work on kona and this message -- I've only been following kevinlawler's branch, but I'll start following you too.

0 comments

1 comments · 1 top-level

silentbicycle14y ago

The transfer overhead would only need to happen once if there's a sequence of GPU-able primitives, though. The dataflow analysis for that isn't too hard.

I'll check ispc out too, thanks. My fork* is using some SSE, via GCC-specific extensions (http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html). It could be special-cased into Kona as-is, but I've found a better way to implement scalars than the elaborate preprocessor hack I did there. (Besides, tuning the reference counting would probably have a bigger impact, something like Deutsch-Bobrow at least.)

* It's not on github, yet; I don't usually post things there until they're reasonably usable end-to-end. My kona fork is mostly for pull requests; I'm talking about a completely new implementation.

Would you like to continue this discussion off HN? My contact info is in my profile.

j / k navigate · click thread line to collapse