undefined | Better HN

0 pointsHydraulix98911y ago0 comments

In my experience, the fully connected layers are the bottleneck. The other issue was the alternating compute-heavy convolution and the IO-heavy pooling. I'm curious how this FFT implementation stacks up against cuDNN (what's the speedup like for just the convolutional layers? and then what's the overall speedup like?).

0 comments

1 comments · 1 top-level

ajtulloch11y ago

http://arxiv.org/pdf/1412.7580v2.pdf compares the convolutional implementation with the cuDNN layers. For the FC layers, it's just CuBLAS `sgemm`.

j / k navigate · click thread line to collapse