undefined | Better HN

0 pointszeec1234y ago0 comments

Thanks. Does it map to native routines like numpy? If yes, what about the copy overhead from JVM to native memory?

0 comments

Yes, it uses ONNX Runtime / MLAS (their native BLAS lib) under the hood. And yes, there is copy overhead but you can eliminate it internally to a single function/graph by compiling it down to a single ONNX model. The end result is within ~15% run time of PyTorch w/ MKL when training a reasonably-sized MLP. And ORT also provides support for CUDA and a number of other "execution providers".

The_rationalist4y ago

Nd4j (another project) enable to use cuda/MKL sota backends. It does native calls but I don't think the overhead is high. Besides some of those overhead are being optimized in the JNI successor.

j / k navigate · click thread line to collapse

0 comments

emergentorder4y ago

The_rationalist4y ago

Nd4j (another project) enable to use cuda/MKL sota backends. It does native calls but I don't think the overhead is high. Besides some of those overhead are being optimized in the JNI successor.

j / k navigate · click thread line to collapse