undefined | Better HN

0 pointsbillconan1y ago0 comments

does this mean you have a customized/dummy kernel gpu driver?

will that cause system instability, say, if the network suddenly dropped?

0 comments

3 comments · 1 top-level

bmodel1y ago· 2 in thread

We are not writing any kernel drivers, this runs entirely in userspace (this won't result in a crowdstrike level crash haha).

Given that, if the network suddenly dropped then only the process using the GPU would fail.

ZeroCool2u1y ago

How do you do that exactly? Are you using eBPF or something else?

Also, for my ML workloads the most common bottleneck is GPU VRAM <-> RAM copies. Doesn't this dramatically increase latency? Or is it more like it increases latency on first data transfer, but as long as you dump everything into VRAM all at once at the beginning you're fine? I'd expect this wouldn't play super well with stuff like PyTorch data loaders, but would be curious to hear how you've faired when testing.

bmodel1y ago

We intercept api calls and use our own implementation to forward them to a remote machine. No eBPF (which I believe need to run in the kernel).

As for latency, we've done a lot of work to minimize that as much as possible. You can see the performance we get running inference on BERT from huggingface here: https://youtu.be/qsOBFQZtsFM?t=64. It's still slower than local (mainly for training workloads) but not by as much as you'd expect. We're aiming to reach near parity in the next few months!

2 more replies

j / k navigate · click thread line to collapse