So I wrote https://github.com/steelbrain/ffmpeg-over-ip and had the server running in the windows machine and the client in the media server (could be plex, emby, jellyfin etc) and it worked flawlessly.
https://gist.github.com/tzmartin/88abb7ef63e41e27c2ec9a5ce5d...
As an aside, are there any uses for GPU-over-network other than video encoding? The increased latency seems like it would prohibit anything machine learning related or graphics intensive.
I found Juice to work decently for graphical applications too (e.g., games, CAD software). Latency was about what you'd expect for video encode + decode + network: 5-20ms on a LAN if I recall correctly.
IPMI and such could use it. Like, for example, Proxmox could use it. Machine learning tasks (like Frigate) and hashcat could also use such. All in theory, of course. Many tasks use VNC right now, or SPICE. The ability to extract your GPU in the Unix way over TCP/IP is powerful. Though Node.js would not be the way I'd want such to go.
The overheads are larger for training compared to inference, and we are implementing more optimizations to approach native performance.
The same way one "approaches the sun" when they take the stairs?
1. If you're actively developing and need a GPU then you typically would be paying the entire time the instance is running. Using Thunder means you only pay for the GPU while actively using it. Essentially, if you are running CPU only code you would not be paying for any GPU time. The alterative for this is to manually turn the instance on and off which can be annoying.
2. This allows you to easily scale the type and number of GPUs you're using. For example, say you want to do development on a cheap T4 instance and run a full DL training job on a set of 8 A100. Instead of needing to swap instances and setup everything again, you can just run a command and then start running on the more powerful GPUs.
> 1. If you're actively developing and need a GPU [for fractional amounts of time]...
Why would I need a GPU for a short amount of time during development? For testing?
I don't get it - what would testing an H100 over a TCP connection tell me? It's like, yeah, I can do that, but it doesn't represent an environment I am going to use for real. Nobody runs applications to GPUs on buses virtualized over TCP connections, so what exactly would I be validating?
I think essentially this is solving the same problem Ray (https://www.ray.io/) is solving, but in a more generic way.
it potentially can have finer grained gpu sharing, like a half-gpu.
I'm very excited about this.
this is awesome. can it do 3d rendering (vulkan/opengl)
> is this a remote nvapi
Essentially yes! Just to be clear, this covers the entire GPU not just the NVAPI (i.e. all of cuda). This functions like you have the physical card directly plugged into the machine.
Right now we don't support vulkan or opengl since we're mostly focusing on AI workloads, however we plan to support these in the future (especially if there is interest!)
I bet you saw this https://github.com/mikex86/LibreCuda
they implemented the cuda driver by calling into rmapi.
My understanding is if there is a remote rmapi, other user mode drivers should work out of the box?
LD_PRELOAD trick allows you to intercept and virtualize calls to the CUDA runtime.
One of our main goals for the near future is to allow GPU sharing. This would be better than MIG or vGPU since we'd allow users to use the entire GPU memory instead of restricting them to a fraction.
What are you doing to reset the GPU to clean state after a run? It's surprisingly complicated to do this securely (we're writing up a back-to-back sequence of audits we did with Atredis and Tetrel; should be publishing in a month or two).
Down the line, we could see this being used for batched render jobs (i.e. to replace a render farm).
Hmm... well I just watched you run nvidia-smi in a Mac terminal, which is a platform it's explicitly not supported on. My instant assumption is that your tool copies my code into a private server instance and communicates back and forth to run the commands.
Does this platform expose eGPU capabilities if my host machine supports it? Can I run raster workloads or network it with my own CUDA hardware? The actual way your tool and service connects isn't very clear to me and I assume other developers will be confused too.
Going into more details for how this works, we intercept communication between the CPU and the GPU so only GPU code and commands are sent across the network to a GPU that we are hosting. This way we are able to virtualize a remote GPU and make your computer think it's directly attached to that GPU.
We are not copying your CPU code and running it on our machines. The CPU code runs entirely on your instance (meaning no files need to be copied over or packages installed on the GPU machine). One of the benefits of this approach is that you can easily scale to a more / less powerful GPU without needing to setup a new server.
will that cause system instability, say, if the network suddenly dropped?
Down the line we want to move to a pay-as-you-go model.
Does anyone know if this is possible with USB?
I have a Davinci Resolve license USB-dongle I'd like to not plugging into my laptop.
Curious where you see this in the CLI, may be an oversight on our part. If you can join the Discord and point us to this bug we would really appreciate it!
Another solution is qCUDA [3] which is more specialized towards CUDA.
In addition to these solutions, various virtualization solutions today provide some sort of serialization mechanism for GPU commands, so they can be transferred to another host (or process). [4]
One example is the QEMU-based Android Emulator. It is using special translator libraries and a "QEMU Pipe" to efficiently communicate GPU commands from the virtualized Android OS to the host OS [5].
The new Cuttlefish Android emulator [6] uses Gallium3D for transport and the virglrenderer library [7].
I'd expect that the current virtio-gpu implementation in QEMU [8] might make this job even easier, because it includes the Android's gfxstream [9] (formerly called "Vulkan Cereal") that should already support communication over network sockets out of the box.
[1] https://github.com/pocl/pocl
[2] https://portablecl.org/docs/html/remote.html
[3] https://github.com/coldfunction/qCUDA
[4] https://www.linaro.org/blog/a-closer-look-at-virtio-and-gpu-...
[5] https://android.googlesource.com/platform/external/qemu/+/em...
[6] https://source.android.com/docs/devices/cuttlefish/gpu
[7] https://cs.android.com/android/platform/superproject/main/+/...
[8] https://www.qemu.org/docs/master/system/devices/virtio-gpu.h...
[9] https://android.googlesource.com/platform/hardware/google/gf...
I am not sure how useful it was in reality(usually if you had a nice graphics card you also had a nice cpu) but I had fun playing around with it. There was something fascinating about getting accelerated graphics on a program running in the machine room. I was able to get glquake running like this once.
But hey i'm happy to be proofed wrong ;)