undefined | Better HN

0 pointsFilligree2y ago0 comments

You get a fast link between the GPUs, which should help when you’ve got a model split between them.

However, that split isn’t automatic. You can’t expect to run a 40GB model on that, unless perhaps if it’s been designed for that—the way llama.cpp can split a model between the GPU and CPU, for instance.

What you can do without trouble is keep more models loaded, do more things at the same time, and occasionally run the same model at double speed if it batches well.

0 comments

4 comments · 2 top-level

pseg1342y ago· 2 in thread

This is incorrect if you are talking about 3090 or 3090ti using nvlink.

PeterStuer2y ago

You mean those would work like a virtual single GPU with 48GB vram?

Tepix2y ago

No. But pytorch will automatically make use of both GPUs and a NVlink bridge if you use its model parallel and distributed data parallel approaches.

deaddodo2y ago

CUDA multi-GPU with NVLink is pretty well tested with shared memory space. You still want to use NCCL to optimize the allocation, but many CUDA-aware libraries (and their subsequent ML tools) are capable.

j / k navigate · click thread line to collapse