undefined | Better HN

0 pointsxeckr2y ago0 comments

>distributed training networks

Now that's an idea. One bottleneck might be a limit on just how much you can parallelize training, though.

0 comments

There's a ton of work in this area, and the reality is... it doesn't work for LLMs.

Moving from 900GB/sec GPU memory bandwidth with infiniband interconnects between nodes to 0.01-0.1GB/sec over the internet is brutal (1000x to 10000x slower...) This works for simple image classifiers, but I've never seen anything like a large language model be trained in a meaningful amount of time this way.

resters2y ago

Maybe there is a way to train a neural network in a distributed way by training subsets of it and then connecting the aggregated weight changes to adjacent network segments. It wouldn't recover 1000x interconnect slowdowns, but might still be useful depending on the topology of the network.

j / k navigate · click thread line to collapse

0 comments

Aeolos2y ago

There's a ton of work in this area, and the reality is... it doesn't work for LLMs.

resters2y ago

j / k navigate · click thread line to collapse