Pool spare GPU capacity to run LLMs at larger scale (opens in new tab)

(github.com)

11 pointsi3863mo ago3 comments

3 comments

3 comments · 3 top-level

> MoE models via expert sharding with zero cross-node inference traffic

This makes the whole project questionable

This is very promising, definitely looks more user friendly than exo. Can't wait to try it out.

iwinux3mo ago

You lost me on "spare GPU". I don't have any capable GPUs, let alone spare ones :)

j / k navigate · click thread line to collapse