Unfortunately, the dominant LLM architecture makes it relatively infeasible right now.
- Gaming hardware has too limited VRAM for training any kind of near-state-of-the-art model. Nvidia is being annoyingly smart about this to sell enterprise GPUs at exorbitant markups.
- Right now communication between machines seems to be the bottleneck, and this is way worse with limited VRAM. Even with data-centre-grade interconnect (mostly Infiniband, which is also Nvidia, smart-asses), any failed links tend to cause big delays in training.
Nevertheless, it is a good direction to push towards, and the government could indeed help, but it will take time. We need both a more healthy competitive landscape in hardware, and research towards model architectures that are easy to train in a distributed manner (this was also the key to the success of Transformers, but we need to go further).