Are there known ways the training for these models could be distributed/decomposed? E.g. SETI-style distribution of a homogenous centrally-defined task, or — much more exciting — recombination of several different models / sets of weights? (I'm just throwing words around here without really understanding them.)
I'm imaging a world in which one group of enthusiasts could work together to train a model on all images on Wikipedia, another group could work on training a model that understands hands really well, and then later yet another group could combine the work of the other two without doing all that training from scratch.
Is that even remotely plausible?