I am not familiar with that work.
> where small enough subsets can be chosen and disparate workers broadcast the small changes at small enough intervals that the net gain in learnings is still larger than the loss in fit due to de-cohesion
I think this really probably depends on the terrain of your loss landscape. My intuition is that many are too spike-y and if you take a step or two in each of your subsets and then average them, you will end up on a steep hill rather than a valley between your two points.
But this is an active area of research for sure.