Cool, thanks for the response. Yes, I do find that the PyTorch tutorials on distributed training are a work-in-progress.
I was thinking of starting with a basic implementation of the original paper by Jeff Dean, et. al. on synchronized data parallelism, implement basic model parallelism, explain why async parallelism works, do a simple implementation of HOGWILD!, and finally do "hello world" training using existing distributed training systems like Horovod, Distributed PyTorch, RayLib, Microsoft DeepSpeed, etc.