I hope to do a technical writeup of my process soon. In short, the training images were created from videos of rotating the tube around in hand at various angles and lighting conditions. These images were then annotated to determine the location of several keypoints around the paper. It generalizes pretty well, see here:
https://twitter.com/2020cv_inc/status/945502116335906816