I address that in the post: apparently Cloud Sight uses a mix of machine learning and crowd sourcing. My experiments and their pricing suggest that the captions you see were mostly crowdsourced. Still, the quality is great, and if their pricing works for your use-case, they offer a great solution.