I wouldn't call what Stable Diffusion et al are trained on "high quality". You need only look through the likes of LAION to see the kind of captions and images they get trained on.
It's not random but it's not particularly curated either. Most of the time, any curation is done afterwards.