> Claude has been trained on a significant amount of content from 4chan, for example.
That sounds like nonsense to me. I can't see why they would do that and I can't find any confirmation that they have. Why do you think they would do that? You might be thinking about Grok.
Look into Common Crawl and see what kind of quality content we are feeding these things. 4chan is just the tip of the iceberg (but it will happily answer all your questions, because it's seen everything).
I don't know of anyone who uses Common Crawl as pre-training data without filtering it. We have an annotation system that lets people pick and choose which subsets they'd like to use.