robots.txt is actually a really usefulay to tell an attacker where to look for juicy content that doesn't want to be indexed, but following it entirely voluntary. It's easy to imagine a dark web search engine that
only has that content.
If you want your stuff to exist in the same way, but for OpenAI training, just block GPTBot in your robots.txt
https://platform.openai.com/docs/gptbot