It is difficult to get a man to understand something, when his salary depends on his not understanding it.
</sarcasm>
OpenAI documents how to opt out of scraping here: https://developers.openai.com/api/docs/bots
Anthropic documents how to opt out of scraping here: https://privacy.claude.com/en/articles/8896518-does-anthropi...
I'm not sure if Gemini lets you opt out without also delisting you from Google search rankings.
I can imagine their models have been trained on a lot of websites before opt outs became a thing, and the models will probably incorporate that for forever.
But at least for websites there's an opt-out, even if only for the big AI companies. Open source code never even got that option ;).
It was a dataset of the entirety of the public internet from the very beginning that bypassed paywalls etc, there’s virtually nothing they haven’t scraped.
PRESS RELEASE: UNITED BURGLARS SOCIETY
The United Burglars Society understands that being burgled may be inconvenient for some. In response, UBS has introduced the Opt-Out system for those who wish not to be burgled.
Please understand that each burglar is an independent contractor, so those wishing not to burgled should go to the website for each burglar in their area and opt-out there. UBS is not responsible for unwanted burglaries due to failing to opt-out.
The fact is my data exists in corpuses used by OpenAI before I was even aware anyone was scraping it. I'm wondering what can be done about that, if anything.
Bit concerning that some professional engineers don't understand this given the sensitive systems they interact with.