So no if its on the internet and its publicly viewable, i don't see why a bot like chatgpt should somehow be blind to a site that a human can see lol, hell microsofts made their new AI system see your screen, do you also want the AI's to somehow black out the screen area that has the website open and ... know theirs a TOS somewhere on the page
But wait, we already had a working mechanism to signal exactly this type of opting out[1] so let me rephrase the OP question: why does OpenAI get to be exempt from existing opt-out mechanisms and implement their own?
It certainly does seem as if they're trying to position themselves as a new standard against which content owners have to actively opt-out, and thus disregarding the already existing active opt-out signals. But that would mean that they don't actually care about privacy, and their opt-out signal is disengenuous! That can't be right, can it?? Surely everything they do is in good faith, just like every other corporation ever!
Anyway, the fact that they disregard existing privacy standards and rolled out their own privacy standard definitely gives me a lot of confidence that they will forever follow the privacy standards they themselves created!
Now excuse me, but I have to go get treatment for terminally metastasized sarcasm.
1) because they are not in law
2) because you too can ignore robots.txt
I thought is was obvious that Microsoft is clearly about to establish the next "standard" with near windows level of ubiquity, it will end up our primary starting point to use Microsoft stuff - we won't open apps, Copilot will.
Actions speak louder than words tho - look at how obvious they are being
Copilot is included with windows, they added a button for it to all keyboards made here on out, built it into Edge, Office, is a standalone app, their search engine and now their Xbox games NPCs will be AI powered, prolly open to all their game pass studios.
If it goes the way I expect Microsoft will be essentially done positioning themselves for the world we talk to and expect to listen to us - and organize, track and recall anything I talk to about it. Perfect for the smart glasses we all about to buy
Tbh, I think this will be the end of computing as we conceive it now - just not for the reason I expected originally.
Folders for example - I think Copilot will end folders and all the file organization stuff for normal users. I shouldn't need to ever kno where that stuff is on my PC after a future date, or manage it in any way.
Instead we'll have "real-time" folders, created from our own saved content, assembled to our inquiry and according to our preferences all named, topic labeled, and dated - but not by us.
Stored and retrieved by AI - lots like human memory actually.
Bc we'd then NEED Copilot just to access our stuff - I think that is most definitely coming sooner than later
A website can put up a TOS prohibiting such use, but my understanding is that is essentially unenforceable if the site is publicly accessible.
The recent Meta v Bright Data case highlights how extreme it can get without being technically illegal. https://techcrunch.com/2024/02/26/meta-drops-lawsuit-against...
If you’re trying to prevent scraping of your data, your best option is to not make it public.
I randomly block their IPs, tried some stuff with robots.txt and even completely banned it in the past as I thought this must be something else. It would just show up with new IPs and proceed.
The few times I checked it looked like official IPs. If I knew how I would sue Microsoft. They have no business in scraping my website 3-5 times a day when they send me basically no traffic
Edit:// it's also not my only website where bing goes crazy. And it's not new, this is going on for several years now (so no AI scraping I guess)
Your legal wherewithal relative to those who abuse them is what gives your terms of service teeth. Or leaves you toothless.