An agent making a request on the explicit behalf of someone else is probably something most of us agree is reasonable. "What are the current stories on Hacker News?" -- the agent is just doing the same request to the same website that I would have done anyways.
But the sort of non-explicit just-in-case crawling that Perplexity might do for a general question where it crawls 4-6 sources isn't as easy to defend. "Are polar bears always white?" -- Now it's making requests I wouldn't have necessarily made, and it could even been seen as a sort of amplification attack.
That said, TFA's example is where they register secretexample.com and then ask Perplexity "what is secretexample.com about?" and Perplexity sends a request to answer the question, so that's an example of the first case, not the second.
What prevents these companies from keeping a copy of that particular page, which I specifically disallowed for bot scraping, and feed it to their next training cycle?
Pinky promises? Ethics? Laws? Technical limitations? Leeroy Jenkins?
When you swap in an AI and ask what are the current stories. The AI fetches the front page and every thread and feeds it back to you. You are less likely to participate in discussion because you've already had the info summarized.
Corporate America. Where clean code goes to die.
Mind you I'm not saying electric scooters are a bad idea, I have one and I quite enjoy it. I'm saying we didn't need five fucking startups all competing to provide them at the lowest cost possible just for 2/3s of them to end up in fucking landfills when the VC funding ran out.
At this moment I am using Perplexity's Comet browser to take a spotify playlist and add all the tracks to my youtube music playlist. I love it.
If sites want to avoid people using agents, they should offer the functionality that people are using the agents to accomplish.
Everyone having a personal shopper obviously changes the relationship to the products and services you use or purchase via personal shopper. Good, bad, whatever.
The point is the web is changing, and people use a different type of browser now. Ans that browser happens to be LLMs.
Anybody complaining about the new browser has just not got it yet, or has and is trying to keep things the old way because they don’t know how or won’t change with the times. We have seen it before, Kodak, blockbuster, whatever.
Grow up cloud flare, some is your business models don’t make sense any more.
You say this as though all LLM/otherwise automated traffic is for the purposes of fulfilling a request made by a user 100% of the time which is just flatly on-its-face untrue.
Companies make vast amounts of requests for indexing purposes. That could be to facilitate user requests someday, perhaps, but it is not today and not why it's happening. And worse still, LLMs introduce a new third option: that it's not for indexing or for later linking but is instead either for training the language model itself, or for the model to ingest and regurgitate later on with no attribution, with the added fun that it might just make some shit up about whatever you said and be wrong. And as the person buying the web hosting, all of that is subsidized by me.
"The web is changing" does not mean every website must follow suit. Since I built my blog about 2 internet eternities ago, I have seen fad tech come and fad tech go. My blog remains more or less exactly what it was 2 decades ago, with more content and a better stylesheet. I have requested in my robots.txt that my content not be used for LLM training, and I fully expect that to be ignored because tech bros don't respect anyone, even fellow tech bros, when it means they have to change their behavior.
They will be quite the wiser if they track/limit how often your shopper enters the store. You probably aren't entering the same store fifteen times every day and neither would be your shopper if they were only doing it on your behalf.
Might does not make right.
It's like saying a web browser that is customized in any way is wrong. If one configures their browser to eagerly load links so that their next click is instant, is that now wrong?
that's called breaking and entering, and generally frowned upon -- by-passing the "closed sign".