And isn't the obvious solution to just make some sort of browsers add-on for the LLM summary so the request comes from your browser and then gets sent to the LLM?
I think the main concern here is the huge amount of traffic from crawling just for content for pre-training.