I agree that it's tenuous. I would give it 20% odds of hitting the 500 year mark at best. And I don't think all of the data will survive.
But if archive.org ever becomes unsustainable to run, the existing data will likely be preserved. Lots of companies will be incentivized to continue hosting the data, as it's excellent PR if nothing else. They don't need to continue gathering the data, just host it.
Hosting is only going to become cheaper as t -> infinity, and given the massive amount of compute I've seen Google wield, it's hard to imagine that an operation like archive.org can't find some way to be preserved.
All that said, the biggest threat is sudden data loss. This only works as long as the data doesn't get lost. Has archive.org posted their operations policies anywhere? It would be interesting reading.
Imagine a future gdpr-like policy that gives people's descendants ownership and copyright over everything they've said. Suddenly every word written into archive.org has an owner, who might come and sue archive.org or its managers. Soon every person alive has some grandparent who wrote something in the archive and some of them are wanting compensation for all the decades archive.org has been distributing grandpa's words for free.
Also, as someone who has trained a few large GPT models, I think ML has a chance of preserving a lot of this data. Training datasets are only growing larger and larger, and although those aren't updated (yet), there's no reason to think they won't last for a long time.
I imagine that in 500 years, imagenet2012 might still be around as a historical curiosity, at least somewhere.