Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
0 points
opless
10y ago
0 comments
Save
Share
Just in case a robot.txt kills that
http://pastebin.com/rcPSyRnR
0 comments
5 comments · 2 top-level
top
newest
oldest
hughw
10y ago
· 3 in thread
Would archive.org typically honor a robots.txt for a resource it already retrieved? I never understood the intent of a robots.txt to be retroactive.
mikeash
10y ago
Apparently yes, it would:
https://archive.org/about/exclude.php
syncsynchalt
10y ago
My understanding is that sites like archive.org honor robots.txt retroactively not because they are required to, but to best honor the wishes of the content provider.
X-Istence
10y ago
Yes, it simply hides the content, it is still kept in their database so if the robots.txt disappears, it pops back from their archive.
New pages won't be archived though.
snsr
10y ago
It's also on seclists.org -
http://seclists.org/isn/2015/Aug/4
j
/
k
navigate · click thread line to collapse