A Curious Case of Disregarded Robots.txt (opens in new tab)

(mike.pub)

10 pointsmikelabatt9y ago10 comments

10 comments

7 comments · 2 top-level

ryandvm9y ago· 4 in thread

Meh. I'm pretty ambivalent about voluntary restrictions like robots.txt. As far as I'm concerned it's mostly useful as a way for site operators to document endless dynamic content or requests that are prohibitively expensive (but not so much that they restrict access).

I figure if it's on the web and a human can read it, my computer ought to be able to read it too.

mikelabattOP9y ago

Yes, but should your computer also be allowed to disseminate that content without the original author's permission?

upofadown9y ago

A robots.txt file is not any sort of license (anti-license?). It's existence has no bearing on the question of if the IA should be allowed to do what it does. It is only intended to provide helpful information to web crawllers.

* http://www.robotstxt.org/norobots-rfc.txt

true_religion9y ago

Well yes, the whole point of robots.txt is that it's impolite to refuse to follow it, and doing so and getting caught might get you banned from the site.

1 more reply

shouldbworking9y ago

How is this any different from a human doing the same? The internet is meant to be open, it's free information after all. If you don't like it put your stuff behind a login

1 more reply

sitkack9y ago· 1 in thread

Robots.txt doesn't confer copyright.

What about domains that have been sharked ? Does controlling robots.txt now give me the right to suppress all content ever originating from that domain, for as long as I control robots.txt?

Internet archive is right to spider the site, but defer showing. Collection != dissemenation.

The IA isn't synthesizing, selling, cross referencing or afaict doing anything nefarious with the data.

You are literally picking on the last org on the internet that needs to get picked on.

cJ0th9y ago

> You are literally picking on the last org on the internet that needs to get picked on.

True but at the same time I do understand those who generally want others to abide by their Robots.txt. INAL, but ideally I would love to see the IA having the right to ignore Robots.txt (however, if some one wants to opt out of the IA they should be given the option) while others shouldn't be allowed to do so.

j / k navigate · click thread line to collapse

10 comments

7 comments · 2 top-level

ryandvm9y ago· 4 in thread

I figure if it's on the web and a human can read it, my computer ought to be able to read it too.

mikelabattOP9y ago

Yes, but should your computer also be allowed to disseminate that content without the original author's permission?

upofadown9y ago

* http://www.robotstxt.org/norobots-rfc.txt

true_religion9y ago

Well yes, the whole point of robots.txt is that it's impolite to refuse to follow it, and doing so and getting caught might get you banned from the site.

1 more reply

shouldbworking9y ago

How is this any different from a human doing the same? The internet is meant to be open, it's free information after all. If you don't like it put your stuff behind a login

1 more reply

sitkack9y ago· 1 in thread

Robots.txt doesn't confer copyright.

What about domains that have been sharked ? Does controlling robots.txt now give me the right to suppress all content ever originating from that domain, for as long as I control robots.txt?

Internet archive is right to spider the site, but defer showing. Collection != dissemenation.

The IA isn't synthesizing, selling, cross referencing or afaict doing anything nefarious with the data.

You are literally picking on the last org on the internet that needs to get picked on.

cJ0th9y ago

> You are literally picking on the last org on the internet that needs to get picked on.

j / k navigate · click thread line to collapse