undefined | Better HN

0 points3pt1415910y ago0 comments

Hey man if I'm following your robots.txt it's all good right? You have the power to only allow the people that give you benefit:

    User-agent: *
    Disallow: /
    User-agent: Googlebot
    Allow: /
    User-agent: Slurp
    Allow: /
    User-Agent: bingbot
    Allow: /

0 comments

kazinator10y ago

robots.txt is widely ignored. User-agent fields are faked out to make robots look like Firefox on Windows.

Anyone can make a crawler and then have it report as Googlebot. That doesn't even violate the robots.txt; it says, if your name is Googlebot, you're allowed.

Blocking crap requires cunning: code that looks for suspicious access patterns and responds.

A genuine Googlebot should be operating from a Google domain. If we reverse the client IP of a Googlebot request, we get something in the ".googlebot.com" domain.

3pt14159OP10y ago

I'm just saying: don't guilt me if I'm following robots.txt.

j / k navigate · click thread line to collapse