robots.txt is widely ignored. User-agent fields are faked out to make robots look like Firefox on Windows.
Anyone can make a crawler and then have it report as Googlebot. That doesn't even violate the robots.txt; it says, if your name is Googlebot, you're allowed.
Blocking crap requires cunning: code that looks for suspicious access patterns and responds.
A genuine Googlebot should be operating from a Google domain. If we reverse the client IP of a Googlebot request, we get something in the ".googlebot.com" domain.