I disagree. Whether or not content should be available to be crawled is dependent on the content's license, and what the site owner specifies in robots.txt (or, in the case of user submitted content, whatever the site's ToS allows)
It should be wholly possible to publish a site intended for human consumption only.
> How would that be enforceable?
Making robots.txt or something else a legal standard instead of a voluntary one. Make it easy for site owners to report violations along with logs, legal action taken against the violators.