They could scrape your website and then they prevent you form scraping your own data back.
The whole process is silly; it reflects the duct tape and chicken wire nature of the www.
No one should have to "scrape" or "crawl".
Data should be put into a open universal format (no tags) and submitted when necessary (rsynced) to a public access archive, mirrored around the world.
This to bridge the gap until we reach a more content addressable system (cf. location based).
Clients (text readers, media players, whatever) can download and transform the universally formatted data into markup, binary, etc. -- whatever they wish, but all the design creativity and complexity of "web pages" or "web apps" can be handled at the network edge, client-side.
"Crawling" should not be necessary.
No one should have to store HTML tags and other window dressing for data.
Dream on.