Obviously walled gardens like Facebook and Twitter would be off limits at the beginning, but if the service gains traction, then it's possible companies would want to be crawled.
I was trying to analyze the common crawl recently, and the process of getting set up is non trivial. This service would allow more people to more easily analyze the web.
Would welcome any feedback.