I've used it for some larger scrapes (nothing at the scale you're talking about, but still sizeable) and scrapy has very tight integration with scrapinghub.com to handle all of the deployment issues (including worker uptime, result storage, rate-limiting, etc). Not affiliated with them in any way, just have had a good experience using them in the past.
Every `hosted/cloud/saas/paas` goes into bazillions $$$ for anything largescale. Starting from aws bandwidth and including nearly every service on this earth.