When I've done scraping, I've always taken this approach also: I decouple my process into paired fetch-to-local-cache-folder and process-cached-files stages.
I find this useful for several reasons, but particularly if you want to recrawl the same site for new/updated content, or if you decide to grab extra data from the pages (or, indeed, if your original parsing goes wrong or meets pages it wasn't designed for).
Related: As well as any pages I cache, I generally also have each stage output a CSV (requested url, local file name, status, any other relevant data or metadata), which can be used to drive later stages, or may contain the final output data.
Requesting all of the pages is the biggest time sink when scraping — it's good to avoid having to do any portion of that again, if possible.