This sounds like a great usage for a forward proxy like Squid or Apache Traffic Server. However, I couldn't find in their docs a way to both:
* Keep a permanent history of the cached pages
* Access old versions of the cached pages (think Wayback Machine)
Does anyone know if this is possible? I could potentially mirror the pages using wget or httrack, but a forward cache is a better solution as the caching process is driven by the scraper itself.
Thanks!