In an earlier age, I ran everything through squid to consolidate browser caches. About five minutes after setting it up, I realised that pulling all the references in the log file and then indexing the lot with htdig would be tremendously useful when I was on the road without internet access.
I spent way too much time pruning stupid crap such as slashdot and started to learn this 'Bayesian classifier' thing.
Your idea is much better.