Here's a nice project idea. Visit a page using multiple cookies, then run OCR on it to extract the texts from each visit. Then compare the texts, extracting only the parts which are equal across visits. This should get rid of targeted ads and ads which vary on each visit. For the remaining ads, use a spam filter, or use a crowd-sourced database of ads.