Check the codebase of some popular parsers:
Firefox (already mentioned): https://github.com/mozilla/readability/blob/master/Readabili...
Google Chrome: https://github.com/chromium/dom-distiller
Mercury parser: https://github.com/postlight/mercury-parser