There are no general solutions, unfortunately. Every PDF-generating system will do things differently, so parsing them must be custom for each type. Generated PDFs are usually very systematic, though, and PDF commands are text (compressed). You can uncompress with PDFTK:
pdftk input.pdf output output.pdf uncompress
Then try grepping or using whatever tools you like (there will be binary parts of the file still, like embedded fonts and bitmaps).