It's always safer to OCR on every file. Sometimes you'll have a "clean" pdf that has a screenshot of an Excel table. Or a scanned image that has already been OCR'd by a lower quality tool (like the built in Adobe OCR). And if you rely on this you're going to get pretty unpredictable results.
It's way easier (and more standardized) to run OCR on every file, rather than trying to guess at the contents based on the metadata.