Thank you. Even with box layout one can not even know that there is a coherent word or phrase to extract, without visually inspecting the PDF beforehand. I've been there, fighting with it right in the CLI and finding that there is no way to even progress to a script.
The advantage of the OCR method is that it effectively performs that visual inspection. That's why it is preferable for PDFs of disparate origin.