Using the vision feature for OCR is like using an LLM for math: it might work, but we already have a lot of tools that are hyper-optimized for the task.
There is practically no chance the new feature uses vision because that'd be _insanely_ slow and expensive for any reasonably sized document. They're likely using Azure's LayoutLM derived tech to get out text, then using embeddings to answer on questions