So, to answer the question mentioned above, the document storing the text is sent to an off-site server (https://ocr.space/) which does the OCR and returns the results.
I guess it auto defaults to English then? Running Tesseract on Scandinavian texts gives AAO instead of ÅÄÖ in my experience if you don't supply the correct language training set. That's quite the hen and the egg problem. Can't language identify without the text can't get the text without the right language identified.