The problem that you have raised here seems to be one of what alphabet or language is being used, but that issue cannot even arise without taking the interpretation into account. If you want alphabet-aware, language-aware, spelling-aware or grammar-aware operators, these will all have to be layered on top of merely byte-aware operations, and this cannot be done without taking into account the intended interpretation of the bytes sequence.
Note that it is not unusual to embed strings of one language within strings written in another. I do not suppose it would be surprising to see some French in a Russian-language War and Peace.