Though substring(m, n) still makes sense in at least interactive text manipulation: how do you do copy/paste?
Unicode is complex because the sum of all human language is complex. Short of a ground up rewrite of the worlds languages, you cannot boil away most of that complexity... it has to go somewhere.
And even if you did manage to “rewrite” the worlds languages to be simple and remove accidental complexity I assert that over centuries it would devolve right back into a complex mess again. Why? Languages represent (and literally shape and constrain) how humans think and humans are a messy bunch of meat sacks living in a huge world rich in weird crazy things to feel and talk about.
- Several writing systems are widely scattered across multiple ‘Supplement’/‘Extended’/‘Extensions’ blocks.
- Operators (e.g. combining forms, joiners) are a mishmash of postfix, infix, and halffix. They should have been (a) in an easily tested reserved block (e.g. 0xF0nn for binary operators, 0xFmnn for unary), so that you could parse over a sequence even if it contains specific operators from a later version — i.e. separate syntax from semantics, and (b) uniformly prefix, so that read-ahead isn't required to find the end of a sequence (and dead keys become just like normal characters).
It is just that there isn't a simple 1:1 correspondence between bytes and characters and glyphs as in unicode, so you cant just extract an arbitrary byte-sequence from a string and expect it to render correctly.
There isn't a simple 1:1 correspondence between anything at all. The only definitive thing about Unicode strings is the beginning where you should start your parsing.
Then the way things are supposed to be displayed to be Unicode-compliant look more like some virtual machine analyzing the code. How is this different from any other declarative language?
It hasn't been for 30 years.