Why? If you can count characters (code points) then it's natural that you can split or substring by characters.
Try this in javascript:
'안녕하세요'.substr(2,2)
Internally Fixed length encoding is much faster than variable-length encoding.> Unicode does not work that way.
It DOES.
> Splitting on characters is garbage.
You messed up Unicode in Python in so many levels. Those characters you seen in Python console is, actually not Unicode. These are just bytes in sys stdout that happens be to correctly decoded and properly displayed. You should always use the u'' for any kind of characters. '안녕하세요' is WRONG and may lead to unspecified behaviors, it depends on your source code file encoding, intepreter encoding and sys default encoding, if you display them in console it depends on the console encoding, if it's GUI or HTML widget it depends on the GUI widget or content-type encoding.
> I'm not even leaving the BMP and it's broken!
Your unicode-fu is broken. Looks like your example provided identical Korean strings, which might be ICU module in Chrome auto normalized for you.
> You can't split decomposed Korean on character boundaries.
In a broken unicode implementation, like Chrome browser v8 js engine.
> I happen to be using Python 3. It is internally using UCS-4.
For the love of BDFL read this