Depending on the task at hand, iterating by UTF-8 byte or by code point can make sense, too. And the definition of these is frozen regardless of Unicode version, which makes these safer candidates for "fundamental" operations. There is no right unit of iteration for all tasks.
If I want to know how much memory to allocate, bytes are it. If I want to know how much screen space to allocate, font rendering metrics are it. If I want to do word-breaking, grapheme clusters are it.
None of these are fundamental.
There are languages whose orthographies don't fit the Unicode grapheme cluster specification, but they're complex enough that I doubt there's any way to deal with them properly other than having someone proficient in them looking over your text processing or pawning it off to a library. At least with grapheme clusters your code won't choke on something as simple as unnormalized Latin text.
And the way you take it all into account is by refusing to accept any defaults. So, for example, a string type should not have a "length" operation at all. It should have "length in code points", "length in graphemes" etc operations. And maybe, if you do decide to expose UTF-8 (which I think is a bad idea) - "length in bytes". But every time someone talks about the length, they should be forced to specify what they want (and hence think about why they actually need it).
Similarly, strings shouldn't support simple indexing - at all. They should support operations like "nth codepoint", "nth grapheme" etc. Again, forcing the programmer to decide every time, and to think about the implications of those decisions.
It wouldn't solve all problems, of course. But I bet it would reduce them significantly, because wrong assumptions about strings are the most common source of problems.
As for indexing, strings shouldn't require indexing period. That's the ASCII way of thinking, especially fixed width columns and such. You should be thinking relatively. For example, find me the first space then using that point in the string the next character needs to be letter. When you build you're code that way you don't fall for the trap of byte indexing or the performance hit of codepoint indexing (UTF-8) or grapheme indexing (all encodings).
Size in memory/bytes you could get trivially for any string (and this doesn't change with whether you chose bytes, graphemes or code points or whatever to iterate).
Screen space is irrelevant/orthogonal to encoding -- it appears at the font level and the font rendering engine that will give the metrics will accept whatever encoding it is.
Exactly. That's why measurements of string length shouldn't ever assume I'm looking for a unit-of-offset for a monospaced font.
The problem is that most naive programmers think that's what a string length is and should be.
I also wish Python would expose more of the Unicode database than it does; I've had to turn to third-party modules that basically build their own separate database for some Unicode stuff Python doesn't provide (like access to the Script property).