What if one function running in “py2 mode” returned a string-that-is-actually-bytes, how would a function in “py3 mode” consume it? What would the type be? If different, how would it be detected or converted? What if it retuned a utf8 string OR bytes? What if that py3 function then passed it to a py2 function - would it become a string again? Would you have two string types - py2string that accepts anything and py3string that only works with utf8? How would this all work with C modules?
It would be bytes. Because py2 string === py3 bytes.
> What if that py3 function then passed it to a py2 function - would it become a string again?
Yes
> Would you have two string types - py2string that accepts anything and py3string that only works with utf8?
Yes. You already have those two types in python3. bytes and string. You'd just alias those as string and utf8 or whatever you want to call it in python2.
How would this all work with C modules?
They'd have specify which mode they were working with too.
So you’d have some magic code that switches py2str to bytes. Which means every py3 caller has to cast bytes into a string to do anything useful with it, because returning strings is the most common case. Then that code has to be removed when the code it’s calling is updated to py3 mode. Which is basically the blue/green issue you see with async functions but way, way worse.
Then you’d need to handle subclasses, wrappers of bytes/str, returning collections of strings across py2/py3 boundaries (would these be copies? Different types? How would type(value[0]) work?), ending up with mixed lists/dicts of bytes and strings depending on the function context, etc etc.
It would become an absolute complete clusterfuck of corner cases that would have killed the language outright.
Yes, that's the whole point. Because compatible modes allow for a gradual transition. Which in practice allows for a much faster transition, because you don't have to transition everything at once (which puts some people off transitioning entirely - making things infinitely harder for everyone else).
Languages like Rust (editions) and JavaScript (strict mode) have done this successfully and relatively painlessly.
> So you’d have some magic code that switches py2str to bytes. Which means every py3 caller has to cast bytes into a string to do anything useful with it, because returning strings is the most common case. Then that code has to be removed when the code it’s calling is updated to py3 mode. Which is basically the blue/green issue you see with async functions but way, way worse.
Well yes, you'd still have to upgrade your code. That goes with a major version bump. But it would allow you to do it on a library-by-library basis rather than forcing you to wait until every dependency has a v3 version. Have that one dependency that keeping you stuck on v2: no problem, upgrade everything else and wrap that one lib in conversion code.
> Then you’d need to handle subclasses, wrappers of bytes/str, returning collections of strings across py2/py3 boundaries (would these be copies? Different types? How would type(value[0]) work?), ending up with mixed lists/dicts of bytes and strings depending on the function context, etc etc.
I'm not sure I understand the problem here. The types themselves are the same between python 2 and 3 (or could have been). It's just the labels that refer to them that are different. A subclass of string in python 2 code would just be a subclass of bytes in python 3 code.
py2 unicode == py3 str
The problem with this approach is that they wanted to reuse the `str` name, which requires a big "flag day", where it switches meaning and compatibility is effectively impossible across that boundary (without ugly hacks).
What they could have done instead would have been to just rename `str` to `bytes`, but retain a deprecated `str` alias that pointed to `bytes`.
That would keep old scripts running indefinitely, while hopefully spewing enough warnings that any maintained libraries and scripts would make the transition.
Eventually they could remove `str` entirely (though I'd personally be against it), but that would still give an actual transition period where everything would be seamlessly compatible.
Same thing with literals: deprecate bare strings, and transition to having to pick explicitly between `b"foo"` and `u"foo"`. Eventually consider removing bare strings entirely. DO NOT just change the meaning of bare strings while removing the ability to pick the default explicitly (in contrast, 3.0 removed `u"asdf"`, and it was only reintroduced several versions later).
What made me personally lose faith in the Python Core team wasn't that Guido made an old mistake a long time ago. It wasn't that they wanted to fix it. It was the absolutely bone-headed way that they prioritized aesthetics over the migration story.
> How would this all work with C modules? In non-strict mode, you'd be able to use either py2 strings or py3 bytes with these, and gradually move all modules to strict mode which requires bytes.
And then, gradually after a decade or so attempt to get rid of all py2 types.