Mercurial’s journey to and reflections on Python 3 (opens in new tab)

(gregoryszorc.com)

411 pointsngoldbaum6y ago367 comments

367 comments

152 comments · 33 top-level

fireattack6y ago· 19 in thread

I understand author's reasoning in the context of a transition, but as a "non-Latin" language user, defaulting str to unicode literals is the best change in Python 3. Coming from C#, I never get used to Python 2's approach. It's a pain in the ass working with non-Latin characters in Py2 starting from simply output in console, especially on Windows.

>assuming the world is Unicode is flat out wrong

True, but Py2's approach makes lots of developers assume the world is Latin-1. I see way too many examples of things broken on a Chinese locale environment, including Python's official IDLE ([1]).

[1] https://bugs.python.org/issue15809 (Summary of this bug: in 2.x IDLE, an explicit unicode literal used to still be encoded using system's ANSI encoding instead of, well, unicode.)

int_19h6y ago

The most amusing quote in the entire article is this (emphasis mine):

> This ground rule meant that a mass insertion of b'' prefixes everywhere was not desirable, as that would require developers to think about whether a type was a bytes or str, a distinction they didn't have to worry about on Python 2 because we practically never used the Unicode-based string type in Mercurial.

Requiring developers to think which one it should be is, of course, the whole point of the changes in Python 3 - and it's what produces better apps that are more aware of i18n issues in general and Unicode in particular.

And the complaint doesn't even make sense if taken at face value - if all strings in Mercurial are byte strings, then what is there to think about? just use b'' throughout, no need to worry about anything else. Of course, the devil is in the details, which is reflected by the word "practically" in that sentence - this kinda implies that there are places where Unicode strings are used. At which point you do want the developers to think about bytes vs Unicode.

So the real complaint is that Python switched the defaults in a way that made bytes-centric code more complicated - because it has to be explicit now, instead of the Python 2 world, where bytes was the default, and Unicode had to be requested explicitly. Which, of course, is the right change for the vast majority of code out there, that operates on higher level of abstraction, where "all strings are Unicode by default" is a perfectly reasonable assumption to force.

sfink6y ago

> And the complaint doesn't even make sense if taken at face value - if all strings in Mercurial are byte strings, then what is there to think about? just use b'' throughout, no need to worry about anything else.

The article directly answers that question. Many, many things in the standard library now only accept unicode strings, not byte strings. So a wholesale change to b'' everywhere breaks lots of stuff.

> So the real complaint is that Python switched the defaults in a way that made bytes-centric code more complicated - because it has to be explicit now, instead of the Python 2 world, where bytes was the default, and Unicode had to be requested explicitly.

Once again, the article directly states that the default is not the problem. The lack of escape hatches is. Paths are not unicode strings, and pretending they are does not work. Using bytes when you need bytes works only until you need to call a library function that only accepts strings.

2 more replies

markbnj6y ago

> if all strings in Mercurial are byte strings, then what is there to think about? just use b'' throughout, no need to worry about anything else.

The author explains later in the article that many system level python 3 apis that are important to a vcs require unicode and won't accept bytes. So apparently it wasn't as easy as just sticking 'b' in front of every literal.

1 more reply

phkahler6y ago

>> So the real complaint is that Python switched the defaults in a way that made bytes-centric code more complicated

The author made it clear. The issue wasn't just that the default changed. It was that 3.0 took away the ability to always make your choice explicit.

Changing the default would have no effect on code that was always explicit. Going over the code and making all implicit strings explicit would allow them to know when they had full coverage, and also make the code work with both 2 and 3.

With 3, any implicit had to get b added, while any string with u had to be made implicit (drop the u). You couldn't tell by looking at code if it was converted or not. At least that's how I read it.

1 more reply

epage6y ago

Another reason the complaint doesn't make sense is that the author then praises Rust which is more similar to Python 3 than 2.

1 more reply

the_mitsuhiko6y ago

> but as a "non-Latin" language user, defaulting str to unicode literals is the best change in Python 3

I'm also a "non-latin" user and I will keep repeating this point ad nauseam: there would have been many strictly superious solutions to solving this problem and most of them would have been closer to what we had in Python 2 than 3.

Both rust and go decided to go with Unicode support that is largely based around utf-8 with free transmutations to bytes and almost free conversions from bytes. This could have been Python 3 but that path was dismissed without a lot of evidence.

A Unicode model that was a bad idea in 2005 was picked and we now have it in 2020 where it's a lot worse because thanks to emojis we now are well outside the basic plane.

harikb6y ago

> Both rust and go decided to go with Unicode support that is largely based around utf-8 with free transmutations to bytes and almost free conversions from bytes. This could have been Python 3 but that path was dismissed without a lot of evidence.

Both of those are newer languages that happen to take a stance from the day 1. So not quite comparable.

That said, UTF-8 is one of the best pragmatic solutions to this Unicode problem. Most engineers I meet who throw their hands up in the air complaining about Unicode haven't read the simple Wikipedia page for utf-8.

Python 2 was already half way there, they just to had to tweak a few places bytes are converted to strings. Of course this is easier for newer languages to solve. We can't blame Python for having to provide backward compatibility.

PS: I also blame all the "encoding detection" libraries which exist to try to solve an unsolvable problem. Nobody can detect an encoding, at least not reliably. If these half-assed libraries did not exist, people would have finally settled on UTF-8 and given up on others by now.

2 more replies

lmm6y ago

What do you mean by "free"? Rust requires you to explicitly convert a string to bytes or vice versa, no? Which is pretty much what you do in Python - the only difference I can see is that you have shortcut methods to encode/decode using UTF-8, but semantically they're no different from encode/decode in Python.

I'm pretty dubious about specifying that the internal representation must be UTF-8. That's a failure of abstraction (because the program shouldn't know or care what the internal representation is), leads to inherent performance/interop problems on several compile targets (Windows, the JVM, Javascript), and seems to imply that Han unification is forced at the language level.

1 more reply

Sohcahtoa826y ago

Do you mean that if you have bytes, but you want to send them to a function that expects a string, then it would automatically interpret the bytes as UTF-8?

If so, that violates the "Explicit is better than implicit" part of the Zen of Python. Encoding/Decoding bytes to/from strings shouldn't happen automatically because doing so means you have to make an assumption about the encoding.

1 more reply

takeda6y ago

IMO Python is doing exactly the same thing that Go does (I know too little about Rust to comment) the only difference is that Python respects the LANG variable while Go is just fixed on using UTF-8.

1 more reply

earthboundkid6y ago

Python 2's approach was bad, no argument, but the transition plan for 2-to-3 just didn't work. They thought everyone would run 2to3 in a big bang, and then we'd all switch over to 3 in a few years. Instead it dragged out over a decade because in reality we needed to write code that was compatible with both 2 and 3 (the "6" approach) until enough things were on 3 to drop 2 support.

Hindsight is 20/20 naturally, but in retrospect, they should have just made `bytes` into the name for old `str` and used `from __future__ import` to create a gradual system for moving from 2 to 3 instead of a big bang "we'll break everything once and then never again".

cbsmith6y ago

I'm not sure they really thought 2to3 would be used for a big bang. I seem to recall the general initial messaging was that Python 3 was a new language and you would need to do a language port to get to it.

kibwen6y ago

> I understand author's reasoning in the context of a transition, but as a "non-Latin" language user, defaulting str to unicode literals is the best change in Python 3

I think this is misreading the author's criticism. The fact that string literals are now Unicode is not the fundamental problem; the fact that standard library APIs that formerly took bytes now incorrectly take Unicode strings is the problem.

IMO it's great that the world is moving towards opaque blobs of Unicode for strings, but that requires understanding when something shouldn't simply be a string in the first place (for reasons of legacy or otherwise).

fireattack6y ago

My comment is about this sentence:

>Perhaps my least favorite feature of Python 3 is its insistence that the world is Unicode

>standard library APIs that formerly took bytes now incorrectly take Unicode strings

What do you mean by "incorrectly"?

1 more reply

branko_d6y ago

Just beware that C# is not exactly "Unicode" either.

C# char is a UTF-16 code unit, not a Unicode code point.

Most code points "fit" into just one UTF-16 code unit, but not all.

For example: 𝐀 ("Mathematical Bold Capital A", code point U+1D400) is encoded in UTF-16 as a surrogate pair of code units: U+D835 and U+DC00. So reversing "x𝐀y" should produce "y𝐀x" ("y\ud835\udc00x") - note how U+D835 and U+DC00 were not reversed in the result.

ygra6y ago

C# isn't exactly quiet about this property, and yes, it can be annoying from an API perspective, but in C# this was likely a pragmatic choice to remain compatible (and familiar) with C++, COM, etc. where most developers would be coming from.

API members that operate on code points universally take a string and an index.

That being said, treating strings as arrays of characters is fraught with peril in most cases anyway. You can't trivially reverse strings in any encoding, as you need to reverse the sequence of grapheme clusters (to account for diacritics, etc.). You can't trivially truncate strings either, for pretty much the same reason. You can't trivially grab a single character from the middle of a string, again, for the same reason. So basically, indexing, reversing, truncating, copying a subsequence, etc. are all not trivially possible regardless of the encoding. UTF-16 is not the main problem here, as even in UTF-32 it'd be broken.

ak2176y ago

I think the actual pain in Python 2 came from the misguided decision not to adopt UTF-8 as the default character encoding, combined with silent coercion between unicode/bytes whenever needed. Those two features in combination made Python brittle and dangerous when handling non-ascii characters, not the "strings are bytes" default.

Making strings Unicode by default is wonderful compared to the alternatives (and OP's assertion that this amounts to "assuming the world is Unicode" is disingenuous: there's nothing stopping programs from handling bytes correctly - Python 3 merely resolved the ambiguity).

kibwen6y ago

> I think the actual pain in Python 2 came from the misguided decision not to adopt UTF-8 as the default character encoding

The decision of a default encoding surely dates back to Python 1.0 or earlier, which predates not just UTF-8 but even Unicode itself. Python is an old language!

And if the assertion is that Python 2.0 should have made the tumultuous Unicode jump when it released in 2000, I could get behind that (especially in retrospect!), but enthusiasm for both Unicode and UTF-8 was not nearly as high then as it is today, so I don't begrudge them for not jumping at the opportunity.

2 more replies

im3w1l6y ago

To be fair, IDLE is pretty garbage in most ways.

ploxiln6y ago· 17 in thread

I've been involved in multiple non-trivial libraries and frameworks that supported both python2 and python3 for many years with the same codebase ... and it really wasn't anything like this. The python3 "adaptation" effort for mercurial was just bungled by multiple terrible decisions.

First was the idea that normal feature contributors should not see any b"" or any sign of python3 support for the first couple years of the effort. Huge mistake. You need some b"".

But you don't need all b"" everywhere. That was the second huge mistake. Don't just convert every natural string in the whole codebase to b"". The natural string type is the right type in many places, both for python2 (bytes-like) and python3 (unicode-like). The helpers for converting kwargs keys to/from bytes is a sign that you are way off track. This guy got really hung up on the fact that the python2 natural string type is bytes-like, and tryied to force explicit bytes everywhere (dict keys, http headers, etc) and was really tilting at windmills for most of these past 5 years.

Yes, you pretty much had to wait for python-3.4 to be released and for python-2.6 to be mostly retired in favor of python-2.7. Then, starting in early 2014, it was pretty straightforward to make a clean codebase compatible with python-2.7 and python-3.4+, and I saw it done for Tornado, paramiko, and a few other smaller projects.

pdonis6y ago

> The natural string type is the right type in many places

For many programs, yes. Not for a revision control system that needs to be sure it's working with the exact binary data that's stored in the repository. Repository data is bytes, not Unicode.

I think this article is an excellent illustration of the Python developers' failure to properly recognize this use case in the 2 to 3 transition.

jharsman6y ago

I was an early adopter of Mercurial and the teams insistence that file names were byte strings was the cause of lots of bugs when it came to Unicode support.

For example, when I converted our existing Subversion repository to Mercurial I had to rename a couple of files that had non ASCII characters in their names because Mercurial couldn't handle it. At least on Windows file names would either be broken in Explorer or in the command line.

In fact I just checked and it is STILL broken in Mercurial 4.8.2 which I happened to have installed on my work laptop with Windows. Any file with non ASCII characters in the name is shown as garbled in the command line interface on Windows.

I remember some mailing list post way back when where mpm said that it was very important that hg was 8-bit clean since a Makefile might contain some random string of bytes that indicated a file and for that Makefile to work the file in question had to have the exact same string of bytes for a name. Of course, if file names are just strings of bytes instead of text, you can't display them, or send them over the internet to a machine with another file name encoding or do hardly anything useful with them. So basic functionality still seems to be broken to support unix systems with non-ascii filenames that aren't in UTF-8.

3 more replies

masklinn6y ago

> For many programs, yes.

For all programs, for the simple reason that:

> Various standard library functionality now wanted unicode str and didn't accept bytes, even though the Python 2 implementation used the equivalent of bytes.

Much of the stdlib works with native strings and will either blow up or misbehave if fed anything else[0], which means much of your codebase will necessarily be native strings, with a subset being explicitly bytes or unicode.

> Repository data is bytes, not Unicode.

It's also mostly absent from the source code, and where it is present (e.g. placeholders or separators) it's easy to flag as explicitly bytes.

[0] though some e.g. the encoding layers or io module want either bytes or unicode depending what you're doing specifically, and not always the most sensible, like baseXY being bytes -> bytes conversions where 95% of the use case is to smuggle binary data through text… oh well

1 more reply

ploxiln6y ago

Repository data bytes does not show up as string literals in your code, or keyword argument names, or http header names. The vast majority of code involved in this struggle is misc business logic, not repository tracked file contents itself.

1 more reply

takeda6y ago

The rule of thumb (not just for Python, but anything that deals with encoding) is to use binary encoding at the bounds of your program (reading/writing files, sending/receiving data over network etc) it applies to everything including tools like this. If you follow it your life will be simpler.

You just need to be aware that in some cases the work is already done for you by the language, for example in python if you open a file (without "b" option, the python will do the translation on the fly and you don't need to worry about it)

2 more replies

cbsmith6y ago

To be fair... the problem was more in Python 2 where this stuff was often conflated. Python 3 really just brought the problem in to stark relief.

TBH I do think the problem is easier to address in a statically typed world.

speedplane6y ago

> I think this article is an excellent illustration of the Python developers' failure to properly recognize this use case in the 2 to 3 transition.

The entire 2 to 3 transition is an excellent illustration of Python developers failing properly recognize the challenges in transition. What other popular language intentionally broke backwards comparability? It's hard to think of any.

Python set the entire community back 10 years or more by making this drastic mistake.

simias6y ago

It might be my own pro-typed-language bias showing but this migration from byte strings to unicode strings is really where dynamically typed languages really don't shine.

If we imagine an alternative reality where Rust started only with byte-strings and added unicode as an afterthought like Python did, you'd definitely face a massive amount of churn but at least the compiler would yell at you every time you pass a byte string where unicode is expected and vice-versa. Once you'll have fixed all of the errors in the vast majority of cases there's a good chance that your program would work again. It would be very annoying but at least you know clearly where the problems occur.

In Python on the other hand this type of code refactoring is very painful in my experience. You may end up with the same function being called sometimes with unicode and sometimes with bytes. And then you have to look at the call stack to figure out where it comes from. And then you realize that you end up with, say, a list of records which sometimes contain unicode and sometimes byte arrays depending on whether the code that updated them used the old or the new version etc...

And if it turns out that you can't easily reproduce the problem and you just get a bug report sent from somewhere in production then Good Luck; Have Fun.

monoideism6y ago

> added unicode as an afterthought like Python did

I agree with you on the benefits of static typing, but let's clear: Python didn't add unicode as an "afterthought". The initial release of Python predates the initial release of the Unicode standard, by almost a year.

Furthermore, even if this were not the case, it took a while before Unicode got any significant adoption among programming languages, well after the release of Python 1.0. I think Java in 1996 was the first language to adopt Unicode.

1 more reply

dkarl6y ago

First was the idea that normal feature contributors should not see any b"" or any sign of python3 support for the first couple years of the effort. Huge mistake. You need some b"".

When I read that, I was angry on behalf of the people doing the porting work who had their hands tied by it, and I was angry on behalf of the Mercurial developers who, I think, must have been underestimated. It's normal that platforms don't stand still and coding standards on a project evolve over time. Obviously it's not going to fly for open source contributors to be "voluntold" to do porting work, but to be aware of it and accommodate it and know enough about the new platform to mostly avoid creating new work for the porters seems like a small and reasonable ask, especially when you compare it to the effort required to make high-quality contributions in the first place.

I get that there are people who are bitter to this day about Python having a version 3, but surely by 2017 the vast, vast majority of developers who were going to rage quit the Python community over it were already gone.

mixmastamyk6y ago

Yes, I was really surprised that they avoided upgrading to Python 2.7-level best practices and future statements for as long as they did and tried to hide it from most developers thru custom compatibility layers. Huh? That's step 0, getting except, stdlib imports, and print statements up to date. Folks can deal with that, that's the easy part.

Keeping blame details (and line-lengths, ha!) was given as the excuse and that is a nice feature and all. However they could have copied the repo over before porting to keep that information and saved time. Wouldn't be surprised if it was eventually lost anyway.

indygreg26y ago

The late start was mostly due to having to retain Python 2.4/2.5 compatibility until May 2015 and it was literally impossible to use some future statements or some Python 3 syntax until 2.6 was required. I have updated the post to reflect this.

1 more reply

CJefferson6y ago

Interesting you mention http headers. I had a program converted Python 2 -> Python 2 which was crashing occasionally, and it turned out it was because I was being sent a http request which wasn't valid unicode, so decoding failed.

I had to switch back to treating headers as bytes for as long as possible.

It is a stupid client which doesn't send valid ascii for http headers of course.

takeda6y ago

I believe the headers are encoded using ISO-8859-1 not Unicode. That encoding has 1:1 mapping with bytes so wouldn't break this way. Treating them as UTF-8 was the bug.

2 more replies

dnautics6y ago

> It is a stupid client which doesn't send valid ascii for http headers of course.

...or a smart malicious actor.

utxaa6y ago

> But you don't need all b"" everywhere.

as a mercurial user i never understood this decision. for instance look at this recent commit: https://www.mercurial-scm.org/repo/hg/rev/b4c82b704180

would anyone disagree with the fact that an error message should be a string?

a source transformer to add b'' all over the place? really?

and i still don't understand why the hg transition had to be more complex than: https://docs.djangoproject.com/en/1.11/topics/python3/

... and of course now this: https://www.mercurial-scm.org/wiki/OxidationPlan

i wonder what does matt mackall think of all these developments?

skywhopper6y ago

Why are you so certain about your assertions here about when they did and did not need to use explicit byte strings?

nemothekid6y ago· 11 in thread

> Perhaps my least favorite feature of Python 3 is its insistence that the world is Unicode. [..] However, the approach of assuming the world is Unicode is flat out wrong and has significant implications for systems level applications (like version control tools).

Isn't this more a problem with Python not easily differentiating between String and Byte types? Both Go and Rust ("""systems""" level languages) have decided that "utf-8 ought to be enough for anybody" and that seems to be a good decision.

Jasper_6y ago

Yes, but that insistence that Bytes and Unicode are two different things that Shall Not Be Mixed was mostly a Python 3-ism. Python 2 had different types but you could be sloppy and it would kinda work out.

There was this assumption that Unicode code points were the correct single unit to talk about Unicode. You iterate over code points, you talk about string lengths in terms of code points, you slice in terms of code points. Much like the infamy of 16-bit Unicode, this is an assumption that has kinda gotten worse over time. Now we can and do want to talk about bytes, code points, and newer sets like extended grapheme clusters. I think this is probably the big failing of Python 3's Unicode model. Making a string type operate on extended grapheme clusters might fix it, but we'd be in for the same sort of pain, and the flexibility of "everything is bytes, we can iterate over it differently" of Go and Rust is much nicer in comparison.

The second thing was this assumption that everything remotely looking like text was Unicode, despite this maybe not being true. HTTP has parts that look like plain text, like "GET" and "POST" and the headers like "Content-Type: text/html". But the correct way to view this as ASCII bytes, and no other encoding makes sense; binary data intermixed with "plain text" definitely happens, and the need to pick and choose between either Unicode or Bytes caused major damage in the standard library which still persists to this day -- some parts definitely chose the wrong side. Take a look at the craziness in the "zipfile" module for one other example. It's probably fixed now, but back then, I basically had to rewrite it from scratch in one of my other projects.

They eventually relented and added back a lot of the conveniences to blur the line between bytes and unicode again, like adding the % formatting operator for bytes, which I think shows that their insistence on separating the two didn't really pan out in practice. And yet, migration is still a pain.

int_19h6y ago

> Python 2 had different types but you could be sloppy and it would kinda work out.

It would "kinda work out", if your Unicode strings were ASCII in practice, and only then. Because whenever a Unicode and a non-Unicode string had to be combined, it used ASCII as the default encoding to converge them.

Which is to say, it only worked out for English input, and even then only until the point where you hit a foreign name, or something like "naïve". Then you'd suddenly get an exception - and it happened not at the point where the offending input was generated, but at the point where two strings happened to be combined.

This was a horrible state of affairs for basically everybody except the English speakers, because there was a lot of Python code out there that was written against and tested solely on inputs that wouldn't break it like that.

Intermixing binary data with text can be represented just fine in a type system where the two are different. For your HTTP example, the obvious answer is that the values that are fundamentally binary, like the method name or the headers, should be bytes, while the parts that have a known encoding should be str - there's nothing there that requires actually mixing them in a single value. In those very rare cases where you genuinely do have something like Unicode followed by binary followed by Unicode in a single value, that is trivially represented by a (str, bytes, str) tuple.

The problem with the Python stdlib isn't that bytes and Unicode are distinct. It's that it's overly strict about only accepting Unicode in some places where bytes should be legal, too. This is orthogonal to them being separate types.

1 more reply

hsivonen6y ago

> There was this assumption that Unicode code points were the correct single unit to talk about Unicode.

The most messed-up thing about Python 3 is that it's supposed to be justified by doing Unicode right and they still got it wrong.

Having strings be sequences of Unicode code points is a super-bizarre design. That is, Python 3 strings indeed are semantically sequences of Unicode code points rather than sequences of Unicode scalar values. You can not only materialize lone surrogates (defensible for compatibility with UTF-16) but you can also materialize surrogate pairs in addition to actual astral characters. You still can't materialize units that are above the Unicode range, though, so it's not like C++'s std::u32string.

Looking at the old PEPs, it appears to have arisen by accident rather than as an actual design.

joshuamorton6y ago

I'm confused, there isn't an insistence that everything is unicode. Http headers are treated as bytes before you decode them, but you can totally decode an http request or response as ASCII. At least until you're interacting with a website that has unicode codepoints in it's url.

1 more reply

takeda6y ago

> Yes, but that insistence that Bytes and Unicode are two different things that Shall Not Be Mixed was mostly a Python 3-ism

Go has string and byte[], and you can't mix it, you have to cast. Java has String, char[] and byte[] and similarly you need to do cast. Rust has Bytes and String (I don't know Rust enough, but I'm pretty sure it doesn't implicit conversion between them).

Also Python 3 doesn't distinct between Bytes and Unicode, Python 3 has distinction between bytes and text (str - BTW: Guido actually expressed regret that he did use "str" instead of "text", because it would be much clearer)

In Python 3 you don't have Unicode (as far as you should be concerned), you have text and bytes, how the bytes are stored internally is an implementation detail, if you need to write to a file or to network, you encode the text using various encodings (most popular is UTF-8) and you decode it back when reading.

1 more reply

toyg6y ago

Zipfile has always been a mess. I have no idea why, but its interfaces have been consistently poor from a usability perspective. This well before py3 was a factor.

steveklabnik6y ago

The blog post talks about this a bit in Rust, but we don't actually say that. We do make that the default, but we also give you the ability to get at the underlying things as well. There's a lot of interesting work here, actually, like WTF-8...

kevingadd6y ago

In the wild WTF-8 and its 16-bit equivalent show up more often than you'd expect. I ended up discovering a case recently where part of the .NET executable file format is actually encoding strings as WTF-16 (not UTF-16) and any internal lowering needs to map them to WTF-8 instead of UTF-8. Until that point I had expected to only ever encounter WTF-8 in web browsers!

vmchale6y ago

> Both Go and Rust ("""systems""" level languages) have decided that "utf-8 ought to be enough for anybody" and that seems to be a good decision.

When working with e.g. filepaths, Rust has an OsStr type.

dilap6y ago

A go string is just a sequence of bytes, which is usually/by convention utf8. But you can store anything you want in there, if necessary.

bjoli6y ago

I would say that it is just shitty design not not differentiate between bytestrings and regular strings in a way that causes problems. The biggest design flaw here was not forcing people to understand the difference in python2

peatmoss6y ago· 9 in thread

I think my takeaway lesson is that it’s very hard to introduce large, breaking changes to a language and not alienate a large proportion of existing users. I don’t know that there’s a right way.

I look at Perl, which was a juggernaut when I first used Python, and announcements of Perl 6 certainly didn’t help Perl’s slide. Often cited is the fact that Perl 6 is a totally different language unrelated by anything but creator and name. The Perl brand was not enough to carry the bulk of Perl users from Perl 5 to Perl 6. Perl 6 is now called Raku, which probably better reflects the magnitude of the change.

On the other hand Python 3 is a small but still significant departure from Python 2. If they’d called Python 3 something else, we’d probably be griping about how superficially different from Python 2 it was without bringing substantially new ideas.

Oddly my feeling is that Racket, in its departure from mainline Scheme, largely did retain its core audience, but that may have been a feature of its usage in academia.

Fast forward to last year when a prominent Racket architect announced “Racket 2” which would completely change the syntax of the language. Prominent community members reacted negatively, due to fears of Perl 6’s fate. But now they’ve decided to simply call the new research language Rhombus and have reiterated plans to continue supporting Racket. I went from feeling very negative to the change to being okay with the direction.

I’m not sure there are lessons to draw, other than noting than version bumping versus making a new language with a new name can be bad for entirely different reasons.

intrepidhero6y ago

I'm not well informed about the breaking changes made under the hood in Python 3. But I wonder if breaking backwards compatibility in this case wasn't simple an externalization of costs. The community certainly went through a TON of work to port libraries, etc. Could all those man-hours have gone towards making incremental, backwards compatible changes instead?

I think the takeaway is that if you want to make breaking changes, make a new thing and turn the old thing over to a maintenance team. If after a while you learn some things that could improve the old thing see if they can be incorporated without breaking compatibility or if the new thing is really so much better people will switch.

b2gills6y ago

Everything that I've heard that was changed from Python 2 to 3 are reminiscent of things that Perl5 handled while maintaining backwards compatibility.

I mean Perl5 is still mostly backwards compatible back to the original version released in 1987. (There were a few rarely used bad features that should have never been there which have been removed.)

The way it does this is by having you specifically ask for the new features, if they would otherwise break code.

edflsafoiewq6y ago

I think that's the wrong takeaway, especially since it kind of absolves Python of doing anything wrong. There were many examples of ways that this was unduly hard just because of how poorly the transition was designed for.

> While hindsight is 20/20, many of the issues with Python 3 were obvious at the time and could have been mitigated had the language maintainers been more accommodating - and dare I say empathetic - to its users.

mikepurvis6y ago

The author's suggestion of permitting a certain set of "from __past__" imports seems especially astute. This would have made it much more possible much earlier to have a single large codebase running natively on the Python 3 interpreter, but with modules (especially leaf modules) at varying degrees of ported-ness.

In contrast, the original porting guidance for module authors was actually to maintain the Python 2 source as the master copy, and use 2to3 to transform it for running tests or cutting a Python 3 release. How is a transition ever supposed to happen if the new hotness is perpetually a second class citizen?

1 more reply

peatmoss6y ago

Hmm, I didn’t mean to convey that the Python 2/3 transition was done well. I think lots of projects have tried to port their community success to substantially different projects, and that most have failed for a variety of different reasons.

Python 3 is a “success” in that a lot of people have moved. But it was, as you rightly point out, a hard won victory that left a lot of people unhappy.

tus886y ago

> introduce large, breaking changes without major benefits

FTFY

lizmat6y ago

Re: Often cited is the fact that Perl 6 is a totally different language unrelated by anything but creator and name

Indeed. That is why Perl 6 has been renamed to Raku (https://raku.org using the #rakulang tag on social media).

I'm not sure what lessons can be drawn from this, other than being indecisive has its price.

mark_l_watson6y ago

I totally agree with you, I am uncomfortable with Racket 2 having a non-Lisp syntax. As someone who has used Lisp languages to get stuff done for over 30 years, I would say NO to the syntax change.

That said, Racket is open source the maintainers have good reasons for a change based on getting a larger user base. I wish them great success.

peatmoss6y ago

I disliked the “Racket2” name because it strongly implied that Racket 1 (the one with parentheses on the outside and few commas) was the past, and that something with a vaguely Algol-ish syntax was the future.

But Racket as a project was always about language experiments. I certainly didn’t bristle at Typed Racket or any of the other languages. And so, in changing “Racket 2” to “Rhombus” and committing to mainline support of Racket, I feel pretty comfortable with the direction. I find this fascinating that I feel this way given that nothing has really changed but the name.

war10256y ago· 8 in thread

We are on the brink of completing the transition to python3 at my work.

The end result of this is that I just spent a good chunk of last week reviewing a pull request with 70,000 lines of changes, which was one of the final in a series of ~10k line pull requests that came in through the fall.

All of this was the heroic effort of one of my coworkers who had the unenviable task of combing through our entire codebase to determine "This is unicode. This is bytes. Here is an api boundary where we need to encode / decode." etc.

It was a nightmare of effort that I'm glad to have behind us.

vmchale6y ago

> All of this was the heroic effort of one of my coworkers who had the unenviable task of combing through our entire codebase to determine "This is unicode. This is bytes.

Dynamic typing!

war10256y ago

Not dynamic typing's fault.

The issue is they changed the types out from underneath you.

And then left it to each library to decide which type it was actually going to accept.

1 more reply

magicalhippo6y ago

Delphi also went through a similar transition from "strings are in whatever the local code page says" with one byte chars to Unicode strings (Windows-style).

However the makers of Delphi spent many years preparing for this, so when the time came for us to switch we only had to spend half a day or so to migrate our half a million lines of code.

d0mine6y ago

Something is wrong if there is no third type: the "natural" string (bytes on Python 2, unicode on Python 3).

war10256y ago

I assume many of the strings were left untouched. But you still have to audit all of it to know which needs to be used where.

eesmith6y ago

I believe that's included in the "etc."

quietbritishjim6y ago

Surely any "natural" string would be better represented as unicode in Python 2? What is an example that wouldn't be?

3 more replies

jnwatson6y ago

There is. It looks like this:

  u"Hello World"

weberc26y ago· 7 in thread

> One of the biggest early hurdles in our porting effort was how to overcome the string literals type mismatch between Python 2 and 3. In Python 2, a '' string literal is a sequence of bytes. In Python 3, a '' string literal is a sequence of Unicode code points. These are fundamentally different types. And in Mercurial's code base, most of our string types are binary by design: use of a Unicode based str for representing data is flat out wrong for our use case. We knew that Mercurial would need to eventually switch many string literals from '' to b'' to preserve type compatibility. But doing so would be problematic.

> In the early days of Mercurial's Python 3 port in 2015, Mercurial's project maintainer (Matt Mackall) set a ground rule that the Python 3 port shouldn't overly disrupt others: he wanted the Python 3 port to more or less happen in the background and not require every developer to be aware of Python 3's low-level behavior in order to get work done on the existing Python 2 code base. This may seem like a questionable decision (and I probably disagreed with him to some extent at the time because I was doing Python 3 porting work and the decision constrained this work). But it was the correct decision. Matt knew that it would be years before the Python 3 port was either necessary or resulted in a meaningful return on investment (the value proposition of Python 3 has always been weak to Mercurial because Python 3 doesn't demonstrate a compelling advantage over Python 2 for our use case).

As a general rule, this seems like good practice, but surely b-strings, print_function, etc are a trivial upfront cost, and one that would have to be paid sooner or later anyway?

lacker6y ago

It seems like a lot of the cost was when system libraries made different choices than Mercurial would have made. For example, the Python 3 filesystem libraries often used unicode as a wrapper around an underlying bytes interface, and Mercurial really wanted to be able to pass bytes directly to the underlying interface. So it isn't just that they had to update their data types, they also had to adjust code to work with system libraries of slightly different semantics.

wnoise6y ago

The python interface usually would take bytes, and if it did would also return bytes. But there were a lot of things that didn't take arguments, so always returned Unicode strings. Instead of e.g. getcwd(), you would have to use getcwdb(). Which naturally didn't have an equivalent in python 2 though they did add the complementary getcwdu() (which one should basically never use).

novok6y ago

Having done a py2 to py3 migration and ran into the same issues, these wouldn't of been that big of a deal IF python had static typing from the get go.

The static compiler would notice all the breaking type changes at compile time and you can systematically fix all of them at once. You wouldn't miss one or two and have to run your unit testing suite to exercise the type system underneath.

I really believe big breaking changes like this in a language causing migration stagnation is a property of dynamically typed languages. With other statically typed languages like swift or rust, it happened quite frequently but wasn't as big of a deal in practice.

TylerE6y ago

Python with static typing wouldn't be python. It wouldn't even be python-adjacent.

1 more reply

kingemer6y ago

It sounds like the cost was non trivial for them, partially because they weren’t allowed to break things for python2, or even disrupt the efforts of those using it.

The language wasn’t ready for the transition, but it feels like it may have been even harder on them because of the requirements imposed on their project.

kingemer6y ago

Considering how much opposition there is in moving to python3, has there been any significant effort in the community keeping python2 alive?

1 more reply

weberc26y ago

Right, they could have paid a nontrivial cost (asking devs to use b-strings, print_function, etc) but didn't by fiat, choosing instead to pay a greater cost during the migration in addition to the nontrivial cost. My comment expresses skepticism about that decision.

rossdavidh6y ago· 6 in thread

Having worked in python for about a decade, first in python2 and lately in python3, and having seen projects convert, I find this article baffling. I found Six to work pretty well, and where it didn't it wasn't hard to change.

I think the core error here was in NOT doing what he calls a "flag day" conversion. Sometimes it is easier to do something quickly, than to live with it happening slowly. I've done "flag day" conversions, and they were pretty painless, if stressful at the time.

the_mitsuhiko6y ago

> Having worked in python for about a decade, first in python2 and lately in python3, and having seen projects convert, I find this article baffling. I found Six to work pretty well, and where it didn't it wasn't hard to change.

It matters a lot where you work. If you are in high level land Python 3 is not much of a chance. If you work at the boundary (wire protocols, OS interop, text transformation) then Python 3 is a significant step back, especially before 3.6. A lot of the mud that Mercurial stepped through is also where I went through with my libraries. The day I managed to get the PEP through that reintroduced the u prefix on strings was also the last time I voluntarily participated in a language summit. The atmosphere was awful and not evidence based.

Jasper_6y ago

Yeah, I was around for the sidelines of PEP-461, reintroducing % onto the bytes type, trying to get things done behind the scenes, and it was just a miserable experience all around. I don't think anybody cared to understand our concerns about why we should bother making the bytes type useful. At times it seemed like they believed leaving in the concatenation on the bytes type was a mistake.

Read through the receipts here: https://bugs.python.org/issue3982

1 more reply

klodolph6y ago

I think Six works pretty well for a certain percentage of projects, and then completely blows up for others.

Same thing applies to using Mypy. Some modules are easy to add annotations for, other modules have insanely complicated types.

CJefferson6y ago

Mercurial still can't have a "flag day" as Macs are still distributed with Python 2, and not 3. Therefore it would make Mercurial significantly worse for mac users if it didn't support Python 2.

krupan6y ago

I love mercurial to death, but let's be honest, how many Mac/mercurial users are there? Very few, I would guess. Now, how many of those users don't install a version of python not included with the OS? I'd guess we are getting pretty darn close to zero there.

2 more replies

McP6y ago

Last time I used Mercurial (it's been a while) it shipped with its own python executable

1 more reply

j88439h846y ago· 5 in thread

A lot of Mercurial's issues would have been resolved much easier if they'd used the common tools for maintaining polyglot 2/3 code instead of trying to invent everything themselves.

Futurize and Pasturize in particular provide essentially all of the features that this post laments missing.

https://python-future.org/

lacker6y ago

The author does touch on that.

When Mercurial accepts a 3rd party package, downstream packagers like Debian get all hot and bothered and end up making questionable patches to our source code.

Some environments just can't use dependencies like this. IMO Python 3 was too much of a breaking change, and in particular, the ability to transition from 2 to 3 should have been better in Python itself.

j88439h846y ago

Six, for example, is designed to be a single file -- specifically to ease copying it directly into the code base. But the idea that Mercurial couldn't use dependencies because of fear for what Debian might do...I find it so hard to believe that's the best choice. Vendor if you must, but do not reinvent the wheel.

1 more reply

morelisp6y ago

The first versions of Flask that supported Python 3 absolutely tanked performance. Some memory k/v store esque APIs we had that spent a good chunk of their time processing headers / query params to know what keys to look up got something like 8x slower. I remember ripping all the logic out of one endpoint and finding out it was >1ms minimum round trip, something like 85% of the time spent switching pure ASCII back and forth in six. It's a shame, because I absolutely love Flask, but we couldn't tolerate that so new APIs ended up in other languages.

I can't imagine what it would do to Mercurial's performance to have picked the wrong migration library early on.

untitaker_6y ago

Those tools were not popular at the time for sure. I remember that Python-Future only got traction sometime after Flask has been ported to Python 3.

j88439h846y ago

FWIW, Flask supported Python 3 in 0.10.1, released 2013-06-14 and Mercurial started porting in 2015.

intrepidhero6y ago· 4 in thread

> Matt knew that it would be years before the Python 3 port was either necessary or resulted in a meaningful return on investment (the value proposition of Python 3 has always been weak to Mercurial because Python 3 doesn't demonstrate a compelling advantage over Python 2 for our use case). What Matt was trying to do was minimize the externalized costs that a Python 3 port would inflict on the project. He correctly recognized that maintaining the existing product and supporting existing users was more important than a long-term bet in its infancy.

Having just done transitions on a number of much smaller projects I had the same thought. Changes to string handling tripped me up and the changes to relative imports took some thinking. But the biggest frustration was the nagging question: Why am I doing this?

edit: missing word

libria6y ago

> Why am [I] doing this?

Lack of security updates past 2019 forced our hand. Did you find a way around that?

falcolas6y ago

> Lack of security updates past 2019 forced our hand. Did you find a way around that?

Amazon is maintaining Python 2 for at least 4 years, as part of their Amazon Linux long term support release. Google app engine will support Python 2 for an unknown amount of time; they haven't announced an end date. PyPy is Python 2, with (to the best of my limited knowledge) no plans to deprecate support. There are also other LTS releases out there which include Python 2 support.

IOW, the forcing function of the PSF no longer supporting Python is not as big a factor as was hoped.

3 more replies

kick6y ago

PyPy is keeping Python 2 support indefinitely, I believe.

hsivonen6y ago

There's a project for keeping Python 2 alive: https://github.com/naftaliharris/tauthon

It's particularly uncool that Guido brought up the prospect of lawyers (https://github.com/naftaliharris/tauthon/issues/47#issuecomm...) to force it not to be called Python and opposed to letting people who care about keeping Python 2 alive evolve it as "Python 2". (I know he has the legal right to insist on the name change. Still uncool.)

8 more replies

michaelhoffman6y ago· 4 in thread

The biggest problem with the Python 2 to Python 3 transition was not that breaking changes were made. It’s that breaking changes were made in a way such that you could not easily have code that worked both on Python 2 and Python 3.

It took years before the advent of six, Python 3 u’’ literals, and modernize. The author discusses this at length.

choppaface6y ago

Another big problem is there was no significant incentive to adopt Python 3. That’s why it took so long for large projects to transition. In comparison, during the last decade, C++ went from dodgy C++11 toy projects to all new code being written in modern C++. The modern feature set is that good.

Jasper_6y ago

C++ doesn't mandate you switch from std::cout to fmt in order to use lambdas. If they did that, I think we'd see a lot less modern C++.

1 more reply

j88439h846y ago

Six was available for years (2011) before Mercurial even started porting (2015).

https://github.com/benjaminp/six/graphs/contributors

eesmith6y ago

That was part of the "discusses this at length". Part of the relevant discussion is:

> So I'm not sure six would have saved enough effort to justify the baggage of integrating a 3rd party package into Mercurial. (When Mercurial accepts a 3rd party package, downstream packagers like Debian get all hot and bothered and end up making questionable patches to our source code. So we prefer to minimize the surface area for problems by minimizing dependencies on 3rd party packages.)

loxs6y ago· 4 in thread

It would probably be less painful and much better (for other reasons) to migrate to some other language. Some projects did that successfully, or are in the process of doing so. Most notably reposurgeon: https://gitlab.com/esr/reposurgeon

sfink6y ago

Greg says as much in the article: in hindsight, porting to Rust would have worked out better. Which is a pretty bold statement, but very interesting to hear from someone with intimate experience to back the opinion up.

cookiecaper6y ago

Mercurial's dependence on Python has always held it back, IMO. Self-contained Rust or Go-style static binaries work much better for "install everywhere" system utilities. I'd love to see Hg port to a more concise ecosystem and potentially claw some of the market away from Git.

1 more reply

ufmace6y ago

In a sense, Rust may have been a better choice for Mercurial overall, but it's hard to imagine how much of a pain the migration process would be. I don't think you could make much of any headway going Python 2 -> Rust with automated tools. That means the transition would look like, stop all Mercurial dev in its tracks, have all current contributors (who can and care to) learn Rust, bring on a couple of devs with experience in architecting large Rust projects, spend however long redesigning and rewriting in Rust, release a roughly feature-equal version a year or 2 later. Good way to move Mercurial from second-place to Git to barely known.

1 more reply

raymondh6y ago

FWIW, Rust didn't become feature stable until mid-2015. So even there, the world was changing.

1 more reply

adontz6y ago· 3 in thread

I have never seen such rejection in Django community, despite real problems, like with WSGI design, handling I/O and thus working with bytes a lot.

Every huge task, like porting from Python 2 to Python 3 or any other huge task is either everybody's task or just a small group's one. And since latter seems more reasonable to not interfere with ongoing development, former is the only way I have seen such tasks to succeed.

Artificial rules to create comfort for one group at the expense of another group, like the following

>> This ground rule meant that a mass insertion of b'' prefixes everywhere was not desirable, as that would require developers to think about whether a type was a bytes or str, a distinction they didn't have to worry about on Python 2 because we practically never used the Unicode-based string type in Mercurial.

sound pretty much wrong to me.

If there is a pain, it should become everybody's pain, or otherwise people will simply burn out and hate own work, like the author did. There is no way porting to Python 3 can be harder than porting to Rust. Rust is statically typed and not garbage collected. Everyone would have to think if they need string or array of bytes anyway, but also, who owns them.

Overall, described situation looks like management issue and not a technical one to me.

Edit: typos.

reubenmorais6y ago

> There is no way porting to Python 3 can be harder than porting to Rust. Rust is statically typed and not garbage collected. Everyone would have to think if they need string or array of bytes anyway, but also, who owns them.

The author addresses this. The difference is that when porting to Rust you'd likely get a faster and more correct program in the end. (Huge caveat of big rewrites, of course). Whereas with Python 3 they feel like they did all the porting work and got nothing valuable in return.

roca6y ago

The Rust compiler statically checks those decisions, while in Python issues with string types will only be caught at run-time, so everywhere your test suite has missing coverage, porting is likely to introduce regressions. That is one way in which a Rust port would be easier.

hinkley6y ago

I used to switch unit tests from jasmine 1.3 to mocha because jasmine is kind of a mess, and jasmine 1.3 tests look too much like they should still work in jasmine 2.0, except some of the corner cases on equivalence of objects are wrong. So some of your tests would go red with no code change, but others would be green and stay green even when the code no longer functions properly. Like cutting the wires to your smoke detector.

It would take quite a bit of change in a language for a port to be safer than an upgrade, but it's not completely impossible.

reggieband6y ago· 3 in thread

In the past I have been a vocal advocate for the way the transition from Python 2 to Python 3 was handled. However, it should be said I use Python primarily as scripting glue, e.g. for build scripts and automation tasks. I have never worked on a "large" Python code base nor did I have to migrate anything. Almost everything I had written in Python 2 was just naturally replaced by newer scripts in the due course of time.

I also remember my first forays into Python 3 and the annoyance I had at some of the decisions. I recall when they relented on the % operator for string interpolation and I agree it was a poor initial choice to leave it out. I totally agree with the author that Python 3 could have made some subtle changes earlier on to help those with massive codebases.

And I still feel it was the right move. Somehow Python is even more relevant today than it was when this painful process began. While some may say that popularity is despite missteps I actually believe the general slow and cautious push forward is one of the primary reasons Python continues to succeed. There is a balance between completely abandoning old users (e.g. Perl 5 to Perl 6) and keeping every historical wart (e.g. C++). IMO, the Python community found a middle ground and made it work.

b2gills6y ago

I have yet to find out about a Python 3 change that couldn't have been handled in a backward compatible way.

I know this because every change I've heard about is reminiscent of a Perl5 change where backwards compatibility was not broken.

The transition to Python 3 was not handled anywhere as well as it could have been.

There is a reason Go2 isn't copying Python3. (Strangely they seem to be copying the Perl5 update model even though they don't realize it.)

The thing is that because Python has an unhealthy fixation on “There should only be one way to do things” they rejected things that would have made the transition easier. (Or even less necessary.)

I think it is kind-of telling that someone thought it necessary to create Tauthon. Tauthon is sort-of applying the Perl5 update model to Python 2.7.

lizmat6y ago

Re: There is a balance between completely abandoning old users (e.g. Perl 5 to Perl 6)

Please note that Perl 6 has been renamed to Raku (https://raku.org using the #rakulang tag on social media).

In the original design of Perl 6, a source-level compatibility layer ("use v5") was envisioned that should allow Perl 5 source code to run inside of Perl 6. So the plan was to actually not abandon old users.

In my opinion, this failed for two reasons:

1. Most of Perl 5 actually depends on XS code, the hastily devised and not very well thought out interface to C code of Perl 5. Being able to run Perl 5 source code in Perl 6 doesn't bring you much, unless you have a complete stack free of XS. Although some people tried to achieve that (with many PurePerl initiatives), this really never materialized.

2. Then when the Inline::Perl5 module came along, allowing seamless integration of a Perl 5 interpreter inside Perl 6, using Perl 5 modules inside of Perl 6 as if they were Perl 6 modules, it basically nailed the coffin in which the "use v5" initiative found itself already in.

And now they're considered different languages after the rename to Raku, dividing already limited resources. I guess that's the way of life.

reggieband6y ago

I think my wording "abandoning" was more inflammatory than I would have liked. And I didn't want to call out or target Perl 6 / Raku. What I meant to convey was that the language team behind Perl 6 (as it was known before the rebranding) made a decision that it would be a new language and not an evolution of the existing language. It was the first example in my mind, one most people would recognize, that anchored one side of the continuum I was describing. I assume there are better examples (or worse offenders) but I don't know of any off hand.

makecheck6y ago· 2 in thread

It’s funny, on the Mac one becomes used to constant changes, rewriting damn near everything just to stand still. Yet I designed my Mac app long ago to depend on the system “Python 2” (bound to C++), because it seemed that both the installation itself and the Python language and libraries were very stable. Looking back, this turned out to be sustainable for a remarkably long time, as “Python 2” really did evolve only additively and there was almost no reason to even touch 15-year-old code that was relying on Python 2. For the Mac platform especially, this reliability is unheard of.

More amazing to me is that in Catalina, the release famous for breaking just about everything else, “Python 2” is still there and works as it always has! Of course, Apple did announce that it will be ripped out in the next release. :)

pfranz6y ago

I think this weird thing happened with Python 2. I believe Python 2.6 (Oct-2008) was the last "feature release" and 2.7 (Jul-2010) was intended as a bridge. So since 2008, 2.x users have been shielded from most all of the normal churn of any widely used language that's in active development.

What I don't think people realize is that not only are you expected to move to 3.x, but you'll have to keep up or fall behind with new 3.x releases. During that same period (since 2008) 3.x has had 9 big releases. Of course that 2.x stability was done with the assumption you'd move to 3.x and isn't sustainable for PSF indefinitely.

cannam6y ago

> Of course, Apple did announce that it will be ripped out in the next release

They did? Damn, I was using that...

ufov26y ago· 2 in thread

The approach of doing the transition slowly over many years maybe was a mistake here, and another thing making it harder seems to have been little support from the top of the project.

I ported two projects with ~200000 Python-SLOC (about the same size as Mercurial according to sloccount) back in the early 3.x days. Doing this via more or less flag-day conversions within a few months, converting the codebases first to 2to3-able subset, and as a second step later on dropping 2to3 via common dialect of Python 2/3 with six, was not very painful in the end.

sfink6y ago

Did you have a large ecosystem of third party extension modules that also had to obey the flag day?

masklinn6y ago

> I ported two projects with ~200000 Python-SLOC (about the same size as Mercurial according to sloccount) back in the early 3.x days. Doing this via more or less flag-day conversions within a few months, converting the codebases first to 2to3-able subset, and as a second step later on dropping 2to3 via common dialect of Python 2/3 with six, was not very painful in the end.

Sounds like you used the same method, just over a smaller timeframe: convert to a common 2/3 subset, then drop Python 2 at some later point.

epage6y ago· 2 in thread

I can't believe their leadership de-prioritized the port until the last minute when they have an ecosystem on top of them that also has to port. I feel that was irresponsible.

The project lead said to not push `b""` on people. That was a mistake imo that led them down a very frustrating rabbit hole (transformers, `pycompat`) that probably greatly extended their port time. One reason given is to not confuse devs with those details but they are critical details and ones you can't avoid with Rust. This inconsistency makes me wonder if the post is mostly misdirected frustration. A lot of it centers on.

I agree about the early python3 releases making it harder. I don't remember what the python leadership's intent was but i think I actually agree with what they did, now. Over my career, I've come to appreciate starting with the ideal and working backwards. This let's you learn what is needed rather than wasting time on speculation (planning or dev) or making a more crippled product.

I can understand frustrations with bugs / differences in python versions. I ran into that a lot just within `2.7.*`

In my mind, the most notable complaint is the stdlib's mixed efforts in supporting str or bytes. I feel "batteries included" maked this harder. They had to port a lot. Not everything can get the same level of scrutiny, especially from domain experts that represent a variety of use cases. They also can't break compat. If they weren't battries included, the porting efforts would be more directed, pull in the right people, and you can fix things later if you get it wrong.

What I find interesting is how different our experiences are that lead to the same place. My frustrations with python are rooted in build tools and packaging and have been loving Rust.

EDIT: I'm also surprised at the hostility towards distribution packagers. Instead of working with them to find mutually valid solutions, the express frustration at distributions and cripple themselves in not allowing third-party dependencies.

Conan_Kudo6y ago

> EDIT: I'm also surprised at the hostility towards distribution packagers. Instead of working with them to find mutually valid solutions, the express frustration at distributions and cripple themselves in not allowing third-party dependencies.

These days, it's "cool" to hate your downstreams (y'know, bite the hands that feed you and all that).

Seriously though, as one of those "distribution packagers" (Fedora, Mageia, OpenMandriva, and openSUSE!), it sucks that I encounter this more and more often. I try to be somewhat involved in the projects I package and contribute where I can, be it code, advice, or anything in between. Ten years ago, people were generally friendly to me. These days? It's rare to get a thank-you. Usually I get grumbles and anger for daring to ship it in a distro package. I've even had a couple of patches rejected that fix real bugs simply because they were discovered as part of my packaging and testing something because it doesn't happen on the dev's machine in his virtualenv on his Mac...

brohee6y ago

In the case of Debian, the Debian official policy was for a long time to explicitly unbundle everything, which for many things amounted to sabotage. Things like rvm were born of distributions unusable packages.

mikl6y ago· 2 in thread

There’s no doubt that the 2 -> 3 transition was rough for the Python community. I personally stopped using Python as my go-to language in the early Python 3, since writing Python 2 code felt stupid since it was outdated the moment you wrote it, and 3 wasn’t really well supported by the community and tooling yet.

On the other hand, Python adoption has really taken off since Python 3.5-ish. Python has never been more popular.

So while you may wonder what might have been, had the transition been smoother, it’s hard to argue that Python 3 is a failure. All’s well that ends well, I guess?

Although it’s sad that Guido felt the need to step down. It’ll be interesting to see where Python goes this decade, now the transition is over and there’s a wealth of possibilities in front of it.

I expect there’ll be a lot of people looking to replace JavaScript with Python once you can run it in the browser with WASM.

roca6y ago

"All's well that ends well" neglects the costs to the community of bad decisions. It also encourages people to think that those decisions must not have been very bad, and not learn from those mistakes.

You can see that at work in the responses here. "And I still feel it was the right move. Somehow Python is even more relevant today than it was when this painful process began." I.e. success is thought to justify every decision made along the way.

I see this fallacy at work in Linux too. "Linux is successful, therefore haphazard CI and using email to track bugs and patches must be a fine way to operate".

mrr546y ago

Haphazard CI and using email to track bugs and patches is a fine way to operate.

souprock6y ago· 2 in thread

Not what you want to hear about a version control system: "Python is a dynamic language and there are tons of invariants that aren't caught at compile time and can only be discovered at run time. These invariants cannot all be detected by tests, no matter how good your test coverage is. This is a feature/limitation of dynamic languages. Our users will likely be finding a long tail of miscellaneous bugs on Python 3 for years."

indygreg26y ago

C/C++ is a language with limited facilities to ensure correctness at compile time. The languages are riddled with undefined behavior in common features that programmers with multiple decades of experience still get tripped up by. NULL access - the so called "billion dollar mistake" - out of bounds reads and writes, and use after free create a litany of security issues and create massive liability for companies who choose to author software in these languages.

Not what you want to hear about an operating system :p

otabdeveloper46y ago

"C/C++" is not a language.

Good lord, how much ignorance can hackernews handle??

alangpierce6y ago· 2 in thread

It's interesting that they wanted to add b' prefixes to all strings, and I wonder if they would have had a better experience by embracing regular strings instead. At least in Python 3, if your string only contains ASCII, then the underlying representation will use one byte per character, so ASCII-only strings are stored just as efficiently as ASCII-only bytes instances.

I think there are two mental models for how to approach the str/bytes split:

1.) A `str` is for unicode use cases, and a `bytes` is better for cases that don't support unicode.

2.) A `bytes` is an array of numbers between 0 and 255. A `str` should almost always be used when your value is conceptually a sequence of characters. `str` doesn't imply that arbitrary unicode is allowed, and it's fine to have a convention that a particular `str` is ASCII-only, just like other conventions you might have on variable values.

My impression is that #1 is the Python 2 mental model and is tempting for Python 3, but that #2 often works better when writing Python 3 code. Under mental model #2, asking for "%s" formatting is really asking for a replacement strategy that detects the number 37 followed by the number 155 in an array of numbers and fills in a sub-array, which seems more strange and likely to get false positives if you're really working with binary data like the bytes of a .jpg file.

That said, I'm sure the devil is in the details, and maybe a project like mercurial has to stay backcompat with bytes data that is neither ASCII nor valid UTF-8, or some other compelling reason to stick with bytes everywhere.

CJefferson6y ago

The problem is many strings might contain things like commit messages, or filenames, neither of which has to be valid unicode.

I've had the same problem with a few Python 2 -> 3 conversions -- everything is fine until you have to operate on text or filenames which aren't valid utf8/unicode.

alangpierce6y ago

Got it. So I understand, maybe someone saved a filename as the latin-1 encoding of some non-ASCII text, and Mercurial would need to support such files (but also would have no contextual information that it's latin-1)?

I'm tempted to say "nobody should have filenames like that", but I guess a project like Mercurial needs to be as compatible as possible. Are there modern use cases for filenames like that, or is it fair to say it's all legacy data?

2 more replies

ascotan6y ago· 2 in thread

Python 3 is the new Windows Vista.

marcosdumay6y ago

You mean they just have to fix the issues behind the scenes, then rename the last version (like to "Python 4"), and it will become the greatest version ever?

yjftsjthsd-h6y ago

Yes, actually. Now that we've gotten through the pain of the first years of Python 3, if we could have a clean start and call Python 3.8 Python4, it would probably be well-received.

raverbashing6y ago· 2 in thread

Ok, yeah, maybe mercurial's case was special, but still, this seems like they made it harder on themselves needlessly.

> One was that the added b characters would cause a lot of lines to grow beyond our length limits and we'd have to reformat code

ORLY?! Well, guess what: hard line size limits are stupid. Now you know why.

That's why "foolish consistency is the hobgoblin of little minds" is one of the 1st phrases of PEP-8.

But I'm tired of people saying "oooh let's cut all lines to be under 80-characters" like it's some kind of Biblical Mandate. No, it isn't. And the 80 chars limit is BS. Probably the part I hate the most about PEP-8 (and especially how people interpret the PEP-8)

> is its insistence that the world is Unicode

Oh please. Yes, the world is Unicode. Get over it. Maybe not bytes on disk/network. But apart from that? Yes. If libraries take bytes or unicodes I can agree it's a thorny issue, but let's move on because a happy day is a day where I don't get an UnicodeDecodeError because Python2, to add insult to the injury thinks the world is not only not Unicode, but it's all ASCII.

Windows made the right call a long time ago when it decided to make all strings Unicode. Ok, maybe UTF-8 would be better than 16, but it still does the job.

But I have to agree with them that any version < Py3.4 or 3.5 was really not worth it.

yjftsjthsd-h6y ago

> the world is Unicode. Get over it. Maybe not bytes on disk/network. But apart from that? Yes

So, just ignoring the 2 things that a version control system exists to work with directly?

raverbashing6y ago

You're ignoring the parts where they have several hardcoded byte/unicode strings.

Otherwise just convert to and from when saving and sending it to network

zmmmmm6y ago· 1 in thread

It mostly makes me question the wisdom of implementing such a tool in Python in the first place - if you want low level access to raw underlying representations etc, using a super high level scripting language seems like a "wrong tool for the job" scenario. I am sure on the other hand they got a lot of productivity benefits from doing that, which is great, but having taken that tradeoff I don't think it is fair to "sour" on a language when you clearly applied it out of it's domain and then encountered problems due to that.

Tanooki_Mario6y ago

You can't really say this after the language has supported the feature as a design decision for 15 years and then removes it. Part of the popularity of python is it made things easier for systems programmers. Unless you want us to write everything in C/C++ again?

raymondh6y ago· 1 in thread

This criticism of the dev team seems naïve, "It should not have taken 11 years to get to where we are today." Core developers can make tooling available, but they can't control adoption. That is a user decision. Users switch-over on a timetable governed by their own individual cost-benefit analysis.

yjftsjthsd-h6y ago

> Core developers can make tooling available, but they can't control adoption

Core developers made the design decisions that made nobody want to adopt it.

> the ecosystem of users and projects are collectively much better-off than if the transition had not occurred at all.

The question seems more like, "could the same benefits have been had with less pain", and a reasonable reading is that the answer is yes. (ex. 4 years of not being able to work with bytes reasonably even if you did need them)

sprash6y ago· 1 in thread

The transition from Python 2 to 3 was one worst things that could happened to the whole community. The costs of the transition never justified the benefits. The new features were negligible at best or a regression at worst and in some cases performance got even worse. One could even assume flat out sabotage.

Let's just hope there will never be a Python 4 and the developers now finally start focusing on the greatest flaw of Python: performance.

Tanooki_Mario6y ago

What's crazy is if they had removed the GIL python 3 adoption would have been huge. All python 3 offered were marginal benefits for adopting functionality that broke huge code bases.

jmilloy6y ago

> in Mercurial's code base, most of our string types are binary by design: use of a Unicode based str for representing data is flat out wrong for our use case.

I feel like this is the essence of the article: specific constraints/choices of Mecurial made their port to Python 3 difficult. Working with early Python 3 certainly did not help. But there seems to have been some stubbornness here mixed with a lot of retroactive justification.

> One was that the added b characters would cause a lot of lines to grow beyond our length limits and we'd have to reformat code.

This is almost ridiculous. You are going to write a JIT partial 2to3 instead of just increasing your length limits and/or using an autoformatter? (Of course, it turns out they eventually did do that... after a bit more stubborness regarding the autoformatter.)

> So I'm not sure six would have saved enough effort to justify the baggage of integrating a 3rd party package into Mercurial.

Couldn't this have been a very occasional copy and paste, instead of a downstream dependency? [six](https://six.readthedocs.io/) "consists of only one Python file, so it is painless to copy into a project."

> Initially, Python 3 had a rather cavalier attitude towards backwards and forwards compatibility.

Yes, can't disagree. Early adopters who attempted to write 2- and 3- compatible code suffered the most.

mbar846y ago

I guess this is as good a time as any to pimp my work in this area: https://pypi.org/project/lib3to6/

lib3to6 is a Python compatibility library similar to to BableJS. It translates (most) valid Python 3.7 syntax to valid Python 2.7 and Python 3 syntax (aka. universal python). If you would like to develop with a modern python version and yet still maintain backward compatibility or if you want to bring a legacy codebase forward step by step (my use case), then please have a look.

TeeWEE6y ago

We just migrated a big codebase to python 3 in about a year. It was not easy, but also not super hard with tools like futurize, and mypy.

Gladly we already had mypy hints, this helped us find a lot of mistakes when (not) using bytes.

Now we're on python 3 we're auto-migrate the type hints to be inlined with tools, like com2ann https://github.com/ilevkivskyi/com2ann And we're auto rewriting code to be more python 3 like with libcst and custom codemods...

gfxgirl6y ago

I wish more devs cared about backward compatibility not just in python but in general. I know a particular, very popular library, 60k stars on github, who's maintainers break stuff every month. They don't care how many developers time it wastes they just decide FooBar should really be named FooB and rename it. No Effs given how many people it disrupts. You'd think people would complain but cult of personality and/or popularity of library turns people in to fan boys where they seem to think "If these geniuses are doing it this way then it must be good". .... sigh

rurban6y ago

For me this is the most exciting outcome:

> The only Python 3 feature that Mercurial developers seem to almost universally get excited about is type annotations. We already have some people playing around with pytype using comment-based annotations and pytype has already caught a few bugs. We're eager to go all in on type annotations and uncover lots of dynamic typing bugs and poorly implemented APIs.

Over in perl land people still spill their hate on types, which caused hard forks.

bschwindHN6y ago

I wonder how many human lives' worth of work has been wasted from the decision to use Python and having to deal with 2/3 transitions, and if it was worth it for the speedup of using an interpreted language.

2T1Qka0rEiPr6y ago

Not knowing really anything about Mercurial, the `skip-blame` feature seems interesting, but seems Git doesn't have something similar (has to be constrained when calling `blame`)

Ohn06y ago

For such a thorough article, I wish there were mention of Python 4

luord6y ago

1. Introduce a new version with the plan of discontinuing the previous version 11 years later (that's almost half of the time that, by then, python had been a thing), that itself was released only three years after the very tool you're talking about was released.

2. Don't even pretend to be interested in trying to do a migration until seven years later.

3. Make sure that your migration plan includes a development cycle that's deliberately hostile to the migration process.

4. ?

5. How could the python maintainers do this to us.

The description of the migration process was a good read. The fud afterwards... wasn't.

And there were a few inaccuracies (I'm being charitable, some of them were straight up lies).

> Python 3.0 was released on December 3, 2008. And it took the better part of a decade for the community to embrace it.

False, I've been using python 3, python 3 exclusively, since 2014, for all my projects.

> Yes, Python is still healthy today and Python 3 is (finally) being adopted at scale

False, same as above.

> I am ecstatic the community is finally rallying around Python 3

Again, false. Not only did "the community" rallied around python 3 years ago, he isn't really happy about it, but I'll get to that later.

> For nearly 4 years, Python 3 took away the consistent syntax for denoting bytes/Unicode string literals.

Or, to put it another way, python 3 was compatible with python 2's string types almost eight years before python 2 reached end of life.

> An ecosystem that falters for that long is generally not healthy

This entire paragraph was a hypothetical. It seems he really wanted to criticize something that did not happen.

> The only language I've seen properly implement higher-order abstractions on top of operating system facilities is Rust

And here's where his true point becomes evident: this is a hype piece for a language he found that he likes better. He's just attacking something in his previous language that he thinks is valid just as an attempt to highlight why the new toy is truly better. In short: He felt like complaining about the migration would be a good way to proselytize.

Just in case: no, it isn't better, and I say this as someone who currently isn't using python nor rust. I'm using a language that I'm quickly growing to hate more than I do either of them at their worst (no, it's not JavaScript).

> if Rust were at its current state 5 years ago, Mercurial would have likely ported from Python 2 to Rust instead of Python 3. As crazy as it initially sounded, I think I agree with that assessment.

So... The best he can say about rust is that it might be better than python 3 five years ago that, by his own opinion on everything he wrote before this, was terrible? Well, that's a recommendation not to use rust if I ever saw any.

When a hype piece defeats its own point.

> And speaking as a maintainer, I have mad respect for the people leading such a large community.

No, he doesn't; he used several appeals to emotion beforehand to try to paint them as terrible people.

> It should not have taken 11 years to get to where we are today.

This statement by itself is a truism that doesn't really mean anything, but the implication is that python 3 is only worthwhile 11 years later and it took that long for it to be so I'll reply to that.

No, it didn't. It didn't even take that long for mercurial, they started the migration four and half years ago, not eleven.

> am confident it will grow stronger by taking the time to do so

What is it to him? He should just move on to rust and be happy with it (sure, there are many people unhappy with it, but he wouldn't take the effort to proselytize if he wasn't).

In conclusion, I just don't understand the need to tear something else down to prop up a new thing. I'm sure I would have liked a post about things he could do with rust, but now...

j / k navigate · click thread line to collapse

367 comments

152 comments · 33 top-level

fireattack6y ago· 19 in thread

>assuming the world is Unicode is flat out wrong

True, but Py2's approach makes lots of developers assume the world is Latin-1. I see way too many examples of things broken on a Chinese locale environment, including Python's official IDLE ([1]).

[1] https://bugs.python.org/issue15809 (Summary of this bug: in 2.x IDLE, an explicit unicode literal used to still be encoded using system's ANSI encoding instead of, well, unicode.)

int_19h6y ago

The most amusing quote in the entire article is this (emphasis mine):

sfink6y ago

The article directly answers that question. Many, many things in the standard library now only accept unicode strings, not byte strings. So a wholesale change to b'' everywhere breaks lots of stuff.

2 more replies

markbnj6y ago

> if all strings in Mercurial are byte strings, then what is there to think about? just use b'' throughout, no need to worry about anything else.

1 more reply

phkahler6y ago

>> So the real complaint is that Python switched the defaults in a way that made bytes-centric code more complicated

The author made it clear. The issue wasn't just that the default changed. It was that 3.0 took away the ability to always make your choice explicit.

With 3, any implicit had to get b added, while any string with u had to be made implicit (drop the u). You couldn't tell by looking at code if it was converted or not. At least that's how I read it.

1 more reply

epage6y ago

Another reason the complaint doesn't make sense is that the author then praises Rust which is more similar to Python 3 than 2.

1 more reply

the_mitsuhiko6y ago

> but as a "non-Latin" language user, defaulting str to unicode literals is the best change in Python 3

A Unicode model that was a bad idea in 2005 was picked and we now have it in 2020 where it's a lot worse because thanks to emojis we now are well outside the basic plane.

harikb6y ago

Both of those are newer languages that happen to take a stance from the day 1. So not quite comparable.

2 more replies

lmm6y ago

1 more reply

Sohcahtoa826y ago

Do you mean that if you have bytes, but you want to send them to a function that expects a string, then it would automatically interpret the bytes as UTF-8?

1 more reply

takeda6y ago

IMO Python is doing exactly the same thing that Go does (I know too little about Rust to comment) the only difference is that Python respects the LANG variable while Go is just fixed on using UTF-8.

1 more reply

earthboundkid6y ago

cbsmith6y ago

kibwen6y ago

> I understand author's reasoning in the context of a transition, but as a "non-Latin" language user, defaulting str to unicode literals is the best change in Python 3

fireattack6y ago

My comment is about this sentence:

>Perhaps my least favorite feature of Python 3 is its insistence that the world is Unicode

>standard library APIs that formerly took bytes now incorrectly take Unicode strings

What do you mean by "incorrectly"?

1 more reply

branko_d6y ago

Just beware that C# is not exactly "Unicode" either.

C# char is a UTF-16 code unit, not a Unicode code point.

Most code points "fit" into just one UTF-16 code unit, but not all.

ygra6y ago

API members that operate on code points universally take a string and an index.

ak2176y ago

kibwen6y ago

> I think the actual pain in Python 2 came from the misguided decision not to adopt UTF-8 as the default character encoding

The decision of a default encoding surely dates back to Python 1.0 or earlier, which predates not just UTF-8 but even Unicode itself. Python is an old language!

2 more replies

im3w1l6y ago

To be fair, IDLE is pretty garbage in most ways.

ploxiln6y ago· 17 in thread

First was the idea that normal feature contributors should not see any b"" or any sign of python3 support for the first couple years of the effort. Huge mistake. You need some b"".

pdonis6y ago

> The natural string type is the right type in many places

For many programs, yes. Not for a revision control system that needs to be sure it's working with the exact binary data that's stored in the repository. Repository data is bytes, not Unicode.

I think this article is an excellent illustration of the Python developers' failure to properly recognize this use case in the 2 to 3 transition.

jharsman6y ago

I was an early adopter of Mercurial and the teams insistence that file names were byte strings was the cause of lots of bugs when it came to Unicode support.

3 more replies

masklinn6y ago

> For many programs, yes.

For all programs, for the simple reason that:

> Various standard library functionality now wanted unicode str and didn't accept bytes, even though the Python 2 implementation used the equivalent of bytes.

> Repository data is bytes, not Unicode.

It's also mostly absent from the source code, and where it is present (e.g. placeholders or separators) it's easy to flag as explicitly bytes.

1 more reply

ploxiln6y ago

1 more reply

takeda6y ago

2 more replies

cbsmith6y ago

To be fair... the problem was more in Python 2 where this stuff was often conflated. Python 3 really just brought the problem in to stark relief.

TBH I do think the problem is easier to address in a statically typed world.

speedplane6y ago

> I think this article is an excellent illustration of the Python developers' failure to properly recognize this use case in the 2 to 3 transition.

Python set the entire community back 10 years or more by making this drastic mistake.

simias6y ago

It might be my own pro-typed-language bias showing but this migration from byte strings to unicode strings is really where dynamically typed languages really don't shine.

And if it turns out that you can't easily reproduce the problem and you just get a bug report sent from somewhere in production then Good Luck; Have Fun.

monoideism6y ago

> added unicode as an afterthought like Python did

1 more reply

dkarl6y ago

First was the idea that normal feature contributors should not see any b"" or any sign of python3 support for the first couple years of the effort. Huge mistake. You need some b"".

mixmastamyk6y ago

indygreg26y ago

1 more reply

CJefferson6y ago

I had to switch back to treating headers as bytes for as long as possible.

It is a stupid client which doesn't send valid ascii for http headers of course.

takeda6y ago

I believe the headers are encoded using ISO-8859-1 not Unicode. That encoding has 1:1 mapping with bytes so wouldn't break this way. Treating them as UTF-8 was the bug.

2 more replies

dnautics6y ago

> It is a stupid client which doesn't send valid ascii for http headers of course.

...or a smart malicious actor.

utxaa6y ago

> But you don't need all b"" everywhere.

as a mercurial user i never understood this decision. for instance look at this recent commit: https://www.mercurial-scm.org/repo/hg/rev/b4c82b704180

would anyone disagree with the fact that an error message should be a string?

a source transformer to add b'' all over the place? really?

and i still don't understand why the hg transition had to be more complex than: https://docs.djangoproject.com/en/1.11/topics/python3/

... and of course now this: https://www.mercurial-scm.org/wiki/OxidationPlan

i wonder what does matt mackall think of all these developments?

skywhopper6y ago

Why are you so certain about your assertions here about when they did and did not need to use explicit byte strings?

nemothekid6y ago· 11 in thread

Jasper_6y ago

int_19h6y ago

> Python 2 had different types but you could be sloppy and it would kinda work out.

1 more reply

hsivonen6y ago

> There was this assumption that Unicode code points were the correct single unit to talk about Unicode.

The most messed-up thing about Python 3 is that it's supposed to be justified by doing Unicode right and they still got it wrong.

Looking at the old PEPs, it appears to have arisen by accident rather than as an actual design.

joshuamorton6y ago

1 more reply

takeda6y ago

> Yes, but that insistence that Bytes and Unicode are two different things that Shall Not Be Mixed was mostly a Python 3-ism

1 more reply

toyg6y ago

Zipfile has always been a mess. I have no idea why, but its interfaces have been consistently poor from a usability perspective. This well before py3 was a factor.

steveklabnik6y ago

kevingadd6y ago

vmchale6y ago

> Both Go and Rust ("""systems""" level languages) have decided that "utf-8 ought to be enough for anybody" and that seems to be a good decision.

When working with e.g. filepaths, Rust has an OsStr type.

dilap6y ago

A go string is just a sequence of bytes, which is usually/by convention utf8. But you can store anything you want in there, if necessary.

bjoli6y ago

peatmoss6y ago· 9 in thread

Oddly my feeling is that Racket, in its departure from mainline Scheme, largely did retain its core audience, but that may have been a feature of its usage in academia.

I’m not sure there are lessons to draw, other than noting than version bumping versus making a new language with a new name can be bad for entirely different reasons.

intrepidhero6y ago

b2gills6y ago

Everything that I've heard that was changed from Python 2 to 3 are reminiscent of things that Perl5 handled while maintaining backwards compatibility.

I mean Perl5 is still mostly backwards compatible back to the original version released in 1987. (There were a few rarely used bad features that should have never been there which have been removed.)

The way it does this is by having you specifically ask for the new features, if they would otherwise break code.

edflsafoiewq6y ago

mikepurvis6y ago

1 more reply

peatmoss6y ago

Python 3 is a “success” in that a lot of people have moved. But it was, as you rightly point out, a hard won victory that left a lot of people unhappy.

tus886y ago

> introduce large, breaking changes without major benefits

FTFY

lizmat6y ago

Re: Often cited is the fact that Perl 6 is a totally different language unrelated by anything but creator and name

Indeed. That is why Perl 6 has been renamed to Raku (https://raku.org using the #rakulang tag on social media).

I'm not sure what lessons can be drawn from this, other than being indecisive has its price.

mark_l_watson6y ago

I totally agree with you, I am uncomfortable with Racket 2 having a non-Lisp syntax. As someone who has used Lisp languages to get stuff done for over 30 years, I would say NO to the syntax change.

That said, Racket is open source the maintainers have good reasons for a change based on getting a larger user base. I wish them great success.

peatmoss6y ago

war10256y ago· 8 in thread

We are on the brink of completing the transition to python3 at my work.

It was a nightmare of effort that I'm glad to have behind us.

vmchale6y ago

> All of this was the heroic effort of one of my coworkers who had the unenviable task of combing through our entire codebase to determine "This is unicode. This is bytes.

Dynamic typing!

war10256y ago

Not dynamic typing's fault.

The issue is they changed the types out from underneath you.

And then left it to each library to decide which type it was actually going to accept.

1 more reply

magicalhippo6y ago

Delphi also went through a similar transition from "strings are in whatever the local code page says" with one byte chars to Unicode strings (Windows-style).

However the makers of Delphi spent many years preparing for this, so when the time came for us to switch we only had to spend half a day or so to migrate our half a million lines of code.

d0mine6y ago

Something is wrong if there is no third type: the "natural" string (bytes on Python 2, unicode on Python 3).

war10256y ago

I assume many of the strings were left untouched. But you still have to audit all of it to know which needs to be used where.

eesmith6y ago

I believe that's included in the "etc."

quietbritishjim6y ago

Surely any "natural" string would be better represented as unicode in Python 2? What is an example that wouldn't be?

3 more replies

jnwatson6y ago

There is. It looks like this:

  u"Hello World"