undefined | Better HN

0 pointsscoot_7186y ago0 comments

> and the result was better Unicode support

Different Unicode support. And worse bytes support.

What could previously be done using python -c "..." is now long, horrible and ugly.

0 comments

7 comments · 2 top-level

mehrdadn6y ago· 5 in thread

> Different Unicode support. And worse bytes support.

I feel like you're the first person I've seen on the planet to echo my sentiments on these. I expect a lot of people will jump here to tell you you're wrong like they have to me, so just wanted to let you know I've felt exactly these pains and agree with you.

harikb6y ago

I am hoping this is in agreement. py2’s flexibility to handle utf8 bytes without fuss is amazing. Then people come up with all kind of purity reasons to make it more complicated.

dataflow6y ago

Take out "utf8" and I'll agree ;)

The fundamental problem as I see it is that "string" is a grossly leaky and misunderstood abstraction. The string type is not the same thing as a "text" type. It's being used in all the wrong places for that purpose. People treat "string" like it means "text", but in so many places where we deal with them, they just aren't (and should never be) text. Everything from stdio to argv to file paths to environment variables to "text" files to basically any interface with the outside world needs to be dealt with in bytes rather than text if you care about actually producing correct code that doesn't lose, corrupt, or otherwise choke on data.

C++ understood this and got it right, preferring to focus on optimizing rather than constraining the string type. Many other languages did pretty well by avoiding enforcing encodings on strings, too. And Python 2 defaulted to bytes as well, and only really cared about encoding/decoding at I/O boundaries where it thought it can assume it's dealing with text (though it sometimes didn't behave well there, and yes it got painful as a result). Then Python 3 came along and just made everyone start treating most data as if they're inherently (Unicode) text by default, when they really had no such constraints to begin with.

It boggles my mind that Python 3 folks like to beat the drum on how Python 3 got the bytes/unicode right without taking a single moment to even notice that most strings people deal with aren't (and never were!) actually guaranteed to be in a specific, known textual encoding a priori. They were just arrays of code units with few restrictions on them, and if you want to write correct code, you're going to have to deal with bytes by default (or something else with similar flexibility) instead of text. It would've been totally fine to introduce a text type, but it fundamentally can't take the place of a blob type, which is the language of the outside world.

1 more reply

raverbashing6y ago

> py2’s flexibility to handle utf8 bytes without fuss is amazing

Without fuzz? No, sorry, it was anything but.

First of all it would default encoding to "ASCII". Have any whiff of non-explicitly handled UTF-8 and it would just go bang at the worse time possible.

That was a stupid decision

"Oh but there was setdefaultencoding" Yeah here's the first result for that https://stackoverflow.com/questions/3828723/why-should-we-no...

So no, Python2 way of dealing with Unicode was the most annoying way possible, because hey who needs anything but ASCII right?

3 more replies

int_19h6y ago

Except for that part where it would happily implicitly convert them to/from a Unicode string in any context where one was needed or present... using ASCII, rather than UTF-8, as the encoding.

pdonis6y ago

> I feel like you're the first person I've seen on the planet to echo my sentiments on these.

There have been plenty of people with similar sentiments. I'm one of them. I have felt ever since I first looked at Python 3 that the ways in which it broke backward incompatibility were heavily skewed towards a few particular use cases and did not take into account the needs of all of the Python community.

Too6y ago

Got any examples?

j / k navigate · click thread line to collapse