undefined | Better HN

0 pointsasveikau13y ago0 comments

I don't know what you're talking about, Win32 supports UTF-8. I can pass CP_UTF8 into WideCharToMultiByte() just fine. :-)

Kidding aside, I don't really see the issue. Do you get upset about in-memory representation of strings often? How about when using Java or Python? Is this not why there is an entire programming practice called "serialization"? Windows started supporting UCS-2 before UTF-8 existed, and so the internal representation on Windows remains 16 bits per char.

0 comments

3 comments · 1 top-level

zurn13y ago· 2 in thread

Python supports full Unicode on all platforms now (x). Who cares about in-memory representation as long as the user doesn't have to suffer from surrogates and all that horror.

(x) pre-PEP393 there was a build option to use the limited 16-bit Unicode, was unfortunately popular on Windows

asveikauOP13y ago

Hm. I picked Python because I knew it was an outlier, but I don't know all the details. Does Python still suffer from the horrors of comparison between diacritics made up of combining characters and the same glyph as a pre-composed character? Seems like even if you expose strings as UTF-32 you'd still have that issue.

zurn13y ago

Afaik the combining character problem hasn't changed, you still have to use unicodedata.normalize() for that. But at least you can pass through Unicode strings cleanly.

Are there other languages that handle this better?

2 more replies

j / k navigate · click thread line to collapse