undefined | Better HN

0 pointssp33216y ago0 comments

That's pretty cool, but using valid UTF-8 should get you up to 6 bytes per character, since each half of a UTF-16 surrogate pair takes 3 bytes in UTF-8.

0 comments

1 comments · 1 top-level

keithwinstein16y ago

Well, if you can do better, please enter the contest! Glittering prizes are on the line here. I didn't completely follow your reasoning -- first, the code points allocated to surrogate pairs are not Unicode scalar values, and so can't be expressed in well-formed UTF-8. But also, remember that the goal is to encode the most bits per tweet, not to come up with the most verbose encoding. :-) How many bits of arbitrary source information would your scheme be able to carry in a tweet?

j / k navigate · click thread line to collapse