Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
0 points
dietr1ch
2y ago
0 comments
Save
Share
I'd guess that the tokenizer is just different and handles this in a "better" way.
0 comments
1 comments · 1 top-level
top
newest
oldest
goodside
2y ago
No, in both tokenizers Unicode tag-block code points like these are converted into bytes (two tokens per character), which is a fallback for code points uncommon enough to not warrant a dedicated token.
j
/
k
navigate · click thread line to collapse