Oh interesting, does that mean languages other than English won't be paying such a large penalty in terms of token lengths?
With previous tokenizers there was a notable increase in the number of tokens needed to represent non-English sentences: https://simonwillison.net/2023/Jun/8/gpt-tokenizers/