We usually build the tokenizer by optimizing for one goal (space-efficient encoding of text), then use it in a model that is trained for an entirely different goal (producing good text, "reasoning", "coding", etc). It is not immediately clear that the optimization goal for the tokenizer is actually the one that best serves the training of the llm.
That's what all these attempts boil down to. They don't presume to be able to find a more space-efficient encoding by hand, they assume that the optimization goal for the tokenizer was wrong and they can do better by adding some extra rules. And this isn't entirely without precendent, most tokenizers have a couple of "forced" tokens that were not organically discovered. Moving around how digits are grouped in the tokenizer is another point where wins have been shown.
This is where projects like nanochat are really valuable for quickly and (relatively) cheaply trying out various tweaks