It's true that GPLv3 covers patents, but it is still primarily a copyright license.
The tokenizer's tokens aren't patented, for sure. They can't be trademarked (they don't identify a product or service). They aren't a trade secret (the data is public). They aren't copyrighted (not a creative work). And the GPL explicitly preserves fair use rights, so there are no contractual restrictions either.
A tokenizer is effectively a list of the top-n most common byte sequences. There's simply no basis in law for it to be subject to copyright or any other IP law in the average situation.