I hope the folding and normalization is configurable by language. I really hate it when some search decides that a and ä are the same letter. In Finnish they really aren't; "saari" is an island, "sääri" is the lower leg or shin.
Currently, you can choose between tokenizers with or without folding. But configurability per language or full customizability of the folding logic by the user is a good idea.