Supporting Unicode characters comes with a set of complications that you may want to address, as, for instance the infamous HTML-parsing with regex rant shows.
As soon as you start trying to address those, the simple problem suddenly grows quite a lot in scope.