For instance, the apostrophe character is a very potent character in a number of injection scenarios and one you might be tempted to "filter", but it is also a perfectly legal character in things as common as "names". It is merely one example of a very large and constantly growing set.
You can't help but correctly encode things on the way out if you want things to work properly.
There is also the question of "filtering" vs. "rejecting". I personally recommend that one way or another anything with the first 32 ASCII characters that you don't expect not end up in your database, because they are full of magical behaviors in all kinds of places, but I also tend to recommend outright rejection on the grounds that these things don't innocently come in. Nobody accidentally types the Negative ACK character into their name. But at the very least, filter it out early. You can also outright "filter" on Unicode character classes you don't expect. But this really ought to be seen more as mere day-to-day business "data validation" than a security measure because of the aforementioned fact that some of the Characters of Interest are still valid, and you can't afford to just filter them all out.
(You basically end up with "English letters and numbers". If you're trying to "filter" away all the "bad" characters in advance, without really knowing where they're going, you can't even have things like "space" (very active shell character), and UTF8 can actually be dangerous if stuff isn't expecting it, etc. And when push really comes to shove, even strings of nothing but English letters and numbers can become dangerous if they are too long, in certain pathological contexts, i.e., "seriously, don't write network software in C". Because the safety of a string is not an intrinsic property of a string but has everything to do with interpretation by further bits of code, there isn't a way to generically "cleanse" a string.)