undefined | Better HN

0 pointsjerf9y ago0 comments

There's another less appreciated one, I think, which is that input filtering often has the problem that the "sensitive" values that you might be trying to filter out are also very often perfectly valid values as well.

For instance, the apostrophe character is a very potent character in a number of injection scenarios and one you might be tempted to "filter", but it is also a perfectly legal character in things as common as "names". It is merely one example of a very large and constantly growing set.

You can't help but correctly encode things on the way out if you want things to work properly.

There is also the question of "filtering" vs. "rejecting". I personally recommend that one way or another anything with the first 32 ASCII characters that you don't expect not end up in your database, because they are full of magical behaviors in all kinds of places, but I also tend to recommend outright rejection on the grounds that these things don't innocently come in. Nobody accidentally types the Negative ACK character into their name. But at the very least, filter it out early. You can also outright "filter" on Unicode character classes you don't expect. But this really ought to be seen more as mere day-to-day business "data validation" than a security measure because of the aforementioned fact that some of the Characters of Interest are still valid, and you can't afford to just filter them all out.

(You basically end up with "English letters and numbers". If you're trying to "filter" away all the "bad" characters in advance, without really knowing where they're going, you can't even have things like "space" (very active shell character), and UTF8 can actually be dangerous if stuff isn't expecting it, etc. And when push really comes to shove, even strings of nothing but English letters and numbers can become dangerous if they are too long, in certain pathological contexts, i.e., "seriously, don't write network software in C". Because the safety of a string is not an intrinsic property of a string but has everything to do with interpretation by further bits of code, there isn't a way to generically "cleanse" a string.)

0 comments

2 comments · 2 top-level

drspacemonkey9y ago

I've also come across scenarios where allowing the user to enter valid HTML was a requirement. Especially in cases where users will be entering HTML that renders as part of a site, it was much easier to treat all user input as potentially unsafe and escape output in all cases, with the exception of the one or two places where user-created HTML was supposed to be rendered and/or sent to an external API.

nradov9y ago

I run into web applications all the time where lazy and incompetent developers have blocked or filtered out the '<' and '>' characters in a naive attempt to prevent content injection attacks.