(b) If* the button really performs as advertised, why on earth should it be a manual step? Do it automatically.
(c) Best practice here is to escape-on-output, negating the need to remove in-DB XSS, so the fact that this button exists strongly implies they aren't doing that
* I would guess that the button is very unlikely to perform comprehensively as advertised
Input validation is good practice (checking that the input is what's expected), but input filtering is problematic for two reasons:
1. Since input filtering lacks context, the usual option is to filter very broadly (e.g. to attempt to filter for SQL injection and various forms of XSS simultaneously). This leads to much more complicated filtering strategies, adding maintenance overhead, risking data corruption, and generally increasing technical debt.
2. Data loss. This is less of an immediate security issue, but full data retention can be useful for debugging, for future migrations or data transformations, &c. It also ensures you don't get data corruption (e.g. where filtering for one context breaks your data in a different context).
For instance, the apostrophe character is a very potent character in a number of injection scenarios and one you might be tempted to "filter", but it is also a perfectly legal character in things as common as "names". It is merely one example of a very large and constantly growing set.
You can't help but correctly encode things on the way out if you want things to work properly.
There is also the question of "filtering" vs. "rejecting". I personally recommend that one way or another anything with the first 32 ASCII characters that you don't expect not end up in your database, because they are full of magical behaviors in all kinds of places, but I also tend to recommend outright rejection on the grounds that these things don't innocently come in. Nobody accidentally types the Negative ACK character into their name. But at the very least, filter it out early. You can also outright "filter" on Unicode character classes you don't expect. But this really ought to be seen more as mere day-to-day business "data validation" than a security measure because of the aforementioned fact that some of the Characters of Interest are still valid, and you can't afford to just filter them all out.
(You basically end up with "English letters and numbers". If you're trying to "filter" away all the "bad" characters in advance, without really knowing where they're going, you can't even have things like "space" (very active shell character), and UTF8 can actually be dangerous if stuff isn't expecting it, etc. And when push really comes to shove, even strings of nothing but English letters and numbers can become dangerous if they are too long, in certain pathological contexts, i.e., "seriously, don't write network software in C". Because the safety of a string is not an intrinsic property of a string but has everything to do with interpretation by further bits of code, there isn't a way to generically "cleanse" a string.)
If you rely on input filtering but you miss something, there's a bug and filtering doesn't work, or there's a new type of text that needs to be filtered (eg: a new tag added that didn't exist at the time), you have no recourse -- that text is already in the database.
A software update can fix/change the output filtering, and since that runs at display time (when the vulnerability is actually activated), it can address it.
Then, when the data itself is read, it should pass through context-sensitive output filters, one for the template engine, once again if it's going to be embedded in html, or javascript, or a stylesheet, or into a URL. Output filtering needs to happen, but it critically should not be attempted at the time of input, as the developer of the form should not be expected to predict any possible usage for that data once gathered.
The broad-strokes filtering you describe in step 1 is an anti-pattern through and through. :) I'd avoid personally any codebase or framework attempting this strategy, as I've seen it fail too many times.
cron, maybe?
>Select a module to remove potential XSS strings
My reading of this is SugarCRM, which ships barebones like Drupal, has a whole ecosystem of poorly coded third-party modules you need even for basic CRM functionality. The core Sugar devs wrote this little script to run through module code and its db entries to find potential XSS issues. I imagine that this would break things so its a 'admin beware' type of thing they shouldn't automate. No one wants to upgrade to Sugar 5.2 from 5.1 to find half their modules broken, so its manual. Maybe in the future it won't be and its a shot across the bow to third-party developers to take security seriously.
That's the problem with these popular bare-bones FOSS frameworks. You get a decent core product like Drupal or even Wordpress, but you need a dozen or so modules to make it do anything useful. The code quality on random modules, is of course, random as well, so the core devs try to work around it. Its ugly, but more an indictment against module authors than the core SugarCRM team. That team is just trying to fix the shit-code in the modules and protect the reputation of their product. They don't want to find themselves in the situation Microsoft often found itself in, where Flash and Java exploits were blamed on Windows as non-tech savvy people saw their Windows hacked and didn't know it was a third-party program responsible for it.
This is also why we don't often implement systems like these at work. While I may have trust in the Drupal or Sugar team, I'm ultimately left to be forced to trust sole module authors like 'webdev420hacker' and 'boutiquewebsites1997' because they're the only game in town for these modules. I think bare-bone frameworks are falling out of style a bit. Go with a manged cloud product or go with a more featureful product that isn't dependent on 3rd party modules so much, or go with a commercial solution with everything instead. There's too much variability with third-party modules and not remotely enough security conscious eyes on this code.