This shouldn't really be a surprise to anyone. It was reported years ago that Twitter was unable to cut down on hate speech because the automated systems they developed triggered too many [debatably false] positives on Republican politicians and that was bad for the company's reputation. If Twitter wanted to prevent future code changes from impacting that approach, there needed to be something like this in the code or tests.