But in this case it's not that the reason does not matter, it's that the reason is censorship/bad faith competition, and is obfuscated behind a mistake
This can be accomplished in a few ways. You could accumulate real URLs and build a test set that you can run in non-prod environments prior to deploy. You could also deploy the new version alongside the current version, both watching the live data, with the current version retaining enforcement power while the new version is in log-only mode.
In the case of automated systems that might create new actions in response to live traffic, anomaly detection can be used to look for significant changes in classification and/or rate of actions, spikes in actions against specific domains, etc.
The output isn't reproducible, not even predictable. The whole idea of a system like this is that it adapts. If only by simply collecting more data to "do the stats on".
What systems like this need, is different layers towards which stuff is leveraged. This is how your spam folder in your mailbox works too (to some extend). Basically: if it's clearly spam, just /dev/null it, if its clearly not spam let it pass. Everything inbetween will be re-rated by another layer which then does the same etc. One or more of these layers can and should be humans. The actions of these humans then train the system. If gmail isn't certain something is spam, it'll deliver it to your spam folder, or maybe even to your inbox. For you to review and mark as ham or spam manually.
Knowing that Elon fired a lot of teams of humans that fact-checked, researched fake news, a lot of it manually, I'd not be surprised if exactly the "human layers" were simply removed. Leaving a system that's not tuned nor checked while running.
(Source: I've built spam/malware/bot etc detection for comment sections of large sites)
Why not? They were already filtering millions of random links with the existing system. Saving some of those results to run regressions against before making changes to critical infrastructure should be trivial.