by "like that" do you mean "a community moderation system" or "a system for the CCP to backdoor explicit takedown rules into the system?"
Based on what we're seeing right now, this is likely caused not by the latter, but by the former. Consider: the ML-assisted thread moderation logic can be vulnerable to brigading. If several tens of thousands of Chinese people decided to start flagging comments with that phrase, YT would also start killing the phrase (because its sample is biased towards seeing "That phrase usually results in a flag, so the community clearly doesn't want it").