And note that I'm not singling out China here.
Note that if such a trigger were to exist, the behavior has to be completely reproducible by definition, e.g. when put into the right setting with the right input context, the model starts behaving maliciously with at least some well-defined probability. I don't think any such incident has ever been described, it's a purely theoretical concern.
How do most Chinese models handle Tienanmen square or discussions on Han superiority?
I was using Claude to work on a pet project which itself has a "generate with AI" feature. The default model the project uses was Gemini (because it was cheaper and more reliably produces the correct output format). Claude kept changing the default model to Opus when working on entirely unrelated parts, and I kept noticing it because Opus would mangle the output and break the rendered page. It also did this to the .env file in addition to the default.
Even with these precautions you may still be hacked by state-level actors using a whole variety of sophisticated attack vectors. There may be Stuxnet-like software hidden on your hard drive where you cannot see it. If you do not have a TEMPEST hardened compute environment then anything you type on your keyboard or display on your screen may be getting stolen.
That said, it would be a fantastic achievement if someone could create a coding model that managed to hide a backdoor in the code it was generating. although surely simpler to hack you in 100 other ways.
And OpenRouter’s architecture makes it inherently a compliance nightmare.
It’s much easier for the typical company to go with a provider where they can pay as they go and have a single data processing agreement.
Why?
Using something like Bedrock is a lot easier for compliance because the only processor is Amazon.
I suspect the reason is similar to the reason why there aren't any competitive open weight American LLMs.