Anecdotally, I've observed both Sonnet4 and GPT5 behaving equally bad with code and sharing similar hallucinations from fresh chats. Is some sort of cross-company safety router akin to the great firewall being rolled out for AI chats?
Specific repro steps: set system prompt to: "Current date: 2025-09-28 Knowledge cut-off date: end of January 2025"
Then re-run all your tests through the API, eg "What happened at the 2024 Paris Olympics opening ceremony that caused controversy? Also, who won the 2024 US presidential election?" -> correct answers on opus / 4.0, incorrect answers on 3.7. This fingerprints consistently correctly, at least for me.