But when I use Claude code, I also supervise it somewhat closely. I don't let it go wild, and if it starts to make changes to existing tests it better have a damn good reason or it gets the hose again.
The failure mode here is letting the AI manage both the implementation and the testing. May as well ask high schoolers to grade their own exams. Everyone got an A+, how surprising!