It does not statistically prove anything, but as I thought I made extremely clear in the card where I discuss it, the point of bringing it up is different: to prove the hypocrisy of the anti-AI crowd.
> By the same token, it would be incorrect for critics to claim that it proves their point. Did anyone claim this, though? The heated complaints seemed more based on priors about AI code.
The entire outrage is because people noticed what they thought was an unusual number of bugs and/or regressions in the release, saw it had Claude in it, and assumed a causal link, not just "priors about AI code."
> You would never be able to reject a null hypothesis based only on that, so it's even less sound to present it as proving the null hypothesis.
The point I'm trying to make is that there is no evidence, based on these two releases, to think Claude made anything worse, whatsoever, and so the outrage is unfounded. This doesn't require me to prove Claude didn't cause any problems. If I ever made the latter claim, I should clean that up.
> It's optimized for passing tests and convincing people and AIs that security holes are closed, which means other considerations like preserving functionality can more easily be regressed as compared to if humans were doing it.
Tridge actually explicitly says he made that tradeoff on purpose, not the AI.
> Every sentence is saturated with AI style. Perhaps the author so AI-indoctrinated that they can't see this? It doesn't read as even vaguely plausible human writing. Which is mightily ironic given the thesis of "AI generated stuff is just fine, m'kay?" The writing style does more to defeat its conclusion than the analysis itself.
I've since rewritten nearly 100% of the prose in the analysis with my own, more inflammatory and verbose style. I also intentionally left in my natural mispellings and typos, to prove it was me.