The way this experiment is conducted is not inline with how current agentic AI is used OR how even humans edit documents.
Here's how agentic AI currently typically do edits:
1. They read the whole document. 2. They come up with a patch. A diff of the section they want to edit. 3. They change THAT section only.
This is NOT what that experiment was doing. A 25% degradation rate would render the whole industry dead. No one would be using claude code because of that. The reality is... everyone is using claude code.
AI is alien to the human brain, but in many ways it is remarkably. This is one aspect of similarity in that we cannot edit a whole document holistically to produce one edit. It has to be targeted surgical edits rather then a regurgitation of the entire document with said edit.