This only came to light after the study had already been running for a few months. That proves that we can no longer tell for certain unless it's literal GPT-speak the author was too lazy to edit themselves.
Teachers will lament the rise of AI-generated answers, but they will only ever complain about the blatantly obvious responses that are 100% copy-pasted. This is only an emerging phenomenon, and the next wave of prompters will learn from the mistakes of the past. From now on, unless you can proctor a room full of students writing their answers with nothing but pencil and paper, there will be no way to know for certain how much was AI and how much was original/rewritten.