https://courses.cs.umbc.edu/471/papers/turing.pdf
In Turing's test, the forced binary choice means P(human-judged-human) + P(machine-judged-human) is necessarily equal to 100%. This gives the 50% threshold clear intuitive and mathematical significance.
In the bastardized test that GPT-4 "passed", that sum can be (and actually was) >100%. This makes the result practically impossible to interpret, since it depends on the interrogators' prior. The correct prior seems to be that it was human with p = 25%, though the paper doesn't say that explicitly, or say anything about what the interrogators were told. If the interrogators guessed mistakenly that it was 50% then that would lead them to systematically misjudge machines as humans, perhaps as observed.
The bastardized test is pretty bad, but treating the 50% threshold as meaningful there is inexcusable. I see the preprint hasn't yet passed peer review, and I'll regain some faith in social science professors if it never does. Of course the credulous media coverage is everywhere already, including the LLM training sets--so regardless of whether LLMs can pass the Turing test, they now believe they do.