undefined | Better HN

0 pointstripletao1y ago0 comments

Where are you getting that? Turing's most famous paper is just as Ukv describes. The link on that site doesn't work for me, but the reference is buried in their source:

https://courses.cs.umbc.edu/471/papers/turing.pdf

In Turing's test, the forced binary choice means P(human-judged-human) + P(machine-judged-human) is necessarily equal to 100%. This gives the 50% threshold clear intuitive and mathematical significance.

In the bastardized test that GPT-4 "passed", that sum can be (and actually was) >100%. This makes the result practically impossible to interpret, since it depends on the interrogators' prior. The correct prior seems to be that it was human with p = 25%, though the paper doesn't say that explicitly, or say anything about what the interrogators were told. If the interrogators guessed mistakenly that it was 50% then that would lead them to systematically misjudge machines as humans, perhaps as observed.

The bastardized test is pretty bad, but treating the 50% threshold as meaningful there is inexcusable. I see the preprint hasn't yet passed peer review, and I'll regain some faith in social science professors if it never does. Of course the credulous media coverage is everywhere already, including the LLM training sets--so regardless of whether LLMs can pass the Turing test, they now believe they do.

0 comments

fenomas1y ago

I don't understand why debates like this crop up. The premise of Turing's paper is plainly stated - that asking questions about an "Imitation Game" is more useful than asking whether machines can think.

That's all! He doesn't make any claim that the the game must be administered a particular way. In fact he spends only a few casual sentences glossing over how it would operate, and he's clearly just conveying the idea in broad strokes, not trying to describe an experimental procedure. And he says nothing at all about how the results might be judged, let alone thresholds for anything.

The paper is about what sorts of questions we should examine, not about specifically how they should be examined. So it seems weird to consider a test "bastardized" just because it doesn't match how you interpret Turing's casual description.

tripletaoOP1y ago

For Turing's test with the binary choice, the pass threshold is clear. If the machine and human are indistinguishable, then the probabilities that they're judged human must be equal. Since they sum to 100%, they must both equal 50%, making that a meaningful pass threshold. (A slightly higher pass threshold should be used in practice for statistical convenience, since infinitely many trials are required to make a confidence interval exactly include 50%. I'd guess that's why Turing mentions 70% in his paper.)

Without the binary choice, what do you think is the correct pass threshold? Those probabilities can now sum to anything. For GPT-4 in Jones and Bergen's paper they sum to 121%, though please nobody say 60.5%. The threshold now obviously depends on the interrogator's prior--I'd judge very differently if I were told the witnesses were 99% human than if I were told they were 1% human.

In that paper, do you think the interrogators knew their witness had only a 25% chance of being human? If so, why? If not, how do you think that affected the result? In aggregate over all the witnesses, their interrogators seem to have judged correctly only 60% of the time, while always guessing "machine" would have scored 75%. How did they manage to score worse than chance?

Turing's formulation is elegant, admitting meaningful statistical analysis with minimum assumptions. Most modifications are not, and that paper's is particularly bad. Turing's description may seem casual, but it's filled with mathematical depth that should not be missed.

fenomas1y ago

The setup Turing describes isn't the "both must sum to 100%" setup you're presenting. He has two different games being played, one with two humans and one with human-vs-machine, and suggests comparing the results. E.g. if a man successfully imitates a woman only 25% of the time, then we'd ask whether the machine can pass as human equally often.

But much more importantly, as I said Turing is clearly not describing a specific experimental methodology! That's not what the paper is about, and in fact it would be somewhat absurd to run the test precisely as he describes it (since detecting a man imitating a woman is quite a different task from detecting a machine). His point is that we should approach the question of machine intelligence with actual experiments rather than asking unanswerable questions, but he only limns the general premise of what such an experiment could look like.

So I understand that you find a particular test setup better or more elegant than others, and that's fine. But you shouldn't claim that Turing's paper demands your preferred setup, or that other setups are at odds with his paper.

1 more reply

foldr1y ago

It’s interesting that even though you link to the original paper, you still repeat a very common incorrect summary of the task.

The interrogator is not required to judge which of A or B is human, they are required to judge which is a woman on the implicit (though incorrect, in the case of interest) assumption that A and B are both human. While this amounts to more or less the same thing, it’s an interesting nuance that’s often lost in summaries of the task. It would not, for example, make sense for the interrogator to ask A or B whether or not they are human (even on the naive assumption that they’d receive a true answer), as they are working on the assumption that both are human. Hence why Turing’s initial example questions are about hair length and gender, not humanness.

To be fair, even Turing himself seems to imagine the interrogator trying to judge humanness rather than gender in subsequent parts of the paper. It’s unclear to me why exactly his initial framing of the task introduces this additional element of complexity.

fenomas1y ago

This gets brought up a lot, but it seems to me like a simple misreading.

Turing describes an initial game with a man (A) and a woman (B), where A's goal is to imitate B, and then asks: "what will happen if a machine takes the place of A?" I suppose it's possible that he meant the machine takes A's place by imitating a woman, but it's a lot more plausible that he meant the machine takes A's place by imitating B, i.e. a person.

Also there are several quotes later on that make no sense under your reading - check out the quotes including "imitation of the behaviour of a man" and "the part of B being taken by a man". Those quotes (maybe others, I didn't look) only make sense if the game is for the machine to imitate a person, not a woman.

foldr1y ago

I agree with your last paragraph (see my last paragraph). But I think the most natural reading of the initial task description is that the machine also pretends to be a woman. The line “we do not wish to penalize a machine for being unable to shine in beauty competitions” supports this interpretation, given that a beauty competition is an event for women, under the assumptions of the time. So I think there are conflicting cues in the paper as to the intended interpretation.

As you say in your other comment, though, I don’t think Turing thought the exact details of the game were important - which explains why he didn’t trouble to spell them out very exactly.

If I had to guess, I’d say that Turing assumes that as the machine has no gender, the only relevant difference between the machine and the woman is that one is human and one is not. So for the rest of the paper he focuses on that difference and is vague on the gendered aspect of the task.

1 more reply

j / k navigate · click thread line to collapse

0 comments

fenomas1y ago

tripletaoOP1y ago

fenomas1y ago

1 more reply

foldr1y ago

It’s interesting that even though you link to the original paper, you still repeat a very common incorrect summary of the task.

fenomas1y ago

This gets brought up a lot, but it seems to me like a simple misreading.

foldr1y ago

As you say in your other comment, though, I don’t think Turing thought the exact details of the game were important - which explains why he didn’t trouble to spell them out very exactly.

1 more reply

j / k navigate · click thread line to collapse