undefined | Better HN

0 pointsLatty10mo ago0 comments

People just confidently stating stuff like "current LLMs basically pass the Turing test" makes me feel like I've secretly been given much worse versions of all the LLMs in some kind of study. It's so divorced from my experience of these tools, I genuinely don't really understand how my experience can be so far from yours, unless "basically" is doing a lot of heavy lifting here.

0 comments

24 comments · 7 top-level

JohnFen10mo ago· 11 in thread

> "current LLMs basically pass the Turing test" makes me feel like I've secretly been given much worse versions of all the LLMs in some kind of study.

I think you may think passing the Turing test is more difficult and meaningful than it is. Computers have been able to pass the Turing test for longer than genAI has been around. Even Turing thought it wasn't a useful test in reality. He meant it as a thought experiment.

skybrian10mo ago

The problem with comparing against humans is which humans? It's a skill issue. You can test a chess bot against grandmasters or random undergrads, but you'll get different results.

The original Turing test is a social game, like the Mafia party game. It's not a game people try to play very often. It's unclear if any bot could win competing against skilled human opponents who have actually practiced and know some tricks for detecting bots.

everforward10mo ago

It depends on which version of the Turing test you use. That's largely true of the standard version, but the later version included the human player winning if they were incorrectly identified as a machine.

The game is much harder if the human player is trying to pretend to be a machine.

theptip10mo ago

I don’t think this is true. Before GPT-2 most people didn’t think the Turing test would be passed any time soon, it’s a quite new development.

I do agree (and I think there is a general consensus) that passing the Turing test is less meaningful than it may seem, it used to be considered an AGI-complete task and this is now clearly not the case.

But I think it’s important to get the attribution right, LLMs were the tech that unexpectedly passed the Turing test.

HarHarVeryFunny10mo ago

Having LLMs capable of generating text based on human training data obviously raises the bar for a text-only evaluation of "are you human?", but LLM output is still fairly easy to spot, and knowing what LLMs are capable of (sometimes superhuman), and not capable of, should make it fairly easy for a knowledgeable "turing test administrator" to determine if they are dealing with an LLM or not.

It would be a bit more difficult if you were dealing with an LLM agent tasked with faking a turing test as opposed to a naieve LLM just responding as usual, but even there the LLM will reveal itself by the things that it plain can't do.

nerdix10mo ago

If you need a specialized skill set (deep knowledge of current LLM limitations) to distinguish between human and machine then I would say the machine passes the turing test.

1 more reply

sejje10mo ago

LLM output might be harder to spot when it's mostly commands to drive the browser.

I often interact with the web all day and don't write any text a human could evaluate.

1 more reply

svachalek10mo ago

Easy to spot assuming the LLM is not prompted to use a deliberately deceptive response style rather than their "friendly helpful AI assistant" persona. And even then, I've had lots of people swear to me that an emoji laden not this--but that bundle of fluff looks totally like it could have been written by a human.

1 more reply

ConceptJunkie10mo ago

ELIZA was passing the Turing test 50+ years ago. But it's still a valid concept, just not for evaluating some(thing/one) accessing your website.

xwolfi10mo ago

"Are you an LLM?" poof, fails the Turing test.

Even if they lie, you could ask them 20 times and they d reply the lie, without feeling annoyed: FAIL.

LLMs cannot pass the Turing test, it's easy to see they're not human. They always enjoy questions ! And they never ask any !

junon10mo ago

You're trained to look for LLM-like output. My 70 year old mother is not. She thought cabbage tractor was real until I broke the news to her. It's not her fault either.

The turning test wasn't meant to be bulletproof, or even quantifiable. It was a thought experiment.

1 more reply

LattyOP10mo ago

I guess that is where the disconnect is, the issue is that if they mean the trivial thing, then bringing it up as evidence for "it's impossible to solve the problem" doesn't work.

x18746310mo ago· 2 in thread

Per the wiki article for Turing Test:

> In the test, a human evaluator judges a text transcript of a natural-language conversation between a human and a machine. The evaluator tries to identify the machine, and the machine passes if the evaluator cannot reliably tell them apart. The results would not depend on the machine's ability to answer questions correctly, only on how closely its answers resembled those of a human.

Based on this, I would agree with the OP in many contexts. So, yeah, 'basically', is a load bearing word here but seems reasonably correct in the context of distinguishing human vs bot in any scalable and automated way.

dylan60410mo ago

Or it could be a bad test evaluator. Just because one person was fooled does not mean the next will be too.

HarHarVeryFunny10mo ago

Judging a conversation transcript is a lot different from being able to interact with an entity yourself. Obviously one could make an LLM look human by having a conversation with it that deliberately stayed within what it was capable of, but judging such a transcript isn't what most people imagine as a turing test.

AberrantJ10mo ago· 2 in thread

Here's three comments, two were written by a human and one written by a bot - can you tell which were human and which were a bot?

Didn’t realize plexiglass existed in the 1930s!

I'm certainly not a monetization expert. But don't most consumers recoil in horror at subscriptions? At least enough to offset the idea they can be used for everything?

Not sure why this isn’t getting more attention - super helpful and way better than I expected!

chrismorgan10mo ago

On such short samples: all three have been written by humans—or at least comments materially identical have been.

The third has also been written by many a bot for at least fifteen years.

AberrantJ10mo ago

If you're willing to say that a fifteen year old bot was "writing" then I think having a discussion on if current "bots" pass the Turing test is sort of moot

cm201210mo ago· 2 in thread

I have seen data from an AI call center that shows 70% of users never suspected they spoke to an AI

bluefirebrand10mo ago

Why would they? Humans running call centers have been running on less than GPT level scripts for ages

HarHarVeryFunny10mo ago

Isn't the idea of a Turing test whether someone (meaningfully knowledgeable about such things) can determine if they are talking to a machine, not can the machine fool some of the people some of the time? ELIZA passed the latter bar back in the 1960's ... a pretty low bar.

falcor8410mo ago

As far as I understand, Turing himself did not specify a duration, but here's an example paper that ran a randomized study on (the old) GPT 4 with a 5 minute duration, and the AI passed with flying colors - https://arxiv.org/abs/2405.08007

From my experience, AI has significantly improved since, and I expect that ChatGPT o3 or Claude 4 Opus would pass a 30 minute test.

1una10mo ago

Well, LLMs do pass the Turing Test, sort of.

https://arxiv.org/abs/2503.23674

armchairhacker10mo ago

It can't mimic a human over the long term. It can solve a short, easy-for-human CAPTCHA.

j / k navigate · click thread line to collapse

0 comments

24 comments · 7 top-level

JohnFen10mo ago· 11 in thread

> "current LLMs basically pass the Turing test" makes me feel like I've secretly been given much worse versions of all the LLMs in some kind of study.

skybrian10mo ago

The problem with comparing against humans is which humans? It's a skill issue. You can test a chess bot against grandmasters or random undergrads, but you'll get different results.

everforward10mo ago

The game is much harder if the human player is trying to pretend to be a machine.

theptip10mo ago

I don’t think this is true. Before GPT-2 most people didn’t think the Turing test would be passed any time soon, it’s a quite new development.

But I think it’s important to get the attribution right, LLMs were the tech that unexpectedly passed the Turing test.

HarHarVeryFunny10mo ago

nerdix10mo ago

If you need a specialized skill set (deep knowledge of current LLM limitations) to distinguish between human and machine then I would say the machine passes the turing test.

1 more reply

sejje10mo ago

LLM output might be harder to spot when it's mostly commands to drive the browser.

I often interact with the web all day and don't write any text a human could evaluate.

1 more reply

svachalek10mo ago

1 more reply

ConceptJunkie10mo ago

ELIZA was passing the Turing test 50+ years ago. But it's still a valid concept, just not for evaluating some(thing/one) accessing your website.

xwolfi10mo ago

"Are you an LLM?" poof, fails the Turing test.

Even if they lie, you could ask them 20 times and they d reply the lie, without feeling annoyed: FAIL.

LLMs cannot pass the Turing test, it's easy to see they're not human. They always enjoy questions ! And they never ask any !

junon10mo ago

You're trained to look for LLM-like output. My 70 year old mother is not. She thought cabbage tractor was real until I broke the news to her. It's not her fault either.

The turning test wasn't meant to be bulletproof, or even quantifiable. It was a thought experiment.

1 more reply

LattyOP10mo ago

I guess that is where the disconnect is, the issue is that if they mean the trivial thing, then bringing it up as evidence for "it's impossible to solve the problem" doesn't work.

x18746310mo ago· 2 in thread

Per the wiki article for Turing Test:

dylan60410mo ago

Or it could be a bad test evaluator. Just because one person was fooled does not mean the next will be too.

HarHarVeryFunny10mo ago

AberrantJ10mo ago· 2 in thread

Here's three comments, two were written by a human and one written by a bot - can you tell which were human and which were a bot?

Didn’t realize plexiglass existed in the 1930s!

I'm certainly not a monetization expert. But don't most consumers recoil in horror at subscriptions? At least enough to offset the idea they can be used for everything?

Not sure why this isn’t getting more attention - super helpful and way better than I expected!

chrismorgan10mo ago

On such short samples: all three have been written by humans—or at least comments materially identical have been.

The third has also been written by many a bot for at least fifteen years.

AberrantJ10mo ago

If you're willing to say that a fifteen year old bot was "writing" then I think having a discussion on if current "bots" pass the Turing test is sort of moot

cm201210mo ago· 2 in thread

I have seen data from an AI call center that shows 70% of users never suspected they spoke to an AI

bluefirebrand10mo ago

Why would they? Humans running call centers have been running on less than GPT level scripts for ages

HarHarVeryFunny10mo ago

falcor8410mo ago

From my experience, AI has significantly improved since, and I expect that ChatGPT o3 or Claude 4 Opus would pass a 30 minute test.

1una10mo ago

Well, LLMs do pass the Turing Test, sort of.

https://arxiv.org/abs/2503.23674

armchairhacker10mo ago

It can't mimic a human over the long term. It can solve a short, easy-for-human CAPTCHA.

j / k navigate · click thread line to collapse