undefined | Better HN

0 pointsVeedrac6y ago0 comments

> I think continual Turing testing is the only way of concluding whether an agent exhibits intelligence or not.

So you think it's impossible to ever determine that a chimpanzee, or even a feral child, exhibits intelligence? This seems rather defeatist.

0 comments

3 comments · 1 top-level

Alas16y ago· 2 in thread

No, interpreting "continual" the way you did would mean I should believe that we can't conclude our friends to be intelligent either (I don't believe that). Maybe I should've said "prolonged" rather than "continual".

Let me elaborate on my previous point with an example. If you look at the recent works in machine translation, you can see that the commonly used evaluation metric of BLEU is being improved upon at least every few months. What I argue is that it's stupid to look at this trend and conclude that soon we will reach human performance in machine translation. Even when comparing against the translation quality of humans (judged again by BLEU on a fixed evaluation set) and showing that we can achieve higher BLEU than humans is not enough evidence. Because you also have Google Translate (let's say it represents the state-of-the-art), and you can easily get it to make mistakes that humans would never do. I consider our prolonged interaction with Google Translate to be a narrow Turing test that we continually apply to it. A major issue in research is that, at least in supervised learning, we're evaluating on datasets that are not different enough from the training sets.

Another subtle point is that we have strong priors about the intelligence of biological beings. I don't feel the need to Turing test every single human I meet to determine whether they are intelligent, it's a safe bet at this point to just assume that they are. The output of a machine learning algorithm, on the other hand, is wildly unstable with respect to its input, and we have no solid evidence to assume that it exhibits consistent intelligent behavior and often it is easy to show that it doesn't.

I don't believe that research in AI is worthless, but I think it's not wise to keep digging in the same direction that we've been moving in for the past few years. With deep learning, while accuracies and metrics are pushed further than before, I don't think we're significantly closer to general, human-like AI. In fact, I personally consider only AlphaZero to be an unambiguous win for this era of AI research, and it's not even clear whether it should be called AI or not.

VeedracOP6y ago

My comment was not on ‘continual’ but on ‘Turing test’.

If you gave 100 chimps of the highest calibre 100 attempts each, not a single one would pass a single Turing test. Ask a feral child to translate even the most basic children's book, and their mistakes will be so systematic that Google Translate will look like professional discourse. ‘Humanlike mistakes’ and stability with respect to input in the sense you mean here are harder problems than intelligence, because a chimp is intelligent and functionally incapable of juggling more than the most primitive syntaxes in a restricted set of forms.

I agree it is foolish to just draw a trend line through a single weak measure and extrapolate to infinity, but the idea that no collation of weak measures has any bearing on fact rules out ever measuring weak or untrained intelligence. That is what I called defeatist.

Alas16y ago

I see your point, but you're simply contesting the definition of intelligence that I assumed we were operating with, which is humanlike intelligence. Regardless of its extent, I think we would agree that intelligent behavior is consistent. My main point is that the current way we evaluate the artificial agents is not emphasizing their inconsistency.

Wikipedia defines Turing test as "a test of a machine's ability to exhibit intelligent behaviour equivalent to, or indistinguishable from, that of a human". If we want to consider chimps intelligent, then in that context the definition of the Turing test should be adjusted accordingly. My point still stands: if we want to determine whether a chimp exhibits intelligence comparable to a human, we do the original Turing test. If we want to determine whether a chimp exhibits chimplike intelligence, we test not for, say, natural language but for whatever we want our definition of intelligence to include. If we want to determine whether an artificial agent has chimplike intelligence, we do the second Turing test. Unless the agent can display as consistent an intelligence as chimps, we shouldn't conclude that it's intelligent.

Regarding your point on weak measures: If I can find an endless stream of cases of failure with respect to a measure that we care about improving, then whatever collation of weak measures we had should be null. Wouldn't you agree? I'm not against using weak measures to detect intelligence, but only as long as it's not trivial to generate failures. If a chimp displays an ability for abstract reasoning when I'm observing it in a cage but suddenly loses this ability once set free in a forest, it's not intelligent.

1 more reply

j / k navigate · click thread line to collapse