The best, in fact the only way to generate truly convincing text output on most subjects is to understand, on some level, what you're writing about. In other words, to create a higher level abstraction than simply "statistically speaking, this word seems to follow that one". Once you start to encode that words map to concepts, you can use the resulting conceptual model to create output which is conceptually consistent, then map it backwards to words. There is what humans do with sensory data, and there is good evidence that GPT-3 is doing this too, to some degree.
Take simple arithmetic, such as adding two and three digit numbers. GPT-2 could not do this very successfully. It did indeed look like it was treating it as a "find the textual pattern" problem.
But GPT-3 is much more successful, including at giving correct answers to arithmetic problems that weren't in its training set.
So what changed? We aren't sure, but the speculation is that in the process of training, GPT-3 found that the best strategy to correctly predicting the continuation of arithmetic expressions was to figure out the rules of basic arithmetic and encode them in some portion of its neural network, then apply them whenever the prompt suggested to do so.
If this is the case, and it remains speculation at this point, would you still argue that GPT-3 doesn't "understand" arithmetic, on some level? I would argue that this abstraction, this mapping of words onto higher-level concepts, which can then be manipulated to solve more complex problems, is exactly what intelligence is, once you strip away biologically-biased assumptions.
Certainly, at this point GPT-3's conceptual understanding remains somewhat primitive and unstable, but the fact that it exhibits it at all, and sometimes in spookily impressive ways, is what has people excited and worried. We have produced AIs that can perhaps think conceptually about relatively narrow topics like playing Go, but we have never before created one that can do so one such a wide range of topics. And there is no suggestion that GPT-3's level of ability represents a maximum. GPT-4 and beyond will be more powerful, meaning that it can mine more and more powerful conceptual understanding from their training data.
I don't mean to attack you personally, but this is a perfect example of what I feel is wrong with so much neural network research. (And I understand that you are just commenting in a discussion, not conducting research.)
In a word, it's baloney. And it's a really common pattern in neural networks' recent history: "How did they perform reasonably well on this task? We aren't sure, but the speculation is that they magically solved artificial general intelligence under the hood." Usually this is followed up by "I don't know how it works, but let's see if a bigger network can make even prettier text." Meanwhile, "it's funny how our image classifiers grossly misperform if you rotate the images a little or add some noise."
A rigorous scientific approach would be aimed at actually figuring out what these models can do, why, and how they work. Rather than just assuming the most optimistic possible explanation for what's happening -- that's antithetical to science.
This is where you lost me. They included important caveats to indicate not being sure, which is important to me as an indication of healthy skepticism. And you substituted a specific example: making inferences about arithmetic, for an more expansive, uncharitable, easy-to-caricature claim of "gee we must have solved general AI!" which is much easier to attack. And, unlike your counterpart, who hedged, you just went ahead and categorically declared it to be baloney, making you the only person to take a definitive side on an unsettled question before the data is in. This is a perfect example of the anti-scientific attitude exhibited in Overconfident Pessimism [0].
I don't think it's known how GPT-3 got so much better at answering math questions it wasn't trained on, I do think the explanation that it made inferences about arithmetic is reasonable, I think the commenter added all the qualifiers you could reasonably ask them to make before suggesting the idea, and frankly I would disagree that there's some sort of obvious history of parallels that GPT-3 can be compared to.
There is an interesting conversation to be had here, and there probably is much more to learn about why GPT-3 probably isn't quite as advanced as it may immediately appear to be to those who want to believe in it. But I think a huge wrench is thrown in that whole conversation with the total lack of humility required to confidently declare it 'baloney', which is the thing that sticks out to me as antithetical to science.
0: https://www.lesswrong.com/posts/gvdYK8sEFqHqHLRqN/overconfid...
As a side note, it's worth mentioning that apparently, from other responses, it seems we have little idea how much arithmetic GPT-3 has learned, and it may not be much.
Anyway, I think the important distinction between my perspective and Overconfident Pessimism, which you attribute to me, is that I'm not talking about (im)possibility of achievement, I'm talking about scientific methodology or lack thereof.
In other words, I'm not saying (here) that some NLP achievements are impossible. I'm saying that we are not rigorously testing, measuring, and verifying what we are even achieving. Instead we throw out superficially impressive examples of results and invite, or provoke, speculation about how much achievement probably must have maybe happened somewhere in order to produce them.
We have seen several years of this pattern, so this is not a GPT-3 specific criticism; it's just that particular quote so neatly captured patterns of lack of scientific rigour that we have seen repeatedly at this point.
Probably the first example was image recognition. Everyone was amazed by how well neural nets could classify images. There was a ton of analogous speculation -- along the lines of 'we're not sure, but the speculation is the networks figured out what it really means to be a panda or a stop sign and encoded it in their weights.' The terms "near-human performance" and then "human-level performance" were thrown around a lot.
Then we found adversarial examples and realized that e.g. if you rotate the turtle image slightly, the model becomes extremely confident that it's a rifle. So, obviously it has no understand of what a turtle or a rifle is. And obviously, we as researchers don't understand what those neural nets were doing under the hood, and that speculation was extremely over-optimistic.
Engineering cool things can absolutely be a part of a scientific process. But we have seen countless repetitions of this pattern (especially since GANs): press releases and impressive-looking examples without rigorous evaluation of what the models are doing or how; invitations to speculate on the best-possible interpretation; and announcing that the next step is to make it bigger. I think this approach is both anti-science and misleading to readers.
Layperson here, but my impression is that "let's see if a bigger network can make even prettier text" has _worked_ far beyond the point most people expected it would stop working.
Also my layperson impression: most "researchers" that are on the cutting edge of cool things are more interested in seeing what cool things they can do than on doing rigorous science (which makes sense -- if you optimize for rigorous science, your stuff probably isn't as flashy as the stuff produced by people optimizing for flash).
Is this a new iteration on that zigzag quote?
> Zak phases of the bulk bands and the winding number associated with the bulk Hamiltonian, and verified it through four typical ribbon boundaries, i.e. zigzag, bearded zigzag, armchair, and bearded armchair.
From "The existence of topological edge states in honeycomb plasmonic lattices"
https://iopscience.iop.org/article/10.1088/1367-2630/18/10/1...
It's similar to computer systems research. For example, a research paper on filesystems might tell us a simple trick which leads to better performance on NVMM. The paper may go into why the trick works, but it doesn't (and shouldn't need to) generalize and try to improve our general understanding of how to design filesystems on different hardware. We've been designing filesystems to this day and well, we are always still guessing about which approaches to use and hoping for the best. In the same vein, we don't even have a widely-accepted theory of how to use data structures yet.
So, I don't think that neural nets aren't scientific enough means that it's all BS. We have gaps in understanding, but the power of the models warrants a lot of continued work on finding useful applications.
Doesn't mean I don't think AI is over-hyped/overfunded though...
For example, people once thought playing chess was hard. So they thought that if a computer could beat the world champion, then computers would probably also be able to replace every job and so on. If you sent Deep Blue back in time to the 1960s, they wouldn't understand how it works so they'd probably assume that it since it could beat Petrosian in chess, it could probably drive cars and treat disease.
But then we built Deep Blue and realized that you don't need AGI to play chess; a very specialized algorithm will do it.
So we're like people in the 70s who've been handed Deep Blue. It's irresponsible, in my opinion, to over-hype it when we have no idea how it works.
Same thing arguably happens with humans with rotation. Our eyes even rotate in the roll axis to keep gravity aligned things upright. Most people can draw faces more accurately when copying from an upside down face than a right side up one.
Of course if you accept the required premise of the argument, you must accept that either, one, we don't live in a universe that is a pure system of deterministic rules, or, two, nothing in the universe can have true understanding.
The Chinese Room argument, scientific materialism, or the existence of true understanding—you can have at most two of those in a consistent view of the universe.
To your point though, the more interesting case is people who would disavow the Chinese Room argument, but then end up using reflecting its views while argue against the intelligence of this or that system.
It can be read in its entirety at the author's site: https://rifters.com/real/Blindsight.htm
It's possible to train GPT3 to produce a facsimile of these transmissions, but doing so does not let us learn anything at all about these aliens, beyond statistical correlations like ⊑⏃⟒⍀ often occurring in close proximity to ⋏⟒⍙⌇ (what do they represent - who knows?). Just having the text is not enough, because we have no understanding of the underlying processes that produced the text.
That said, this is only a limitation of language models as they currently exist. I imagine it would be possible to train a ML model that encodes more of the human experience via video/audio/proprioception data.
The intuition behind this idea is that the structure inherent in a language is dependent upon features of the world being described by that language to some degree. If we can abstract out the details of the language and get at the underlying structure the language is describing, then this latent structure should be language-independent. But then translation turns out to simply be a matter of decoding and encoding a language to this latent structure. One limitation of this idea is that it depends on there being some shared structure that underlies the languages we're attempting to model and translate. It's easy to imagine this constraint holds in the real world as human contexts are very similar regardless of language spoken. The basic units and concepts that feature in our lives are more-or-less universally shared and so this shared structure provides a meaningful pathway to translation. We might even expect the world of intelligent aliens to share enough latent structure from which to build a translation given enough source text. The laws of physics and mathematics are universal after all.
Take the sentence 'I fooed a bar with a Baz' - can you infer what I did from this?
The more imminent question is more of engineering than philosophy - what does it take for GPT-3 to not make the mistakes it does? This would require it to have some internal model for why humans generate text (persuasion, entertainment, etc.) as well as the social context in which that human generated the text. On a lower level it also needs to know about cognitive shortcuts that humans take for granted (object permanence, gravity)
Basically, some degree of human subjective experience must be encoded and fed to the model. That's a difficult problem, but not an intractable one.
I suspect this is how many humans do arithmetic (especially considering how many people conflate numbers with their representation as digits). So if GPT-3 is doing that, that's pretty impressive.
Hypothesis: <AI writes this>
Results: <human observations>
<repeat>The novel Manna explores where this can lead quite nicely - http://www.marshallbrain.com/manna1.htm
I saw a lot of basic arithmetic in the thousands range where it failed. If we have to keep scaling it quadratically for it to learn log n scale arithmetic then we're doing it wrong.
I'm surprised you think it learned some basic rules around arithmetic. A lot of simple rules extrapolate very well, into all number ranges. To me it seems like it's just making things up as it goes along. I'll grant you this though, it can make for a convincing illusion at times.
Oh, aren’t we all?
I strongly disagree. GPT-3 has 100% accuracy on 2-digit addition, 80% on 3-digit addition, 25% on 4-digit addition and 9% on 5-digit addition. If it could indeed "understand arithmetic" the increase in number of digits should not affect its accuracy.
My perspective as an ML practitioner is that the cool part of GPT-3 is storing information effectively and it is able to decode queries easier than before to get the information that is required. Yet with things like arithmetic, the most efficient way would be to understand the rules of addition but the internal structure is too rigid to encode those rules atm.
I suspect that if it did that, it would be able to write a very convincing fake paper about how it designed and tested an Alcubierre drive, and that the main clue about the paper being fake being a sentence such as “we dismantled Jupiter for use as a radiation shield against the issue raised by McMonigal et al, 2012”.
Or, to put it another way, the hardest of hard SciFi, but still SciFi, not science.
Imitating existing texts better is not conceptual understanding.
"Understanding" means you can explain why you made a decision. It means there exists a model with conceptual entities that you can access and make available to others.
What GPT-3 does is this: "I am given many answers to similar questions, and I build up a huge model that reflects these answers. If I'm given a new question, I come up with a response that's probably right, based on the previous answers, but there's no explanation possible."
Don't get me wrong - it's amazing! But it's not understanding anything yet.
Even humans have skills that we know but do not understand - like "walking" for most of us!
But on abstract question, we almost always have access to a complete set of reasons. "Why did you go back to the store?" "I left my bag there." "Why did you talk to that man?" "I know he's the manager, I'm a regular." "Why were you happy?" "I had my bag."
(Indeed, this is so common that people often "backdate" reasons for actions that didn't really have any reason at the time. But I digress.)
That's not exactly what the GPT-3 paper [1] claims. The paper claims that a search of the training dataset for instances of, very specifically, three-digit addition, returned no matches. That doesn't mean there weren't any instances, it only means the search didn't find any. It also doesn't say anything about the existence of instances of other arithmetic operations in GPT-3's training set (and the absence of "spot checks" for such instances of other operations suggests they were, actually, found- but not reported, in time-honoured fashion of not reporting negative results). So at best we can conclude that GPT-3 gave correct answers to three-digit addition problems that weren't in its training set and then again, only the 2000 or so problems that were specifically searched for.
In general, the paper tested GPT-3's arithmetic abilities with addition and subtraction between one to five digit numbers and multiplication between two-digit numbers. They also tested a composite task of one-digit expressions, e.g. "6+(4*8)" etc. No division was attempted at all (or no results were reported).
Of the attempted tasks, all than addition and subtraction between one to three digit numbers had accuracy below 20%.
In other words, the only tasks that were at all successful were exactly those tasks that were the most likely to be found in a corpus of text, rather than a corpus of arithmetic expressions. The results indicate that GPT-3 cannot "perform arithmetic" despite the paper's claims to the contrary. They are precisely the results one should expect to see if GPT-3 was simply memorising examples of arithmetic in its training corpus.
>> So what changed? We aren't sure, but the speculation is that in the process of training, GPT-3 found that the best strategy to correctly predicting the continuation of arithmetic expressions was to figure out the rules of basic arithmetic and encode them in some portion of its neural network, then apply them whenever the prompt suggested to do so.
There is no reason why a language model should be able to "figure out the rules of basic arithmetic" so this "speculation" is tantamount to invoking magick.
Additionally, language models and neural networks in general are not capable of representing the rules of arithmetic because they are incapable of representing recursion and universally quantified variables, both of which are necessary to express the rules of arithmetic.
In any case, if GPT-3 had "figure(d) out the rules of basic arithmetic", why stop at addition, subtraction and multiplication between one to five digit numbers? Why was it not able to use those learned rules to perform the same operations with more digits? Why was it not capable of performing division (i.e. the opposite of multiplication)? A very simple asnwer is: GPT-3 did not learn the rules of arithmetic.
_________