https://en.wikipedia.org/wiki/Propositional_calculus#Solvers
Solving a set of propositional logic statements is NP-Complete. I'd argue "reading comprehension" is actually knowing the state of the world after a piece of text, which requires solving how these predicates interact. For example, if the paragraph is
Bobby picked up the toy. Then he put down the toy.
This "semantic memory" does not "comprehend" where the toy is, and this is a relatively simple example. I think the title "basic reading comprehension" is thus inaccurate. Perhaps a better title is "A simple knowledge graph" or "A simple semantic memory" The iron ball fell on the glass table, and it shattered.
The glass ball fell on the iron table, and it shattered.
Human readers will pick the correct antecedent for "it" in each case, so it's not ambiguous. But correct interpretation depends on knowing something about how likely glass and iron are to shatter.I love these kinds of examples, but I don't think they ought to significantly deter the kind of work shown here; it depends on the application, but basic interpretation could be very useful even if it can't handle every case.
The way it was explained to me, computers don't know that when you turn a full cup of coffee upside down, the liquid will fall out. Cyc attempts to provide a framework for that.
When we read this sentence, our brains automatically augment additional information based on the verb. However, in this example, my program will fail to answer because my program does not augment any additional information but it can be extended to.
This can be implemented in our program by created a new property for each object called "location". If a verb is location based, we can set the location of the object based on what the verb describes. For example, "the toy"'s location could be "Bobby's hands" after the first sentence based on the verb phrase "pick up". So the program will understand where the toy is and be able to understand queries related to "where".
As you can imagine, implementing this would be very tedious since there are too many cases for all the verbs. My program may not be able to do advanced reading comprehension (reading between lines and augmenting information) but I argue that it can do simple reading comprehension, in that it can understand the relationship between objects. There is still a long way to go before my program is capable of more sophisticated reading comprehension, but in theory, I think my approach seems possible.
1. meaning is usage. 2. structure emerges from usage.
meaning we don't use grammar to produce language, it is an emergent property (irreducible) of "pre-linguistic pragmatics" ...
I may very well be wrong, and I'd love it if you showed me I am. Good luck. As I said, I really would be excited to see this go farther.
Take a look at text sentiment analyzers and processors that employ advanced machine learning algorithms like this too, http://52.10.12.34/TuataraSum
Since machine vision is such a fast moving field, I think date is relevant for understanding the context of this xkcd. Is image recognition of a bird so inconceivable at the moment? Perhaps the joke now would be - "I'll need one researcher and one year"
You comment reminds me of this: http://www.uh.edu/engines/epi879.htm
Any successful implementation of comprehension must progressively enhance the world model based on additional information. Furthermore it must understand some basic rules, such as, "Any subject, set of subjects, or actions can be represented multiple ways."
So if you said, "Mary's brother is Sam," or "Mary has a brother named Sam," or "Mary's brother is named Sam," the world model must collocate the meanings "Sam" and "brother" for Mary and be able to respond to queries about either.
Further, if you mention that John also has a brother named Sam, and then you mention Sam in an ambiguous context, the program should be smart enough to ask, "Which Sam? Mary's brother or John's?" Infocom games did this; you are building a more flexible world model builder, but the parser would operate similarly.
There is also the concept of recency. If I talk about "John's brother Sam", ignoring the fact that pronoun references to "he" should be contextually mapped correctly, and then mention Sam, the program should not need to ask which Sam I mean. It would be like talking to someone who wasn't paying attention.
Finally, there is also the concept of confidence. In the face of ambiguity that can't be resolved, a confidence rating should be assigned based on available information and future answers should be based on that confidence.
I suspect that if someone were to create a language parser that could create a mostly-accurate world model AND modify itself based on new rules it read (e.g., "When some says 'he' after referring to someone's name, they are almost certainly talking about the man they previously referred to"), you would be 90% of the way to creating a useful virtual intelligence. It would of course not be able to reason or have opinions of its own, but it would be extremely useful as a virtual assistant that could learn your preferences over time.
There has to be a hierarchy of world models, and only the bottom-most layer would be the "real" one. Any kind of planning will require simulating hypothetical worlds.
That should probably be the first thing ;-) Every word has a probable meaning, and when all of them fit together into a coherent way including context, then the meaning is correct (probably). This also encompasses the use of pronouns, which you touch on. An inability to resolve ambiguity should also point to the appropriate question to ask for clarification, which you also mentioned. I think your points are all great, just wanted to point out that I think the notion of confidence should be more central.
I'd also go as far as saying part of the internal model of the world should also have a confidence. Any failure to understand a sentence may actually be a problem with its world view. But now I'm rambling.
Yes, but that happens automatically by virtue of learning through natural language teaching. Put the VI through school.
Option 3: The only Sam, since Mary and John are brothers :)
I think it is too much to call this "capable of basic reading comprehension". Surely, "simple sentence parser can answer reading comprehension questions" would be more correct?
I mean it is quite an accomplishment, but there is no understanding here. In some natural languages with less inflection or change in word order, for example, you could answer any "why" questions with the regex /($question_string) because (.+?)\./ against some source corpus, and then $2 will contain your answer. It doesn't work with English due to slight changes in word order, but surely it would be too much to state that this regex is capable of basic reading comprehension in languages it does work in.... If I'm allowed to massage the question the way the author massaged the output, look at this fine result:
http://ideone.com/84QSHC (output at bottom)
Would you say that 14-line Perl program is capable of basic reading comprehension? I wouldn't!
For some reason, the library I used to get the present tense of "was" is "be". I had a manually fix for this but accidentally removed it when I was cleaning up the code. Sorry if I disappointed you
> I think it is too much to call this "capable of basic reading comprehension". Surely, "simple sentence parser can answer reading comprehension questions" would be more correct?
I would say it is capable of basic reading comprehension because it is attempts to build relationships between different objects although the relationships are weak. When humans do reading comprehension, we do what my program does which is try to parse the sentence and understand the relationships. But brains are also able to augment a lot more information and thus be more flexible with understanding.
> Would you say that 14-line Perl program is capable of basic reading comprehension? I wouldn't!
I would say it is not because it does not understanding the relation between objects; it only understands that sentences starting with "why" should be answered with everything after the string "because". Also, in your program, if there are two instances of "because" in the source, your program will only choose the first one.
> Would you say that 14-line Perl program is capable of
basic reading comprehension? I wouldn't!
I would say it is not because it does not understanding the
relation between objects; it only understands that sentences
starting with "why" should be answered with everything after
the string "because". Also, in your program, if there are
two instances of "because" in the source, your program will
only choose the first one.
This line is arbitrary. These programs aren't "understanding" anything, they are just able to generate different facts. Your "AI" is very input-dependent too, which was his point.I am still bothered by the title, as your program doesn't "comprehend" the text -- only you do that. Your program restructures the text and answers to a very limited querying mechanism; there's no comprehension, only information retrieval. They're not the same. Knowledge Graph and Freebase are comparable (/ much larger) efforts.
It's not disappointing, it's quite a feat. I suppose I would say that what I was trying to say is that "parsing", while impressive, is not a large part of "understanding" in my opinion. To use an analogy: almost by definition, compilers parse languages like C++ much better than humans do (basically perfectly, unless there's literally a bug in the compiler or it doesn't follow the standard due to some error).
But that doesn't mean they understand the programs (at all.) A compiler has no idea on an algorithmic level what a program might be doing (and if you remove comments, maybe a person won't understand it either, if they're not familiar with the algorithm.)
So my basic objection is that you're really calling this reading comprehension, but I don't think anything is actually being "understood"; just parsed. A better title would be as I suggested: simple AI correctly answers reading comprehension questions.
The reason that I object to "comprehension" is that these days there really are a few "deep learning" systems, that can possibly synthesize information. (I don't know that much about them.) I don't think it's fair to elevate semantic parsing to the level of comprehension.
However the comment by tariqali34 makes a good point, that perhaps this is a criticism of reading comprehension tests. I know in multiple-choice tests from standardized exams, I've been able to correctly answer reading comprehension questions about texts that I didn't even read (by finding just the sentence that talks about it), or in other cases, texts that were too technical and that I didn't understand.
I would say that I would be able to answer a question about some biomedical excerpt that I can't understand a word of, I just can't make heads or tails of it, let's say:
In order to study the physiological roles of AGC kinases, a commonly used approach has been to over-express the active forms in cells. However, due to the overlapping substrate specificities of many AGC kinases, it is likely that the over-expression of one member of this kinase subfamily will result in the phosphorylation of substrates that are normally phosphorylated by another AGC kinase. Another strategy has been to over-express catalytically inactive ‘dominant negative’ mutants of AGC kinases in cells. However, such mutants are likely to interact with and inhibit the upstream protein kinase(s) that they are is activated by, and thus prevent the ‘upstream’ kinase(s) from phosphorylation of other cellular substrates. For example, a dominant negative RSK may interact with ERK1/ERK2 preventing the activation of MSK isoforms and hence the phosphorylation of CREB (cAMP-response-element-binding protein) [9]. Furthermore, in Saccharomyces cerevisiae, over-expression of catalytically inactive Rck2p, a kinase that binds to and is activated by the Hog1P MAPK, sequestered the substrate-docking site of the Hog1P kinase, thereby preventing Hog1P from interacting with other substrates. Thus catalytically inactive Rck2P is acting as a dominant negative mutant of Hog1P and not Rck2P.
I couldn't answer a REAL reading-comprehension test about this: I just have no idea what it's REALLY talking about, I don't actually understand it. (Obviously on a syntactic level, it's not hard to parse.) I don't know what over-expression is, I don't know what a kinase is, I don't know what phosphorylation is. I don't understand the text. But if the questions are simple, perhaps I could answer some reading comprehension questions about this by parroting back quotations from it. Syntactically, there's nothing difficult here. I just don't understand it.
So I think it's unfair to call sentence parsing real reading comprehension, even if sometimes reading comprehension tests fail to differentiate between the two. You can parse sentences perfectly while understanding nothing. For example, I could answer the question "what is wrong with studying the physiological role of AGC kinases by overexpressing the active forms in cells?" which the first sentence refers to. I can just quote the second sentence "Due to the overlapping substrate specificities of many AGC kinases, it is likely that the over-expression of one member of this kinase subfamily will result in the phosphorylation of substrates that are normally phosphorylated by another AGC kinase". I don't understand, but I think I correctly parroted.
So there are real issues in determining comprehension. The higher the level of the question that is asked, the harder it is to answer without actually understanding the text.
I did find your work very interesting, thank you.
We test whether a human have "basic reading comprehension" by asking them reading comprehension questions. If they answer the questions successfully, then we assume that they have this "reading comprehension" skill. Therefore, if an AI can answer these questions, then it must also have "reading comprehension".
Maybe these "reading comprehension questions" don't actually test reading comprehension, just pattern matching and sentence parsing. In which case, we shouldn't be asking these questions to anyone (human or AI). So what we need are new questions.
This is starkly different from a machine whose only ability is information retrieval -- this program (and more advanced versions e.g Freebase) parse and restructure text to be more retrievable. But there is no comprehension because there is no world model.
You wouldn't call parsing a google search query "reading comprehension."
Granted that it's a fundamentally more difficult problem, so the pattern matching will probably get there earlier.
My recommendation comes from a more informed place than "Dur, neural networks!" Of course, this is my opinion, and you are welcome to have a different one.
What is your recommendation on the topic of most promising research areas for teaching reading comprehension to computers? Skip deep learning and read what instead? Or the OP has it figured out?
OP - Instead of linking directly to en/Stanford Parser etc, you should get together a list of dependencies people need to run your application. Usually as easy as 'pip install pattern' (for the 'ImportError: No module named en') which is `import pattern.en` :-)
I like it! It's nearly 2am so better catch some sleep, but I'm definitely going to have a look further tomorrow.
Thanks!
> OP - Instead of linking directly to en/Stanford Parser etc, you should get together a list of dependencies people need to run your application. Usually as easy as 'pip install pattern' (for the 'ImportError: No module named en') which is `import pattern.en` :-)
I didn't actually pip install anything for my project, I just downloaded and extracted the Stanford Parser, and Nodebox Linguistics libraries. The setup should be in the readme. I'll try to see if I can find the pip dependencies and update the readme.
http://norvig.com/chomsky.html
The essential question is: Can it go from basic reading comprehension to advanced just by adding more rules. Is intelligence simply 10 million rules? If so, how do we go about creating new rules as language evolves? By hard-coding them, as in the example code?
The real test for general AI and NLP is how well it, well, generalizes; i.e. how well does it deal with situations we have not explicitly anticipated?
In my opinion, the fuzzy, statistical methods @davesullivan mentions have a better chance at generalizing (although they may well be augmented by rules-based AI).
If an AI doesn't have a good way of transferring what it knows to novel problems, then it is severely limited. It's treating the world like a canned problem with a finite number of possibilities, like chess or checkers, when in fact the world is much more complex.
The way DeepMind combines deep learning and reinforcement learning is one way of acknowledging that complexity.
Deep learning learns patterns in raw sensory data, which means it can ingest and handle the new. Reinforcement learning learns to perform actions over a series of unknown states, improving its choices by monitoring the rewards it receives for those actions. They both maximize within uncertainty, and I think that's our best bet going forward.
Because the world, and language, cannot be known in their entirety. The number, motion and interrelation of the atoms of air in the room where I'm typing this are all too large and complex to be computable. Their fluid dynamics can only be vaguely guessed at, not deterministically predicted in a few lines of code.
The trick will be to bridge the gap between the hard-coded, limited rules and the unlimited recombinations of language, which is inventing new rules and words all the time.
I believe that it can go from basic reading comprehension to more advanced by adding many rules but of course manually adding them is not very feasible or scaleable.
> In my opinion, the fuzzy, statistical methods @davesullivan mentions have a better chance at generalizing (although they may well be augmented by rules-based AI).
I agree that a statistical model would be better since it will be able to handle more complexity. It would be much easier to train the rules from a dataset instead of hard coding all of them and it would be able to adapt to new rules as well. However, I could not find a good data set for the task I wanted.
As other have mentioned, you will make progress more efficiently if you survey the linguistics literature, where a tremendous number of very smart people have spent decades grappling with the same essential problems.
Heterodox linguistics is a veritable goldmine of ideas that can be implemented in AI. My favorite approach is Richard Hudson's "word grammar," which you can read about here: http://www.phon.ucl.ac.uk/home/dick/wg.htm
Word grammar is particularly well suited for coding, because it strips away lots of arbitrary linguistic formalisms in favor of a flexible, network-centric framework. Some of the core principles, like default inheritance, were actually taken directly from computer science.
https://en.wikipedia.org/wiki/Sentence_diagram
I don't mean that in a dismissive way - even if it was a reinvention of the wheel it's still an elegant and useful one. It would be interesting to work up to larger chunks of text, and also to encapsulate ambiuities in some way, such that if presented with a sentence that admits of two meanings the program could honestly say 'I don't know, tell me more.'
which is fine ... just different results ...
specifically:
1. meaning is usage. 2. structure emerges from usage.
this post and many of the comments have a world view akin to:
1. meaning is structure. 2. usage emerges from structure.
what i mean by this is that the "analysis" of the text doesn't exist in the same "world" as the text.
meaning that its nothing like "real" (natural) language.
what i mean further by this is simple:
humans don't "use" natural language... it is an emergent property of other systems of externalized behaviors and such by individual humans.
and such emergent properties and systems are also evident in any development of this actual system and many of the comments here.
producing the comprehension introduces in-comprehensible things ... or at best just divorces systems (discontinuous) ... at which point any thing can be any thing, so debating it as such here doesn't even matter (value) .
im not exactly sure what im saying here but it is akin to:
1. nthorder cybernetics (mostly like 3rd and 4th and such) 2. autopoesis (humberto maturana and francisco valera)
im going through the same type of analysis regarding "activity stream" type APIs which use an "actor verb object" type form ... (usage from structure) ...
Of course it's going to be challenging, who cares – you are going to learn a lot of things even if this attempt fail and you are going to allow other people to stand on your shoulders.