Skip to content

Top Best Ask Show New Jobs

What Happened to Old School NLP? (opens in new tab)

(languagengine.co)

91 pointspsygnisfive11y ago47 comments

47 comments

35 comments · 9 top-level

lazzlazzlazz11y ago· 6 in thread

NetBase (http://www.netbase.com) uses this kind of "old-school" NLP (with a large team of full-time linguistics PhDs) augmented with statistical tools and increasingly sophisticated forms of automation.

The end product is more accurate and quicker to adapt than the industry is used to.

Disclosure: I work in the engineering team at NetBase.

microtonal11y ago

Unfortunately, the website has marketing all over it. The only thing I could find on Google scholar was about how Netbase is doing parsing of Chinese, based on phrase chunking and then extracting dependency relations using phrase chunks.

I wonder if this is true for all the languages, since this is usually considered shallow parsing (and not deep parsing, a word that seems to be in the white papers). Constructing an 'old-school' grammar for deep syntactic analysis (I am thinking HPSG, CG, or LFG-style here) of 42 languages is a very tedious task if you want to get any decent coverage.

lazzlazzlazz11y ago

Not all languages receive the "deep" treatment. Eight, going on nine, have full-time linguists working on developing grammars.

thomasfl11y ago

Cool! The website says netbase supports 42 different languages, but not which languages.

psygnisfiveOP11y ago

to find out which 42, we'll need to build a computer the size of a planet

psygnisfiveOP11y ago

awesome! what sorts of things does NetBase do with old school NLP?

lazzlazzlazz11y ago

Social media analysis of many kinds is our full-time focus right now. It's a rapidly changing field. The demands of customers shift as they become more educated about what to expect from social media and how to push the boundaries of technology.

microtonal11y ago· 5 in thread

We'll start to see the re-emergence of tools from old-school NLP, but now augmented with the powerful statistical tools and data-oriented automation of new-school NLP. IBM's Watson already does this to some extent.

This is not a new trend. As early as 1997, Steven Abney augmented [1] attribute-value grammars with discriminative modelling (maximum entropy models) in this case to form 'stochastic attribute-value grammars'. There is a lot of work on efficiently extracting the best parse from packed forests, etc. Most systems that rely on unification grammars (e.g. HPSG grammars) already use stochastic models.

In the early to mid 2000s when the modelling of association strengths using structured or unstructured text became popular, old-school parsers have been adopting such techniques to learn selectional preferences that cannot be learnt from the usually small hand-annotated treebanks. E.g. in languages that normally have SVO (subject-verb-object) for main clauses but also permit OVS order, parsers trained on small hand-annotated treebanks would often be set on the wrong path when the direct object is fronted (analyzing the direct object as subject). Techniques from association strength modelling were used to learn selectional preferences such as 'bread is usually the subject of eat' from automatically annotated text [2].

In recent years, learning word vector representations using neural networks has become popular. Again, not surprisingly, people have been integrating vectors as features in the disambiguation components of old-school NLP parsers. In some cases with great success.

tl;dr, the flow of ideas and tools from new-school NLP to old-school NLP has been going on ever since the statistical NLP revolution started.

[1] http://ucrel.lancs.ac.uk/acl/J/J97/J97-4005.pdf

[2] http://www.let.rug.nl/vannoord/papers/iwptbook.pdf

SixSigma11y ago

The problem is slippery though, people always reuse words to express different ideas :

"We can't buy any bread because we haven't got any bread."

And it's not just English. In Chinese one is taught that "ni hao ma?" is the greeting equivalent to "hello, how are you" but try it on a Chinese person and it amuses them. My Chinese friend at Uni says that Chinese people use "Ni chi ma?" which is literally "have you eaten?" (although we both appreciate that is a bit of a generalisation for 1bn people).

microtonal11y ago

> The problem is slippery though, people always reuse words to express different ideas : > "We can't buy any bread because we haven't got any bread."

As long as we are talking about syntactic parsing, this is not a problem as long as the attachment is the same. In both cases 'bread' is the direct object of the main verb.

Of course, there are cases where a particular word can be used both as a direct object and a subject of a particular verb. E.g.:

The man ate the pig.

The pig ate the apple.

Of course, what such systems are learning are not rules, but probability distributions that combine information about the distributions of word orders, association strengths between heads and dependent with a particular dependency relation, configurations of dependent pairs, etc.

cynicalkane11y ago

The Taiwanese say "ni hao ma". That's probably where pedagogy got it from.

psygnisfiveOP11y ago

Yep, it's certainly true that these techniques never went away fully, but they've been eclipsed by the simpler techniques. Academic work, especially under the guise of "computational linguistics" as opposed to "NLP", works a lot on this. But the public face of NLP has been pure new-school stuff for a while now.

I was more trying to just talk about why the richly structured old school fell from grace, not claim that it's dead and buried.

microtonal11y ago

The rest of the analysis seemed very much to the point. It was an interesting read, thanks!

ryanmim11y ago· 4 in thread

This is a pretty good explanation of why almost all practical applications of NLP are now accomplished by statistics rather than fancy linguistic grammar models you might have read about in a Chomsky book.

Old school NLP has always fascinated me though, and I'm pretty excited about what might be possible in the future by using more than purely statistical methods for accomplishing NLP tasks. Maybe the author could have speculated more wildly in his prognostication ;)

sqrt1711y ago

It's important to make a distinction between (i) Chomskyan linguistics, (ii) 90s style symbolic systems, (iii) 90s/early 2000s style statistical systems and (iv) 2010s style statistical systems.

Chomskyan linguistics assumes that statistics and related stuff is not relevant at all, and that instead you need to find the god-given (or at least innate) Universal Grammar and then everything will be great. 90s style symbolic systems adopt a more realistic approach, relying on lots of heuristics that kind of work but aim at good performance rather than unattainable perfection; 90s style statistical models give up some of the insights in these heuristics to construct tractable statistical models.

If you look at 2010s style statistical models, you'll notice that machine learning has become more powerful and you can use a greater variety of information, either using good linguistic intuitions (which help even more with better learning algorithms, but require a certain expressivity as well as some degree of matching between the way of constructing the features and the classification) or unsupervised/deep-NN learning, which constructs generalizations over features.

The main reason that you won't ever see people talking about systems with great machine learning and great linguistic intuitions is that you normally want to treat one of them as fixed and focus on improving the other, i.e., it's more a practical/cultural difference than an actual limitation.

psygnisfiveOP11y ago

Actually this isn't true, wrt Chomsky. Chomskyan linguistics assumes statistics is very important (and this has been noted by Chomsky himself since at least the early 1960s). Chomsky simply argues that statistics is insufficient on its own. And in truth, most NLPers believe this, but they rarely admit it. Most/all NLP projects have some form of "universal grammar", tho usually its something like a regular grammar (~ a Markov chain) or at best a probabilistic CFG (PCFG). I suspect the reason is that, to some extent, hierarchical structures like this seem so natural that its hard to imagine what else you could do, so there's a tendency to co treat CFGs as not even a grammar choice, but it is. There are other kinds of grammars (such as pregroup grammars) which lack these notions of hierarchy but work perfectly well for the same domains as CFGs, just in very different ways.

psygnisfiveOP11y ago

Well, if you'd like to know what I think we'll be doing in the future, check out the rest of the site. :p

But: I'm building an SDK for conversational AI (think Siri, in any app, and 10 times better), that's what the site as a whole is for. I think in 5 years it'll be pretty commonplace to have fairly natural, Jarvis-like conversations with computers, and within 10 years we'll have R2D2/C3PO robots.

ryanmim11y ago

Ya I checked out your root project, will give it a whirl when you open it up. I'm moderately interested in adding voice commands to an app I'm working on and haven't found a service that fits the bill yet.

dschiptsov11y ago· 4 in thread

The subtle ideas form the original "Structure Of Magic" books about how we construct out internal representations of reality depending of wording we use has been replaced by industry of coaches and consultants.

The ideas, by the way, had been studied by mainstream psychology as the framing effect and the priming effect.

In short, our minds do lexical analysis and decomposition sub-consciously, so we could be influenced by specially crafted sentences. We also leak details of our internal representation of some aspects of reality in the way we unconsciously construct language sentences.

microtonal11y ago

Note: NLP here means Natural Language Processing, not Neurolinguistic Programming.

dschiptsov11y ago

Oh, sorry. Too old school.)

copsarebastards11y ago

It's impressive how you responded to an article you didn't read.

dschiptsov11y ago

Why, sometimes we answer questions which pop up in our heads.)

agentile11y ago· 2 in thread

Reading this article, particularly the part about sentiment analysis, was interesting to me because last year I did my thesis[1] regarding sentiment classification using a somewhat mixed approach (albeit pretty simple) where I factored in basic sentence structure in addition to word features to see improvement in accuracies. I found it really neat to see various cases where particular sentence structures like PRP RB VB DT NN would be much more likely to show up for a positive sentiment e.g. "I highly recommend this product" vs negative sentiment e.g. "They totally misrepresent this product"

I get the impression that while it is true the computational side of computational linguistics has seemingly seen more attention for lucrative reasons, but now it is seeing some success there more people trying to incorporate more from the linguistic side, when it doesn't cause for a huge amount of computational expense.

It doesn't seem like anything new, however, that business needs drive funding for particular areas in academia. Sadly, more so than ever considering the greed of the school systems (but that is another topic).

[1] https://digital.lib.washington.edu/researchworks/handle/1773...

psygnisfiveOP11y ago

Yep, I agree with what you say. Tho I would dispute calling a POS gloss much of a structure. When I think structure, I think full syntax. Parse + formal features. Or a full logical formula for the semantics! Now that's structure!

agentile11y ago

Fair enough, I think in my thesis I referred to it as a sentence representation. I think it may actually serve to be a good example of some of the compromises being done in application. Sometimes people need some of that high level useful information, but not within a deeper, more comprehensive format/structure for efficiency sake even if some knowledge and subsequent accuracy is lost. My experience was that some of the tools out there in NLP land didn't always make this an easy thing to do.

JacobiX11y ago· 2 in thread

The beginning of the article reminds me of the quote : "Every time I fire a linguist, the performance of our speech recognition system goes up." But nowadays statistical NLP systems regularly use syntactic and semantic information as a features in the learning phase.

psygnisfiveOP11y ago

It really depends on what you count as syntactic and semantic information. As a linguist, to me syntactic information is tree structures, syntactic categories, etc., and semantics is formulas in some (typically higher-order) logic. But for a lot of the NLP that I see, "syntax" is pretty shallow stuff like head words and POS tag contexts, and "semantics" is at best things like word vectors maybe dependency trees. These are very different. But maybe we're thinking of different things! :)

microtonal11y ago

There have been some inflated claims, e.g. people calling their part-of-speech tagger a shallow parser or their shallow parser (e.g. chunking plus some rules) a parser :).

But I think in general computational linguists would say that dependency trees are definitely syntax.

sdoering11y ago· 2 in thread

I really liked the article and some of the Blog-Headlines seemed to be interesting as well. But try as I might, I was not able to find a rss/atom/xml-Feed for plugging this ressource into my feedreader. Sadly, so I will probably miss upcoming interesting posts.

psygnisfiveOP11y ago

I'm working on it, have no fear! I'm hand coding the blog right now because I don't want to use a CMS, I really don't like the options I have. I might just hand-code an RSS feed.

zhenjl11y ago

Try a static blog generator like hugo or octopress...

dsfsdfd11y ago· 1 in thread

I think it's more likely that we will do machine learning to learn the syntactic structure, rather than hand craft these pieces of machinery. For a long time we have tried to create intelligent machines by designing to solve a problem, now at last we are designing machines to solve problems and finding success. I see no reason to imagine that going back to the old school methods, with a layer of the new magic on top is going to be effective as we move to the medium term - in the short term, possibly, but briefly.

psygnisfiveOP11y ago

I think it's going to be machine learning too, just over richer structures. And actually when you need parsing that's what you get together. But few people need real parsing right now. Some things, like meaning, will I think have to be more old school. There's no good way to learn meanings right now at all.

VLM11y ago

There is a greater economic lesson that tech does not necessarily have the drivers seat in the economy.

"but with the advent of computers, it became possible to monetize NLP and the priority shifted to making products people would buy, rather than a system that was scientifically correct."

The competition for a NLP computer program is not another NLP computer program, but call centers in India, Phillipines, onshore prison labor, that kind of "support"

j / k navigate · click thread line to collapse