The end product is more accurate and quicker to adapt than the industry is used to.
Disclosure: I work in the engineering team at NetBase.
I wonder if this is true for all the languages, since this is usually considered shallow parsing (and not deep parsing, a word that seems to be in the white papers). Constructing an 'old-school' grammar for deep syntactic analysis (I am thinking HPSG, CG, or LFG-style here) of 42 languages is a very tedious task if you want to get any decent coverage.
This is not a new trend. As early as 1997, Steven Abney augmented [1] attribute-value grammars with discriminative modelling (maximum entropy models) in this case to form 'stochastic attribute-value grammars'. There is a lot of work on efficiently extracting the best parse from packed forests, etc. Most systems that rely on unification grammars (e.g. HPSG grammars) already use stochastic models.
In the early to mid 2000s when the modelling of association strengths using structured or unstructured text became popular, old-school parsers have been adopting such techniques to learn selectional preferences that cannot be learnt from the usually small hand-annotated treebanks. E.g. in languages that normally have SVO (subject-verb-object) for main clauses but also permit OVS order, parsers trained on small hand-annotated treebanks would often be set on the wrong path when the direct object is fronted (analyzing the direct object as subject). Techniques from association strength modelling were used to learn selectional preferences such as 'bread is usually the subject of eat' from automatically annotated text [2].
In recent years, learning word vector representations using neural networks has become popular. Again, not surprisingly, people have been integrating vectors as features in the disambiguation components of old-school NLP parsers. In some cases with great success.
tl;dr, the flow of ideas and tools from new-school NLP to old-school NLP has been going on ever since the statistical NLP revolution started.
"We can't buy any bread because we haven't got any bread."
And it's not just English. In Chinese one is taught that "ni hao ma?" is the greeting equivalent to "hello, how are you" but try it on a Chinese person and it amuses them. My Chinese friend at Uni says that Chinese people use "Ni chi ma?" which is literally "have you eaten?" (although we both appreciate that is a bit of a generalisation for 1bn people).
As long as we are talking about syntactic parsing, this is not a problem as long as the attachment is the same. In both cases 'bread' is the direct object of the main verb.
Of course, there are cases where a particular word can be used both as a direct object and a subject of a particular verb. E.g.:
The man ate the pig.
The pig ate the apple.
Of course, what such systems are learning are not rules, but probability distributions that combine information about the distributions of word orders, association strengths between heads and dependent with a particular dependency relation, configurations of dependent pairs, etc.
I was more trying to just talk about why the richly structured old school fell from grace, not claim that it's dead and buried.
Old school NLP has always fascinated me though, and I'm pretty excited about what might be possible in the future by using more than purely statistical methods for accomplishing NLP tasks. Maybe the author could have speculated more wildly in his prognostication ;)
Chomskyan linguistics assumes that statistics and related stuff is not relevant at all, and that instead you need to find the god-given (or at least innate) Universal Grammar and then everything will be great. 90s style symbolic systems adopt a more realistic approach, relying on lots of heuristics that kind of work but aim at good performance rather than unattainable perfection; 90s style statistical models give up some of the insights in these heuristics to construct tractable statistical models.
If you look at 2010s style statistical models, you'll notice that machine learning has become more powerful and you can use a greater variety of information, either using good linguistic intuitions (which help even more with better learning algorithms, but require a certain expressivity as well as some degree of matching between the way of constructing the features and the classification) or unsupervised/deep-NN learning, which constructs generalizations over features.
The main reason that you won't ever see people talking about systems with great machine learning and great linguistic intuitions is that you normally want to treat one of them as fixed and focus on improving the other, i.e., it's more a practical/cultural difference than an actual limitation.
But: I'm building an SDK for conversational AI (think Siri, in any app, and 10 times better), that's what the site as a whole is for. I think in 5 years it'll be pretty commonplace to have fairly natural, Jarvis-like conversations with computers, and within 10 years we'll have R2D2/C3PO robots.
The ideas, by the way, had been studied by mainstream psychology as the framing effect and the priming effect.
In short, our minds do lexical analysis and decomposition sub-consciously, so we could be influenced by specially crafted sentences. We also leak details of our internal representation of some aspects of reality in the way we unconsciously construct language sentences.
I get the impression that while it is true the computational side of computational linguistics has seemingly seen more attention for lucrative reasons, but now it is seeing some success there more people trying to incorporate more from the linguistic side, when it doesn't cause for a huge amount of computational expense.
It doesn't seem like anything new, however, that business needs drive funding for particular areas in academia. Sadly, more so than ever considering the greed of the school systems (but that is another topic).
[1] https://digital.lib.washington.edu/researchworks/handle/1773...
But I think in general computational linguists would say that dependency trees are definitely syntax.
"but with the advent of computers, it became possible to monetize NLP and the priority shifted to making products people would buy, rather than a system that was scientifically correct."
The competition for a NLP computer program is not another NLP computer program, but call centers in India, Phillipines, onshore prison labor, that kind of "support"