KnowledgeNet: A Benchmark for Knowledge Base Population (opens in new tab)

(blog.diffbot.com)

33 pointsmiket6y ago7 comments

7 comments

7 comments · 5 top-level

nl6y ago· 2 in thread

State-of-the-art models (using BERT) are far from achieving human performance (0.504 vs 0.822).

This is moderately surprising.

In question answering (QA) style tasks (SQUAD, SQUAD 2) we see state of the art models approach human performance. QA is similar to KBC in the sense that the answers are usually extracted from text in a similar way.

I'd imaging there is potential for fairly rapid improvement in this (Knowledge Base Population) task.

g829186y ago

As long as we haven't reached AGI I feel like this is true of any new benchmark. BERT wasn't trained or designed for the task, give some smart folks a few months and they can now beat the task. The bigger question is what would we like an AI to be able to do. Is this benchmark a good one? Is there maybe a better choice of questions to get the type of NLP we want?

nl6y ago

Is this benchmark a good one?

Any benchmark which reflects a task that humans do is a good one, unless it has specific weaknesses that a computer exploits.

I'd use models that are written for this is my work, so I find it useful.

I feel like this is true of any new benchmark.... give some smart folks a few months and they can now beat the task.

In NLP work this use not to be the case. 5 years ago we were stuck at a local maximum.

And this undervalues this task - this bridges the gap between unstructured and structured data. In many ways it is the holy grail for many tasks.

miketOP6y ago

When people think about using computers for Natural Language Processing, they often think about end-tasks like classification, translation, question answering, and models like BERT that model the statistical regularities in text. However, these tasks only measure indirectly how much the system has understood the meaning of the text, are largely unexplainable black boxes, and require reams of training data.

NLP is good enough that we can now explicitly measure how well a system reads text in terms of what knowledge is extracted from it. This task is called Knowledge Base Population, and we've released the first reproducible dataset called KnowledgeNet that measures this task, along with an open source state-of-the-art baseline.

Direct link to the Github repo: https://github.com/diffbot/knowledge-net EMNLP paper: https://www.aclweb.org/anthology/D19-1069.pdf

g829186y ago

Mostly an article pushing their benchmark and article: https://www.aclweb.org/anthology/D19-1069.pdf. In the article they compare existing benchmarks against a criteria they create to show their benchmark is the only one that features the things they say are important. All the others are somehow deficient by the totally objective metric they create.

bhl6y ago

Reminds of a submission from a year ago on autogenerating knowledge base from articles from the web [1]. I think it'd be neat if Q&A Nets and other techniques sufficed to the point where we would prefer using "knowledge engines" over search engines, like a generalized Wolfram Alpha.

[1] https://primer.ai/blog/quicksilver

sdan6y ago

Amazing! Love using Diffbot and although I'm not too deep into the NLP space yet, finding the relations of the text itself is a pretty important task.

j / k navigate · click thread line to collapse

7 comments

7 comments · 5 top-level

nl6y ago· 2 in thread

State-of-the-art models (using BERT) are far from achieving human performance (0.504 vs 0.822).

This is moderately surprising.

I'd imaging there is potential for fairly rapid improvement in this (Knowledge Base Population) task.

g829186y ago

nl6y ago

Is this benchmark a good one?

Any benchmark which reflects a task that humans do is a good one, unless it has specific weaknesses that a computer exploits.

I'd use models that are written for this is my work, so I find it useful.

I feel like this is true of any new benchmark.... give some smart folks a few months and they can now beat the task.

In NLP work this use not to be the case. 5 years ago we were stuck at a local maximum.

And this undervalues this task - this bridges the gap between unstructured and structured data. In many ways it is the holy grail for many tasks.

miketOP6y ago

Direct link to the Github repo: https://github.com/diffbot/knowledge-net EMNLP paper: https://www.aclweb.org/anthology/D19-1069.pdf

g829186y ago

bhl6y ago

[1] https://primer.ai/blog/quicksilver

sdan6y ago

Amazing! Love using Diffbot and although I'm not too deep into the NLP space yet, finding the relations of the text itself is a pretty important task.

j / k navigate · click thread line to collapse