Context: My company has become popular for explaining science to kids. We now see how hard it is for kids to explore their questions so we asked elementary teachers: if your students ask a question you can’t answer, submit it to us
We received 600K well-formed questions! It’s fascinating to browse. We’ve been so inspired by this that we now plan to answer every question and build a video-Wikipedia-for-kids (living at mystery.org). E.g. Here’s some random questions: https://tinyurl.com/y43bwtek
First we wonder: what’s the most popular question? This is tough! “Synonymous” questions look so different. Take a look at these, you’d give the same explanation to all three kids:
Why is there sand on beaches? How is sand made? Where does sand come from?
Ugh. Before we embark on de-duping, we’re first trying to find the topic(s) of each question. Organizing topics into a kid’s conceptual hierarchy will be useful:
Living things > Birds > Penguine Man-made things > Food > Junk food > French fries
NLP libraries help us get parts of speech & the “topics” we want are the nouns and verbs of the sentence, but not all are meaningful topics. E.g.:
Why do spoons and forks go in a certain place when setting the table?
“Place” is a noun, but it’s too vague here to be a topic. We would not want to let kids “Browse all questions about place.” So we need the subset of nouns which are meaningful topics.
> How did people make glue?
“Make” is a verb. In most questions, “make” is not a topic. But here it’s being used in a significant way. We’d want to list under:
All questions about making/invention All questions about glue
Any advice to filter down nouns & verbs in a sentence to the kid-friendly topics?