So yeah, I would expect this to be a long ways away.
I don't know if Johns Hopkins became the canonical data source because they were amongst the first to have public data and charts, but honestly I was kinda surprised at the low quality, coming from a group called "Center for Systems Science and Engineering". Their data was far harder to use than it needed to be, even months into the pandemic.
Fortunately there were a handful of other projects dedicated to making it sane and resolving the inconsistencies, unreconciled changes in format, etc... that was really helpful.
Perhaps outputting several potential answers at the end, each explaining the “pathway” it chose to use (filters / decision tree splits + graphical path through keys / joinable types in the underlying data), and allow the user to select one or more results that they believe are valid pathways of criteria, or perhaps tweak individual filters and joins in the listed pathway for a given result.
I think this would offer a lot more value than trying to get a full natural language interface that “just works” on complex filtering conditions, where getting just one answer back (instead of seeing the variety of pathways the system could choose and what influence each step has on the end result) entails too many cases the ML system fails with unrealistic results.
[1] https://ai.googleblog.com/2020/04/using-neural-networks-to-f...
Wang, et.al. (previous SOTA): 44.5
TAPAS: 48.8
TaBERT: 52.3
Seriously, if this is not available, what are the alternatives?
I've seen in the past some NLP + Storage project but I don't recall them. (even remotely connected, there was something to convert PDFs into machine readable data).
Is this AwesomeNLP https://github.com/keon/awesome-nlp a good starting point there?
"A representative example is semantic parsing over databases, where a natural language question (e.g., “Which country has the highest GDP?”) is mapped to a program executable over database (DB) tables."
Could it be thought of in the same fashion as Resolvers in GraphQL integrated into BERT?
Are we entering deep copycat culture?
> Why it matters:
> Improving NLP allows us to create better, more seamless human-to-machine interactions for tasks ranging from identifying dissidents to querying for desperate laid-off software engineers. TaBERT enables business development executives to improve their accuracy in answering questions like “Which hot app should we buy next?” and “Which politicians will take our bribes?” where the answer can be found in different databases or tables.
> Someday, TaBERT could also be applied toward identifying illegal immigrants and automated fact checking. Third parties often check claims by relying on statistical data from existing knowledge bases. In the future, TaBERT could be used to map Facebook posts to relevant databases, thus not only verifying whether a claim is true, but also rejecting false, divisive and defamatory information before it's shared.
Most of the data in the world is in tables, and most people don't speak SQL. A very large purpose of computers has been for managing and querying this data and there is nothing nefarious at all about that.
Translating between the two is something that there have been many attempts at.