I think that the real innovation will be when users are given exposure to lots of different models, and have the pros and cons of these models are properly explained to them. Maybe I want to use this on specialized bio-medical literature and would be better off with a model fine-tuned in that domain instead of on Squad.
Also, shameless self-plug, I wrote a system that does extractive summarization/highlighting of documents which is in principle very similar to what is going on here (https://github.com/Hellisotherpeople/CX_DB8). For awhile, I had a hosted, web accessible version of this system available to make it easy to show it off to interviewers. It could highlight the important parts of a web-page based on a user query at either the word, sentence, n-gram, or paragraph level. I figured that the next step was to make it a browser extension. I simply wasn't proficient enough in JS and at the time I was working on this, quantized/pruned models were slightly less good. I firmly believe that making high quality semantic search work everywhere will be an extreme (and obvious) step-forward for most peoples daily tasks. What a brave new world we are entering!
I'm looking for an open source solution to find algorithm names inside the academic articles (normally PDF), and perhaps on the web too
Is there any suggestion?
Could be interesting to compare the similarities of the semantics of the algorithms as understood by an NLP model. E.g. depth-first-search vs. monte-carlo, or dijkstra's vs Kruskal's. Both used in similar contexts, so you could group algorithms into families. I'd love to see more NLP-driven meta-analysis of scientific literature.
I also noticed it can crash browsers too .