Show HN: Shift-Ctrl-F: Search a webpage with natural language and TensorFlowJS (opens in new tab)

(github.com)

143 pointsyoavz5y ago27 comments

27 comments

27 comments · 16 top-level

ReD_CoDE5y ago· 2 in thread

Wow, great!

I'm looking for an open source solution to find algorithm names inside the academic articles (normally PDF), and perhaps on the web too

Is there any suggestion?

codemonkey-zeta5y ago

I've thought about doing this before as well. One challenge you might face occurs when one algorithm goes by different names in different circles. I can't think of good examples that I ran into, but some statistical methods have one name when physicists use it, another for biologists, another for statisticians, etc.

Could be interesting to compare the similarities of the semantics of the algorithms as understood by an NLP model. E.g. depth-first-search vs. monte-carlo, or dijkstra's vs Kruskal's. Both used in similar contexts, so you could group algorithms into families. I'd love to see more NLP-driven meta-analysis of scientific literature.

ReD_CoDE5y ago

I couldn't agree more, and I think we can make a graph that shows parent and childs. The main algorithm and its children

zitterbewegung5y ago· 2 in thread

I tried using tfjs and Bert with 87 kilobytes of text do you have a similar issue ?

yoavzOP5y ago

The extension feeds it context data in chunks (e.g. only one <p> element at a time). I could see TensorFlowJS BERT struggling with a single sequence of 87 KB -- have you tried breaking them up?

zitterbewegung5y ago

I was putting in a bunch of poetry that I wrote into Bert or a screenplay. I guess that is a limitation .

I also noticed it can crash browsers too .

kevincox5y ago· 1 in thread

I think more natural language and context is the next huge step in search. The README has a good example where you find a section comparing two things. An example I run into often is trying to find emails or texts about an event. I know the date that the event occurred but I might have said "tomorrow", "tuesday", "the 25th", "2020-08-25" or "yesterday". These all refer to the same date can could be indexed, however Now I need to search for all of these with different date restrictions to find the hits and not show the misses.

yoavzOP5y ago

I love that fuzzy date search use-case, will try testing that out.

Der_Einzige5y ago· 1 in thread

I've been waiting for someone to do a proper semantic search plugin in a browser for awhile. There was one awhile back called... Fuzbal ... which used word2vec and was good but has not been updated. You've implemented a more question-answer based approach. This is awesome!

I think that the real innovation will be when users are given exposure to lots of different models, and have the pros and cons of these models are properly explained to them. Maybe I want to use this on specialized bio-medical literature and would be better off with a model fine-tuned in that domain instead of on Squad.

Also, shameless self-plug, I wrote a system that does extractive summarization/highlighting of documents which is in principle very similar to what is going on here (https://github.com/Hellisotherpeople/CX_DB8). For awhile, I had a hosted, web accessible version of this system available to make it easy to show it off to interviewers. It could highlight the important parts of a web-page based on a user query at either the word, sentence, n-gram, or paragraph level. I figured that the next step was to make it a browser extension. I simply wasn't proficient enough in JS and at the time I was working on this, quantized/pruned models were slightly less good. I firmly believe that making high quality semantic search work everywhere will be an extreme (and obvious) step-forward for most peoples daily tasks. What a brave new world we are entering!

yoavzOP5y ago

Pretty cool. The embedding similarity approach makes a lot of sense. I actually this project by experimenting with computing cosine similarities of sentence embeddings [1]. But I wasn't very impressed with out-of-the-box results, and I found it difficult to set a similarity threshold for a match. QA was the second try, and the pretrained models worked better out of the box. I'm wondering if I should revisit the embedding approach now...

[1] https://github.com/UKPLab/sentence-transformers

lbj5y ago· 1 in thread

Wow. Now that’s an innovative and brilliant way to improve one our oldest tools. Certainly could by relevant in a general sense for much more than browsing

yoavzOP5y ago

Thanks, really nice to hear :)

adrianmonk5y ago· 1 in thread

Interesting idea for sure. I wasn't able to understand much from the demo image, though. The animation is fast, and all I can see about the result is that the word "lower" is highlighted/matched. I was hoping to get an idea of what results it finds and how relevant they are to the search.

throwaway7446785y ago

They describe the sample search two paragraphs below.

paraschopra5y ago· 1 in thread

Isn’t this exactly like what Google released as open source a couple of months ago https://github.com/tensorflow/tfjs-models/tree/master/qna

compressedgas5y ago

This is glue for that that makes it a browser extension.

de6u99er5y ago· 1 in thread

Does the use TensoflowJS mean that search is being performed locally?

yoavzOP5y ago

Yep -- no calls to any API backend.

krick5y ago· 1 in thread

Did someone try it? Is it actually good?

JoshMandel5y ago

I tried it just now; after "make develop" and installing the unpacked extension in Chrome developer mode, I can't get it to produce anything except "No Results," even when search for words and phrases that appear verbatim on a page. It took ~20 seconds to load the first time and ~1 second per kB of text each time I search, but no hits.

roland-s5y ago

OpenAI API has a similar demo, the Wikipedia one at https://openai.com/blog/openai-api/

dvaun5y ago

This is definitely an interesting project. I'll give it a shot with Chrome the next time I'm scouring Reddit or HN for information when I'm doing research for a project

dzhiurgis5y ago

Could be an awesome plug-in for IntelliJ (which is already super awesome compared to macos spell checker and text navigation)

arey_abhishek5y ago

Great product! How is the data corpus being fed? Could this work as a chrome extension for any page?

nloui5y ago

Very cool. Also just signed up for your product beta.

scott315y ago

Should have been called Ctrl-Shift-F

xzyaoi5y ago

Awesome!

j / k navigate · click thread line to collapse

27 comments

27 comments · 16 top-level

ReD_CoDE5y ago· 2 in thread

Wow, great!

I'm looking for an open source solution to find algorithm names inside the academic articles (normally PDF), and perhaps on the web too

Is there any suggestion?

codemonkey-zeta5y ago

ReD_CoDE5y ago

I couldn't agree more, and I think we can make a graph that shows parent and childs. The main algorithm and its children

zitterbewegung5y ago· 2 in thread

I tried using tfjs and Bert with 87 kilobytes of text do you have a similar issue ?

yoavzOP5y ago

The extension feeds it context data in chunks (e.g. only one <p> element at a time). I could see TensorFlowJS BERT struggling with a single sequence of 87 KB -- have you tried breaking them up?

zitterbewegung5y ago

I was putting in a bunch of poetry that I wrote into Bert or a screenplay. I guess that is a limitation .

I also noticed it can crash browsers too .

kevincox5y ago· 1 in thread

yoavzOP5y ago

I love that fuzzy date search use-case, will try testing that out.

Der_Einzige5y ago· 1 in thread

yoavzOP5y ago

[1] https://github.com/UKPLab/sentence-transformers

lbj5y ago· 1 in thread

Wow. Now that’s an innovative and brilliant way to improve one our oldest tools. Certainly could by relevant in a general sense for much more than browsing

yoavzOP5y ago

Thanks, really nice to hear :)

adrianmonk5y ago· 1 in thread

throwaway7446785y ago

They describe the sample search two paragraphs below.

paraschopra5y ago· 1 in thread

Isn’t this exactly like what Google released as open source a couple of months ago https://github.com/tensorflow/tfjs-models/tree/master/qna

compressedgas5y ago

This is glue for that that makes it a browser extension.

de6u99er5y ago· 1 in thread

Does the use TensoflowJS mean that search is being performed locally?

yoavzOP5y ago

Yep -- no calls to any API backend.

krick5y ago· 1 in thread

Did someone try it? Is it actually good?

JoshMandel5y ago

roland-s5y ago

OpenAI API has a similar demo, the Wikipedia one at https://openai.com/blog/openai-api/

dvaun5y ago

This is definitely an interesting project. I'll give it a shot with Chrome the next time I'm scouring Reddit or HN for information when I'm doing research for a project

dzhiurgis5y ago

Could be an awesome plug-in for IntelliJ (which is already super awesome compared to macos spell checker and text navigation)

arey_abhishek5y ago

Great product! How is the data corpus being fed? Could this work as a chrome extension for any page?

nloui5y ago

Very cool. Also just signed up for your product beta.

scott315y ago

Should have been called Ctrl-Shift-F

xzyaoi5y ago

Awesome!

j / k navigate · click thread line to collapse