Show HN: A search engine that lets you refine your queries (opens in new tab)

(occamm.com)

86 pointsKetan-fullstack4y ago33 comments

33 comments

There are to many tags to be useful, and the quality of tags is low.

For example, I searched for banana boat, thinking of the song. "Song" was not one of the tags offered. Nor "day-o", nor Belafonte. But I did get "formation of an attorney-client relationship" and "High levels of benzene". It looks like both of those have a high number of other, related tags, like someone pulled a bunch of tags from one or two web pages. I see what looks like a cluster of "sun & sunscreen related" good, but should only be one or two tags. "Food", also good. Also should be just one or two tags. "Law", shouldn't be there at all, unless there is some POPULAR legal aspect of banana boats I'm not aware of, and "toxic chemicals", should be one or two at most, and probably should not be there at all.

I think something is wrong with your clustering. You are not getting the right clusters, and you are going too deep into the clusters you have.

FYI: The song is the first video hit on google, and the top of the second page of general results.

Tried star trek. Still too many tags. Tags are more related, but there are ones that clearly aren't important enough in relation to warrant being a top tag, like "former Viacom", "smaller population countries", "notable exceptions", "extreme example", "generic rule of thumb".

Keep refining! :-)

Ketan-fullstackOP4y ago

Thanks for your feedback!

yes, you are spot on about the quality of the tags being low for most searches. The current refinement algorithm will get better as more and more people use the service and that would reduce the number of suggested tags and improve their quality over time.

I am however working on modifying the algorithm to make the refinement more useful for current users.

smilespray4y ago

That's your classic chicken and egg problem. Few will use the service if the user experience is poor, and your algorithm will not get much useful input.

If you could figure out how to improve it without essentially becoming Google and spending 2 decades and billions of dollars, then we're talking...

shamilbi4y ago

> We share your IP address and your search query with Microsoft Bing, as we use the BING API as our search engine backend

mikkom4y ago

Wouldn't this kind of use go against bing terms of use as they probihit using the results for any kind of machine learning training?

> [Do not] Use data received from the Search APIs as part of any machine learning or similar algorithmic activity. Do not use this data to train, evaluate, or improve new or existing services that you or third parties might offer.

https://docs.microsoft.com/en-us/azure/cognitive-services/bi...

Note: I'm partly asking this because some time ago I was thinking of setting up a little bit similar thing and I decided it would go against bing api terms.

freediver4y ago

Why would it need to share client's IP address when it can just proxy it? Unless it is doing search queries directly from the client?

Kiro4y ago

Same as DuckDuckGo then.

dotancohen4y ago

Nice job. Linux and Ubuntu related searches have very relevant results.

The only constructive criticism that I have would be regarding presentation. There is far too little contrast between the results URL and the white background. And I don't know what markup language you're using but results cannot be opened in a new tab via middle click (current Firefox, current Ubuntu). Not to mention that my Tridactyl extension is completely useless on the site.

Please, just serve results in normal HTML and let the browser handle the presentation. For every clever behavior you think you're adding, you are actually breaking a feature that somebody uses.

dotancohen4y ago

More constructive criticism: This is probably not how you want the "Q&A communities" tag to display:

  > Q&amp;A communities

ykevinator24y ago

Its a great idea and you should keep going with this (and nice work so far). You may want to consider the idea of only presenting disambiguating tags. A simple heuristic like finding the top x tags of close to equal cardinality and throw the rest away. The value of what you've built is disambiguation and if you present every single tag, most of those tags are noise. Hope this helps.

xtiansimon4y ago

> “disambiguating”

Interesting to think about.

Something like—High frequency tags may be present in the name or as synonymous with the search term, whereas less common/frequency terms, which would characterize the link, would not appear.

I’ve not done any work on this topic but it’s very interesting.

I’m thinking about search on e-commerce sites (lol. holiday shopping) where you’re searching on a general term or type of product. The results are mixed, so you sort by price. Let’s see what a high value item looks like, and a low value; this is an evaluation heuristic to reveal different quality products.

Maybe there should be both tags then? A few high frequency and a few low frequency?

Cilvic4y ago

It seemed to work Ok.

I found the number of tags far too many to be really useful. I'm searching for an answer/not more problems.

UI wise I was irritated, that clicking the tag only had a visible effect after I moved the mouse away. First time i clicked (toggled on), then clicked again (toggled off).

I was thinking about a search engine that would help me chose the right direction of there are multiple interpretations.

My test was searching for "quasar" and seeing if the vue js framework would be offered as tag (which it's not). But it's actually result number 3?

How are the tags related to the results?

I've also found it irritating that selecting the tags need to be "applied" without having an idea of what it will do, or how many results there are.

Maybe apply the selected tag immediately and keep the tags in the same UI location (don't reload them, like you currently do after applying).

I'm not sure what's the perfect search example is for this. Because narrowing down by keyword is easy. Here it's almost like it helps you discover keywords/tags that are yet unknown?

marginalia_nu4y ago

Neat. I do think there is a lot of room to refine the search engine interface. If the dumpster fire that's Google it's pretty clear that natural language search is just not a good approach. It's extremely frustrating when the search engine is second guessing you.

Your "tags" look like search terms, like key n-grams. I'm guessing because when I extract search terms from a document, I get stuff of a similar "vibe". Maybe easier if I show than try to describe, this is from a unit test I have that runs keyword extraction code on a few documents. I think these are from some page about SSH clients

  unix_source, msi, ssh_authentication_agent, gitweb, pscp, cryptographic_checksums, windows_source, ssh, puttytel, command-line_secure, telnet, windows_html, most_up-to-date_version, psftp, zip, scp, standalone, version_of_putty, scp_client, dsa, putty, plink, rsa_and_dsa, sftp, binaries, unix, up-to-date_version, checksums, rsa, individual_executables, 64-bit_arm, 64-bit_version, source_archive, .zip_archive, 32-bit_arm, latest_release, checksum, standalone_binaries, cryptographic, windows_on_arm, ftp, zipped, versions_of_putty, alternative_binary_files, unix_source_archive, download_putty, sftp_client, latest_released_version, ssh_and_telnet_client, ssh_and_telnet

I've been playing with the idea of doing some sort of Naive Bayes-categorization of the general topic of a web page and using those to offer a broad filtering on my search results. May be a lot of work since it relies on a degree of manual curation, but seems be doable once you have a decent model cooked up. I'd like, when you search for Plato, to be able to offer the alternative of refining the search to the philosopher or the computer system.

visarga4y ago

It's fast and apparently wide enough, or at least I found what I searched for. Summarisation is nice.

UI: The refining keywords are hard to read because they are not sorted. The gradient borders around search results are a bit too much, better go with something more understated.

It would be nice to tell a few things about the underlying technology. Are you using other search engines under the hood? How large is the index?

Ketan-fullstackOP4y ago

Thank you for your feedback!

I take both your points about the UI. Tags are a bit overwhelming. i am working on solving that problem.

Underlying Tech: i am using clojure/redis/mongo on the backed. React/redux on the front end. I am partially using my own index but for the most part (especially for new searches) relying on Bing api.

Cilvic4y ago

I'm contemplating offering "indexing as a service" could you share whether you at any point thought about buying a (partial) index? And if so, what are parameters you'd like to set: source pages, intervall etc. And lastly how to consume the index.

Ketan-fullstackOP4y ago

Hi, I have been working on this search engine for the last few months. After you search for something, you can select up to 4 tags to further refine your search.

Feedback please :)

jrussbowman4y ago

Good luck with the project. Out of curiosity are you crawling and indexing yourself or using a search api?

It's really fast, I like that. The amount of tags to choose from is a bit overwhelming.

Hope you get good responses here, the couple times I've posted my search engine I haven't gotten a lot of feedback.

m-i-l4y ago

@Ketan-fullstack & @jrussbowman Good luck with your projects. I sometimes wonder if us independent search engine developers should set up a discord or something to chat about what we're doing, because there are quite a few of us working on similar (but not necessarily competing) things.

Ketan-fullstackOP4y ago

Thanks for the feedback!

I am partially indexing on my own but as of now mostly relying on Bing API.

Yes, the number of tags and their quality is an issue. The current keyword suggestion algorithm will get better as more and more people use it.

I would love to checkout your search engine if you have a link handy

1 more reply

alangibson4y ago

Bing is the only search engine that still has a useful public pay as you go API.

Though Startpage claims to be backed by Google, so it must be possible to get a contact with them.

gathersight4y ago

Nice and fast. How are you extracting the data for the Summary tab? On the searches I tried, it pulled in a block of, seemingly, random text from mid-way down the destination page. It didn't appear to be summarizing anything.

tmsh4y ago

works really well for general topics like nextjs (helps capture what's out there to search for).

Ketan-fullstackOP4y ago

Thanks! i am glad :)

TruthWillHurt4y ago

Amazing. However - I did a test where I searched for "Python cast" and got variable type casting results.

I then wanted to refine the search to the cast of Monty Python, but had no relevant tags to do it with.

In google the 6th result on first page was Monty Python wikipedia page.

boffinism4y ago

I looked at this on mobile and on my first attempt had no idea what the point of this was. (Hint to others on mobile: scroll to the bottom of the search results page.)

sealeck4y ago

https://search.marginalia.nu is another interesting search engine

senectus14y ago

hot damn!

Please add the ability to scroll down the popped up suggestions with cursor keys!

Ketan-fullstackOP4y ago

Thanks!

keyboard input on the auto suggest drop down will be there on the next update, it’s already a work in progress!

Again, thanks for your feedback!

senectus14y ago

would like a search field in the filter panel that lets me search of key words in the keyword list.

yes I know I can ctrl F, but if I didn't have to that would be faster :-P

senectus14y ago

how do we add this as a custom ending in chromium?

do we still use the same qury as the bing one? {bing:baseURL}search?q=%s&{bing:cvid}{bing:msb}{google:assistedQueryStats}

j / k navigate · click thread line to collapse

33 comments

_tom_4y ago

There are to many tags to be useful, and the quality of tags is low.

I think something is wrong with your clustering. You are not getting the right clusters, and you are going too deep into the clusters you have.

FYI: The song is the first video hit on google, and the top of the second page of general results.

Keep refining! :-)

Ketan-fullstackOP4y ago

Thanks for your feedback!

I am however working on modifying the algorithm to make the refinement more useful for current users.

smilespray4y ago

That's your classic chicken and egg problem. Few will use the service if the user experience is poor, and your algorithm will not get much useful input.

If you could figure out how to improve it without essentially becoming Google and spending 2 decades and billions of dollars, then we're talking...

shamilbi4y ago

> We share your IP address and your search query with Microsoft Bing, as we use the BING API as our search engine backend

mikkom4y ago

Wouldn't this kind of use go against bing terms of use as they probihit using the results for any kind of machine learning training?

https://docs.microsoft.com/en-us/azure/cognitive-services/bi...

Note: I'm partly asking this because some time ago I was thinking of setting up a little bit similar thing and I decided it would go against bing api terms.

freediver4y ago

Why would it need to share client's IP address when it can just proxy it? Unless it is doing search queries directly from the client?

Kiro4y ago

Same as DuckDuckGo then.

dotancohen4y ago

Nice job. Linux and Ubuntu related searches have very relevant results.

Please, just serve results in normal HTML and let the browser handle the presentation. For every clever behavior you think you're adding, you are actually breaking a feature that somebody uses.

dotancohen4y ago

More constructive criticism: This is probably not how you want the "Q&A communities" tag to display:

  > Q&amp;A communities

ykevinator24y ago

xtiansimon4y ago

> “disambiguating”

Interesting to think about.

Something like—High frequency tags may be present in the name or as synonymous with the search term, whereas less common/frequency terms, which would characterize the link, would not appear.

I’ve not done any work on this topic but it’s very interesting.

Maybe there should be both tags then? A few high frequency and a few low frequency?

Cilvic4y ago

It seemed to work Ok.

I found the number of tags far too many to be really useful. I'm searching for an answer/not more problems.

UI wise I was irritated, that clicking the tag only had a visible effect after I moved the mouse away. First time i clicked (toggled on), then clicked again (toggled off).

I was thinking about a search engine that would help me chose the right direction of there are multiple interpretations.

My test was searching for "quasar" and seeing if the vue js framework would be offered as tag (which it's not). But it's actually result number 3?

How are the tags related to the results?

I've also found it irritating that selecting the tags need to be "applied" without having an idea of what it will do, or how many results there are.

Maybe apply the selected tag immediately and keep the tags in the same UI location (don't reload them, like you currently do after applying).

I'm not sure what's the perfect search example is for this. Because narrowing down by keyword is easy. Here it's almost like it helps you discover keywords/tags that are yet unknown?

marginalia_nu4y ago

  unix_source, msi, ssh_authentication_agent, gitweb, pscp, cryptographic_checksums, windows_source, ssh, puttytel, command-line_secure, telnet, windows_html, most_up-to-date_version, psftp, zip, scp, standalone, version_of_putty, scp_client, dsa, putty, plink, rsa_and_dsa, sftp, binaries, unix, up-to-date_version, checksums, rsa, individual_executables, 64-bit_arm, 64-bit_version, source_archive, .zip_archive, 32-bit_arm, latest_release, checksum, standalone_binaries, cryptographic, windows_on_arm, ftp, zipped, versions_of_putty, alternative_binary_files, unix_source_archive, download_putty, sftp_client, latest_released_version, ssh_and_telnet_client, ssh_and_telnet

visarga4y ago

It's fast and apparently wide enough, or at least I found what I searched for. Summarisation is nice.

UI: The refining keywords are hard to read because they are not sorted. The gradient borders around search results are a bit too much, better go with something more understated.

It would be nice to tell a few things about the underlying technology. Are you using other search engines under the hood? How large is the index?

Ketan-fullstackOP4y ago

Thank you for your feedback!

I take both your points about the UI. Tags are a bit overwhelming. i am working on solving that problem.

Underlying Tech: i am using clojure/redis/mongo on the backed. React/redux on the front end. I am partially using my own index but for the most part (especially for new searches) relying on Bing api.

Cilvic4y ago

Ketan-fullstackOP4y ago

Hi, I have been working on this search engine for the last few months. After you search for something, you can select up to 4 tags to further refine your search.

Feedback please :)

jrussbowman4y ago

Good luck with the project. Out of curiosity are you crawling and indexing yourself or using a search api?

It's really fast, I like that. The amount of tags to choose from is a bit overwhelming.

Hope you get good responses here, the couple times I've posted my search engine I haven't gotten a lot of feedback.

m-i-l4y ago

Ketan-fullstackOP4y ago

Thanks for the feedback!

I am partially indexing on my own but as of now mostly relying on Bing API.

Yes, the number of tags and their quality is an issue. The current keyword suggestion algorithm will get better as more and more people use it.

I would love to checkout your search engine if you have a link handy

1 more reply

alangibson4y ago

Bing is the only search engine that still has a useful public pay as you go API.

Though Startpage claims to be backed by Google, so it must be possible to get a contact with them.

gathersight4y ago

tmsh4y ago

works really well for general topics like nextjs (helps capture what's out there to search for).

Ketan-fullstackOP4y ago

Thanks! i am glad :)

TruthWillHurt4y ago

Amazing. However - I did a test where I searched for "Python cast" and got variable type casting results.

I then wanted to refine the search to the cast of Monty Python, but had no relevant tags to do it with.

In google the 6th result on first page was Monty Python wikipedia page.

boffinism4y ago

I looked at this on mobile and on my first attempt had no idea what the point of this was. (Hint to others on mobile: scroll to the bottom of the search results page.)

sealeck4y ago

https://search.marginalia.nu is another interesting search engine

senectus14y ago

hot damn!

Please add the ability to scroll down the popped up suggestions with cursor keys!

Ketan-fullstackOP4y ago

Thanks!

keyboard input on the auto suggest drop down will be there on the next update, it’s already a work in progress!

Again, thanks for your feedback!

senectus14y ago

would like a search field in the filter panel that lets me search of key words in the keyword list.

yes I know I can ctrl F, but if I didn't have to that would be faster :-P

senectus14y ago

how do we add this as a custom ending in chromium?

do we still use the same qury as the bing one? {bing:baseURL}search?q=%s&{bing:cvid}{bing:msb}{google:assistedQueryStats}

j / k navigate · click thread line to collapse