Launch HN: Hello (YC S22) – A search engine for developers

273 pointswayy3y ago199 comments

Hi HN, we’re Michael and Justin from Hello Cognition (https://beta.sayhello.so). We're building a better search engine for software developers. Hello saves you time by synthesizing clear explanations to technical questions along with code snippets from the web, showing them right on the search page.

We’ve found that most technical searches fall into a few categories: ad-hoc how-tos, understanding an API, recalling forgotten details, research, or troubleshooting. Google is too broad and shallow of a search tool to be good at this. Even after sifting through the deluge of spammy, irrelevant sites pumped full of SEO, you still have to manually find your answer through discussion boards or documentation. Their “featured snippet” approach works for simple factoid queries but quickly falls apart if a question requires reasoning about information across multiple webpages.

Our approach is narrow and deep — to retrieve detailed information for topics relevant to developers. When you submit a query, we pull raw site data from Bing, rerank them, and extract understanding and code snippets with our proprietary large language models. We use seq-to-seq transformer models to generate a final explanation from all of this input.

For our honors theses at UT Austin, we researched prototypes of large generative language models that can answer complex questions by combining information from multiple sources. We found that GPT-3, GPT-Neo/J/X, and similar autoregressive language models that predict text from left to right are prone to “hallucinating” and generating text inconsistent with the “ground truth” document. Training a sequence-to-sequence language model (T5 derivative) on our custom dataset designed for factual generation yielded much better results with less hallucination.

After creating this prototype, we started actively developing Hello with the idea that searching should be just like talking to a smart friend. We want to build an engine that explains complex topics clearly and concisely, and lets users ask follow-up questions using the context of their previous searches.

For example, when asked “what type of semaphore can function as a mutex?”, Hello pulls in the raw text from all five search results linked on the search page to generate: “A binary semaphore can be used as a mutex. Mutexes and semaphores are two different types of synchronization mechanisms. A mutex is a lock that prevents two threads from accessing the same resource at the same time. A semaphore is used to signal that a resource has become available.” We're biased, of course, but we think that the ability to reason abstractly about information from multiple web pages is a cool thing in a search engine!

We use BERT-based models to extract and rank code snippets if relevant to the query. Our search engine currently does well at answering applicable how-to questions such as “Sort a list of tuples by the second element”, “Set a response cookie in FastAPI”, “Get value of input in React”, “How to implement Dijkstra's algorithm.” Exclusively using our own models has also freed us from dependence on OpenAI.

Hello is and will always be free for individual devs. We haven’t rolled out any paid plans yet, but we’re planning to charge teams per user/month to use on internal data scattered around in wikis, documentation, slack, and emails.

We started Hello Cognition to scratch our own itch, but now we hope to improve the state of information retrieval for the greater developer community. If you'd like to be part of our product feedback and iteration process, we'd love to have you—please contact us at founders@sayhello.so.

We're looking forward to hearing your ideas, feedback, comments, and what would be helpful for you when navigating technical problems!

Launch HN: Hello (YC S22) – A search engine for developers

273 pointswayy3y ago199 comments

We're looking forward to hearing your ideas, feedback, comments, and what would be helpful for you when navigating technical problems!

199 comments

167 comments · 73 top-level

joshstrange3y ago· 19 in thread

First Impressions:

* I won't use a different search engine for programmers stuff vs everything else. So while this might be targeted toward software developers I can't see myself using it unless it can handle normal searches.

* UI/UX - I hate the progress bar, I'm not sure at all what it's telling me as there are results shown while it's still completing. The results are way too spaced out. On my 27" 2K screen I can only see 3 results, the search bar takes up way too much space and there is way too much padding on the results. Don't move the DOM on me, removing the progress bar is jarring as is "Was this answer helpful?" popping in, I'm here for results, not to train your ML.

* Trackers - Using the default installs of Privacy Badger and uBlock Origin meant no results ever loaded. I'm not sure what was being blocked that caused the issue but cookies from bing [0] and a request to cloudflareinsights.com [1] should not hamper showing results.

Search is a tool and one that I need to be quick, simple, and informationally dense. This checks almost none of those boxes. I'm even open to using a different search engine (I semi-recently switched from Google to Ecosia and it's been near-seamless), but I don't see any "pro" to using this engine and I see a ton of "cons".

[0] https://cs.joshstrange.com/V5uiyM

[1] https://cs.joshstrange.com/GkPuap

EDIT: I did a few more searches because I realized I wasn't getting the "info box"/ML results on my first few searches and I wanted to be fair. Sorry but that made me dislike this even more. I really, really hate content moving out from under me. My eyes start reading one of the 3 results that was shown then they got pushed down for another overly-padded box that tried to "answer" my question. The results were worse than "grab the selected answer from the first SO that matches this query". Maybe it would be better if that info was shown off to the side and didn't move the results when it loaded in but again, I didn't find it useful in the queries where it showed up.

EDIT2: I posted a follow up comment about what, specifically, I think should be changed: https://news.ycombinator.com/item?id=32005841

_tom_3y ago

I definitely see the benefit of a separate code search engine and would use it. Google has, as you have notice, gotten way less useful. A big part of that is trying to target the wider market to the exclusion of less popular searches, like developers'. A dedicated engine would help with that.

I'm more than willing to open another tab to not have a search result page full of YouTube videos.

joshstrange3y ago

To each their own. I have 1 flow for search which I don't plan on changing and I personally have no issues with google search (for code, technical, or otherwise). I still consider it to be one of the best and I don't agree with the "google search is getting worse" crowd. Maybe at some point I'll use some kind of "search hydra" that hits multiple engines and either combines the results or shows me the results based on the type of search it thinks I'm doing but I don't imagine I'll ever want to consciously switch engines based on task.

Also I've never seen a "search result page full of YouTube videos" no matter what the query was. Sometimes there will be 3 or a carousel of them near the top but I can easily ignore that (assuming they aren't relevant or useful to me). I can't remember the last time I got a video for a code/technical query on google, I just did some testing and only a few queries showed videos, always 3, always partway down the page so that the search results at the top answered what I needed before I even got to the videos.

1 more reply

visarga3y ago

I think you missed the forest for the trees... it is a Q&A system not just a search engine. You can talk to it, you can refine your queries.

To the SayHello team: kudos for being faster than Google to release a Q&A+search system. I was expecting something like this for a couple of years wondering why Google was sleeping on its mountain of papers and not doing it.

Search was the first step in finding information, Q&A is the next logical step. Language models+search such as DeepMind RETRO have shown this approach to be very efficient: 25x reduction in model size for the same perplexity and verifiable correct answers with source document references.

In the future I expect search to become more like an assistant with context and language abilities. Retrieving a bunch of web pages is so 2000's. Q&A is especially relevant for mobile use with speech interface (hello Siri and Google Assistant).

joshstrange3y ago

> I think you missed the forest for the trees... it is a Q&A system not just a search engine. You can talk to it, you can refine your queries.

From the creators:

> We're building a better search engine for software developers.

Also no you can't "talk to it", I'm not sure where you got that idea. It has a "Ask a follow up" but that performs a new search with none of the context of your previous search (also this UI of sliding a modal up from the bottom and layering the results is terrible).

> Search was the first step in finding information, Q&A is the next logical step.

And we are clearly not there. Not only does this not allow you to ask follow-ups to refine but it doesn't give good results in my testing.

1 more reply

8organicbits3y ago

> * I won't use a different search engine for programmers stuff vs everything else.

I'd use this as a ddg bang[1]. I don't use them often, as ddg is a great search engine, but some search engines handle certain queries better and ddg lets you route queries efficiently.

https://duckduckgo.com/bang

rushingcreek3y ago

Thanks for the feedback. We agree that speed is incredibly important, and we're working on making searches much faster. We'll be iterating on the UI/UX as well, as we think that we can definitely do better and be more efficient with space.

I'd love to hear more about what you mean by "informationally dense" -- some search engines simply show more information on the results page, but that doesn't make results inherently better in my opinion because it frequently simply increases noise relative to signal.

Our current approach is to provide only the most relevant answers/code snippets and nothing else (high signal with low noise) as opposed to cramming in every Stack Overflow answer we can find. We realize we still have a long way to go to make it magical for every search, but we're working on it.

joshstrange3y ago

Consistent, non-moving (as things load) UI is super important. Also something that bugged me but I didn't realize why until now: don't use the full width for the description under links (or the titles for that matter). We've known for some time now that if you make text too wide it becomes harder to read.

My suggestions:

* Kill the padding/margins, it's pretty for demos or certain cases but I want to be able to see more information, heavy padding/margins have no place in search results.

* Shrink the search bar to the upper left like every other search engine. Keeping it centered with tons of padding wastes space. Take your logo and put it to the left of the search field, take the buttons and put them to the right. On my screen you are burning a little over 500px of vertical space with things that don't matter, the results matter.

* Shrink your "regular" search results to be half the width of the screen (on desktop, something like a max of 700, Google uses ~640 as does Ecosia). Use the space to the right to show your AI/ML results. This means no content will jump around and people can more easily read the results, full-width is very hard to read. Also shorten the "description" under the links. 2 lines max (at 640px width).

* Either don't ask "Was this answer helpful?" (use hints like: Did they click the link? Did they leave the site after seeing the results?) OR don't make it move the content (hold the space empty if you must animate it in, just don't let the content shift multiple times after doing a search).

Here is your default result for "this is a test" search query: https://cs.joshstrange.com/oKbz6G

Here it is with a bunch of padding/margins removed: https://cs.joshstrange.com/VEVXGh

Yes, I removed the logo/buttons because that was faster than moving them to the left/right of the search but the end result is the same. In my tightened up version you can fit 8+ result links where the initial version could only show 3, also all the results are easier to read.

1 more reply

detaro3y ago

> some search engines simply show more information on the results page, but that doesn't make results inherently better in my opinion because it frequently simply increases noise relative to signal.

But if the "automatic" answer fails and I need to skim results, as I'll often need to do, you put 3 result previews in a space DDG and Google fit 5. They also apply reasonable defaults for the max line length - a basic typography thing that improves quick readability a lot.

wayyOP3y ago

To comment on the dynamic DOM - we're displaying up to three answer types (text, code, links) for each question. We're loading them independently to get information to the user as fast as possible. The alternative (in this state) is to have all of them wait until the slowest component finishes. We're still in the early stages of development, so either way it's not going to be perfect. I can see how this can be a poor experience for some - we're working on it.

joshstrange3y ago

Side-by-side is the best way to handle this. Show results on the left and load in the ML stuff on the right after it loads. This prevents content-jump and makes the results less wide (you want to aim for <700px to be more readable).

visarga3y ago

Maybe not everyone understands that the Q&A responses have to go through a large language model before they are displayed. This takes time, showing something while the LM is churning away is a good idea.

1 more reply

unsafecast3y ago

A placeholder empty box that gets populated when the content arrives would be an improvement.

hbn3y ago

Also the scraped snippet appears at the top of the results a couple seconds after the results load and it causes all the results below it to suddenly jerk lower on the page

discreteevent3y ago

Lowest common denominator and "one click" is killing the internet for me. A lot of times I am the lowest common denominator and so that's fine. But when I am a specialist I want something that lets me be specific and gives me specific results.

visarga3y ago

How much more specific do you need than having the ability to refine your question iteratively? I think regular search engine only allow for a few special keywords. Here you can use natural language to refine.

1 more reply

NegativeLatency3y ago

UI: Feels overpadded to me, I'd like to be able to see more stuff without scrolling so far

jerrysievert3y ago

a search for v8 gave me:

* juice

* v8 engine

* juice

* v8 engine

* juice

so definitely some non-programming searches showing up, unfortunately none of the documentation sources for v8.

joshstrange3y ago

Yep, I saw non-programming stuff but my worry is, if they are branding themselves as "better search engine for software developers", that programming-stuff will be weighted higher than non-programming stuff even if the search term has nothing to do with programming (or a tenuous link). Though your search examples seem to prove the exact opposite.

All that said the UI/UX is too frustrating to use (as-in) even if they don't promote programming content over non.

richardsocher3y ago

We learned many of these lessons at https://you.com/code:

* we also needed to build a strong "everything else" search engine and then

* have great results for coding with specific search apps like StackOverlfow, AI code complete, ++

* be very fast (we messed that up when we first launched)

* have great scores on Privacy Badger, be compatible with uBlock, etc.

Last week we've started opening up our platform to collaborate on results with outside developers and have gotten a lot of interest: https://about.you.com/developers/

Maybe we can collaborate also with you guys at sayhello. Ping me at hey@you.com if you want to compare notes.

gbro3n3y ago· 4 in thread

Thought I'd try this on a problem I've been researching today (which I resolved) where my service worker for offline PWA usage was working for everything except audio files.

I searched the following in say hello.so.

"Service worker fails on request for audio file"

I got back a couple of results related to general service worker use but none that get close to discussing the core problem that lead to the solution.

The same query in Google returns several results that together pointed me to the solution (it was around range headers in requests for media data types).

This is just one example though. I think the problem you are trying to fix is worth the effort. I just wonder if this is where humans are still stronger than computers - gathering unstructured data to use in problem solving.

wayyOP3y ago

The description of the steps you took is super helpful feedback - thanks! Hello performs best on "how-to" questions at the moment. We're still working to improve troubleshooting type queries.

CodeSgt3y ago

That'll be a difficult adaptation for potential users to make. I think most of us have been conditioned to phrase our queries a certain way to achieve the best results from Google.

Then again maybe that's just me.

1 more reply

harrisonjackson3y ago

Is the assumption you are making that most developers would go to search first? rather than when they hit a blocker or error?

1 more reply

gbro3n3y ago

No problem. Good luck with the project.

sailorganymede3y ago· 4 in thread

Personally I’ve never really had an issue with Google - I think my mental model with how it works is to the point it makes sense.

It would be amazing if this could be used for internal documentation however. Like we have so much documentation on our wiki which is just disorganised.

8n4vidtmkvmk3y ago

Stack overflow offers a version for companies. I've never used it, but it sounds like what you might want

asiachick3y ago

I can just imagine it will spawn the same "closed as off topic" and other similar responses for most questions :P

Also, stack overflow's search has always sucked. The way to find stuff on stack overflow has mostly been to use google.

1 more reply

rushingcreek3y ago

Indexing internal docs is one of our ideas for how to monetize. And it'd be great if you could tell me more about your mental model while using Google -- are there inconveniences that you brush aside?

teekaykay3y ago

Bing has a offering which works on searching through internal documentation as a part of the Office. Works well with Sharepoint and other traditional office products.

TekMol3y ago· 4 in thread

I see whole solutions copied from other websites displayed on your site.

Is that legal?

Isn't there copyright on those?

throwaway6753093y ago

Ding ding ding. This is the exact issue that a vocal minority are whingeing over github copilot for. It's automatically pasting results from websites without embedding the necessary attribution - so if you copy entire functions from this search engine (which may be coming from stack overflow for which attribution is required), then you're guilty of the same thing.

So I only see one of two outcomes:

1. Courts rule copilot is fair use in which case your search engine becomes largely superfluous

2. Courts rule copilot is infringement in which case all of these types of applications cannot be used commercially

danuker3y ago

There are two separate issues:

1. Copilot itself infringing licenses (MS copying and sharing copyrighted code)

2. Developer infringing licenses (Allowing code from MS into own codebase).

Case 2 is avoided by Hello, because it provides a link to the original, allowing the developer to find and respect the license. Therefore Hello is net superior (with respect to people using the service at least).

lancesells3y ago

I would hope it's not legal.

> Hello pulls in the raw text from all five search results linked on the search page to generate...

Not to be negative but I think I'll stick to the sites and people that made the results and not a middleman that intends to charge for other people's work.

GrinningFool3y ago

I think that framing is missing some nuance. Seems more like they would be charging for the process that goes into sifting through those results and pulling out other people's work on the user's behalf.

1 more reply

mudlarker3y ago· 4 in thread

Sorry to be blunt, this site is one of the worst ui/ux experience I've seen. For a website targeted towards programmers mainly, the lack of dark mode is a crime. And the first half of the page is just logo and the searchbar. Poorly named and designed logo and branding. 'sayhello' is a stupid name for a search engine catered towards programmers. Why make the product at all when clearly you havent done enough thinking on how users would approach and interact with your product.

mmazzarolo3y ago

I’m not a fan of many UX/UI choices here, but complaining about the “lack of dark mode” in a beta product feels a bit too much in my opinion.

boberoni3y ago

This feedback is a bit harsh and not at all constructive towards their actual value proposition: a technical search engine for devs.

Dark mode is not a core value proposition.

(my guess is that) The logo and search bar take up a lot of space because they are mimicking the design of the Google.com landing page.

It seems like the bulk of their work has been on the search itself, so I would forgive them on logo and branding. It’s an early product so logo and branding can change.

For now, they just need constructive feedback on workflow and usability.

heystoney3y ago

dark mode works just fine for me.

hans_castorp3y ago

> the lack of dark mode is a crime

Well, everybody is different. I just hate dark mode.

When I come to a website that defaults to dark mode and I can't see a way to change it, I leave immediately.

mrwnmonm3y ago· 3 in thread

I love the idea <3

It may be a weird suggestion, but if the query to general topics returns something like this https://unzip.dev/archive (check how compact it is and delivers almost all you need to know about the subject to get you going), it would be perfect.

rushingcreek3y ago

We'd love to talk to you some more and get your feedback. Our email is founders@sayhello.so :)

wayyOP3y ago

Thanks for the feedback - I agree that explanations aren't the most appropriate for every kind of search. We're definitely considering different forms of search inputs and outputs (text, code, lists, etc.)

mrwnmonm3y ago

And what about books, just links to the most important books on the subject, without getting too philosophical about how to determine the most important ones.

treis3y ago· 3 in thread

https://beta.sayhello.so/search?q=how+to+base64+encode+a+str...

Query: how to base64 encode a string in ruby

Response: I'm not sure what you mean by "base64 encode a string in ruby" - that's a bit of a misnomer. Base64 encoding is a way of storing data in a form that can be decoded by a human. It's not a secure way to store data, but it's useful if you want to send a message to someone who doesn't understand the language you're using.

The right answer is in the third link provided but it's not exactly correct.

Google gives back the Ruby Module Base64 docs as the first hit.

richardsocher3y ago

I'm not a Ruby expert but this looks right to me also: https://you.com/search?q=how+to+base64+encode+a+string+in+ru...

rushingcreek3y ago

Our index is based on Bing as of right now -- if they give us low-quality results, our generated answers will be low quality as well. We're definitely aware of this and are working on developing our own index to augment Bing's in cases like this.

masukomi3y ago

while i assume there are good business reasons you're basing your stuff on Bing, it's notable that as a general rule developers don't use bing. In my experience the google results are radically better.

1 more reply

ForrestN3y ago· 3 in thread

FYI: I clicked on get lucky, and went here: https://beta.sayhello.so/search?q=Check+if+string+is+a+palin... which for me in Safari is just an empty white page.

rushingcreek3y ago

Do you have Javascript enabled?

mdaniel3y ago

While I'm not directly affected by this, a blank white page is always a symptom of careless error handling, even in the case where the user has JS turned off. The <noscript> tag exists expressly to present information about your site's need to have JS enabled

ForrestN3y ago

Yes

1 more reply

ezekiel113y ago· 3 in thread

still not as good as stackoverflow

wayyOP3y ago

We’re still very early - of course it can't be as good. Could you tell me what you were trying to do and how it didn't work for you?

jamesmcintyre3y ago

Not sure what the original commenter was looking for but I can give my thoughts:

- stackoverflow's UI actually serves well to provide a sort of "ambient" information that rapidly indicates not just the best answers, but the best most-recent answers. Oftentimes, and especially in rapidly-evolving dev languages/frameworks, what was the best answer a few months ago may no longer be the best answer and the ability to rapidly scan the comments that would indicate this is valuable. - in addition those stackoverflow comments and links within them can point to additional info that can save the dev time (potentially pointing to the dev misidentifying the problem: "don't do this, this is the real issue <link>).

I think with the traditional google->stackoverflow or google->[some documentation site, forum, etc] user flow you actually get layers of ambient cues as to relevance, recency and quality that we've grown accustom to. Even if your product ultimately serves better answers I'd worry that lacking these cues would make a user like me feel as though I'm blindly trusting an answer that seems to have come from the ether (sort of like github copilot).

As low-hanging fruit maybe adding level-meters beside each result that indicates these dimensions could help (like npmjs.com does with npm pkg results in their ui).

I love the product idea and it looks like a strong start! Good luck!

1 more reply

ezekiel113y ago

it just won't be as good as the refinement in searches i can do with appending stackoverflow at the end of a google query and github copilot already does what you are trying to do

izolate3y ago· 2 in thread

Congrats on the launch! Looks promising, so I'll try it out for a couple of days.

One feature request at first glance: please default to the system font stack for code snippets. I see you're currently using Consolas, a Microsoft typeface, which is not pleasant to see as a mac user.

You can use this to default to the system font on every platform:

    font-family: "SF Mono", "Monaco", "Inconsolata", "Fira Mono", "Droid Sans Mono", "Source Code Pro", monospace;

FractalHQ3y ago

Why do you consider it unpleasant? I’m a mac user and I really like Consolas. I like to use it in VSCode or when building websites that display code blocks.

rushingcreek3y ago

Thanks for the feedback, we'll take a look at that :)

skilled3y ago· 2 in thread

Not to be too critical but the results I got so far have been subpar. Seeing a lot of hyperbole/clickbait articles.

Let's say I'm searching for front-end frameworks. Each article has the word "best" in the title, yet doesn't link to resources like State of JS, Stack Overflow Survey or other similar sites. So, in this context "best" is subjective. I can't be bothered with subjective results when I'm trying to find out what is actually considered "best" or in this case popular.

rushingcreek3y ago

Those articles are coming from Bing as of right now. Our offering is based on analyzing those articles and summarizing them/picking out the most relevant parts. We definitely plan to augment (and eventually replace Bing) with our own index.

closedloop1293y ago

Have you considered using blacklists? You could cooperate with Brave and their Goggles: https://news.ycombinator.com/item?id=31837986

1 more reply

lawl3y ago· 2 in thread

Maybe I'm misunderstanding the intended scope of this engine, or I just ran into a bad result page, but:

https://beta.sayhello.so/search?q=Java+aot+compile

Does not seem to mention graal anywhere. (It's just a random test query that popped into my mind)

Asking a full question for a code snippet seems to work: https://beta.sayhello.so/search?q=How+do+I+sort+a+map+in+Jav...

How do you deal with licensing for these snippets though. Is that up to the user to verify?

rushingcreek3y ago

Because we're currently built on top of Bing's index, we're somewhat dependent on the raw pages they provide. If those pages don't mention graal, neither will our AI. Building out our own index is something we're working on.

It is currently up to the user to verify licensing for the snippets, but we try to make it easy (using the See Reference button) to go to the original source.

danuker3y ago

Thank you! The "See Reference" makes it much easier to comply with licenses, than GitHub Copilot.

abalaji3y ago· 2 in thread

Interesting--seems you have to retrain your "google-fu"

"meta programming python" does not give as good results as

https://beta.sayhello.so/search?q=meta+programming+python

"how to implement a meta class in python"

https://beta.sayhello.so/search?q=how+to+implement+a+meta+cl...

rushingcreek3y ago

Yep, the AI definitely prefers fully formed sentences as it is right now. We know it's important not to force users to change how they word their queries, so making it less sensitive for phrasing is a priority for us.

Invictus03y ago

Searching "meta class python" gives better results, which seems reasonable to me.

laumars3y ago· 2 in thread

I'm seeing the same page as result 1, 2 and 3. Interestingly only 1 out of those 3 results were scraped from that page. Even more curiously only 1 out of those 3 results were even valid code.

https://beta.sayhello.so/search?q=hello+world+in+brainfuck

Nice idea for the project though. Good luck with it

rushingcreek3y ago

Our code extraction/ranking model hasn't been trained on that language yet, so it's definitely an out-of-domain example. We'll keep working on expanding our repertoire!

laumars3y ago

Ahh that’s fair enough. I think it’s fair to say “brainfuck” is outside most peoples domain. I was just curious how your search engine performed on less common search queries (the kind that are trying to debug a rarely hit problem with an otherwise popular framework or language and thus you often spend hours digging through irrelevant answers before you find that one blog post that solves it) but couldn’t think of a more realistic example off the top of my head.

hubraumhugo3y ago· 2 in thread

Glad to see better search tooling for programmers since it's an essential task we do every day. How do you compare yourself to you.com's specialized search engine for developers? https://you.com/code

rushingcreek3y ago

You.com is too similar to Google imo; we go significantly further than they do in terms of synthesizing explanations. Same goes for snippets; on You.com, you usually need to click on a button (.e.g "Open Side Panel") to get a code snippet which adds friction. Furthermore, they seem to be simply showing the full Stack Overflow page; our approach is to find and rank the most relevant code snippet while offering a "See Reference" button to make it easy to go to the original page.

chiken3y ago

you.com is a lot more informationally dense, feels quicker to find the right answer, and I rarely have to open a new tab because of the (open side panel) button. Beyond programming, it doesn't seem like I can use hello.so on a regular basis for normal searches compared to Google and you.com.

gitgud3y ago· 2 in thread

> We found that GPT-3, GPT-Neo/J/X, and similar autoregressive language models that predict text from left to right are prone to “hallucinating” and generating text inconsistent with the “ground truth” document.

The term hallucinating is brilliant for how these AI systems seem to generate output.

Your product is very interesting, seems to work nicely on easy queries "how do I sort an array of objects in JavaScript". But was quite confusing for complex queries.

The UI doesn't work too well on mobile, but it's a beta and software is written on the desktop.

I also think making this a specific search engine for companies internal messy data would be a very useful tool as well.

wayyOP3y ago

Thanks for trying it out! We've gotten some solid feedback about the UI and will be working to improve it. Could you tell us a bit more about what you were trying to do and how it was confusing for complex queries?

rushingcreek3y ago

Hallucination is, funny enough, the technical word for this phenomenon from NLP research :)

cpcat3y ago· 2 in thread

It says start typing to search. So i started typing and it didn't search. I really expected it to be some sort of typeahead search without requiring focus on the the search field :)

rushingcreek3y ago

Query autocomplete is on the roadmap :)

allanrbo3y ago

The odd thing to me was not missing autocomplete, it was that it says "start typing to search", but when you type, nothing happens. This is of course because the search field is not focused when you load the page, until you click it. It should maybe be "type here to search".

2 more replies

lysecret3y ago· 2 in thread

Hey so, I have been working on a specialized search engine (at least you can think of it in this way). And for me a lot of the gains came because we could structure and restrain the search space in a meaningful way, such that we could build better distance metrics than a pure text search could do.

I wonder what you think about that. Maybe one could submit a code snippet, or mark something as an error, or ask for a refactor of some code. But then again, this gets close to what copilot is doing.

wayyOP3y ago

We are absolutely experimenting with non-text inputs (and outputs as you can see). A big problem we see in mainstream search engines is that syntax is not parsed well. Understanding code as context for the query could be huge for developer search.

lysecret3y ago

Yea true, but its a fine line to something like copilot, since it also can be thought of as a search engine, however it has your whole code (and maybe your past behavior too) as its context, hard to beat.

Klonoar3y ago· 2 in thread

Doesn't appear to work in Safari at all (at least, for me here - some JS bundle error).

rushingcreek3y ago

What version are you using?

alextheparrot3y ago

Same issue, Version 15.1 (17612.2.9.1.20)

    TypeError: N.at is not a function. (In 'N.at(-1)', 'N.at' is undefined)

Bolkan3y ago· 2 in thread

I searched the walrus operator and all results were unrelated

rushingcreek3y ago

What was your query? I just searched (https://beta.sayhello.so/search?q=walrus+operator) and got "The walrus operator is a new syntax for assigning variables in the middle of expressions." with one of the code examples being "(walrus := True)"

Bolkan3y ago

mlejva3y ago· 1 in thread

Hey Michael and Justin, congrats to the launch!

My co-founder and I were building the same product as you are some time ago [1]. We managed to scale it to around 5k WAU before we decided to pivot for various reasons.

If you think there might be any useful information and experience we could share with you, please shoot me an email - vasek@usedevbook.com. I'd love to help in any way I can to help you guys succeed.

[1] https://www.producthunt.com/products/devbook

moneywoes3y ago

Do you mind sharing why you pivoted?

danenania3y ago· 1 in thread

Congrats on the launch! I love this idea. I've thought for a long time that something like it should exist. Google results are often lacking in this realm.

I've played around just a bit and clicked some of the preset examples and like what I'm seeing so far. I bookmarked it and will try it out more as I code over the next few days.

Main initial feedback: I'd really like to see version/last-updated-at info accompanying all results. One of the biggest problems with Google for code stuff is finding outdated examples and docs. Even better would be a dropdown that lets me see results depending on the version of the language/framework/tools I'm using.

wayyOP3y ago

Thanks for trying it out and good point - we'll look into adding version info

ianbutler3y ago· 1 in thread

Hey so I built a search engine doing largely the same thing (and also interviewed with YC during the W20 time) and ultimately we pivoted away due to lack of interest from the developer teams we were pitching, often the startups we were pitching didn't have enough accumulated internal knowledge for the paid plan to be useful. For the ones who did (at the like 200+ person 5+ yrs in business mark) we still weren't seeing the problem being painful enough where companies wanted to pay to solve it.

How do you see navigating this space when this can be considered a nice to have versus a strict need?

wayyOP3y ago

Right now we're primarily focused on building a search tool that developers love. Would love to chat more about your experience - shoot us an email at founders@sayhello.so

graypegg3y ago· 1 in thread

This seems oriented towards answering a question like “how do I…” where you expect an explanation, but I’d say most of the web searching I do is pretty specific.

Maybe half the time I know what I want (eg: the order of values in the animation CSS property), and from who (eg: MDN), so I just go to the relevant docs page via google with something like “MDN animation css”.

The other 50% of the time, I’m searching an exact error string, probably in quotes, on google. For that I also don’t really want a knowledge graph answer, I’d much rather see a GitHub issue or stack overflow post and I’ll derive the context I need.

wayyOP3y ago

We empirically find that most developer searches fall into a few categories consistent with this paper [0]. The author explains these categories nicely in her blog post [1]. It sounds like you're describing 4/6 of these categories: ad-hoc how-tos, understanding an API / recalling forgotten details, and troubleshooting.

We currently do well at answering "ad hoc how-to" questions and are working to improve our answers in the other categories. This could either look like augmenting our existing natural language answer, or building a separate view specializing in official documentation or errors.

[0] https://elisehe.in/i/googledrivendevelopment.pdf [1] https://bootcamp.uxdesign.cc/the-hidden-insights-in-develope...

bearjaws3y ago· 1 in thread

Really curious how this worked, the query 'FHIR appointment spec' produces the following. It actually did get the right result as the first result.

"fhir appointment spec

I'm not sure what you're asking about, but I'll try to answer it as best I can. __ Appointment is a FHIR data type. It's a way to describe a time slot for a patient to be seen by a healthcare provider. Appointments can be booked, cancelled, rescheduled, or canceled and rebooked. It can also be used to describe the location of the appointment. "

Pretty impressive summary given that it doesn't exist in any one specific page.

rushingcreek3y ago

Thanks :) We've gone all-in on using our AI to answer questions based on information from multiple sources.

Minor49er3y ago· 1 in thread

Pretty decent overall. Though some code that comes back is a little dense and hard to read. This appears to be caused by how the scraper/parser handles whitespace. For example, the query "make a curl request in php" returns an example from the comments section in the PHP docs where the entirety of the comment has been returned as code, but the <br> tags have not been converted to newlines.

It would be a good idea to preserve whitespace, or arguably better, integrate optional syntax formatting

Overall, this search engine looks promising

rushingcreek3y ago

Thanks :) we'll take a look at better syntax formatting for the code snippets

servercobra3y ago· 1 in thread

This suffers from one of the things I've been hating about Google lately: it doesn't use exactly what I typed in. Case in point: "react-native-navigation" is an entirely different package than "react-navigation". I query about RNN and get results about RN. I get this is due to Bing, but it could be a fundamental flaw with the approach (for me at least)

The animations and page jumpiness are a bit off-putting and slow, but it is a beta!

rushingcreek3y ago

Yep, we feel that frustration -- it's our intention to use exactly what you typed in, but here Bing is messing up. Using results from our own index should make this less of an issue in the future.

llaolleh3y ago· 1 in thread

Can you give an example of using the context of a previous query?

I applaud you for trying to make a new search engine - it's not something sane people would try to do because of a certain behemoth eating everyone's lunch. It's going to take extraordinary insight and out of the box thinking to get something really good.

wayyOP3y ago

Our initial prototype focused on conversational search, with the ability to ask follow-up questions.

Here's a rather trivial example: Q: "Who founded Y Combinator?" A: "Paul Graham founded Y Combinator with Jessica Livingston and Trevor Blackwell."

If you scroll to the bottom of the answer page to ask a follow-up question: Q: "How old is he?" A: "Paul Graham is 57 years old. He founded Y Combinator in 2005."

applgo4433y ago· 1 in thread

I work on building language models at FAANG for my day-day job and I'm very curious about this - how does finetuning on T5 remove hallucination? Can you elaborate on this?

This is an amazing product, btw. Let me know if you're looking for people to hire :)

rushingcreek3y ago

We don't use T5 exactly, we use a derivative that has a similar (but not identical) architecture and is also pre-trained differently. That model, combined with our factual generation dataset and clever prompt engineering, seems to be the secret sauce for reducing hallucination.

And thank you :) it's comments like this that really fire us up

mikkergp3y ago· 1 in thread

One of your examples at the bottom of the page, is the "IsPalindrome". I guess the assumption is I would click on the "see Reference" as it doesn't provide any context for the code. This context of a real person explaining it is one of the benefits of sites like Stack Overflow, so I would think about this element of UX.

Also, I noticed in your palindrome reference example, it didn't choose the accepted answer from Stack Overflow. How did it choose the example? Also, the 2nd 2 reference panes, I can't tell what value they are adding. They seem like a list of random outputs of the ispalindrome script.

rushingcreek3y ago

Our code ranking algorithm uses an NLP model (trained on a code dataset) to pick the most relevant snippets. You're right that the accepted answer on Stack Overflow is a good heuristic, and it's something we'll add to our ranking algorithm in the near future.

Showing an answer written by a human as a part of the code snippet is also a good idea.

selectnull3y ago· 1 in thread

Using latest Chrome gives "Uncaught (in promise) TypeError: Failed to fetch" console error.

The same is with any input, even with predefined ones; the progress bar gets to the end (slowly) and nothing happens.

Firefox works.

wayyOP3y ago

Thanks for letting us know. We're under load too so I'm not surprised there are some hiccups. I just tested with Chrome Version 103.0.5060.66 and the fetch works - will have to investigate the latest Chrome version later.

jspdown3y ago· 1 in thread

I searched for "Position based dynamics" and I was pleasantly surprised by the results. The output if full of blog posts that I never encountered while searching on Google. It might be related to the fact that it rely on Bing which, I have to admit, I never used.

When I compared the output between Hello and Bing, the filtering worked pretty well. It removed most of the StackOverflow results, which are 99% of the time not insightful.

Great job on this search engine and congrats on the release.

forgotpwd163y ago

>The output if full of blog posts that I never encountered while searching on Google

That's strange. I searched the same term (as well as let archive.is do it[1][2]) and results are similar. A workshop paper available on GitHub, a Springer Link reference entry, and a GitHub project. Indeed Hello has a single blog post whereas Google has lecture notes but there isn't a difference in content since the blog post is written in same style (essentially being a tl;dr version of the paper that introduces the methods).

[1]: https://archive.ph/6SkWZ (Hello) [2]: https://archive.ph/90jY4 (Google)

anshumankmr3y ago· 1 in thread

Possible edge cases that aren't handled yet. https://imgur.com/a/mieqZIU

Though I know the first search term was unrelated but I tried it for some time as a regular search engine, tried a bunch of random keywords apart from what it was meant to do as well as some that can be legitimate questions as well.

I am liking the product quite a bit however. Good stuff.

wayyOP3y ago

Thanks for pointing this out - determining when code snippets are relevant is something we're actively working on.

8organicbits3y ago· 1 in thread

On mobile browser I noticed that autocorrect was enabled, which isn't great for technical inputs. I want "hcl", not "hello".

wayyOP3y ago

Good catch, we'll disable it

radiojasper3y ago· 1 in thread

I searched for 'iife javascript' and although it understood what I was searching for, it's 3 code examples only showed an iife on the 3rd code example. the 2nd one didn't make sense at all [0]

[0] https://beta.sayhello.so/search?q=iife+javascript

wayyOP3y ago

Thanks for trying it out - we're still pretty early so the code search feature won't be perfect. The model works best with natural language queries, however, so you might find better results by giving the search engine more to work with.

https://beta.sayhello.so/search?q=Immediately+Invoked+Functi...

HasanYousef3y ago· 1 in thread

UI/UX first impressions feedback: Unnecessary animations are annoying especially, you can rid of all the animations with no any exceptions (think of all the applications that we use everyday and designed by biggest companies, they all are free of animations). The same thing applied to shadows, someines shadows add extra complexity to the UI.

wayyOP3y ago

Thanks for the feedback. I agree that the animations can get old if you're using the search every day. We'll look into simplifying.

hill6133y ago· 1 in thread

Good idea. Although it seems it's language querying is pretty poor.

If I specifically list Python/Javascript the first couple results are not even in that language, 3rd/4th are. And you have to click link/see reference to even see the language

You would think if your language is included in the query it should be heavily prioritised

rushingcreek3y ago

Thanks for the feedback. Could you post what queries you tried?

aquajet3y ago· 1 in thread

> Training a sequence-to-sequence language model (T5 derivative) on our custom dataset designed for factual generation yielded much better results with less hallucination.

Could you elaborate more on this or point to a paper/benchmark results?

rushingcreek3y ago

I'd be happy to talk a bit about how we evaluated the model. The task we're performing is fundamentally long-form question answering (LFQA), and recent papers (https://arxiv.org/pdf/2103.06332.pdf) have shown that metrics such as ROUGE (used for the KITE benchmark) aren't great at evaluating the quality & truthfulness of generated answers. On our dataset, our approach is to use a combination of human evaluation (which is still arguably the most reliable metric used by the NLP research community to evaluate generated answer quality) and an entailment score (checking if a generated answer is consistent with a "ground truth" context document).

silentsea903y ago· 1 in thread

Would this be useful as a vscode plugin perhaps? I would much rather search this on an IDE imo but I am just one random dude.

wayyOP3y ago

We've actually had a bit of feedback related to IDE integration. We're focusing on developing the search technology itself right now but vscode plugin could be interesting to look at in the future.

vyrotek3y ago· 1 in thread

Very interesting.

Some of my results with code examples looked awfully similar to GitHub CoPilot output.

Is that being used to generate results sometimes?

rushingcreek3y ago

Thanks :)

We actually do have a code generation model similar to CoPilot but it's not active yet on the backend. All of the code snippets you see are pulled from other websites.

Fede_V3y ago· 1 in thread

You.com has a similar code search specific product. What do you plan to offer that they don't?

rushingcreek3y ago

There are a few key differences. The main one is that our AI generates an explanation to answer your question directly; we go significantly further than they do in terms of synthesizing explanations. Same goes for snippets; on You.com, you usually need to click on a button (.e.g "Open Side Panel") to get a code snippet which adds friction. Furthermore, they seem to be simply showing the full Stack Overflow page; our approach is to find and rank the most relevant code snippet while offering a "See Reference" button to make it easy to go to the original page.

Overall, our goal is to have the highest signal-to-noise ratio of any search engine when it comes to developer searches.

samelawrence3y ago· 1 in thread

The answer never loads if using Brave with Shields up, even after allowing JS.

wayyOP3y ago

Just installed Brave and it seems to work even with Shields up. The site is under load so maybe try again. Are any of the other things (answer/explanation, code snippet, links) loading for you?

MauranKilom3y ago· 1 in thread

Here's a data point from me:

Once in a blue moon I need to remember what the syntax for C++ explicit template instantiation is. All I need is a short snippet showing me an example of the syntax, but usually this means asking google and then trawling through several tangential SO questions ("why would one use this feature?") or scrolling through cppreference until I am reasonably confident I am looking at a valid example. This, to me, sounds exactly like the use case you are targeting.

Here is the kind of output that would have been meaningful/useful to me (although I was only looking for the second part, since I already knew the theory just not the syntax):

    An explicit template instantiation definition (usually placed in a source file) makes the compiler instantiate the template for the given arguments. An explicit template instantiation declaration (usually placed in the header) tells the compiler that the template will already be instantiated elsewhere, so implicit instantiation can be skipped.

    template <class T>
    class Foo {};

    // Explicit template instantiation declaration:
    extern template class Foo<int>;

    // Explicit template instantiation definition:
    template class Foo<int>;

I tried six different queries, of the form "C++ explicit template instantiation [declaration|definition] [syntax]".

In all cases, the synthesized explanation was either gibberish or flat out wrong. For example:

> Explicit template instantiation is a feature of C++11 that allows you to declare a template as a class, rather than a function. This means that if you want to use the template in a program, you don't have to declare it in the program itself, but rather in the template file. [Entirely nonsensical]

> Explicit template instantiation definitions can be put into header files, but they can't be put in source files. [Aside from not actually explaining the feature, this is more or less exactly backwards!]

The best it managed to output was a mediocre explanation of what a template is. I mean, it is about what I would expect a language model to interpolate from e.g. the StackOverflow corpus of questions tagged "C++" and "template", but it is a very far cry from being useful.

The quality of the code snippets was better, but still not at a usable level. Among outputs like `mytemplate.cpp`, `extern template` and compiler error messages, some snippets did correctly employ the syntax. It's clear that the model is selecting query-related code from query-related questions/tutorials, but it's still very hit and miss and not very focused. In my case, it certainly didn't "understand" the declaration/definition distinction, and even for most "good" snippets you first had to picture a bunch of surrounding code to make sense of it.

I'd say from a technical level the code snippet output is certainly impressive. But at this point I would have no reason to use your search engine over others for this narrow task of recalling the correct syntax for a language feature (or at least this one in particular - maybe it is too deep), because it involves just as much comparing snippets from various sources as opening the top three google results would - except without having any of the context. And if I didn't already know the topic quite well, none of the outputs (or even all 18 of them together) would have made me confident enough to say "ok, got it, this is the syntax I need to use".

wayyOP3y ago

Thank you so much for writing this up - this is extremely useful feedback. What you've described here is a use-case we're aware of called "recalling forgotten details," where "the developer knows what she’s doing, but doesn’t remember the specific syntax or naming. With these searches, a quick code example is what people were often after." [0][1]

Hello primarily does well for "how-to" questions at the moment; it's still early and we're working to improve results for the queries you've described.

[0] https://static.googleusercontent.com/media/research.google.c... [1] https://bootcamp.uxdesign.cc/the-hidden-insights-in-develope...

_ank_it3y ago· 1 in thread

No dark mode?

wayyOP3y ago

In the works :)

ziddoap3y ago· 1 in thread

Maybe someone from the team can help me with a question regarding the privacy page/policy.

>The searches we anonymously log be used to improve our product.

>Your data will not be shared with any third party unless we are required to respond to subpoenas, court orders, or legal process, to establish or exercise our legal rights or defend against legal claims.

>We will never sell your data to any third party.

The first sentence is at odds with the second two.

If you say you are only collecting query and query response data, but then assure me you aren't selling my data, I can't help but wonder which is true:

1) The site is actually collecting data, but is not selling it. 2) The site is not collecting data, and the privacy page is outdated/incorrect/inaccurate.

The same question obviously applies to court orders and the like. Which is it? Do you store data that would be material to me if the site was presented by court order, and that is why you have given me a disclaimer? Or do you not store data, and so the disclaimer is meaningless?

wayyOP3y ago

We only collect anonymous queries + explicit feedback provided. No data is material to you. The privacy page was made pretty early on - we'll update it to be more precise. Thanks for pointing this out.

muds3y ago

Awesome work guys! A couple of knee jerk reactions while playing around with this:

1. In my work (also at UT actually: Hook 'em), we've found that the hallucination problem is, in part, lessened by over-parametrizing the model. Places that have the budget to do this have noticed that the performance of ml4code transformers increases linearly for every 1e3 increase in the number of parameters (with no drop off in sight). Love to hear your thoughts on this.

2. I'm concerned that finding code snippets from a short form query is underspecifing the problem too much and may not be the best user-interaction model. Let's compare your system to something like Github Copilot. I pass a query:

> how to normalize the rows of a tensor pytorch

With GitHub Copilot, I can demonstrate intent in the development environment itself with an IO example / comment / both and interact more efficiently. If I see errors in the synthesized snippet, I can change the query in >1 second etc. Etc. This is hard with a search engine style interactive environment. For this query, I had to navigate to the website, type in the query, check the results (which were wrong for me btw. Y'all might need to check correctness of the snippets), copy back the result, maybe go to the relevant thread and parse more closely etc. A good question to keep in mind here would be to figure out how to make this process more interactive.

3. Finally, I just want to say that the website is phenomenal, even on mobile. Kudos on the frontend/backend/architecture side of things.

Also, don't let my or anyone else's comments take away from the awesome work y'all have done!!! I pulled out that example from a paper I read recently called TF-coder. They have a dataset of these examples as part of their supplement material. All the best!

skrtskrt3y ago

> we’re planning to charge teams per user/month to use on internal data scattered around in wikis, documentation, slack, and emails.

Do you have any idea of how you're going to go the "enterprise integration" route without hiring an army of implementation consultants?

best of luck, I'm sure there are many teams on Confluence that wish they had a functioning search without moving everything off Confluence at once

michannne3y ago

>Load image in black and white ttf c++

>I'm not sure what you mean by "load image in black and white ttf c++", but I can give you an example of how to do it. First, you'll need to convert the image to grayscale. You can do this by using the cvtColor function. Then, you can apply a binary threshold to the grayscale image. For example, if you have a color image with a value of 255 and a black value of 0, then you can do something like this: __

First impressions - really impressive! I did have to add "c++" to get any meaningful results though.

yrgulation3y ago

Specialised search engines are the future in my view. Google is an ocean of data great for generalised search. But as soon as you want to narrow down to a specific domain it’s mostly noise - in my personal experience.

quickthrower23y ago

I tried it for a typical query for firebase. The issue I have is Google would get me to the firebase docs just fine (well I’d hope so!) and once there, I want to be in those docs to get the code and the context of that code. Having bits of code on the search results page isn’t that useful.

What would be useful is if you could present information from different docs sites in the same format and combine them.

Maybe that is a different startup idea completely but it would be cool to have MDN, React and Firebase docs in one place, brutalist style where I can quickly get to what I want.

kyelewis3y ago

So, i was confused about this- the snippet that comes back, is it direct from another site or is it generated from the results?

I searched for "how to set up preact with vite", and I got a passage that sounded incredibly condescending, and half-way through it ignored the p in preact and started talking about react instead- but I was impressed that I couldn't work out if it was directly lifted from a stackoverflow reply, or it was generated in the style of one.

niemal_dev3y ago

"how to center a div" does not yield any results. :-(

aneeqdhk3y ago

I ran a search for 'How to scrape Linkedin Profiles'. At a quick glance 1 out of the top 4 results actually contained some instructions. The rest was more SEO-bait. Just thought I'd share with you - not sure if this is the exact use case.

Congrats on the launch! I can vouch that this is indeed a problem devs face, especially new devs. The number of times I've had to steer away my students at Edyst away from SEO optimized articles!

rcshubhadeep3y ago

A small observation: I searched "How to implement trie" it showed me some code examples of specific things that can be done with Trie (such as search) but all of them were in Python language.Is that because your training set? Also when I slightly change the question "How to implement btree" I do not get really any code result. Simple observations. Maybe helpful for you.

Congrats on the Launch!

olegkaplya3y ago

Hi, congrats guys, I wish you good luck. I've tried to see how it works but got blank page instead of search results.

orzi3y ago

Search does not work with Firefox (blank page), tried 63.0.3, 78.11.0esr, 89.0. Does work with 91.11.0esr

uhtred3y ago

Not returning anything for me. Just displays progress bar.

Edit: Privacy badger blocks api.bing.microsoft.com which breaks the search

Is this just forwarding the query to bing? How is it different to duck duck go?

celdon253y ago

Searching for "eson ruby" gives me ESPN Rugby, just like other search engines.

IMO we need a code search engine that knows when to use or not use a lexeme index for each word or phrase.

HPGBeans3y ago

I'm not a fan of the UI. The concept is great, but imo this doesn't differentiate itself enough for me to stop using Google or even searching via a site like Stack Overflow

beyang3y ago

drvsh3y ago

https://imgur.com/a/CF711HR

This is something that will make me go back to Google.

mageofpanthera3y ago

[Y-Combinator] is pushing up daisies now. Too damn slimey for the future. I get that you love your Simoleons, slavs.

candiddevmike3y ago

How do you think your product will fare in the wake of the backlash and legal saber rattling against GitHub Copilot?

barbarbar3y ago

I did a search.

Clicked on the result.

Then a blank empty page was shown.

winddude3y ago

cool! some of the code samples shown aren't always the most relevant, but looks promising.

wharfjumper3y ago

How do you plan to index internal documentation given that you currently use Bing?

arkanane3y ago

I press search, get a 10sec progress bar loading and nothing happens.

mmmuhd3y ago

Nice project. FYI Search Returns empty page on UC browser mobile.

kposehn3y ago

I quite like this. Looking forward to continuing to use it.

langitbiru3y ago

Congrats for the launching.

Anyway, just want to comment the product niche. Vertical search engine? Interesting. Will we see another vertical for a search engine product?

fengyiqicoder3y ago

Interesting Product, I like it

golergka3y ago

> center a div

#myDiv{ margin:0px auto; }

snowstormsun3y ago

Nice!

j / k navigate · click thread line to collapse

199 comments

167 comments · 73 top-level

joshstrange3y ago· 19 in thread

First Impressions:

[0] https://cs.joshstrange.com/V5uiyM

[1] https://cs.joshstrange.com/GkPuap

EDIT2: I posted a follow up comment about what, specifically, I think should be changed: https://news.ycombinator.com/item?id=32005841

_tom_3y ago

I'm more than willing to open another tab to not have a search result page full of YouTube videos.

joshstrange3y ago

1 more reply

visarga3y ago

I think you missed the forest for the trees... it is a Q&A system not just a search engine. You can talk to it, you can refine your queries.

joshstrange3y ago

> I think you missed the forest for the trees... it is a Q&A system not just a search engine. You can talk to it, you can refine your queries.

From the creators:

> We're building a better search engine for software developers.

> Search was the first step in finding information, Q&A is the next logical step.

And we are clearly not there. Not only does this not allow you to ask follow-ups to refine but it doesn't give good results in my testing.

1 more reply

8organicbits3y ago

> * I won't use a different search engine for programmers stuff vs everything else.

I'd use this as a ddg bang[1]. I don't use them often, as ddg is a great search engine, but some search engines handle certain queries better and ddg lets you route queries efficiently.

https://duckduckgo.com/bang

rushingcreek3y ago

joshstrange3y ago

My suggestions:

* Kill the padding/margins, it's pretty for demos or certain cases but I want to be able to see more information, heavy padding/margins have no place in search results.

Here is your default result for "this is a test" search query: https://cs.joshstrange.com/oKbz6G

Here it is with a bunch of padding/margins removed: https://cs.joshstrange.com/VEVXGh

1 more reply

detaro3y ago

wayyOP3y ago

joshstrange3y ago

visarga3y ago

1 more reply

unsafecast3y ago

A placeholder empty box that gets populated when the content arrives would be an improvement.

hbn3y ago

Also the scraped snippet appears at the top of the results a couple seconds after the results load and it causes all the results below it to suddenly jerk lower on the page

discreteevent3y ago

visarga3y ago

1 more reply

NegativeLatency3y ago

UI: Feels overpadded to me, I'd like to be able to see more stuff without scrolling so far

jerrysievert3y ago

a search for v8 gave me:

* juice

* v8 engine

* juice

* v8 engine

* juice

so definitely some non-programming searches showing up, unfortunately none of the documentation sources for v8.

joshstrange3y ago

All that said the UI/UX is too frustrating to use (as-in) even if they don't promote programming content over non.

richardsocher3y ago

We learned many of these lessons at https://you.com/code:

* we also needed to build a strong "everything else" search engine and then

* have great results for coding with specific search apps like StackOverlfow, AI code complete, ++

* be very fast (we messed that up when we first launched)

* have great scores on Privacy Badger, be compatible with uBlock, etc.

Last week we've started opening up our platform to collaborate on results with outside developers and have gotten a lot of interest: https://about.you.com/developers/

Maybe we can collaborate also with you guys at sayhello. Ping me at hey@you.com if you want to compare notes.

gbro3n3y ago· 4 in thread

Thought I'd try this on a problem I've been researching today (which I resolved) where my service worker for offline PWA usage was working for everything except audio files.

I searched the following in say hello.so.

"Service worker fails on request for audio file"

I got back a couple of results related to general service worker use but none that get close to discussing the core problem that lead to the solution.

The same query in Google returns several results that together pointed me to the solution (it was around range headers in requests for media data types).

wayyOP3y ago

The description of the steps you took is super helpful feedback - thanks! Hello performs best on "how-to" questions at the moment. We're still working to improve troubleshooting type queries.

CodeSgt3y ago

That'll be a difficult adaptation for potential users to make. I think most of us have been conditioned to phrase our queries a certain way to achieve the best results from Google.

Then again maybe that's just me.

1 more reply

harrisonjackson3y ago

Is the assumption you are making that most developers would go to search first? rather than when they hit a blocker or error?

1 more reply

gbro3n3y ago

No problem. Good luck with the project.

sailorganymede3y ago· 4 in thread

Personally I’ve never really had an issue with Google - I think my mental model with how it works is to the point it makes sense.

It would be amazing if this could be used for internal documentation however. Like we have so much documentation on our wiki which is just disorganised.

8n4vidtmkvmk3y ago

Stack overflow offers a version for companies. I've never used it, but it sounds like what you might want

asiachick3y ago

I can just imagine it will spawn the same "closed as off topic" and other similar responses for most questions :P

Also, stack overflow's search has always sucked. The way to find stuff on stack overflow has mostly been to use google.

1 more reply

rushingcreek3y ago

teekaykay3y ago

Bing has a offering which works on searching through internal documentation as a part of the Office. Works well with Sharepoint and other traditional office products.

TekMol3y ago· 4 in thread

I see whole solutions copied from other websites displayed on your site.

Is that legal?

Isn't there copyright on those?

throwaway6753093y ago

So I only see one of two outcomes:

1. Courts rule copilot is fair use in which case your search engine becomes largely superfluous

2. Courts rule copilot is infringement in which case all of these types of applications cannot be used commercially

danuker3y ago

There are two separate issues:

1. Copilot itself infringing licenses (MS copying and sharing copyrighted code)

2. Developer infringing licenses (Allowing code from MS into own codebase).

lancesells3y ago

I would hope it's not legal.

> Hello pulls in the raw text from all five search results linked on the search page to generate...

Not to be negative but I think I'll stick to the sites and people that made the results and not a middleman that intends to charge for other people's work.

GrinningFool3y ago

1 more reply

mudlarker3y ago· 4 in thread

mmazzarolo3y ago

I’m not a fan of many UX/UI choices here, but complaining about the “lack of dark mode” in a beta product feels a bit too much in my opinion.

boberoni3y ago

This feedback is a bit harsh and not at all constructive towards their actual value proposition: a technical search engine for devs.

Dark mode is not a core value proposition.

(my guess is that) The logo and search bar take up a lot of space because they are mimicking the design of the Google.com landing page.

It seems like the bulk of their work has been on the search itself, so I would forgive them on logo and branding. It’s an early product so logo and branding can change.

For now, they just need constructive feedback on workflow and usability.

heystoney3y ago

dark mode works just fine for me.

hans_castorp3y ago

> the lack of dark mode is a crime

Well, everybody is different. I just hate dark mode.

When I come to a website that defaults to dark mode and I can't see a way to change it, I leave immediately.

mrwnmonm3y ago· 3 in thread

I love the idea <3

rushingcreek3y ago

We'd love to talk to you some more and get your feedback. Our email is founders@sayhello.so :)

wayyOP3y ago

mrwnmonm3y ago

And what about books, just links to the most important books on the subject, without getting too philosophical about how to determine the most important ones.

treis3y ago· 3 in thread

https://beta.sayhello.so/search?q=how+to+base64+encode+a+str...

Query: how to base64 encode a string in ruby

The right answer is in the third link provided but it's not exactly correct.

Google gives back the Ruby Module Base64 docs as the first hit.

richardsocher3y ago

I'm not a Ruby expert but this looks right to me also: https://you.com/search?q=how+to+base64+encode+a+string+in+ru...

rushingcreek3y ago

masukomi3y ago

1 more reply

ForrestN3y ago· 3 in thread

FYI: I clicked on get lucky, and went here: https://beta.sayhello.so/search?q=Check+if+string+is+a+palin... which for me in Safari is just an empty white page.

rushingcreek3y ago

Do you have Javascript enabled?

mdaniel3y ago

ForrestN3y ago

Yes

1 more reply

ezekiel113y ago· 3 in thread

still not as good as stackoverflow

wayyOP3y ago

We’re still very early - of course it can't be as good. Could you tell me what you were trying to do and how it didn't work for you?

jamesmcintyre3y ago

Not sure what the original commenter was looking for but I can give my thoughts:

As low-hanging fruit maybe adding level-meters beside each result that indicates these dimensions could help (like npmjs.com does with npm pkg results in their ui).

I love the product idea and it looks like a strong start! Good luck!

1 more reply

ezekiel113y ago

it just won't be as good as the refinement in searches i can do with appending stackoverflow at the end of a google query and github copilot already does what you are trying to do

izolate3y ago· 2 in thread

Congrats on the launch! Looks promising, so I'll try it out for a couple of days.

You can use this to default to the system font on every platform:

    font-family: "SF Mono", "Monaco", "Inconsolata", "Fira Mono", "Droid Sans Mono", "Source Code Pro", monospace;

FractalHQ3y ago

Why do you consider it unpleasant? I’m a mac user and I really like Consolas. I like to use it in VSCode or when building websites that display code blocks.

rushingcreek3y ago

Thanks for the feedback, we'll take a look at that :)

skilled3y ago· 2 in thread

Not to be too critical but the results I got so far have been subpar. Seeing a lot of hyperbole/clickbait articles.

rushingcreek3y ago

closedloop1293y ago

Have you considered using blacklists? You could cooperate with Brave and their Goggles: https://news.ycombinator.com/item?id=31837986

1 more reply

lawl3y ago· 2 in thread

Maybe I'm misunderstanding the intended scope of this engine, or I just ran into a bad result page, but:

https://beta.sayhello.so/search?q=Java+aot+compile

Does not seem to mention graal anywhere. (It's just a random test query that popped into my mind)

Asking a full question for a code snippet seems to work: https://beta.sayhello.so/search?q=How+do+I+sort+a+map+in+Jav...

How do you deal with licensing for these snippets though. Is that up to the user to verify?

rushingcreek3y ago

It is currently up to the user to verify licensing for the snippets, but we try to make it easy (using the See Reference button) to go to the original source.

danuker3y ago

Thank you! The "See Reference" makes it much easier to comply with licenses, than GitHub Copilot.

abalaji3y ago· 2 in thread

Interesting--seems you have to retrain your "google-fu"

"meta programming python" does not give as good results as

https://beta.sayhello.so/search?q=meta+programming+python

"how to implement a meta class in python"

https://beta.sayhello.so/search?q=how+to+implement+a+meta+cl...

rushingcreek3y ago

Invictus03y ago

Searching "meta class python" gives better results, which seems reasonable to me.

laumars3y ago· 2 in thread

I'm seeing the same page as result 1, 2 and 3. Interestingly only 1 out of those 3 results were scraped from that page. Even more curiously only 1 out of those 3 results were even valid code.

https://beta.sayhello.so/search?q=hello+world+in+brainfuck

Nice idea for the project though. Good luck with it

rushingcreek3y ago

Our code extraction/ranking model hasn't been trained on that language yet, so it's definitely an out-of-domain example. We'll keep working on expanding our repertoire!

laumars3y ago

hubraumhugo3y ago· 2 in thread

Glad to see better search tooling for programmers since it's an essential task we do every day. How do you compare yourself to you.com's specialized search engine for developers? https://you.com/code

rushingcreek3y ago

chiken3y ago

gitgud3y ago· 2 in thread

The term hallucinating is brilliant for how these AI systems seem to generate output.

Your product is very interesting, seems to work nicely on easy queries "how do I sort an array of objects in JavaScript". But was quite confusing for complex queries.

The UI doesn't work too well on mobile, but it's a beta and software is written on the desktop.

I also think making this a specific search engine for companies internal messy data would be a very useful tool as well.

wayyOP3y ago

rushingcreek3y ago

Hallucination is, funny enough, the technical word for this phenomenon from NLP research :)

cpcat3y ago· 2 in thread

It says start typing to search. So i started typing and it didn't search. I really expected it to be some sort of typeahead search without requiring focus on the the search field :)

rushingcreek3y ago

Query autocomplete is on the roadmap :)

allanrbo3y ago

2 more replies

lysecret3y ago· 2 in thread

I wonder what you think about that. Maybe one could submit a code snippet, or mark something as an error, or ask for a refactor of some code. But then again, this gets close to what copilot is doing.

wayyOP3y ago

lysecret3y ago

Klonoar3y ago· 2 in thread

Doesn't appear to work in Safari at all (at least, for me here - some JS bundle error).

rushingcreek3y ago

What version are you using?

alextheparrot3y ago

Same issue, Version 15.1 (17612.2.9.1.20)

    TypeError: N.at is not a function. (In 'N.at(-1)', 'N.at' is undefined)

Bolkan3y ago· 2 in thread

I searched the walrus operator and all results were unrelated

rushingcreek3y ago

Bolkan3y ago

mlejva3y ago· 1 in thread

Hey Michael and Justin, congrats to the launch!

My co-founder and I were building the same product as you are some time ago [1]. We managed to scale it to around 5k WAU before we decided to pivot for various reasons.

If you think there might be any useful information and experience we could share with you, please shoot me an email - vasek@usedevbook.com. I'd love to help in any way I can to help you guys succeed.

[1] https://www.producthunt.com/products/devbook

moneywoes3y ago

Do you mind sharing why you pivoted?

danenania3y ago· 1 in thread

Congrats on the launch! I love this idea. I've thought for a long time that something like it should exist. Google results are often lacking in this realm.

I've played around just a bit and clicked some of the preset examples and like what I'm seeing so far. I bookmarked it and will try it out more as I code over the next few days.

wayyOP3y ago

Thanks for trying it out and good point - we'll look into adding version info

ianbutler3y ago· 1 in thread

How do you see navigating this space when this can be considered a nice to have versus a strict need?

wayyOP3y ago

Right now we're primarily focused on building a search tool that developers love. Would love to chat more about your experience - shoot us an email at founders@sayhello.so

graypegg3y ago· 1 in thread

This seems oriented towards answering a question like “how do I…” where you expect an explanation, but I’d say most of the web searching I do is pretty specific.

wayyOP3y ago

[0] https://elisehe.in/i/googledrivendevelopment.pdf [1] https://bootcamp.uxdesign.cc/the-hidden-insights-in-develope...

bearjaws3y ago· 1 in thread

Really curious how this worked, the query 'FHIR appointment spec' produces the following. It actually did get the right result as the first result.

"fhir appointment spec

Pretty impressive summary given that it doesn't exist in any one specific page.

rushingcreek3y ago

Thanks :) We've gone all-in on using our AI to answer questions based on information from multiple sources.

Minor49er3y ago· 1 in thread

It would be a good idea to preserve whitespace, or arguably better, integrate optional syntax formatting

Overall, this search engine looks promising

rushingcreek3y ago

Thanks :) we'll take a look at better syntax formatting for the code snippets

servercobra3y ago· 1 in thread

The animations and page jumpiness are a bit off-putting and slow, but it is a beta!

rushingcreek3y ago

Yep, we feel that frustration -- it's our intention to use exactly what you typed in, but here Bing is messing up. Using results from our own index should make this less of an issue in the future.

llaolleh3y ago· 1 in thread

Can you give an example of using the context of a previous query?

wayyOP3y ago

Our initial prototype focused on conversational search, with the ability to ask follow-up questions.

Here's a rather trivial example: Q: "Who founded Y Combinator?" A: "Paul Graham founded Y Combinator with Jessica Livingston and Trevor Blackwell."

If you scroll to the bottom of the answer page to ask a follow-up question: Q: "How old is he?" A: "Paul Graham is 57 years old. He founded Y Combinator in 2005."

applgo4433y ago· 1 in thread

I work on building language models at FAANG for my day-day job and I'm very curious about this - how does finetuning on T5 remove hallucination? Can you elaborate on this?

This is an amazing product, btw. Let me know if you're looking for people to hire :)

rushingcreek3y ago

And thank you :) it's comments like this that really fire us up

mikkergp3y ago· 1 in thread

rushingcreek3y ago

Showing an answer written by a human as a part of the code snippet is also a good idea.

selectnull3y ago· 1 in thread

Using latest Chrome gives "Uncaught (in promise) TypeError: Failed to fetch" console error.

The same is with any input, even with predefined ones; the progress bar gets to the end (slowly) and nothing happens.

Firefox works.

wayyOP3y ago

jspdown3y ago· 1 in thread

When I compared the output between Hello and Bing, the filtering worked pretty well. It removed most of the StackOverflow results, which are 99% of the time not insightful.

Great job on this search engine and congrats on the release.

forgotpwd163y ago

>The output if full of blog posts that I never encountered while searching on Google

[1]: https://archive.ph/6SkWZ (Hello) [2]: https://archive.ph/90jY4 (Google)

anshumankmr3y ago· 1 in thread

Possible edge cases that aren't handled yet. https://imgur.com/a/mieqZIU

I am liking the product quite a bit however. Good stuff.

wayyOP3y ago

Thanks for pointing this out - determining when code snippets are relevant is something we're actively working on.

8organicbits3y ago· 1 in thread

On mobile browser I noticed that autocorrect was enabled, which isn't great for technical inputs. I want "hcl", not "hello".

wayyOP3y ago

Good catch, we'll disable it

radiojasper3y ago· 1 in thread

I searched for 'iife javascript' and although it understood what I was searching for, it's 3 code examples only showed an iife on the 3rd code example. the 2nd one didn't make sense at all [0]

[0] https://beta.sayhello.so/search?q=iife+javascript

wayyOP3y ago

https://beta.sayhello.so/search?q=Immediately+Invoked+Functi...

HasanYousef3y ago· 1 in thread

wayyOP3y ago

Thanks for the feedback. I agree that the animations can get old if you're using the search every day. We'll look into simplifying.

hill6133y ago· 1 in thread

Good idea. Although it seems it's language querying is pretty poor.

If I specifically list Python/Javascript the first couple results are not even in that language, 3rd/4th are. And you have to click link/see reference to even see the language

You would think if your language is included in the query it should be heavily prioritised

rushingcreek3y ago

Thanks for the feedback. Could you post what queries you tried?

aquajet3y ago· 1 in thread

> Training a sequence-to-sequence language model (T5 derivative) on our custom dataset designed for factual generation yielded much better results with less hallucination.

Could you elaborate more on this or point to a paper/benchmark results?

rushingcreek3y ago

silentsea903y ago· 1 in thread

Would this be useful as a vscode plugin perhaps? I would much rather search this on an IDE imo but I am just one random dude.

wayyOP3y ago

We've actually had a bit of feedback related to IDE integration. We're focusing on developing the search technology itself right now but vscode plugin could be interesting to look at in the future.

vyrotek3y ago· 1 in thread

Very interesting.

Some of my results with code examples looked awfully similar to GitHub CoPilot output.

Is that being used to generate results sometimes?

rushingcreek3y ago

Thanks :)

We actually do have a code generation model similar to CoPilot but it's not active yet on the backend. All of the code snippets you see are pulled from other websites.

Fede_V3y ago· 1 in thread

You.com has a similar code search specific product. What do you plan to offer that they don't?

rushingcreek3y ago

Overall, our goal is to have the highest signal-to-noise ratio of any search engine when it comes to developer searches.

samelawrence3y ago· 1 in thread

The answer never loads if using Brave with Shields up, even after allowing JS.

wayyOP3y ago

Just installed Brave and it seems to work even with Shields up. The site is under load so maybe try again. Are any of the other things (answer/explanation, code snippet, links) loading for you?

MauranKilom3y ago· 1 in thread

Here's a data point from me:

Here is the kind of output that would have been meaningful/useful to me (although I was only looking for the second part, since I already knew the theory just not the syntax):

    An explicit template instantiation definition (usually placed in a source file) makes the compiler instantiate the template for the given arguments. An explicit template instantiation declaration (usually placed in the header) tells the compiler that the template will already be instantiated elsewhere, so implicit instantiation can be skipped.

    template <class T>
    class Foo {};

    // Explicit template instantiation declaration:
    extern template class Foo<int>;

    // Explicit template instantiation definition:
    template class Foo<int>;

I tried six different queries, of the form "C++ explicit template instantiation [declaration|definition] [syntax]".

In all cases, the synthesized explanation was either gibberish or flat out wrong. For example:

wayyOP3y ago

Hello primarily does well for "how-to" questions at the moment; it's still early and we're working to improve results for the queries you've described.

[0] https://static.googleusercontent.com/media/research.google.c... [1] https://bootcamp.uxdesign.cc/the-hidden-insights-in-develope...

_ank_it3y ago· 1 in thread

No dark mode?

wayyOP3y ago

In the works :)

ziddoap3y ago· 1 in thread

Maybe someone from the team can help me with a question regarding the privacy page/policy.

>The searches we anonymously log be used to improve our product.

>We will never sell your data to any third party.

The first sentence is at odds with the second two.

If you say you are only collecting query and query response data, but then assure me you aren't selling my data, I can't help but wonder which is true:

1) The site is actually collecting data, but is not selling it. 2) The site is not collecting data, and the privacy page is outdated/incorrect/inaccurate.

wayyOP3y ago

muds3y ago

Awesome work guys! A couple of knee jerk reactions while playing around with this:

> how to normalize the rows of a tensor pytorch

3. Finally, I just want to say that the website is phenomenal, even on mobile. Kudos on the frontend/backend/architecture side of things.

skrtskrt3y ago

> we’re planning to charge teams per user/month to use on internal data scattered around in wikis, documentation, slack, and emails.

Do you have any idea of how you're going to go the "enterprise integration" route without hiring an army of implementation consultants?

best of luck, I'm sure there are many teams on Confluence that wish they had a functioning search without moving everything off Confluence at once

michannne3y ago

>Load image in black and white ttf c++

First impressions - really impressive! I did have to add "c++" to get any meaningful results though.

yrgulation3y ago

quickthrower23y ago

What would be useful is if you could present information from different docs sites in the same format and combine them.

Maybe that is a different startup idea completely but it would be cool to have MDN, React and Firebase docs in one place, brutalist style where I can quickly get to what I want.

kyelewis3y ago

So, i was confused about this- the snippet that comes back, is it direct from another site or is it generated from the results?

niemal_dev3y ago

"how to center a div" does not yield any results. :-(

aneeqdhk3y ago

Congrats on the launch! I can vouch that this is indeed a problem devs face, especially new devs. The number of times I've had to steer away my students at Edyst away from SEO optimized articles!

rcshubhadeep3y ago

Congrats on the Launch!

olegkaplya3y ago

Hi, congrats guys, I wish you good luck. I've tried to see how it works but got blank page instead of search results.

orzi3y ago

Search does not work with Firefox (blank page), tried 63.0.3, 78.11.0esr, 89.0. Does work with 91.11.0esr

uhtred3y ago

Not returning anything for me. Just displays progress bar.

Edit: Privacy badger blocks api.bing.microsoft.com which breaks the search

Is this just forwarding the query to bing? How is it different to duck duck go?

celdon253y ago

Searching for "eson ruby" gives me ESPN Rugby, just like other search engines.

IMO we need a code search engine that knows when to use or not use a lexeme index for each word or phrase.

HPGBeans3y ago

I'm not a fan of the UI. The concept is great, but imo this doesn't differentiate itself enough for me to stop using Google or even searching via a site like Stack Overflow

beyang3y ago

drvsh3y ago

https://imgur.com/a/CF711HR

This is something that will make me go back to Google.

mageofpanthera3y ago

[Y-Combinator] is pushing up daisies now. Too damn slimey for the future. I get that you love your Simoleons, slavs.

candiddevmike3y ago

How do you think your product will fare in the wake of the backlash and legal saber rattling against GitHub Copilot?

barbarbar3y ago

I did a search.

Clicked on the result.

Then a blank empty page was shown.