undefined | Better HN

0 pointssho_hn3y ago0 comments

I think it's a search engine with a bad curation/ranking algorithm.

It's trained with a corpus of research papers it mines from in response to a search prompt. It's a bit like if Google were to haphazardly compose a website from the first 20 pages of search results, or worse.

Composition is the novelity here, and we should judge it based on how well it can select and compose. Turns out not that well yet; judgement is lacking. Its performance depends on how easy it is to get it right for a given query and goes down the more difficult the query is, also because "is actually good" weights are not usually part of the input dataset to begin with (since the researchers hope to one day build something that comes up with its own notion of that - but so far have no idea how).

It's a bit like inventing pagerank and then stopping there, too.

That's a useful mental analogy to understand the limitations of this tech for now in case you ever go "I know, I will solve my problem with ML".

One of the ways I see people get this wrong is not believing in "performance goes down the more difficult the query is", because we tend to mistake complexity for difficulty, and a more complex and specific prompt helps these models produce convincing output a lot currently (i.e., prompt engineering). But that is not demonstrating understanding - it is handing the model a better set of training wheels.

0 comments

2 comments · 2 top-level

skybrian3y ago

A basic difference is that search engines don't make up fictional links, quotes, and citations.(Though they often index web pages that are bullshit.)

"Fill in the blank" training results in a model that guesses when it doesn't know the answer. You need some different kind of training or architecture to get nonfiction.

This turned out to be a great demo for demonstrating what a large language model can't do, because people expect nonfiction for scientific papers, making the bullshitting stand out more.

RodgerTheGreat3y ago

A startup called Cuil (https://en.wikipedia.org/wiki/Cuil) tried exactly the strategy you suggest in jest: synthesize articles by mashing up search results. It was a disaster, and widely mocked for how easy it was to get Cuil to produce absolute nonsense from straightforward prompts. When your starting point is "untrustworthy nonsense", it is an uphill battle in both technology and PR to arrive at "trustworthy synthesis", if it is indeed possible at all.

j / k navigate · click thread line to collapse