undefined | Better HN

0 pointscrystal_revenge1y ago0 comments

I don't think we've even started to get the most value out of current gen LLMs. For starters very few people are even looking at sampling which is a major part of the model performance.

The theory behind these models so aggressively lags the engineering that I suspect there are many major improvements to be found just by understanding a bit more about what these models are really doing and making re-designs based on that.

I highly encourage anyone seriously interested in LLMs to start spending more time in the open model space where you can really take a look inside and play around with the internals. Even if you don't have the resources for model training, I feel personally understanding sampling and other potential tweaks to the model (lots of neat work on uncertainty estimations, manipulating the initial embedding the prompts are assigned, intelligent backtracking, etc).

And from a practical side I've started to realize that many people have been holding on of building things waiting for "that next big update", but there a so many small, annoying tasks that can be easily automated.

0 comments

ppeetteerr1y ago

The reason people are holding out is that the current generation of models are still pretty poor in many areas. You can have it craft an email, or to review your email, but I wouldn't trust an LLM with anything mission-critical. The accuracy of the generated output is too low be trusted in most practical applications.

saalweachter1y ago

Any email you trust an LLM to write is one you probably don't need to send.

Tagbert1y ago

Glib but the reality is that there are lots of cases where you can use an AI in writing but don’t need to entrust it with the whole job blindly.

I mostly use AIs in writing as a glorified grammar checker that sometimes suggests alternate phrasing. I do the initial writing and send it to an AI for review. If I like the suggestions I may incorporate some. Others I ignore.

The only times I use it to write is when I have something like a status report and I’m having a hard time phrasing things. Then I may write a series of bullet points and send that through an AI to flesh it out. Again, that is just the first stage and I take that and do editing to get what I want.

It’s just a tool, not a creator.

1 more reply

jeswin1y ago

Google (even now) wasn't absolutely accurate either. That didn't stop it from becoming many billions worth.

> You can have it craft an email, or to review your email, but I wouldn't trust an LLM with anything mission-critical

My point is that an entire world lies between these two extremes.

ppeetteerr1y ago

Google became a billion dollar company creating the best search and indexing service at the time and putting ads around the results (that and YouTube). The didn't own the answer of the question.

DiscourseFan1y ago

I would say that anything you write can come back to you in the future, so don’t blindly sign your name on anything you didn’t review yourself.

netdevnet1y ago

Why don't you give actual concrete testable examples back with evidence where this is the case? Put your skin in the game.

1 more reply

dr_dshiv1y ago

> I've started to realize that many people have been holding on of building things waiting for "that next big update"

I’ve noticed this too — I’ve been calling it intellectual deflation. By analogy, why spend now when it may be cheaper in a month? Why do the work now, when it will be easier in a month?

vbezhenar1y ago

Why optimise software today, when tomorrow Intel will release CPU with 2x performance?

ben_w1y ago

Back when Intel regularly gave updates with 2x performance increases, people did make decisions based on the performance doubling schedule.

sdenton41y ago

Curiously, Moore's law was predictable enough over decades that you could actually plan for the speed of next year's hardware quite reliably.

For LLMs, we don't even know how to reliably measure performance, much less plan for expected improvements.

1 more reply

fooker1y ago

If Intel could do that, they would be the one with a 3 trillion market cap. Not Nvidia.

throwing_away1y ago

Call Nvidia, that sounds like a job for AI.

jkaptur1y ago

https://en.wikipedia.org/wiki/Osborne_effect

creativenolo1y ago

Great & motivational comment. Any pointers on where to start playing with the internals and sampling?

Doesn’t need to be comprehensive, I just don’t know where to jump off from.

wruza1y ago

Afaiu “sampling” here, it is controlled with (not only?) topk and temp parameters in e.g. “text generation web ui”. You may find these in other frontends probably too.

This ofc implies local models and that you have a decent cpu + min 64gb of ram to run above 7b-sized model.

https://github.com/oobabooga/text-generation-webui

https://huggingface.co/models?pipeline_tag=text-generation&s...

creativenolo1y ago

> holding on of building things waiting for "that next big update", but there a so many small, annoying tasks that can be easily automated.

Also we only hear / see the examples that are meant to scale. Startups typically offer up something transformative, ready to soak up a segment of a market. And that’s hard with the current state of LLMs. When you try their offerings, it’s underwhelming. But there is richer, more nuanced hard to reach fruits that are extremely interesting - but it’s not clear where they’d scale in and of themselves.

deegles1y ago

My big question is what is being done about hallucination? Without a solution it's a giant footgun.

MBCook1y ago

CAN anything be done? At a very low level they’re basically designed to hallucinate text until it looks like something you’re asking for.

It works disturbingly well. But because it doesn’t have any actual intrinsic knowledge it has no way of knowing when it made a “good“ hallucination versus a “bad“ one.

I’m sure people are working at piling things on top to try and influence what gets generated or catch and move away from errors errors other layers spot… but how much effort and resources will be needed to make it “good enough“ that people don’t worry about this anymore.

In my mind the core problem is people are trying to use these for things they’re unsuitable for. Asking fact-based questions is asking for trouble. There isn’t much of a wrong answer if you wanted to generate a bedtime story or a bunch of test data that looks sort of like an example you give it.

If you ask it to find law cases on a specific point you’re going to raise a judge‘s ire, as many have already found.

jacobr11y ago

Semantic search without LLMs is already making a dent. It still gives traditional results that need to be human processed, but you can get "better" search results.

And with that there is a body work on "groundedness" that basically post-processes output to compare it against its source material. It still can result in logic errors and has a base error it self, but can ensure you at least have clear citations for factual claims that match real documents, but doesn't fully ensure they are being referenced correctly (though that is already the case even with real papers produced by humans).

Also consider the baseline isn't perfection, it is a benchmark against real humans. Accuracy is getting much better in certain domains where we have a good corpora. Part of assessing the accuracy of a system is going to be about determining if the generated content is "in distribution" of its training data. There is progress being made in this direction, so we could perhaps do a better job at the application level of making use of a "confidence" score of some kind maybe even taking that into account in a chain of thought like reasoning step.

People keep finding "obviously wrong" hallucinates that seem like proof things are still crap. But these system keep getting better on benchmarks looking at retrieval accuracy. And the benchmarks keep getting better as people point out deficiencies it them. Perfection might not be possible, but consistently better than average human seems in reach, and better than that seems feasible too. The challenge is the class of mistakes might look different even if the error rate overall is lower.

netdevnet1y ago

what do you want done about it? Hallucination is an intrinsic part of how LLMs work. What makes a hallucination is the inconsistency between the hallucinated concept and the reality. Reality is not part of how LLMs work. They do amazing things but at the end of the day they are elaborate statistical machines.

Look behind the veil and see LLMs for what they really are and you will maximise their utility, temper your expectations and save you disappointment

dr_kiszonka1y ago

Would you have any suggestions on how to play with the internals of these open models? I don't understand LLMs well, and would love to spend some experimenting, but I don't know where to start. Are any projects more appropriate for neophytes?

kozikow1y ago

> "The theory behind these models so aggressively lags the engineering"

The problem is that 99% of theories are hard to scale.

I am not an expert, as I work adjacent to this field, but I see the inverse - dumbing down theory to increase parallelism/scalability.

dheera1y ago

Exactly, I think the current crop of models is capable of solving a lot of non-first-world problems. Many of them don't need full AGI to solve, especially if we start thinking outside Silicon Valley.

j / k navigate · click thread line to collapse

0 comments

ppeetteerr1y ago

saalweachter1y ago

Any email you trust an LLM to write is one you probably don't need to send.

Tagbert1y ago

Glib but the reality is that there are lots of cases where you can use an AI in writing but don’t need to entrust it with the whole job blindly.

It’s just a tool, not a creator.

1 more reply

jeswin1y ago

Google (even now) wasn't absolutely accurate either. That didn't stop it from becoming many billions worth.

> You can have it craft an email, or to review your email, but I wouldn't trust an LLM with anything mission-critical

My point is that an entire world lies between these two extremes.

ppeetteerr1y ago

Google became a billion dollar company creating the best search and indexing service at the time and putting ads around the results (that and YouTube). The didn't own the answer of the question.

DiscourseFan1y ago

I would say that anything you write can come back to you in the future, so don’t blindly sign your name on anything you didn’t review yourself.

netdevnet1y ago

Why don't you give actual concrete testable examples back with evidence where this is the case? Put your skin in the game.

1 more reply

dr_dshiv1y ago

> I've started to realize that many people have been holding on of building things waiting for "that next big update"

I’ve noticed this too — I’ve been calling it intellectual deflation. By analogy, why spend now when it may be cheaper in a month? Why do the work now, when it will be easier in a month?

vbezhenar1y ago

Why optimise software today, when tomorrow Intel will release CPU with 2x performance?

ben_w1y ago

Back when Intel regularly gave updates with 2x performance increases, people did make decisions based on the performance doubling schedule.

sdenton41y ago

Curiously, Moore's law was predictable enough over decades that you could actually plan for the speed of next year's hardware quite reliably.

For LLMs, we don't even know how to reliably measure performance, much less plan for expected improvements.

1 more reply

fooker1y ago

If Intel could do that, they would be the one with a 3 trillion market cap. Not Nvidia.

throwing_away1y ago

Call Nvidia, that sounds like a job for AI.

jkaptur1y ago

https://en.wikipedia.org/wiki/Osborne_effect

creativenolo1y ago

Great & motivational comment. Any pointers on where to start playing with the internals and sampling?

Doesn’t need to be comprehensive, I just don’t know where to jump off from.

wruza1y ago

Afaiu “sampling” here, it is controlled with (not only?) topk and temp parameters in e.g. “text generation web ui”. You may find these in other frontends probably too.

This ofc implies local models and that you have a decent cpu + min 64gb of ram to run above 7b-sized model.

https://github.com/oobabooga/text-generation-webui

https://huggingface.co/models?pipeline_tag=text-generation&s...

creativenolo1y ago

> holding on of building things waiting for "that next big update", but there a so many small, annoying tasks that can be easily automated.

deegles1y ago

My big question is what is being done about hallucination? Without a solution it's a giant footgun.

MBCook1y ago

CAN anything be done? At a very low level they’re basically designed to hallucinate text until it looks like something you’re asking for.

It works disturbingly well. But because it doesn’t have any actual intrinsic knowledge it has no way of knowing when it made a “good“ hallucination versus a “bad“ one.

If you ask it to find law cases on a specific point you’re going to raise a judge‘s ire, as many have already found.

jacobr11y ago

Semantic search without LLMs is already making a dent. It still gives traditional results that need to be human processed, but you can get "better" search results.

netdevnet1y ago

Look behind the veil and see LLMs for what they really are and you will maximise their utility, temper your expectations and save you disappointment

dr_kiszonka1y ago

kozikow1y ago

> "The theory behind these models so aggressively lags the engineering"

The problem is that 99% of theories are hard to scale.

I am not an expert, as I work adjacent to this field, but I see the inverse - dumbing down theory to increase parallelism/scalability.

dheera1y ago

Exactly, I think the current crop of models is capable of solving a lot of non-first-world problems. Many of them don't need full AGI to solve, especially if we start thinking outside Silicon Valley.

j / k navigate · click thread line to collapse