undefined | Better HN

0 pointsgitaarik10mo ago0 comments

So to clarify, it could potentially come up with (something close to) C, but if you want it to get to D, E, F etc, it will become less and less accurate for each consequentive step, because it lacks the human curated reward models up to that point. Only if you create new reward models for C, the output for D will improve, and so on.

0 comments

JoshCole10mo ago

> Only if you create new reward models for C, the output for D will improve, and so on.

Again, tons of false claims. One is that 'you' have to create the reward model. Another that it has to be human-curated at all. Yet another is that you even need to do that at all: you can instead have the model build a bigger model of itself, train using its existing resources or more of them, then synthesize itself back down. Another way you can get around it is to augment the existing dataset in some way. No other changes except resource usage and yet the resulting model will be better, because more resources went into its construction.

Seriously notice: you keep making false claims again and again and again and again and again. You're not stating true things. You really need to reflect. If almost every sentence you speak on this topic is false, why is it that you think you should be able to persuade me to your views? Why should I believe your views, when you say so many things that are factually inaccurate, rather than my own views?

gitaarikOP10mo ago

Ok, so you claim that LLMs can get smarter without human validation. So why do they hallucinate at all? And why are all reward models currently curated by humans? Or are you claiming they aren't?

JoshCole10mo ago

I don't find it reasonable that you didn't understand my corrections, because current AI already do. So I'm exiting the conversation.

https://chatgpt.com/share/683a3c88-62a8-8008-92ef-df16ce2e8a...

1 more reply

j / k navigate · click thread line to collapse

0 comments

JoshCole10mo ago

> Only if you create new reward models for C, the output for D will improve, and so on.

gitaarikOP10mo ago

Ok, so you claim that LLMs can get smarter without human validation. So why do they hallucinate at all? And why are all reward models currently curated by humans? Or are you claiming they aren't?

JoshCole10mo ago

I don't find it reasonable that you didn't understand my corrections, because current AI already do. So I'm exiting the conversation.

https://chatgpt.com/share/683a3c88-62a8-8008-92ef-df16ce2e8a...

1 more reply

j / k navigate · click thread line to collapse