undefined | Better HN

story

0 pointsonemoresoop10mo ago0 comments

> I'd say we're not far off.

How are we not far off? How can LLMs generate goals and based on what?

0 comments

You just train it on the goal. Then it has that goal.

Alternately, you can train it on following a goal and then you have a system where you can specify a goal.

At sufficient scale, a model will already contain goal-following algorithms because those help predict the next token when the model is basetrained on goal-following entities, ie. humans. Goal-driven RL then brings those algorithms to prominence.

kordlessagain10mo ago

Random goal use is showing to be more important than training. Although, last year someone trained on the fly during the competition, which is pretty awesome when you think about it.

kelseyfrog10mo ago

How do you figure goal generation and supervised goal training are interchangeable?

FeepingCreature10mo ago

Layman warning! But "at sufficient scale", like with learning-to-learn, I'd expect it to pick up largely meta-patterns along with (if not rather than) behavioral habits, especially if the goal is left open, because strategies generalize across goals and thus get reinforcement from every instance of goal pursuit during base training.

But also my intuition is that humans are "trained on goals" and then reverse-engineer an explicit goal structure using self-observation and prosaic reasoning. If it works for us, why not the LLMs?

edit: Example: https://arxiv.org/abs/2501.11120 "Tell me about yourself: LLMs are aware of their learned behaviors". When you train a LLM on an exclusively implicit goal, the LLM explicitly realizes that it has been trained on this goal, indicating (IMO) that the implicit training hit explicit strategies.

kelseyfrog10mo ago

I'm not sure. In my experience humans without explicit goal generation training tend to under perform at generating goals. In other words, our out-of-distribution performance for goal generation is poor.

Noticing this, frameworks like SMART[1], provide explicit generation rules. The existence of explicit frameworks is evidence that humans tend to perform worse than expected at extracting implicit structure from goals they've observed.

1. Independent of the effectiveness of such frameworks

NetRunnerSu10mo ago

Minimize prediction errors.

tsurba10mo ago

But are we close to doing that in real-time on any reasonably large model? I don’t think so.

NetRunnerSu10mo ago

This is not about reasoning , this is about continuous learning and perpetual learning .

https://github.com/dmf-archive/PILF

https://dmf-archive.github.io/docs/posts/beyond-snn-plausibl...

1 more reply

j / k navigate · click thread line to collapse

0 comments

FeepingCreature10mo ago

You just train it on the goal. Then it has that goal.

Alternately, you can train it on following a goal and then you have a system where you can specify a goal.

kordlessagain10mo ago

Random goal use is showing to be more important than training. Although, last year someone trained on the fly during the competition, which is pretty awesome when you think about it.

kelseyfrog10mo ago

How do you figure goal generation and supervised goal training are interchangeable?

FeepingCreature10mo ago

But also my intuition is that humans are "trained on goals" and then reverse-engineer an explicit goal structure using self-observation and prosaic reasoning. If it works for us, why not the LLMs?

kelseyfrog10mo ago

1. Independent of the effectiveness of such frameworks

NetRunnerSu10mo ago

Minimize prediction errors.

tsurba10mo ago

But are we close to doing that in real-time on any reasonably large model? I don’t think so.

NetRunnerSu10mo ago

This is not about reasoning , this is about continuous learning and perpetual learning .

https://github.com/dmf-archive/PILF

https://dmf-archive.github.io/docs/posts/beyond-snn-plausibl...

1 more reply

j / k navigate · click thread line to collapse