story
How are we not far off? How can LLMs generate goals and based on what?
Alternately, you can train it on following a goal and then you have a system where you can specify a goal.
At sufficient scale, a model will already contain goal-following algorithms because those help predict the next token when the model is basetrained on goal-following entities, ie. humans. Goal-driven RL then brings those algorithms to prominence.
But also my intuition is that humans are "trained on goals" and then reverse-engineer an explicit goal structure using self-observation and prosaic reasoning. If it works for us, why not the LLMs?
edit: Example: https://arxiv.org/abs/2501.11120 "Tell me about yourself: LLMs are aware of their learned behaviors". When you train a LLM on an exclusively implicit goal, the LLM explicitly realizes that it has been trained on this goal, indicating (IMO) that the implicit training hit explicit strategies.
Noticing this, frameworks like SMART[1], provide explicit generation rules. The existence of explicit frameworks is evidence that humans tend to perform worse than expected at extracting implicit structure from goals they've observed.
1. Independent of the effectiveness of such frameworks
https://github.com/dmf-archive/PILF
https://dmf-archive.github.io/docs/posts/beyond-snn-plausibl...