1. LLM generates an idea and
2. the user responds positively or negatively or
3. the user tries the idea and comes back to continue the iteration, communicating the outcomes.
For example the LLM generates some code and I run it, and if it fails I copy paste the error.
That is the (state, action, reward) tuple which defines an experience.