Current LLMs have none of that - they are just the fixed set of rules, further limited by also having a fixed number of steps of rule application.
The effect is as if you had multiple people playing a game where they each extend a sentence by taking turns adding a word to it, but there is zero continuity from one word to the next because each person is starting from scratch when it is their turn.
What do you mean? They get to access their previous hidden states in the next greedy decode using attention, it is not simply starting from scratch. They can access exactly what they were thinking when they put out the previous word, not just reasoning from the word itself.
An LLM has no innate traits such as curiosity or boredom to trigger exploration, and anyways no online/incremental learning mechanism to benefit from it even if it did.