With sufficient RL sampling/training, there's no reason an LLM couldn't similarly develop entirely new skills, especially in verifiable domains like math and code.
> It simply alters the probabilities.
Yes? What else would a learning system do besides alter its behavior? (and you can just sample with argmax or pseudo-randomly of you think probabilities are a problem)
No comments yet.