undefined | Better HN

0 pointsin-silico1mo ago0 comments

What about things like AlphaZero and Atari gameplay, where the model has zero prior knowledge and learns superhuman ability purely using RL?

With sufficient RL sampling/training, there's no reason an LLM couldn't similarly develop entirely new skills, especially in verifiable domains like math and code.

> It simply alters the probabilities.

Yes? What else would a learning system do besides alter its behavior? (and you can just sample with argmax or pseudo-randomly of you think probabilities are a problem)

0 comments

No comments yet.