story
Using data from another model won't save you any training time.
It's...not, and its repeatedly been proven in practice that this is an invalid generalization because it is missing necessary qualifications, and its funny that this myth keeps persisting.
It's probably a bad idea to use uncurated output from another AI to train a model if you are trying to make a better model rather than a distillation of the first model, and its definitely (and, ISTR, the actual research result from which the false generalization has developed) a bad idea to iteratively fine-tune a model on its own unfiltered output, but there has been lots of success using AI models to generate data which is curated and used to train other models, which can be much more efficient that trying to create new material without AI once you've gotten to the point where you've already hoovered up all the readily-accessible low hanging fruit of premade content relevant to your training goal.
This is immediately obvious if you look at it through a statistical learning lens and not the mysticism crystal ball that many view NN’s through.
"Play and reflection" is something else, which isn't distillation.
Given this, there’s no reason why it could not be trivial to produce a child model from (filtered) parent output that exceeds the child model on a different, more meaningful objective like being a useful chatbot. There's no reason why this would have to be limited to domains with verifiable answers either.
It is not distillation. It's like how you can arrive at new knowledge by reflecting on existing knowledge.
Unfiltered? Sure. With human curation of the generated data it certainly can. (Even automated curation can do this, though its more obvious that human curation can.)
I mean, I can randomly developed fact claims about addition, and if I curate which ones go into a training set, train a model that reflects addition of integers much more accurately than the random process which generated the pre-curation input data.
Without curation, as I already said, the best you get is a distillation of the source model, which is highly improbable to be more accurate.
That is the existential, $1T question.
Also, can I have some money to build more data centres pls?
Re: "generally a bad idea", I'd just highlight "generally" ;) Clearly it worked in this case!
I said generally because there are things like adversarial training that use a ruleset to help generate correct datasets that work well. Outside of techniques like that it's not just a rule of thumb, it's always true that training on the output of another model will result in a worse model.
https://www.scientificamerican.com/article/ai-generated-data...
Not convincing.
You can imagine model doing some primitive thinking and coming to conclusion. Then you can train another model on summaries. If everything goes well it will be coming to conclusions quicker. That's at least. Or it may be able solve more complex problems with the same amount of 'thinking'. It will be self-propelled evolution.
Another option is to use one model to produce 'thinking' part from known outputs. Then train another to reproduce thinking to get the right output, unknown to it initially. Using humans to create such dataset would be slow and very expensive.
PS: if it was impossible humans would be still living on the trees.
These models don't evolve like they, there is not a random process of architectural evolution. Nor is there a fitness function anything like "get better at math."
A system like AlphaZero works because it has a rules to use as an oracle: the game rules. The game rules provide the new training information needed drive the process. Each game played produces new correct training data.
These LLMs have no such oracle. Their fitness function is and remains: predict the next word, followed by: produce text that makes a human happy. Note that it's not "produce text that makes ChatGPT happy."
Ah. So if I understand this... once the internet becomes completely overrun with AI-generated articles of no particular substance or importance, we should not bulk-scrape that internet again to train the subsequent generation of models.
I look forward to that day.
It seems like the difference between someone doing a better writeup of (say) Wiles's proof vs. proving Fermat's Last Theorem independently.
It proofs we _can_ optimize our training data.
Just like humans have been genetically stable for a long time, the quality & structure of information available to a child today vs that of 2000 years ago makes them more skilled at certain tasks. Math being a good example.
That is not true at all.
We have known how to solve this for at least 2 years now.
All the latest state of the art models depend heavily on training on synthetic data.
> We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models
No one is training on indiscriminate synthetic data. It's very much discriminated, but still synthetic.