> What good it's it to train a model basically on itself?
If the model generates data of variable quality, and if there's a good way to distinguish good data from bad data, then training on self-generated data might "bootstrap" a model to better performance.
This is common in reinforcement learning. Famously, AlphaGo Zero (https://en.wikipedia.org/wiki/AlphaGo_Zero) learned exclusively on self-play, without reference to human-played games.
Of course, games have a built-in critic: the better strategy usually wins. It's much harder to judge the answer to a math problem, or decide which essay is more persuasive, or evaluate restaurant recommendations.
If the model generates data of variable quality, and if there's a good way to distinguish good data from bad data, then training on self-generated data might "bootstrap" a model to better performance.
This is common in reinforcement learning. Famously, AlphaGo Zero (https://en.wikipedia.org/wiki/AlphaGo_Zero) learned exclusively on self-play, without reference to human-played games.
Of course, games have a built-in critic: the better strategy usually wins. It's much harder to judge the answer to a math problem, or decide which essay is more persuasive, or evaluate restaurant recommendations.