AI models collapse when trained on recursively generated data

The article delves into the phenomenon of model collapse, which affects generations of generative models such as LLMs, GMMs, and VAEs. It describes the degenerative process whereby models trained on data generated by previous generations misperceive reality over time, leading to convergence to a distribution with reduced variance. The content highlights three specific sources of error compounding over generations: statistical approximation error, functional expressivity error, and functional approximation error. The author discusses the implications of model collapse on language models and showcases experiments with fine-tuning data generated by other models. The findings suggest that model collapse is universal across various machine learning models and can lead to degraded performance over generations.

https://www.nature.com/articles/s41586-024-07566-y

To top