Mad Cow Disease (Of Sorts) Is a Threat to AI Trained on AI

AI systems can, in a way, "get sick," too. This revelation comes from recent research that has identified a condition known as "Model Autophagy Disorder" (MAD), a phenomenon observed when AI models are trained primarily on data generated by other AI systems rather than fresh, human-created content.

Researchers from Rice University and Stanford University in the United States have discovered that when generative AI models—whether they create text, images, or other forms of digital content—are fed a diet of synthetic, machine-made data, their output quality begins to degrade. The more an AI consumes content made by other AIs, the more it starts to produce repetitive, less diverse, and increasingly flawed results. Over time, this can lead to a digital equivalent of the neurological disorder seen in cows, commonly known as mad cow disease.

“The problems arise when this synthetic data training is, inevitably, repeated, forming a kind of a feedback loop ⎯ what we call an autophagous or ‘self-consuming’ loop,” says Richard Baraniuk, Rice’s C. Sidney Burrus Professor of Electrical and Computer Engineering. “Our group has worked extensively on such feedback loops, and the bad news is that even after a few generations of such training, the new models can become irreparably corrupted. This has been termed ‘model collapse’ by some ⎯ most recently by colleagues in the field in the context of large language models (LLMs). We, however, find the term ‘Model Autophagy Disorder’ (MAD) more apt, by analogy to mad cow disease.”

Researchers trained a visual generative AI model on three different types of data sets: one composed entirely of synthetic data, one that mixed synthetic data with a fixed set of real data, and one that combined synthetic data with a constantly refreshed set of real-world information. The results were telling. In the first two scenarios, where fresh human-generated data was absent or limited, the AI's output became progressively worse.

Source: Digital Signal Processing Group/Rice University

Artefacts such as grid-like scars appeared on computer-generated faces, and the faces themselves began to look eerily similar to one another. When tasked with generating handwritten numbers, the results became increasingly indecipherable, showing a clear decline in quality and diversity.

So apparently, AI can't do without fresh, real-world data in its training processes. Without it, AI models run the risk of entering a feedback loop of self-consumption, where the quality of their output steadily declines. The study rightfully warns of a future where AI-generated content becomes nothing more than a degraded echo of itself, a cautionary tale for the digital age.

Is it just AI images that can be affected this way?

Not at all. The implications of Model Autophagy Disorder extend beyond image generation. The researchers suggest that large language models, which generate text, could suffer from the same fate if trained on AI-generated content without the inclusion of new, human-created data.

“We chose to work on visual AI models to better highlight the drawbacks of autophagous training, but the same mad cow corruption issues occur with LLMs, as other groups have pointed out,” Baraniuk says.

Experts in the field have already raised alarms about the limitations of generative AI; they point out that these systems are rapidly depleting the available reservoir of real-world data.

What about music, though? Training models on large amounts of high-quality data might be unethical, as this data is, as a rule, copyright-protected. Yes, there are models that were trained on datasets of stock music but the most successful and controversial music gen AI models were allegedly trained on songs whose rightsholders have never given consent to it or couldn't voluntarily opt out.

In our conversation with Max Hilsdorf, data scientist and musicologist, we asked him what might happen to artificial music if music gen AI would train on its own data. He replied that "if an AI is trained on its own outputs or those of similar models, it might reinforce its own biases instead of learning new useful patterns. You could say it becomes detached from its real, human-made training data." So we can assume that models generating AI music can face the same outcome as the ones generating images, like Stable Diffusion or OpenAI's DALL·E.

But not everything is doomed. There's a nice pattern we observe in the last few months: AI companies train models on voices or music of those artists who openly want it, fairly paying them for this. Voice Swap, Lemonaide, Kits.AI, SOUNDRAW, and many other companies choose an ethical approach to make it all win-win: compensate artists, train their AI on high-quality data, and avoid mad cow disease in music gen AI.

🍿🍿 You may also like