A bill introduced by two legislators requires transparency from creators of foundation models (such as text to music), urging disclosure of training data sources to inform copyright holders of potential data usage, The Verge reports. The AI Foundation Model Transparency Act, presented by Representatives Anna Eshoo (D-CA) and Don Beyer (D-VA), proposes that the Federal Trade Commission (FTC) collaborate with the National Institute of Standards and Technology (NIST) to establish regulations for transparent reporting of training data. This legislation aims to foster transparency in the use of artificial intelligence, specifically targeting foundation models that have garnered international attention for their use in generative AI websites and chatbots.
The bill is imposed due to the lack of public access to information about the data used to train these models, which has raised concerns. The resulting AI models often produce responses that are inaccurate, imprecise, or biased, with potential real-world consequences. Under this proposed legislation, companies producing foundation models must disclose essential information to the FTC and the public. Key aspects are details about the training data, the model's training processes, and whether user data is collected during inference. They are required to outline the model's limitations and risks, its alignment with NIST's AI Risk Management Framework, adherence to federal standards, and specifics about the computational power employed in both training and running the model.
"This bill would help users determine if they should trust the model they are using for certain applications, and help identify limitations on data, potential biases, or misleading results," Beyer says in the press release.
Highlighting the relevance of training data transparency in copyright issues, the bill references several lawsuits against AI companies alleging copyright infringement. Notably mentioned are cases involving artists suing Stability AI, Midjourney, and Deviant Art, as well as Getty Images' complaint against Stability AI.
The legislative proposal is yet to be assigned to a committee for discussion, and its fate remains uncertain.
What does the new bill mean for generative AI in music?
The act is set to impact AI music startups that train their models on copyright-protected data.
The legislation mandates that AI music startups disclose the sources of their training data, particularly focusing on whether the data involves copyright-protected material. This move aims to address concerns related to copyright infringement within the AI music sector. AI music companies, especially those that leverage tons of copyright-protected content for training, will need to ensure that their use of data aligns with regulatory standards, with potential legal consequences for non-compliance, and ensure models create music without causing harm to copyright holders.
Stricter regulations surrounding the use of copyrighted material may impact the innovation and development of AI music models, so startups will need to carefully navigate the balance between leveraging valuable and high-quality training data and respecting copyright laws.
As the AI Foundation Model Transparency Act makes its way through the legislative process, companies that mix music and AI are urged to closely monitor developments, adapt their practices accordingly, and engage with regulatory bodies to keep an eye on AI regulations (which will be imposed one after another, so it seems).
Eshoo and Beyer's bill aligns with the Biden administration's AI executive order, which seeks to establish reporting standards for AI models. However, as the executive order is not law, the passage of the AI Foundation Model Transparency Act would enforce transparency requirements for training data as a federal rule.
Earlier this year, the EU also imposed an act regulating how AI is used and developed in the European Union, stating that "generative AI, like ChatGPT, would have to comply with transparency requirements [by] disclosing that the content was generated by AI, designing the model to prevent it from generating illegal content, [and] publishing summaries of copyrighted data used for training."
The rationale behind the regulation is the same: high-impact general-purpose AI models might pose a systemic risk and, thus, have to undergo thorough evaluations.
Does it mean more ethical data training and respect towards artists and rights-holders? We'll see.