News

Bill Would Force AI Companies to Cite Data Sources

TechNet, a trade group that lobbies on behalf of tech companies like Meta and Google with huge investments in AI, says the kinds of disclosures that bill calls for would kneecap the U.S. edge in AI technology.

April 10, 2024 •

The days of artificial intelligence companies sweeping up endless data, copyrighted or not, could be coming to an end.

That is if Rep. Adam Schiff, D-Burbank, gets his way after introducing a bill in Congress on Tuesday that would force AI companies to say where they got the reams of data needed to make their super smart chatbots and image generators.

The bill could face an uphill battle on Capitol Hill. But if passed into law, it would wade into a developing area of the law and set rules for how AI systems can and can’t be trained. And it would potentially put limits on the breakneck speed at which companies including San Francisco-based OpenAI are moving to build ever-better digital brains.

“AI has the disruptive potential of changing our economy, our political system, and our day-to-day lives,” said a statement from Schiff, who is running for one of California’s U.S. Senate seats. “We must balance the immense potential of AI with the crucial need for ethical guidelines and protections.”

The Generative AI Copyright Disclosure Act would require companies to alert the government prior to releasing a new generative AI system, outlining “all copyrighted works used in building or altering the training dataset for that system.” The rules would also be retroactive, meaning AI companies would have to divulge where they got the millions — and, in some cases, billions and trillions — of pieces of data used to train their existing models.

What would happen then is somewhat unclear, but “generally, the guilty party must pay for past harm and refrain from future harm” in copyright cases, said Colleen Chien, co-director of the Berkeley Center for Law and Technology at Berkeley Law School, in an email.

“In the algorithmic context, we have started to see some creative forms of ‘disgorgement remedies’ including destruction of the model or algorithm, retraining it without the infringing material, or some combination of both,” she said.

AI programs such as OpenAI’s GPT series are finely tuned probability machines that learn from being fed essentially everything on the Internet and then some. That allows them to recognize patterns in language and images and produce fluent responses to prompts as if a user were talking to someone who knows a bit about everything.

The more training data, the smarter the program. But companies including OpenAI, Anthropic and others are facing lawsuits that say they have run roughshod over copyright rules and unfairly used data — including books, images and songs — that didn’t belong to them.

The companies have asserted that they are protected under fair-use rules, which allow unlicensed use of copyrighted materials for certain purposes, like free expression, under the law. But it’s far from a settled matter. The New York Times, which is among those suing AI companies for alleged copyright infringement, recently reported that OpenAI and others may have knowingly skirted rules, and possibly the law, in guzzling training data in an effort to win the AI arms race and build the best machine.

TechNet, a trade group that lobbies on behalf of tech companies like Meta and Google with huge investments in AI, said last year that the kinds of disclosures Schiff’s bill calls for would kneecap the U.S. edge in AI technology.

Enforcing rules on how AI models are trained would be bad for business and could force companies to take their business “to other jurisdictions with more innovation-friendly legal frameworks,” TechNet said in a letter to the U.S. Copyright Office in October.

The Times’ lawsuit against OpenAI alleges the ChatGPT bot regurgitated whole pieces of articles and reviews that appeared on its site, in violation of its copyright. So does a suit against Anthropic, another core AI developer in San Francisco, filed last year by Universal Music and more than a dozen other music publishers.

The bill would “not prohibit theft, just require notice of it,” however, said Timothy Giordano, a partner at Clarkson Law Firm, which has sued Google and OpenAI over their AI products and alleged copyright infringement and privacy violations. “Together with the bill’s narrow focus on copyright, it underscores the importance of our broader lawsuits challenging how Big Tech also helped itself to the personal information of millions of everyday Americans to build its volatile AI products with no notice, no right to opt-out, and no compensation,” Giordano said in an email.

A letter from the Artists Rights Alliance, which includes Billie Eilish, Nicki Minaj and many others, called for a halt to the use of AI in the music industry, framing it as a threat to creativity and the future of music. Michael Chabon and other authors have also sued OpenAI and Meta, alleging their copyrighted works were unfairly used without compensation to train the companies’ AI programs.

Schiff’s announcement included statements of support from a range of creative industries, including the Recording Industry Association of America, the Directors Guild of America, multiple writers’ guilds and SAG-AFTRA.

“The Directors Guild of America commends this commonsense legislation, which is an important first step toward enabling filmmakers to protect their intellectual property from the potential harms caused by generative AI,” said Lesli Linka Glatter, president of the Directors Guild of America, in a statement.

California also has a number of AI safety bills wending their way through Sacramento this session, on topics from discrimination to deepfakes.

Perhaps the marquee bill of the session on AI, from state Sen. Scott Wiener, D-San Francisco, would require some large AI companies to safety-test their models before releasing them to the public. They could face fines and other penalties if their technology causes harm or otherwise runs awry.

(c)2024 the San Francisco Chronicle. Distributed by Tribune Content Agency, LLC.

Tribune News Service

See More Stories by Tribune News Service

IE11 Not Supported

Bill Would Force AI Companies to Cite Data Sources

TechNet, a trade group that lobbies on behalf of tech companies like Meta and Google with huge investments in AI, says the kinds of disclosures that bill calls for would kneecap the U.S. edge in AI technology.