Over the past two years, a thriving market for licensing copyrighted material to train artificial intelligence systems has emerged. OpenAI was the first to strike deals with publications, like Axel Springer, News Corp. and the Associated Press. A few others in the field followed.
Such agreements weren’t in place when AI firms first started to face litigation accusing them of widespread infringement. Now, lawsuits are increasingly targeting the existence of this licensing market to argue that AI companies are illegally pilfering creators’ works.
Authors, in a proposed class action filed on Monday evening, accused Anthropic of illegally downloading and copying their books to power its AI chatbot Claude. The lawsuit alleges that the Amazon-backed company “usurped a licensing market for copyright owners.”
Without intervention from Congress, the legality of using copyrighted works in training datasets will be decided by the courts. The question will likely be answered in part on fair use, which provides protection for the use of copyrighted material to make a secondary work as long as it’s “transformative.” It remains among the primary battlegrounds for the mainstream adoption of the tech.
The authors’ lawsuit nods to AI firms’ position that their conduct is covered by the legal doctrine. By refusing to license content that allowed it to build Claude, Anthropic is undercutting an existing market that’s been established by other AI companies, the complaint says.
The allegations could be aimed at undermining a fair use defense, which was effectively reined in when the Supreme Court issued its decision in Andy Warhol Foundation for the Visual Arts v. Goldsmith. In that case, the majority said that an analysis of whether an allegedly infringing work was sufficiently transformed must be balanced against the “commercial nature of the use.” The authors, and other similarly situated creators, are looking to leverage that ruling to establish that Anthropic could’ve simply licensed the material that it infringed upon and that its conduct illegally hurts their prospects to profit off of their material by interfering with potential deals.
“Instead of sourcing training material from pirated troves of copyrighted books from this modern-day Napster, Anthropic could have sought and obtained a license to make copies of them,” the lawsuit states. “It instead made the deliberate decision to cut corners and rely on stolen materials to train their models.”
Absent Anthropic’s alleged infringement, the authors say blanket licensing practices would be possible through clearinghouses, like the Copyright Clearance Center, which recently launched a collective licensing mechanism. Record labels, publications and different authors suing AI companies have advanced similar arguments in other litigation.
The Authors Guild is exploring a model for its members to opt-in to the offering of a blanket license to AI companies. Early discussions have involved a fee to use works as training materials and a prohibition on outputs that borrow too much from existing material.
“We have to be proactive because generative AI is here to stay,” Mary Rasenberger, chief executive of the organization, told The Hollywood Reporter in January. “They need high-quality books. Our position is that there’s nothing wrong with the tech, but it has to be legal and licensed.”
The proposed class was filed shortly before the announcement on Tuesday of OpenAI reaching a deal with Condé Nast to display content from its brands, including Vogue, The New Yorker, and GQ, within ChatGPT and a search tool prototype. It was filed on behalf of Andrea Bartz (The Lost Night: A Novel, The Herd), Charles Graeber (The Good Nurse: A True Story of Medicine, Madness, and Murder, Kirk Wallace Johnson (The Fisherman and the Dragon: Fear, Greed, and a Fight for Justice on the Gulf Coast). The lawsuit, which only alleges copyright infringement, seeks to represent other authors whose books were used as training data and a court order blocking further infringement.
The authors also argue that Anthropic is depriving authors of book sales by facilitating the creation of rip-offs. When Kara Swisher released Burn Book earlier this year, Amazon was flooded with AI-generated copycats, according to the complaint. In another instance, author Jane Friedman discovered a “cache of garbage books” written under her name.
These fraudsters, the lawsuit says, turn to Claude to generate such content. “It was reported that a man named Tim Boucher had ‘written’ 97 books using Anthropic’s Claude (as well as OpenAI’s ChatGPT) in less than year, and sold them at prices from $1.99 to $5.99,” the complaint states. “Claude could not generate this kind of long-form content if it were not trained on a large quantity of books, books for which Anthropic paid authors nothing.”
The authors claim that Anthropic used a dataset called “The Pile,” which incorporates nearly 200,000 books from a shadow library site, to train Claude. In July, Anthropic confirmed the use of the dataset to various publications, according to the lawsuit.
Anthropic didn’t immediately respond to a request for comment.