OpenAI California Court Ruling: Authors Lose Ground in ChatGPT Training Dispute

San Francisco, California – Two copyright infringement lawsuits brought forward by authors against artificial intelligence company OpenAI have been partially dismissed in court. Comedian Sarah Silverman and novelist Paul Tremblay alleged that OpenAI unlawfully used their books to train the large language model underlying artificial intelligence tool ChatGPT.

On Monday, a federal judge in California granted the bulk of OpenAI’s motion to dismiss many of the writers’ claims. District court judge Araceli Martínez-Olguín also said that the cases would be consolidated with a similar suit brought by another group of authors including Michael Chabon, Ta-Nehisi Coates, and Andrew Sean Greer.

Martínez-Olguín wrote that a claim of vicarious copyright infringement was dismissed on the grounds that the authors had not shown that there was “substantial similarity” between their books and ChatGPT’s output. The authors’ claim that all ChatGPT outputs are “infringing derivative work” is “insufficient”, she added.

However, OpenAI continues to face the allegation that it violated unfair competition law by using copyrighted books without author permission. The ruling follows that of another case brought by Silverman against Meta over the use of copyrighted books in training its artificial intelligence tool LLaMA. In November, the judge broadly sided with Meta, but the claim of direct copyright infringement advanced to the discovery phase.

Last week, the group of authors suing OpenAI in California asked Martínez-Olguín to stop a similar suit brought forward in New York – led by the Authors Guild and novelists including Jonathan Franzen, Jodi Picoult, David Baldacci, and George RR Martin – accusing OpenAI of “forum shopping for the most favorable schedule”.

In August, it was revealed that more than 170,000 books by authors including Zadie Smith, Stephen King, Rachel Cusk, and Elena Ferrante had been used to train Meta’s LLaMA and “likely” other generative-AI tools.

In the June lawsuit filed on behalf of Tremblay and Mona Awad – who has since withdrawn from the suit – the authors’ attorney Joseph Saveri wrote that one of the “internet-based books corpora” that OpenAI said it used to train ChatGPT-3 is estimated to contain nearly 300,000 titles, and that the only websites to offer that much material are “shadow libraries” such as Library Genesis (LibGen), through which books can be secured in bulk via torrent systems.

Martínez-Olguín gave authors until March 13 to amend their complaint.