Courts Rule in Favor of AI Leaders

Two federal district courts found that training large language models on copyrighted books is fair use

In June, AI leaders Meta and Anthropic each prevailed in court cases alleging they had violated copyright law by training their large language models (LLMs) on copyrighted books without permission. These are regarded as some of the first court rulings around copyright and LLMs, and they may set the stage for future court decisions around LLMs and AI more broadly.

In 2024, authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson alleged Anthropic used millions of copyrighted books without permission to train its Claude chatbot. Senior U.S. District Judge William Alsup ruled that Anthropic's use of copyrighted material fell under fair use, which permits the limited use of copyrighted material without having to obtain permission from the copyright owner, typically for the benefit of the public (e.g., criticism, comment, news reporting, teaching, scholarship, or research). Specifically, Alsup said Anthropic's use of copyrighted material was "exceedingly transformative." He also ruled, however, that many of the books Anthropic used to train Claude were not purchased and that a separate trial should be held to determine damages, which could be extensive.

Separately, authors Ta-Nehisi Coates, Michael Chabon, Junot Díaz, and Sarah Silverman, among others, filed a similar suit against Meta. In that case, U.S. District Judge Vince Chhabria ruled in favor of Meta, stating that their use of copyrighted works also fell under fair use, due in part to it being "transformative" but also because plaintiffs failed to present evidence that Meta's Llama would produce countless works that could potentially dilute the market for copyrighted books. The judge also stated, however, that similar suits could be ruled differently, provided plaintiffs presented more evidence that use of their work impacted its market performance.

Organizations developing AI software can consider a multitude of factors related to fair use, especially when incorporating copyrighted material into training data sets. The Copyright Act lists four factors to consider in fair use cases:

The purpose and character of the use, including whether such use is of a commercial nature or for nonprofit or educational purposes;
The nature of the copyrighted work;
The amount and substantiality of the portion used in relation to the copyrighted material as a whole; and
The effect of the use on the potential market for or value of the copyrighted work.

As was found for the above cases, the use of copyrighted material as training data for AI software development can be considered intrinsically transformative, for example, in cases where the copyrighted material is not used for the original purpose of the copyrighted material (e.g., for entertainment), but rather to identify and extract embedded patterns and structures. As for the nature of the copyrighted material, an assessment of whether the material is factual or a result of creative expression can impact a determination of fair use.

Consideration must also be made as to whether the secondary use of copyrighted material would harm the market for the works (e.g., reproduce a substantial portion of the original copyrighted material, impact the market for licensing copyrighted material, propensity to dilute the market for copyrighted books). Furthermore, the means by which copyrighted material was acquired for model training purposes matters (e.g., the use of pirated material, shadow libraries, and torrents to acquire copyrighted material may be subject to copyright infringement claims).

The U.S. Copyright Office has been performing an analysis of copyright law and policy issues raised by AI. In addition to the outcome of any appeal made for the Anthropic and Meta cases discussed herein, there remain several cases in litigation related to proprietary AI software copyright infringement. Every court ruling will continue to shape the copyright infringement landscape that all AI service providers will need to navigate.

Determining whether a software platform or a LLM engine is subject to copyright infringement requires a multidisciplinary team that includes subject matter experts who understand the development, operation, and usage of the software platform and how the software platform compares to other technologies in the existing marketplace. Exponent's experts in AI and LLMs have vast experience analyzing the acquisition, storage, and use of training data; dissecting and analyzing the underlying algorithms and source code of software systems; and performing extensive testing of software systems to verify and validate performance and operational claims.

What Can We Help You Solve?

With expertise in software development, AI systems, data collection and analysis, and litigation support, Exponent's consultants are well-positioned to provide expert consultation in navigating intellectual property disputes including copyright infringement, as well as assessing products to advise on potential fair use exposure risk.

Get in touch