#142. 💥 Boom! AI training is fair use!

💡Storing scraped works indefinitely in a digital library, however, is not.

This is educational material and does not constitute legal advice nor is any attorney/client relationship created with this article, hence you should contact and engage an attorney if you have any legal questions. No warranties, express or implied, are made with respect to its accuracy. Information contained herein, or information relied upon, is subject to change without notice.

The first major judicial ruling on the question of whether scraping copyrighted material to train AI large language models (LLMs) just dropped:

AI training is “exceedingly transformative,” “spectacularly so,” and is “among the most transformative many of us will see in our lifetimes.”

Part of this inquiry hinged on the actual accumulation (scraping) and — crucially — the storing, in a large digital central library, the scraped materials; the other focused on the question of fair use.

While the former was held unquestionably to be an infringement on copyright, potentially exposing Anthropic, makers of Claude, to some $1,050,000,000,000 (that’s one-trillion-fifty-billion!) in damages, the latter considered the four factors of the fair use test per Section 107 of the Copyright Act, namely:

(1) The purpose and character of the use (whether it’s commercial or educational);

(2) The nature of the copyrighted work (whether it’s creative or factual);

(3) The amount and substantiality of the portion used;

(4) The adverse impact on the market.

As likewise argued in my recent paper on whether AI training should fall entirely outside of copyright’s domain (pre-print published on SSRN on June 12, 2025; accepted and officially posted June 23, 2025, ironically, the same day this decision dropped), the training of an LLM from scraped material is indeed an entirely transformative function, and hence there can be no copyright infringement. Also, I argued, was the need for some “volitional” act — not merely “intent” — to infringe on copyright, something indisputably lacking from Anthropic with respect to training the models. (Hat tip to Stanford’s Mark Lemley for his initial read and guidance on this crucial point, and whose Texas Law Review paper, Fair Learning, is precisely on point.)

And while my paper argued to wholly separate AI training from the question of copyright infringement, I simultaneously argued that fair use was a necessary fallback, a failsafe of sorts. In sum, I agree 100% with the court’s finding and hope this will help guide the court’s imminent decision in the still-pending NYT v. OpenAI case.

This is an incredibly profound victory for the further development of AI technology as a whole, while providing the parallel guarantee that creators’ works cannot be infringed wholesale with abandon.

What'd you think of today's newsletter?

This helps me improve them for you!

Login or Subscribe to participate in polls.

Looking for past newsletters? You can find them all here.

Reply

or to participate.