New York Times v. OpenAI & Microsoft: a battle between copyright and innovation?

By Eleni Tournaviti, Nikos Grigoriadis & George Pantazis

The case

A lawsuit from the New York Times (NYT) against OpenAI (Chat GPT owner) and Microsoft (Copilot owner) was filed on December 27, 2023, after several months of unsuccessful negotiations among the parties. The dispute relates to the use of copyrighted NYT material by the tech companies for training their AI models, without proper licensing.

In short, NYT accused the aforementioned companies of copyright infringement, alleging that they used millions of copyright-protected NYT articles to train their automated chatbots, essentially turning them into a source of competition for NYT. Moreover, according to NYT, said use reduces web traffic to NYT internet properties and damages the ability of NYT to exploit its content, depriving it of significant advertising, licensing, and subscription revenues, as NYT content can now be accessed, copied, and displayed by readers for free. Last but not least, the lawsuit also emphasizes the implications for journalism and democracy, declaring, among other things, that “Independent journalism is vital to democracy” and “If The Times and other news organizations cannot produce and protect their independent journalism, there will be a vacuum that no computer or artificial intelligence can fill.” Apart from the monetary compensation “for damages of billions of dollars”, NYT asks the Court, among others things, to order the destruction of “all GPT or other LLM models and training sets” that use copyrighted NYT material.

The legal issue

The NYT lawsuit raises complicated legal issues regarding the interaction of AI technology and copyright law.  Given the undisputed fact that NYT original content was indeed used for the training of the chatbots, the crucial legal question is whether such use constitutes an infringement of copyright, or is it simply “fair use”?

According to the doctrine of “fair use“, limited utilization of copyrighted material is allowed, without obtaining permission from the copyright owner, for certain purposes, including criticism, commentary, news reporting, teaching, scholarship, or research. In determining whether a particular use of a copyright-protected work is “fair” according to the doctrine, the following factors are considered by the Court:

(a) the purpose and character of the use (i.e. commercial, non-profit, educational etc.);

(b) the nature of the copyrighted work;

(c) the amount and substantiality of the portion used in relation to the copyrighted work as a whole;

(d) the effect of the use upon the potential market.

Regarding the implementation of fair-use factors in the NYT case, the following are noted:


As for the first factor, it is important to determine whether the material has been used to help create something new or merely copied verbatim into another work. In the current case, NYT included several examples in the lawsuit where ChatGPT replicated paragraphs from NYT articles almost verbatim, demonstrating the reproduction of the ΝΥΤ original content. Moreover, NYT has incorporated almost 100 articles in the lawsuit that were copied, memorized, and used by the chatbots, providing evidence that they were not just used for training purposes, as the large tech companies claim.

It is significant, but not decisive, whether the alleged infringer is a profit-making entity or not. While OpenAI is ostensibly non-profit, it has ballooned to a $90 billion valuation due to the for-profit subsidiary that develops and monetizes ChatGPT. This diminishes the likelihood that OpenAI’s use of NYT articles is “fair”.

The second factor most likely favors OpenAI, as the nature of NYT reportage and stories is mostly based on facts (i.e. factional works) and is not fictional. More leeway is given when diffusing news and information.

However, it is difficult and perhaps premature to conclude whether the third and fourth factors are likely to favor the ΝΥΤ or the tech companies.

As for the third statutory fair use factor, the question that arises is the following: “did the tech companies use an extensive amount of NYT’s content?” The lawsuit, as noted above, includes specific concrete examples of ChatGPT self-selectively copying significant extracts or entire articles from the NYT. Also, NYT highlights the “qualitative value” of the portion of its content used by the tech companies to train their chatbots, stating that: even if the portion of NYT content used could not be proven to constitute a “large part” of the data used as a whole for the training of the chatbots, NYT content was more important and “uniquely valuable” for training the models as compared to content from other sources.

As regards the fourth factor, another point that could indicate copyright infringement is that the chatbots in question trespass paywalls, accessing and diffusing content that is not normally accessed without a subscription. In addition, the lawsuit demonstrates examples where ChatGPT was used to generate article summaries behind a paywall. While the summaries themselves do not infringe copyright, their use, given that they derived from articles to which one does not have access without payment, potentially demonstrates another commercial effect/harm against NYT, which could also affect the determination of “fair use”.

It is, however, not clear as to what extent the value of NYT content marketability can be reduced. It is noted that, although NYT emphasizes the impact of the influential style of their content on the AI models, there is at least no published evidence to demonstrate any material and measured relation between ChatGPT and its profitability based on NYT-derived content.

It should be noted, though, that these factors are only guidelines, which courts are called to adapt on a case-by-case basis. Namely, the Court has a considerable degree of discretion (which inevitably includes subjective judgments affected by personal experience and a sense of things) when ruling on a fair use case, and thus, the outcome may be difficult to anticipate.

Noticeable points 

Significant Agreements of Media Companies with Generative AI companies:  Meanwhile, some prominent media companies have already reached agreements with Generative AI companies. Indicatively, an agreement was reached between OpenAI and Axel Springer, which operates in more than 40 countries around the world (Bild, Die Welt, Politico, etc.), while Associated Press (AP) concluded an agreement with Open AI, licensing part of its text archive. According to the announcement, AP will also leverage OpenAI’s technology and product expertise. Shutterstock also recently signed an agreement with Open AI. Such agreements demonstrate the commercial significance of media content to AI function, as the latter uses it to train models. However, it also reveals that the sudden ‘invasion’ of search engines still hurts news and content companies. Although their commercial and financial terms of the aforementioned agreements are not revealed, their existence suggests that the stakes in question are undoubtedly high for all content providers, AI companies and, of course, their respective investors.

Copyright Compliance Certification – Can this be a solution?  A very recent development, according to Bloomberg (January 18, 2024), is that a legal entity seeks to create a sort of ‘fair trade’ label for AI. It will evaluate and certify AI products as copyright-compliant, offering a stamp of approval to AI companies that submit details of their models for independent review. The certification is conducted by a non-profit legal entity named ‘Fairly Trained’, founded by Ed Newton-Rex, who resigned in November as the vice president of audio for Stability AI, citing concerns over AI “exploiting creators.”

Conclusion

The legal battle is of paramount importance for several, obvious reasons. The Court has to examine and take many factors into consideration before judgment,  including: if, and to what extent, the respective NYT articles are copyright protected as “factual works” and, therefore, whether the use of journalistic work can be deemed fair, if not licensed; whether, under said circumstances, such copyright limitation could be captured by “fair use” or whether it would be necessary to obtain licenses, depending on the kind or type of usages to be made by the AI companies; the necessity for AI Generative companies to use NYT original content for the evolution and perfection of their chatbots (as ChatGpt); the balance between protection of original works and evolution, including tech innovation, ensuring though that the growth of AI does not occur at the expense of copyrighted works and related laws;  the value of the Open AI company (more than $90 billion), although operating as a non-profit organization; and, of course, the importance and need for the protection of original works by copyright law for independent journalism and its value to democracy, as well as the survival of the media based on a subscription model, as NYT is. 

As technology evolves and AI needs original content for its evolution and training, the battle between copyright protection and innovation remains as critical as it has always been. It is almost certain that in the near future, this battle will extend to the application of the essential facilities doctrine in AI and Big Data. However, this is the subject of another article. Irrespective of the outcome, it is certain that the results/judgment (if any) will help regulate the market. It could also prompt other media companies, content providers, and publishers to either initiate proceedings against tech companies or engage in collaboration agreements.

While it is likely that copyright law will eventually adapt to the upcoming reality of new technologies, only time will tell – and the unfolding events are bound to be interesting from a legal perspective.

* Eleni Tournaviti, Nikos Grigoriadis & George Pantazis are Lawyers and IPR experts (www.nglaw.gr)