First wave of copyright lawsuits in favor of AI companies
One of the big legal questions of our time: How are the creators of Large Language Models (LLMs) allowed to use all kinds of content to train their AI models? As has been reported several times, companies such as OpenAI, Stability AI, and Meta Platforms have been inundated with lawsuits in recent months from authors, artists, and newspapers who believe their copyrights have been infringed.
According to a report by Dave Hansen and Yuanxiao Xu of the US-based Authors Alliance, copyright lawsuits against AI companies have not been particularly successful so far. “Over the past year, two dozen AI-related lawsuits and their myriad infringement claims have wound their way through the court system. None of them have yet reached a jury trial,” they summarize.
According to Hansen and Xu’s analysis, most lawsuits citing the Digital Millennium Copyright Act (DMCA) have been dismissed to date, including the following:
- J. Doe 1 v. GitHub
- Tremblay v. OpenAI
- Andersen v. Stability AI
- Kadrey v. Meta Platforms
- Silverman v. OpenAI
One problem with copyright lawsuits: The plaintiffs have so far been unable to prove that the AI outputs automatically constitute copyright infringements because they “cannot provide concrete evidence that an output is substantially similar to an adopted work.” In the lawsuit filed by artists Sarah Andersen, Kelly McKernan, and Karla Ortiz against the makers of the image generator Stable Diffusion (Stability AI, DeviantArt, and Midjourney), for example, it could not be proven that there were “substantial similarities” between the artists’ works and the AI images.
Even in the lawsuit filed by Sarah Silverman, a famous US comedian, against OpenAI, it could not have been sufficiently proven that the texts that ChatGPT spits out are sufficiently similar to Silverman’s books.
AI in politics: Ex-Angela Merkel advisor Juri Schnöller on agile democracy
“Massive violations in the training of generative AI”
Not only in the USA but also in Europe, legal experts are now intensively examining the question of whether and how AI models violate copyright. A new study entitled “Copyright & Training Generative AI – Technological and Legal Basics” by the authors Prof. Dr. Tim W. Dornis (University of Hanover) and Prof. Dr. Sebastian Stober (University of Magdeburg), however, comes to the conclusion that European copyright laws would apply to AI training. It would not just be a matter of text and data mining, for which there are exceptions.
“As a closer look at the technology of generative AI models reveals, training such models is not a case of text and data mining. It is a copyright infringement – there is no valid limitation in sight under German and European copyright law,” says Dornis. Parts of the training data from the current generative models are memorized in whole or in part and can therefore be regenerated with appropriate prompts from end users and thus reproduced, says Stober.
“There is no appropriate copyright exception or limitation to justify the massive infringements that occur during the training of generative AI. This includes copying of protected works during data collection, full or partial reproduction within the AI model, and reproduction of works from the training data initiated by the end users of AI systems such as ChatGPT,” says a summary of the study results.