AI training and the global battle over copyright

The Invisible Fuel of AI

When we use tools like ChatGPT, Gemini, or Midjourney, it often seems like magic. A simple text input transforms into complex essays, poems, or photorealistic images. But this “artificial” creativity doesn’t come from nothing. It’s based on a foundation of human creativity: the collective knowledge and output of humanity available on the internet.

Large language and image models are only as good as the data they’re trained on. To build these models, companies like OpenAI, Google, and Stability AI have “read” the internet on a petabyte scale—books, articles, blog posts, artwork, photos, and code.

This very process—the training—has unleashed a global legal and ethical avalanche. The core question: Is it legal to use copyrighted works without permission and without compensation to train a commercial AI that might ultimately replace the copyright holders themselves?

künstliche-intelligenz-training-urheberrecht

The technical problem: How an AI “learns”

To understand the legal debate, one must understand the technical process. Training an AI is not a simple copying process like we know from computers.

Data Collection (Scraping): First, enormous amounts of data are collected. For text models, this often happens through “web scraping,” where bots automatically crawl the public internet and save texts (e.g., in the “Common Crawl” dataset). For image models, datasets like “LAION” were used, which collected billions of images and their text descriptions from the web.

Training (The “Learning Process”): The AI doesn’t “read” this data like a human. Instead, it analyzes statistical patterns, correlations, stylistic features, and semantic relationships. It learns “which word is most likely to follow another” or “which pixel patterns are associated with the word ‘cat’.”

The Result (The Model): The final product—the AI model—is a gigantic neural network consisting of billions of “parameters” (mathematical values). These parameters represent the learned knowledge. The model does not contain the original works themselves, but rather the patterns it has abstracted from them.

AI companies argue: “It’s like a person going to a library, reading thousands of books, and then learning to write themselves. The person isn’t copying the books; they’re learning.”
Authors argue: “No, it’s like wasting thousands of books to create a new one—without asking or paying the authors.”

The legal fronts: “Fair Use” vs. “Text and Data Mining”

The legal battle is being fought on two main fronts with different types of weapons, primarily in the US and the EU.

A) The US Front: The “Fair Use” Doctrine

In the US, the decisive factor is the “fair use” doctrine. It permits the use of copyrighted material under certain circumstances. Whether something qualifies as “fair use” is determined by four factors:

Purpose and nature of the use: (The most important point) Is the use “transformative”? Does it create something new with a new purpose, or does it simply replace the original?
Type of copyrighted work: (Creative works enjoy more protection than factual texts).
Scope of use: (The entire work was used, not just a quotation).
Impact on the market: (Does AI harm the market for the original work? Yes, say the artists, because AI replaces them).

AI companies say: Yes, it is highly transformative. An AI model is not a book or a collection of images, but a completely new tool that has learned patterns.

Creators say: No, it is not transformative if the result (e.g., an image in the style of artist X) directly competes with the work of artist X.

B) The EU Front: The “Text and Data Mining” Limitation

In the EU, the legal situation is less flexible and more strongly regulated by directives. The relevant directive here is the Copyright Directive (DSMD) of 2019. It contains specific exceptions (limitations) for “Text and Data Mining” (TDM).

In the EU, the legal situation is less flexible and more strongly regulated by directives.

TDM for Research: TDM (i.e., the automated analysis of data) is generally permitted for scientific research purposes.
Commercial TDM: (This is where it gets complicated) TDM is also permitted for commercial purposes (such as training ChatGPT), BUT: copyright holders can object (obtain an “opt-out”).

This “rights reservation” (opt-out) must be machine-readable, e.g., through an entry in a website’s robots.txt file or in the metadata. However, many AI companies have argued that they collected data before this regulation was clear or before the copyright holders knew they had to object.

The major lawsuits: Who is fighting whom?

These theoretical conflicts are currently being played out in practice through multi-billion-dollar lawsuits.

Authors vs. OpenAI (e.g., Authors Guild, George R.R. Martin): Authors accuse OpenAI of illegally using their books to train ChatGPT. They argue that the AI can now write summaries of their books or even sequels in their style, directly infringing on their rights.

Artists vs. Stability AI (e.g., Sarah Andersen, Getty Images): Image generators like Stable Diffusion were trained on billions of images. Artists are suing because the AI has “learned” their unique style and can now create works “in their style” at the touch of a button. Getty Images even found remnants of its watermark in AI-generated images, proving that its database was used.

Publishers vs. AI (e.g., The New York Times vs. OpenAI/Microsoft): This is perhaps the strongest claim. The NYT argues not only that its articles were used for training, but also that the AI (ChatGPT/Bing) can now regurgitate its articles almost verbatim. This undermines its subscription model and constitutes direct competition.

The “output problem”: When the AI spits out the original.

Even if the training were considered legal (e.g., “transformative”), there’s a second copyright issue: the output.

What happens if the AI generates a result that is “substantially similar” to an existing work?
If Midjourney creates an image that is almost identical to a photograph by a specific photographer.
If ChatGPT spits out code that has been copied verbatim from a GitHub page (including the original programmer’s comments).
If an AI generates music that clearly contains the melody of a copyrighted song.

In these cases, there is a classic copyright infringement. The problem is proving it: How can an artist prove that the AI didn’t “accidentally” create a similar image, but rather because it was trained on their work? The New York Times has a strong case here, as it was able to precisely prove this “regurgitation.”

Solutions and the future of copyright

The status quo is a “Wild West” scenario that is unsustainable. Various solutions are currently being discussed and some are already being implemented:

Licensing models (The “Axel Springer approach”): More and more publishers and rights holders are entering into licensing agreements with AI companies. OpenAI, for example, pays Axel Springer (Bild, Welt) and the Associated Press (AP) for the legal right to use their (current) content for training. This ensures that the AI is trained with high-quality data and that the copyright holders receive compensation.

Strict opt-out systems: The idea of opting out could become the standard. Platforms like DeviantArt have already introduced switches that allow artists to exclude their work from AI training. The problem: It is difficult to control and does not apply retroactively to models that have already been trained.

OpenAI pays Axel Springer (Bild, Welt) and Associated Press (AP), for example, for the right to legally use their (current) content for training purposes. Transparency Obligations (The “EU AI Act”): New regulations like the EU AI Act aim for transparency. AI providers will be required to disclose which copyrighted data they have used for training. This gives copyright holders at least the opportunity to assert their rights (e.g., to compensation).

Training with “Clean” Data: Some companies (e.g., Adobe with “Firefly”) are taking a different approach. They train their models exclusively with data that they themselves have licensed (e.g., from their own Adobe Stock database) or that is in the public domain. These models are legally “clean,” but often less powerful than their competitors trained with the “entire internet.”

Conclusion

The conflict between AI developers and creators is more than just a legal battle. It’s a fundamental negotiation about the value of data and creativity in the 21st century.

Courts and legislators face a difficult balancing act: How can they foster innovation without undermining the rights and economic livelihoods of the creators whose work makes that innovation possible in the first place? The rulings in the coming years will forever change the digital economy and the way we create and consume content.

Beliebte Beiträge

2310, 2025

From assistant to agent: Microsoft’s Copilot
From assistant to agent: Microsoft’s Copilot

From assistant to agent: Microsoft’s Copilot

Michael2025-11-06T06:28:23+01:00October 23rd, 2025|Categories: Shorts & Tutorials, Artificial intelligence, AutoGPT, ChatGPT, Homeoffice, LLaMa, Microsoft Excel, Microsoft Office, Microsoft Outlook, Microsoft PowerPoint, Microsoft Teams, Microsoft Word, Office 365, TruthGPT, Windows 10/11/12|Tags: AI, Microsoft, Office, Windows|

Copilot is growing up: Microsoft's AI is no longer an assistant, but a proactive agent. With "Vision," it sees your Windows desktop; in M365, it analyzes data as a "Researcher"; and in GitHub, it autonomously corrects code. The biggest update yet.

1710, 2025

Never do the same thing again: How to record a macro in Excel
Never do the same thing again: How to record a macro in Excel

Never do the same thing again: How to record a macro in Excel

Michael2025-11-07T10:01:37+01:00October 17th, 2025|Categories: Shorts & Tutorials, Homeoffice, Microsoft Excel, Microsoft Office, Office 365|Tags: Excel, Microsoft, Office|

Tired of repetitive tasks in Excel? Learn how to create your first personal "magic button" with the macro recorder. Automate formatting and save hours – no programming required! Click here for easy instructions.

1710, 2025

IMAP vs. Local Folders: The secret to your Outlook structure and why it matters
IMAP vs. Local Folders: The secret to your Outlook structure and why it matters

IMAP vs. Local Folders: The secret to your Outlook structure and why it matters

Michael2025-11-07T07:32:50+01:00October 17th, 2025|Categories: Shorts & Tutorials, Homeoffice, Microsoft Office, Microsoft Outlook, Microsoft Word, Office 365|Tags: Homeoffice, Office, Outlook|

Do you know the difference between IMAP and local folders in Outlook? Incorrect use can lead to data loss! We'll explain simply what belongs where, how to clean up your mailbox, and how to archive emails securely and for the long term.

1710, 2025

Der ultimative Effizienz-Boost: Wie Excel, Word und Outlook für Sie zusammenarbeiten
Der ultimative Effizienz-Boost: Wie Excel, Word und Outlook für Sie zusammenarbeiten

Der ultimative Effizienz-Boost: Wie Excel, Word und Outlook für Sie zusammenarbeiten

Michael2025-11-04T11:12:39+01:00October 17th, 2025|Categories: Shorts & Tutorials, Homeoffice, Microsoft Excel, Microsoft Office, Microsoft Outlook, Microsoft PowerPoint, Microsoft Teams, Microsoft Word, Office 365, Windows 10/11/12|Tags: Excel, Office, Outlook, Word|

Schluss mit manuellem Kopieren! Lernen Sie, wie Sie Excel-Listen, Word-Vorlagen & Outlook verbinden, um personalisierte Serien-E-Mails automatisch zu versenden. Sparen Sie Zeit, vermeiden Sie Fehler und steigern Sie Ihre Effizienz. Hier geht's zur einfachen Anleitung!

1510, 2025

Microsoft 365 Copilot in practice: Your guide to the new everyday work routine
Microsoft 365 Copilot in practice: Your guide to the new everyday work routine

Microsoft 365 Copilot in practice: Your guide to the new everyday work routine

Michael2025-11-07T17:48:33+01:00October 15th, 2025|Categories: Shorts & Tutorials, Artificial intelligence, LLaMa, Microsoft Excel, Microsoft Office, Microsoft Outlook, Microsoft PowerPoint, Microsoft Teams, Microsoft Word, Office 365|Tags: AI, Data Protection, Office|

What can Microsoft 365 Copilot really do? 🤖 We'll show you in a practical way how the AI assistant revolutionizes your daily work in Word, Excel & Teams. From a blank page to a finished presentation in minutes! The ultimate practical guide for the new workday. #Copilot #Microsoft365 #AI

102, 2024

Integrate and use ChatGPT in Excel – is that possible?
Integrate and use ChatGPT in Excel – is that possible?

Integrate and use ChatGPT in Excel – is that possible?

Michael2024-02-02T06:08:09+01:00February 1st, 2024|Categories: Artificial intelligence, ChatGPT, Microsoft Excel, Microsoft Office, Shorts & Tutorials|Tags: AI, digitization, Excel, Short News|

ChatGPT is more than just a simple chatbot. Learn how it can revolutionize how you work with Excel by translating formulas, creating VBA macros, and even promising future integration with Office.

AI training and the global battle over copyright

The Invisible Fuel of AI

Topic Overview

JETZT NEU BEI UNS:

The technical problem: How an AI “learns”

The legal fronts: “Fair Use” vs. “Text and Data Mining”

A) The US Front: The “Fair Use” Doctrine

B) The EU Front: The “Text and Data Mining” Limitation

The major lawsuits: Who is fighting whom?

The “output problem”: When the AI ​​spits out the original.

Solutions and the future of copyright

Conclusion

Search for:

You might also be interested in:

Latest Posts:

About the Author:

Search by category:

Search by keyword:

Beliebte Beiträge

Offers 2024: Word & Excel Templates

Related Posts

Popular Posts:

Search by category:

Search by keyword:

Autumn Specials:

Title

Unterstützen Sie unsere Arbeit

Neueste Artikel

The “output problem”: When the AI spits out the original.