Exploring the Intersection of AI and Copyright Law: The Legal Battle Over AI Training Datasets

In the early days of July, a group of authors launched a legal case claiming copyright infringement. They assert their copyrighted works were unlawfully used as part of the training data for AI technologies like ChatGPT. This legal issue could have far-reaching implications, potentially reshaping the relationship between artificial intelligence and copyright law.

This case, led by Silverman, prompts us to examine the fair use principle of U.S. copyright law, which allows limited usage of copyrighted materials without the need for permission from the rights owner.

Here, clarity becomes elusive. To qualify as fair use, a piece must meet several criteria, including its purpose, character, and the amount used in relation to the original work. The current quandary is whether AI’s usage of text for training purposes falls under fair use, and if AI’s application of the work can be seen as transformative, adding unique value or meaning to the original content.

It’s important to remember that AI platforms like ChatGPT don’t reproduce books word-for-word. They generate new content based on patterns recognized in the training data, with the specific phrases and sentences formed not being direct copies from copyrighted books, complicating the issue of infringement. While it’s my view that this case may not stand up in court, the final decision rests with the judiciary.

The concept of imitation is central to human learning. Similarly, the essence of intelligence, whether organic or artificial, is based on recognizing patterns and applying them creatively. AI technologies, including ChatGPT, learn from their environment – in this instance, vast textual datasets – and mirror the patterns discovered. This ability allows AI to generate text that is eerily human-like despite the lack of consciousness or inherent creativity.

The ongoing lawsuit challenges this viewpoint, arguing that AI’s method of learning, which involves reading, processing, and extracting patterns from a multitude of texts, constitutes a breach of copyright law. Essentially, it suggests that an AI model infringes upon an author’s copyright by reading and integrating an author’s book into its larger dataset.

This viewpoint isn’t unique to Silverman; Shutterstock, a provider of royalty-free images, operates under a similar premise. Their business model includes a compensation strategy recognizing the value of copyrighted work in AI training, providing contributors with a form of compensation when their intellectual property is used in the training of Shutterstock’s AI models or for licensing generated assets.

In essence, proponents of this perspective argue that AI shouldn’t use copyrighted works freely without permission or compensation. They contend that even if AI doesn’t reproduce the works exactly, training AI on these texts still takes advantage of the author’s creativity, skill, and effort. They suggest AI is using copyrighted works to enhance its capabilities rather than simply learning like a human.

These proponents see the Shutterstock model as a viable alternative that respects authors’ rights while still allowing AI training. They believe this revenue-sharing model could provide a solution to the issues presented by the intersection of AI and copyright law. This model proposes a new category of use where AI training and output aren’t considered fair use but is a form of derivative work warranting author compensation.

However, this approach poses several challenges. Firstly, the sheer volume of data consumed by AI models for training makes tracking down each copyright owner and negotiating usage terms practically impossible. This could lead to exorbitant costs for AI developers, potentially inhibiting innovation and impacting public models.

An alternative solution could be to consider the fact that AI learns from our collective pool of knowledge, positioning it as a global human heritage. If this is the case, access to AI could be free and unrestricted, with providers charging only for computational resources. Critics argue that this eliminates the financial incentive for development. Yet, the existence of numerous open-source AI models refutes this, demonstrating humanity’s recognition of AI’s value.

In terms of implications, if Silverman and her group succeed, we could see an onslaught of litigation, potentially stifling AI’s progress. However, a victory for authors over formal AI service providers might not be the end of the matter. The rise of AI and open-source software democratizes technology, allowing anyone with a computer and internet access to build AI models, unrestricted by the risk of litigation. In such a scenario, open-source models could replace mainstream AI technologies. In this situation, authors like Silverman may win significant monetary damages, but they may ultimately lose the broader battle against AI.

Editorial Team

Disclaimer: The views expressed in this feature article are of the author. This is not meant to be an advisory to purchase or invest in products, services or solutions of a particular type or, those promoted and sold by a particular company, their legal subsidiary in India or their channel partners. No warranty or any other liability is either expressed or implied.
Reproduction or Copying in part or whole is not permitted unless approved by author.


Please enter your comment!
Please enter your name here

Latest Articles

Sign Up for CXO Digital Pulse Newsletters

Sign Up for CXO Digital Pulse Newsletters to Download the Research Report

Sign Up for CXO Digital Pulse Newsletters to Download the Coffee Table Book

Sign Up for CXO Digital Pulse Newsletters to Download the Vision 2023 Research Report

Download 8 Key Insights for Manufacturing for 2023 Report

Sign Up for CISO Handbook 2023

Download India’s Cybersecurity Outlook 2023 Report

Unlock Exclusive Insights: Access the article

Download CIO VISION 2024 Report