ChatGPT and copyright: many questions remain to be answered

Author info

23/03/2023

The Californian company OpenAI launched ChatGPT in November 2022, a chatbot based on artificial intelligence (AI) that can have a dialogue with human users. It is a form of generative AI.

How does it work?

Generative AI always works on the same principle: AI models analyse a large amount of data fed to the model (input or training data) and generate new data (output or responses) based on this data. Data can consist of text, images or even sounds. Data can also take many forms. For example, text can consist of an article, a poem or even programme code.

Interactive Oracle of Delphi

ChatGPT is an application of generative AI. In recent months, ChatGPT proved that it can deliver impressive results. For instance, ChatGPT passed the exam of a US law school. ChatGPT achieved a C+. That is certainly not outstanding, but it is enough to pass. High school students also found the Chatbot a useful tool for homework.

A unique feature of ChatGPT is that it uses Reinforcement Learning with Human Feedback (RLHF). This means that the AI model learns not only from the data it is fed with, but also from humans who lend a hand to reduce the likelihood of harmful output. Indeed, AI models trained without human intervention depend on the data they are fed with. This can lead to biased or harmful output. This is also one of the criticisms of ChatGPT: due to filtering, the chatbot sometimes risks getting bogged down in rather generic output.

But this is not the only criticism of ChatGPT. It is also not clear how the chatbot arrives at its output, and the output is sometimes just factually wrong. For example, when asked, "Can I drive with summer tyres in Belgium in winter?", ChatGPT replies, "No, in Belgium it is forbidden to drive with summer tyres in winter." A simple Google search yields links to some reliable sources showing that it is not at all forbidden to drive on summer tyres during winter in Belgium. So it is dangerous to simply take ChatGPT's output for correct without critical reflection.

ChatGPT is also only updated until 2021 (but it is unclear when exactly in 2021), which means that the AI model has not been fed with more recent data for the time being. ChatGPT's AI model is apparently separate from the internet and thus can only generate output based on data it was fed with, which admittedly came from the internet.

Legal concerns

Apart from these substantive criticisms, there are also several legal concerns with ChatGPT. Within the limited scope of this blog, we only touch upon some legal concerns about copyright below. For this purpose, we distinguish between input and output. Regarding input, we distinguish two types: developer input and user input (sometimes called prompts). Developer input consists of all the data used by the developer to train the AI model (it is a training dataset). User input, on the other hand, consists of all the data entered by the user to make use of the AI model. The output are the answers provided by ChatGPT.

Developer input or training data

Applied to ChatGPT, this means that developer input consists of all kinds of data to train the AI model, but OpenAI is not entirely transparent about exactly what source material is involved. According to the OpenAI FAQ, it involves "vast amounts of data from the internet written by humans.". The developers of GPT-3, ChatGPT's predecessor, claim that all kinds of data is scraped from the internet (via a crawler), including Wikipedia, but information from books was also exploited. ChatGPT does not copy those sources verbatim or substantially in its answers, but it does learn what answers should look like so that the answers appear to be written by a human. According to OpenAI, it legitimately uses copyrighted material.

Does ChatGPT infringe the copyright of the author of a book that was used as a source? If the text from the book is not copied in the answer, there is no infringement. However, the author's permission may be required when pieces of text from the book are copied. It will have to be assessed on a case-by-case basis. In generative AI, the user has no influence on this developer input (only the developer OpenAI has), but the user has influence on the user input. In the case of ChatGPT, this means that, for example, the user decides which question to ask the chat robot.

User input

When providing user input, the user must take into account, among other things, the ChatGPT terms of use. These state, for example, that the user must not use the service in a way that would harm the rights of third parties. This means that user input must not contain copyrighted data without the permission of the author or rights holder, or that the user must not use confidential data for no reason.

The terms of use also state that if you process personal data when using the service, that you must have made the necessary notifications and obtained the necessary consent and that you certify that the processing is in accordance with applicable laws. The terms of use also provide that the user must contact OpenAI to enter into a processing agreement if the processing is subject to the GDPR or the California Consumer Privacy Act (CCPA). For professionals bound by professional secrecy, such as lawyers, it goes without saying that they should not use information subject to professional secrecy as input. The reason for all these restrictions on user input is obvious.

After all, user input is not only used to generate output, but is also reused to improve the service, and thus refine the AI model. The user input ends up on a big pile of data and the user then basically loses control over that data. OpenAI is reportedly working on a professional paid version of ChatGPT, but it is still unclear whether professional user input, which may also contain confidential professional data, will also be used to improve the service. Organisations using the ChatGPT API can, however, request OpenAI not to use their input data to improve the services. But for ordinary users, such an opt-out does not currently seem possible.

Copyright

So while user input must comply with certain rules and not infringe third-party rights, it is extremely difficult, or almost impossible, for an author or rights holder to take action against copyright infringement via ChatGPT. Indeed, a given output generated on the basis of infringing user input may also have been generated on the basis of non-infringing user input.

Indeed, it is not the case that the same output is always the result of the same user input. In other words, the same question can always generate different answers. And conversely, the same answer may be the result of different questions. The ChatGPT terms of use do mention the submission of a DMCA notice, namely a notification under US law that certain content infringes the applicant's copyright, but such a request seems to refer to OpenAI's website rather than the user input. Indeed, without the (voluntary or forced) cooperation of OpenAI, there is no way for an author or rights holder to know which (potentially) infringing user input is in the chat robot, because the user input is visible only to the user who provided that input and not to other users.

ChatGPT's answers

Generative AI generates a certain output based on developer input and user input. But who can assert rights over these AI-generated results? The ChatGPT terms of use state that the user becomes the owner of all rights that apply to the output: "[...] OpenAI hereby assigns to you all its right, title and interest in and to Output." So this also means that the user may use the output for commercial purposes. But this can lead to special or strange situations in practice.

After all, it is possible for two users, independently of each other, to generate the same output. On the assumption that this would be a copyrightable output, which of the two users owns the copyright to that output? Perhaps it should be concluded that both authors have their own right to admittedly the same output, provided they arrived at that output independently of each other. But what if the output generated corresponds to the work of an author who did not use ChatGPT? Does this make a difference?

Suppose ChatGPT arrives at exactly the same text as a pre-existing but not ChatGPT-generated text (and this is irrespective of whether this is due to an error in the AI model or pure coincidence)? The only difference between the two authors is that one author has used a technical tool, namely an AI model, and arrived at that output through some user input, while the other author has written the work himself based on his creative mind. This leads to the following questions:

Is one author more entitled to that output than the other?
Does it matter in what way they arrived at that output, namely through an unpredictable AI model, or the creative own mind?
Must the ChatGPT user's input meet the requirements of copyright in order for him to then also claim copyright protection of the output since the output is the consequence of the input?
Is it sufficient that the output itself is copyrighted independently of the input?
Does the ChatGPT user infringe the non-user author's copyright if his input was not copyrighted?

After all, one of the conditions for copyright protection is that the work must be the result of a creative activity. Typical things that are not protected by copyright are works produced exclusively by a machine (e.g. satellite images) or works not created by a human (e.g. a selfie taken by a monkey).

Neither case applies here, as the output is not generated without human input. And what then is the difference with two photographers both independently taking the exact same picture using a technical device, i.e. a camera in this case? Should we approach this as a separate case because the output with a camera is perfectly predictable, while the output with ChatGPT is unpredictable? And does that unpredictability have a bearing on copyright protection? Many of these questions cannot be answered in black and white and the actual facts and evidence will be the deciding factor.

ChatGPT is not an author

While the user is given the rights to the output, they must not give the impression that they generated the output themselves when they did not. The ChatGPT terms state, "You may not [...] represent that output from the Services was human-generated when it is not". The OpenAI Content Policy takes this even further by encouraging users to proactively mention the use of AI. Does this mean mentioning ChatGPT as an author too? The answer to that question is negative, as ChatGPT is not a natural person and therefore cannot be an author.

So OpenAI or ChatGPT should not be listed as author or co-author (but it is encouraged by OpenAI). But what if a ChatGPT user only mentions his name (as author) to a text generated entirely by ChatGPT without having made any changes to the output? Does that user then commit a breach of the ChatGPT terms of use? Should this be considered as giving the impression that the text is "human-generated", while the text is entirely AI-generated? Nor is the ChatGPT user an author if he did not give his own creative stamp to the text. If the user still modifies ChatGPT's response, makes his own creative choices and thus creates an original work bearing his personal stamp, then the user is still the author of the work. So again, a question of fact.

Even before ChatGPT, this question existed. For instance, there are various tools for developing software code. There are specialised websites that contain scripts or answers for programmers and even AI-based tools that automatically make suggestions while writing code to complete the code.

Liability

What if certain output would lead to damage, for example because the output is factually incorrect? OpenAI holds users fully liable for the use they make of the output: "You are responsible for Content, including for ensuring that it does not violate any applicable law or these Terms." The ChatGPT Terms of Use also limits OpenAI's liability to a maximum of USD 100 or to the amount paid in the past 12 months, whichever is lower. OpenAI does not provide indemnity for third-party claims against the user. It is also currently unclear whether the professional paying version of ChatGPT will provide such an indemnity, although future large professional users may wish for such an indemnity.

Conclusion

The copyright implications are not yet fully clear. We obviously asked ChatGPT the question but we got - you guessed it - different answers to the same question. Perhaps there is an AI lawyer behind ChatGPT who answers : "It depends!"

Update: meanwhile, ChatGPT did launch a paid Plus version, but it does not seem to have any other terms and conditions attached to it, for example in terms of liability.

Do you still have questions or would you like an introductory meeting? Book a free 15-minute call with Bernd at bernd.lawyer.brussels (reserved for organizations).