The Californian company OpenAI launched ChatGPT in November 2022, a chatbot based on artificial intelligence (AI) that can have a dialogue with human users. It is a form of generative AI.
Generative AI always works on the same principle: AI models analyse a large amount of data fed to the model (input or training data) and generate new data (output or responses) based on this data. Data can consist of text, images or even sounds. Data can also take many forms. For example, text can consist of an article, a poem or even programme code.
ChatGPT is an application of generative AI. In recent months, ChatGPT proved that it can deliver impressive results. For instance, ChatGPT passed the exam of a US law school. ChatGPT achieved a C+. That is certainly not outstanding, but it is enough to pass. High school students also found the Chatbot a useful tool for homework.
A unique feature of ChatGPT is that it uses Reinforcement Learning with Human Feedback (RLHF). This means that the AI model learns not only from the data it is fed with, but also from humans who lend a hand to reduce the likelihood of harmful output. Indeed, AI models trained without human intervention depend on the data they are fed with. This can lead to biased or harmful output. This is also one of the criticisms of ChatGPT: due to filtering, the chatbot sometimes risks getting bogged down in rather generic output.
But this is not the only criticism of ChatGPT. It is also not clear how the chatbot arrives at its output, and the output is sometimes just factually wrong. For example, when asked, "Can I drive with summer tyres in Belgium in winter?", ChatGPT replies, "No, in Belgium it is forbidden to drive with summer tyres in winter." A simple Google search yields links to some reliable sources showing that it is not at all forbidden to drive on summer tyres during winter in Belgium. So it is dangerous to simply take ChatGPT's output for correct without critical reflection.
ChatGPT is also only updated until 2021 (but it is unclear when exactly in 2021), which means that the AI model has not been fed with more recent data for the time being. ChatGPT's AI model is apparently separate from the internet and thus can only generate output based on data it was fed with, which admittedly came from the internet.
Apart from these substantive criticisms, there are also several legal concerns with ChatGPT. Within the limited scope of this blog, we only touch upon some legal concerns about copyright below. For this purpose, we distinguish between input and output. Regarding input, we distinguish two types: developer input and user input (sometimes called prompts). Developer input consists of all the data used by the developer to train the AI model (it is a training dataset). User input, on the other hand, consists of all the data entered by the user to make use of the AI model. The output are the answers provided by ChatGPT.
Applied to ChatGPT, this means that developer input consists of all kinds of data to train the AI model, but OpenAI is not entirely transparent about exactly what source material is involved. According to the OpenAI FAQ, it involves "vast amounts of data from the internet written by humans.". The developers of GPT-3, ChatGPT's predecessor, claim that all kinds of data is scraped from the internet (via a crawler), including Wikipedia, but information from books was also exploited. ChatGPT does not copy those sources verbatim or substantially in its answers, but it does learn what answers should look like so that the answers appear to be written by a human. According to OpenAI, it legitimately uses copyrighted material.
Does ChatGPT infringe the copyright of the author of a book that was used as a source? If the text from the book is not copied in the answer, there is no infringement. However, the author's permission may be required when pieces of text from the book are copied. It will have to be assessed on a case-by-case basis. In generative AI, the user has no influence on this developer input (only the developer OpenAI has), but the user has influence on the user input. In the case of ChatGPT, this means that, for example, the user decides which question to ask the chat robot.
After all, user input is not only used to generate output, but is also reused to improve the service, and thus refine the AI model. The user input ends up on a big pile of data and the user then basically loses control over that data. OpenAI is reportedly working on a professional paid version of ChatGPT, but it is still unclear whether professional user input, which may also contain confidential professional data, will also be used to improve the service. Organisations using the ChatGPT API can, however, request OpenAI not to use their input data to improve the services. But for ordinary users, such an opt-out does not currently seem possible.
So while user input must comply with certain rules and not infringe third-party rights, it is extremely difficult, or almost impossible, for an author or rights holder to take action against copyright infringement via ChatGPT. Indeed, a given output generated on the basis of infringing user input may also have been generated on the basis of non-infringing user input.
After all, it is possible for two users, independently of each other, to generate the same output. On the assumption that this would be a copyrightable output, which of the two users owns the copyright to that output? Perhaps it should be concluded that both authors have their own right to admittedly the same output, provided they arrived at that output independently of each other. But what if the output generated corresponds to the work of an author who did not use ChatGPT? Does this make a difference?
Suppose ChatGPT arrives at exactly the same text as a pre-existing but not ChatGPT-generated text (and this is irrespective of whether this is due to an error in the AI model or pure coincidence)? The only difference between the two authors is that one author has used a technical tool, namely an AI model, and arrived at that output through some user input, while the other author has written the work himself based on his creative mind. This leads to the following questions:
After all, one of the conditions for copyright protection is that the work must be the result of a creative activity. Typical things that are not protected by copyright are works produced exclusively by a machine (e.g. satellite images) or works not created by a human (e.g. a selfie taken by a monkey).
Neither case applies here, as the output is not generated without human input. And what then is the difference with two photographers both independently taking the exact same picture using a technical device, i.e. a camera in this case? Should we approach this as a separate case because the output with a camera is perfectly predictable, while the output with ChatGPT is unpredictable? And does that unpredictability have a bearing on copyright protection? Many of these questions cannot be answered in black and white and the actual facts and evidence will be the deciding factor.
While the user is given the rights to the output, they must not give the impression that they generated the output themselves when they did not. The ChatGPT terms state, "You may not [...] represent that output from the Services was human-generated when it is not". The OpenAI Content Policy takes this even further by encouraging users to proactively mention the use of AI. Does this mean mentioning ChatGPT as an author too? The answer to that question is negative, as ChatGPT is not a natural person and therefore cannot be an author.
Even before ChatGPT, this question existed. For instance, there are various tools for developing software code. There are specialised websites that contain scripts or answers for programmers and even AI-based tools that automatically make suggestions while writing code to complete the code.
The copyright implications are not yet fully clear. We obviously asked ChatGPT the question but we got - you guessed it - different answers to the same question. Perhaps there is an AI lawyer behind ChatGPT who answers : "It depends!"
Update: meanwhile, ChatGPT did launch a paid Plus version, but it does not seem to have any other terms and conditions attached to it, for example in terms of liability.
Do you still have questions or would you like an introductory meeting? Book a free 15-minute call with Bernd at bernd.lawyer.brussels (reserved for organizations).