It's not a lie, says generative AI: untruths, defamation and the use of ChatGPT

Simon Newcomb, Ian Bloemendal
03 May 2023
9.5 minutes


Imagine this: you are doing some research on law professors accused of sexual harassment. Rather than use a search engine, you decide to ask ChatGPT – after all, it uses what's online, and the results will be in sentences, ready for you to work on. So you ask ChatGPT, and it comes back with some paragraphs, and citations too! You work on the text and publish.

There's only one problem: innocent professors have been named, and the citations are false.

This is not some far-fetched scenario; a test query put into ChatGPT invented a sexual harassment scandal with fake news citations, falsely naming Prof. Jonathan Turley from Georgetown University as the accused. When the same query was repeated in Bing, said the Washington Post, it repeated the same false claim, and cited the professor's op-ed outlining his experience of being falsely accused by ChatGPT as proof.

Closer to home, a Victorian mayor, Brian Hood, is contemplating legal action after ChatGPT falsely said he was convicted of foreign bribery offences. Adding to the sting of the false accusation, Mr Hood had actually been the whistleblower who exposed others' wrongdoing. If he proceeds, it is likely to be the first defamation suit against ChatGPT for its content.

While Australian law is friendly to defamation plaintiffs, Mr Hood's chances of success are by no means clear. Why that is so, how the problem even arose on the first place, your options you have for protecting your reputation, and how you can avoid accidentally defaming someone with generative AI, are explored below.

How ChatGPT can generate a defamatory statement

Content is generated by ChatGPT using its underlying AI large language model, GPT-3.5 or GPT-4, in response to a prompt by an end user. "Like a search engine, right?", you might think. Crucially, it is not finding facts from a large database. It is learning what words are most likely to go together, and presenting those back to you, using a large language model.

The model is pre-trained, largely unsupervised, by processing a large corpus of existing content including the internet, books, journals, images and other content. The model is stored in a neural network (a large data structure, much like a brain, with set of nodes and connections between them). The model does not, at least discretely, include the existing content, but rather the probabilities of sequences of words learned from the training data.

The model can in many cases effectively "store" facts because the probabilities in the model can often return a sequence of words that represent the correct facts. However, given the vast amount of content in the training data, it is also possible that the model does not properly account for some relevant facts.

The model is then fine-tuned by human reviewers. Reviewers see a sample of responses and choose a preferred response to help adjust the model to the desired behaviours. While is it possible that human reviewers were presented with the relevant defamatory material during the fine-tuning stage, it is unlikely.

When an end user prompts ChatGPT, the model aims to predict the sequence of words that should follow the prompt based on the probabilities stored in the model. Creating the response does not involve any understanding of the facts. And it also will typically not involve reproducing the specific text from the training data.

There is however an important limitation with large language models. They do not always reliably recall facts, and sometimes they just make things up. And to make matters worse, they can also present the incorrect information in a very convincing way.

So it is highly unlikely that OpenAI deliberately trained or programmed the model to produce the wrong statement about Mr Hood or Mr Turley. It's most likely that the defamatory statement resulted from a limitation in the training or content generation algorithm.

One possible explanation is that the training process did not adequately train the model to account for the sequence of words needed to represent the correct facts. That could result in the model "recalling" incorrect facts or simply hallucinating (inventing) a response.

Another possibility is that the defamatory statement resulted by chance. The model allows a degree of randomness in how it applies the probabilities. This is called the "temperature". A higher temperature allows more creative writing (that is, randomness) in the response that it is returned. A lower temperature will tend to produce the same response each time. ChatGPT is creative, so successive attempts at the same prompt will usually return different responses. Similarly, different users may see different responses to the same prompt.

ChatGPT vs Google vs Australian defamation law: is ChatGPT a publisher?

To consider whether there has been an actionable defamation by ChatGPT, the first key issue is whether it has published the defamatory statement.

When using ChatGPT to find information that it learned from the internet, should ChatGPT be compared to a search engine or does it publish content its own right?

Last year Google was found not to be a “publisher” for the purposes of defamation when its search engine returns a search result that merely provided links to defamatory webpages. The High Court considered it had merely facilitated a person’s access to the contents of another webpage, which is not the same as participating in the bilateral process of communicating its contents to that person. When Google only provided information in the hyperlink that was given by the website to which the hyperlink directed and its snippets gave no further commentary or colour, it was not a publisher. However, if a search result or hyperlink contains material which would direct, entice, or encourage someone to click on the link, (eg., a “sponsored link”), the person/business providing such a link could be held to be a publisher of that underlying material.

So, a search engine (or any platform for that matter, including ChatGPT) that takes an active role in creating content may be held liable in defamation.

ChatGPT is not a search engine, but it is used by people to obtain information. While it is trained on third party material, (eg., from the internet) it is not directly repeating it but rather generates new content itself via the model. Any defamatory statement generated by ChatGPT arises in response to the user's prompt as a sequence of words predicted in real time by the model using a probabilistic algorithm. The model operator (OpenAI) probably has no knowledge that the particular statement was made. Nevertheless, it appears that it is at risk of being held to participate in the act of publication by facilitating or procuring the creation of the content by the algorithms employed - which could make it a primary publisher, particularly if it made the content up or rewrote other material. In contrast, if it merely repeated defamatory content published by a third party it might argue it should instead be viewed as a secondary publisher with an innocent dissemination defence available to consider.

Stepping away from search engines and generative AI, we can find some guidance from how the courts have considered publication of unvetted defamatory third party comments left on social media. In Fairfax Media Publications Pty Ltd v Voller, the particular question was whether media companies whose Facebook pages hosted comments by third parties were in fact publishers of the defamatory material. The High Court noted the difference between cases involving defendants who played no role in the facilitation of publication prior to becoming aware of the defamatory matter, and those who did. The Voller decision confirms in general that entities which provide the infrastructure for defamatory comments can be held liable as publishers with or without notice.

Applying Voller, it appears ChatGPT is certainly at risk of being a primary publisher of third party content. It ought to make no difference that a model operator such as OpenAI probably does not intend to make the defamatory imputation. Publication does not require knowledge, and defamation law is blind to what the author's actual intentions were (unless malice is involved, which can then aggravate the damages payable).

As noted above, an innocent dissemination defence might be available if ChatGPT could prove that it is only a secondary publisher unable to exercise editorial control over the defamatory content copied from a third party source. But where ChatGPT generates the defamatory content itself, that seems an unlikely argument. In any event, a defence of innocent dissemination is only available to the point a secondary publisher is made aware of the defamatory content and fails to address it within a reasonable period.

Looking more broadly at the evolving generative AI industry, there are many applications being created by developers that source their generative content from foundation models such as GPT. The innocent dissemination defence may be a potential defence that the application providers look to when they are passing on content generated by foundation model operators like OpenAI, Meta or Google.

Don't believe everything I say!

Let us assume that ChatGPT is a publisher. On its website, ChatGPT prominently warns users that it “may occasionally generate incorrect information.” Is that enough to avoid liability under defamation?

On its face, the answer is "no". Defamation is a tort of strict liability. Once the core ingredients of publication, identification, defamatory meaning and serious harm are established, a plaintiff is entitled to sue for defamation, and it then becomes a matter as to whether the author or publisher of the defamatory material can show that any relevant defences applied. A disclaimer is not a defence per se.

On its face, ChatGPT's warning that it “may occasionally generate incorrect information” won't avoid liability under defamation.

Being defamed is not enough: The need for serious harm

Once the elements of a cause of action are established in a defamation case, there is a presumption of damage to reputation and falsity. Plaintiffs must however prove “serious harm”: on the balance of probabilities the relevant publication has caused, or is likely to cause, serious harm to their reputation. Merely asserting serious harm or a tendency to harm reputation will not be enough.

For a plaintiff suing over a ChatGPT result, there is an added complication: what actually did other people see and how many people saw it? Remember, there could be a variety of responses generated to the same prompt. Some may be defamatory, and some may not. Some may be more damaging than others. Some may have a wider audience than others.

The NSW Court of Appeal has noted that where a statement has a strong defamatory imputation (such as the statements about Professor Turley and Mr Hood) and it has been shared with a wide readership or audience, it may be possible to deduce that serious harm has occurred on an "inferential" basis.

In the case of Mr Hood, he is reported as saying that he did not know the exact number of people who had read or accessed the false information about him. This means that he would need to gather evidence on the extent of publication to take the matter further, because courts will not infer that a person has identified and accessed a webpage containing defamatory content of another simply because it exists. It might be possible to get this from logs of conversations available from OpenAI. Or, in some cases, it might be possible that a sufficient number of end users would provide copies of their conversations. Either way, that evidence is probably not easy to get. And it might also not be retained for long, so a plaintiff may need to move quickly to get it. What sort of evidence will be required, and how ChatGPT's varying responses play into that, would all need careful consideration by the court.

Where the court accepts that the natural and probable consequence is that the content will be republished, the original publisher can be liable for the repetition of its original defamatory publication by others (eg. by application providers or users of the AI), including in altered form where the republication adheres to the sense and substance of the original publication.

Calculating the cost of online defamation

When a court seeks to determine the amount of damages, it seeks to ensure that there is an appropriate and rational relationship between the harm sustained by the relevant plaintiff, and the damages awarded. In addition, the NSW Supreme Court outlined the following further relevant matters to be considered in a case about online defamation:

  • the damages must provide consolation for hurt feelings, recompense for damage to reputation and vindication of the plaintiff's reputation;
  • a high value should be placed upon the reputation of those whose work and life depend upon their honesty, integrity and judgment;
  • damages should be sufficient to "convince a bystander of the baselessness of the charge";
  • the extent and seriousness of the defamatory sting should be considered; and
  • the distress of family members who were distressed themselves about defamatory publications is also relevant.

Certainly the types of statements made about Mr Hood and Professor Turley are quite serious matters. But, what about the fact that many people now know that generative AI tends to invent things? The problem of generative AI hallucinating facts is becoming widespread and well known. So, does anyone really believe what it says, and does that affect its ability to defame someone? If a site notoriously lacks credibility, the court will ordinarily consider that as part of its damages assessment, so it could be relevant to the question of serious harm or the amount of damages payable could be less if people ought to know that the statement is unreliable.

In considering the extent of publication, the court may also make allowance for the "grapevine effect". There have been a number of decisions in recent years which have taken into account the fact that the publication of material online may not be viewed as a limited publication and that the grapevine effect can be significant; when the initial publication caused only nominal harm, that grapevine effect could in fact be enough to elevate it to serious harm.

Contractual terms limiting liability

Service provider terms such as those used by ChatGPT and other generative AI applications and foundation models often limit or seek to exclude liability (such as OpenAI's liability cap of the greater of 12 months fees or US$100). This works between ChatGPT and the user, but not against a third party who suffers harm from the defamatory content published by the model.

Terms and conditions may, however, have some relevance in contribution claims between defendants. For example, if an application (eg. a Chatbot) from provider X defames someone using content created by foundation model provider Y (eg. GPT-4 from OpenAI), then both X and Y may be sued. The application provider's access to the foundation model will be governed by the foundation model's terms of use.

The Online Safety Act's safe harbour

Australia has a form of safe harbour for internet service providers (ISPs) and Australian hosting service providers (AHSPs). An AHSP is a person who provides a hosting service that involves hosting material in Australia. Under section 235 of the Online Safety Act 2021 (Cth), both ISPs and AHSPs are immune from State and Territory laws and common law and equitable principles if they were not aware of the nature of the content in question.

The protection offered by section 235 has never been the subject of consideration by the courts and the extent of the protection it gives to ISPs and AHSPs has yet to be tested.

Query whether a generative AI like ChatGPT qualifies as being an AHSP. One anticipates it may not. Further, the safe harbour defence will be unavailable to protect OpenAI or another provider if it does not host the relevant material in Australia.

Mitigating the risk of defaming someone using ChatGPT

If you are planning to generate content with an AI, what could you do to reduce the risk of defaming someone?

The simplest approach, for non-interactive systems, is to ensure that a human reviews generated content before it is released. That review should be performed with extra care, knowing that generative AI tools can tend to be convincingly untruthful.

Another approach is to get the facts from a more reliable source first (rather than relying on the AI to recall them from its model). One way to do that is to give the AI the correct facts in the prompt. The AI can then use its language capabilities to write the desired content using the facts that it has been given. Another way to do it is to use a tool that automatically retrieves the content from a more reliable source before giving it to the language model. For example, that approach is taken in Microsoft's new BingChat and also with the new ChatGPT plug-ins where the AI can obtain the information from the web or another source before using it to write the output.

More advanced approaches could include configuring the system to produce more precise (less creative) responses or fine-tuning the underlying model on relevant subject matter for greater accuracy or truthfulness.

Clayton Utz communications are intended to provide commentary and general information. They should not be relied upon as legal advice. Formal legal advice should be sought in particular transactions or on matters of interest arising from this communication. Persons listed may not be admitted in all States and Territories.