Generative AI Miniseries - Ep 1: Generative AI & intellectual property | whose line is it anyway?

03 Feb 2023

Ep1: Generative AI & intellectual property – whose line is it anyway?

In this first episode of our Generative AI Miniseries, host Will Howe (Director of Data Analytics) speaks with Simon Newcomb (Partner, IP & Technology) to discuss the complex issues surrounding IP ownership and infringement – asking: Whose line is it anyway?

This series takes a deep dive into a number of topics related to generative AI and its applications, as well as the legal and ethical implications of this technology, and provides practical takeaways to help you navigate what to expect in this fast-evolving space.

Related insights

Get in touch

Simon Newcomb

Partner • Brisbane

+61 7 3292 7243

[email protected]

William Howe

Partner • Sydney

+61 2 9353 5292

[email protected]

Intellectual Property

Transcript

William Howe (WH)

Hi and welcome to the Clayton Utz Generative Ai Podcast Mini Series. We'll have a range of experts from different subject areas to cover this topic in detail. We know there's a lot of discussion around his topic so rather than covering in generalities we'll actually do a series of deep dives so you get some practical take aways. I'm you host Will Howe. I lead Clayton Utz's Data Analytics Capability and today I'm joined by my guest Simon Newcomb. Simon is a lead intellectual property and technology lawyer in the firm and he also serves in a number of leadership roles here in the firm including on our Cyber Leadership Board. He's also a computer scientist by training so this is not just theoretical for Simon he actually knows what's going on in the machine as well as what's going on inside the law. So in this session we'll cover 3 main areas. We'll cover ownership of what actually comes out of this technology. We'll also cover entrenchment potential and we'll also cover where we are today with the law and where this is all headed. So Simon, a lot of things going on but can we start with why are we here what's changed?

Simon Newcombe (SN)

Right well certainly there was a lot of interest in some of the new Generative Ai technologies that came out last year like Stable Diffusion, Dall E 2 that they're sort of image generators and a lot of people thought they were really cool really interesting and for some people it was right at the heart of their you know personal and professional lives but for a lot of people it was more just kind of interesting or cool. Now over the Christmas holidays you know sort of end of November and over Christmas something really big happened and that was the release of ChatGPT and that has brought Generative Ai really out into the public for written works and the thing is everybody writes and so I think its sort of ignited interest on a much bigger scale because everyone can see how this affects them in terms of their personal and professional lives and so that's what sparked a massive amount of interest in this topic.

Well that's a good point and so we're now actually using this to create a huge amount of new content and some of that is great .. some of it's not. But you know eventually if not already some of that content is going to become very important parts of our personal work lives. Can you touch on who actually owns this stuff.

Right so let's start with the position in Australia because it is different in different countries. So the answer to that is generally no you can't own the output of Generative Ai so copyright is the main type of IP right that usually protects this type of content so I'm talking about text or art, images, music or films. But the things is that our Copyright Act requires that there's a human author to qualify for copyright so where a work is created by Generative Ai without a human author then there's most likely no copyright now computer generated works have been denied you know by our courts so it's not like this is the first time we've ever thought about this stuff. It has actually been before the courts some years ago Telstra tried to protect its phone directories, the yellow pages and the white pages by copyright and it went to court over that and the court said well there wasn't copyright. It was not copyright because there wasn't sufficient independent intellectual effort you know of a literary nature by exercise by human author that what had happened is the software essentially output the work that Telstra was having copyrighted. So it wasn't able to enforce that. Another case of ACOS and Newcorp involves some safety data sheets for hazardous materials and again similar kind of idea that the computer generated web pages .. so it generated some html code and that was then formatted into the safety data sheets. Now they tried .. the company that did that tried to claim copyright in that or it did claim copyright in that and that was rejected by the court because the court said no there wasn't a human author, that the computer had produced this. So it's fairly clear at the moment that under our Copyright Act that there's not going to be copyright for purely Ai generated works.

I mean that seems pretty cut and dried but there is an element to me where it seems this whole thing about the prompt and actually when a human being is working with this you know you get .. you know dramatically different results depending on what prompt you put in so does that prompt potentially give you an element where there is a human author.

Right so the whole notion of prompt engineering is now a thing, it's a phenomenon in itself so there are prompt engineers and prompt databases where you know, people store the best prompts so those prompts, the text that you give to the Ai, you know, in most cases except for the most simple of prompts will be small literary works themselves and they will be protected by copyright. So the prompts themselves are protected but that doesn't mean that that then protects the output because the thing with copyright is that it protects the expression rather than the idea. So giving an idea to an Ai is not enough. The human actually needs to be part of the actual creation of the work. Not merely the idea that the Ai responds to so it is possible that you could have enough detailed human instruction in a prompt so it is so specific about how to generate the output that that creates enough human creativity for it to qualify for copyright and in that case you have to think of the Ai essentially more like a tool or an extension of the human author rather than generating the work itself.

Okay and is that just how we view it here in Australia or what's going on in some of the other countries around this particular topic?

Right so there is some very different positions in other countries. And so some countries like the UK and New Zealand expressly recognise computer generated works in their legislation so what they say is they get around this issue about there being no human author by saying that the author is actually taken to be person who undertakes the arrangements necessary for the creation of the work. Now in a generative Ai context, that potentially has some ambiguity as to how you apply that. Is it the programmer? Is it the person who trained the Ai or is it the end user who then prompts the Ai? And so I think there is more interpretation going to be needed to understand quite how that works in the context of generative Ai in the context of say a video game, there was a case in the UK where, where it was held that the graphics generated by the video game were owned by the programmer. There are also some similar laws to that in other countries. It is not just those 2. So there is kind of a group if you like that recognise that computer generated work.

And so that's the UK jurisdiction. What about the US?

Right so the US has a somewhat similar to us I guess in that in the US there is a requirement for a human author and so generally speaking there would be no copyright in the US. My understanding of US law and having said that, there is a very recent case that's been file just on the 10th of January where this has been challenged. So, so a person called Dr Stephen Failer is suing the US copyright office claiming that he, he created a generative Ai called the Creativity Engine that created a quite beautiful image called "A recent entrance to paradise" and he argues that the Ai was actually the author of the copyright and he is the owner of that under the Work for Hire doctrine because the Ai was working for him. This to my knowledge is the first case of its kind. Dr Failure was, is rather a pioneer of litigation over the status of Ai like in regards to intellectual property so it may be that part of his objective in running this case is to draw attention to this issue as he is in some other fields, which we might talk about a bit later. So I think for now the legal principal here is that Ai is not recognised as a legal person. It can't be sued. It doesn't have legal rights. It can't own property, including intellectual property.

Okay so we've covered copyright there and I think you've covered some really good perspectives of where we are here in Australia and around the world. Can we move to patents potentially so what if the Ai comes up with something that is truly inventive? Can you get a patent for that?

Right, so in Australia the position is pretty similar to copyright and the answer is pretty clearly "no". Whereas something that is invented by an Ai that's not going to work under our law because you need to have a natural person as the inventor, the human. And again I mentioned Dr Stephen Failure, he also tried to register 2 patents in Australia and he, one was for a food container where this kind of fractal shape. Another one was for a light pigeon both of which he said were invented by his generative Ai. A system that he called Gavis. Now pretty amazingly in a world first, he actually won in first instance in the Court and convinced the Court that the Ai was the inventor of the patent which did require you know some stretching I guess of, of the reading of our patents law. Now that was appealed and overturned on appeal then they tried again to go up to the High Court who refused to hear the case. So now that's pretty final. It is pretty clear in Australia, no patents for inventions by generative Ai on our current law. Now interestingly Dr Failure is running that same case around the world and as I understand it he has had his application rejected in 18 jurisdictions including the US and the UK. He is continuing to litigate some near identical cases to the one that he ran in Australia. So some of those cases are still going. I think the UK case on appeal to the Supreme Court is due to be heard in March this year. So people will be watching that pretty closely.

Well that also seems to be fairly straightforward then and it is nice that in these few areas there is some certainty in the law from what we have. Can we maybe move into a place that I think there is a bit more uncertainty, which is on the infringement side. So one of the big issues is, as we are playing with this and the industry is playing with this and the world really is playing with this technology is, what people are asking themselves is "is this legal?" Right. Are we going to get sued for this? So what's happening in the infringement space?

Right, well I think it is a big issue and I think it helps to consider when the main acts of copying happen here and you know, very broad sense you can say well let's look at this. Firstly on training a model and secondly then on using the model to generate content. So firstly on the training side, models are created by being trained and the way that works is that they are given some data as an input and then trained to learn what is a desirable output for that particular input and that's then trained to learn what is a desirable output for that particular input and over huge cycles of that, they have learned then how to train their neural meds which is almost like our brains full of your neurons and synapses, connections essentially, and that's how they are able to deal with many different situations that they have never seen before. So that process of training involves huge amounts of data. So for a large language model like ChatGPT it uses a many, many terabytes of data called a corpus to train up the model. Now from what we know, GPT3 which sits under ChatGPT, was trained on the common internet so that's lots of websites scraped from the internet, user generated content like Reddit which has the advantage of being uploaded and downloaded, Wikipedia and also some books and journals. Now the problem is that in all of that content there's a lot of material there that's been created by people who own copyrighting it and it's subject to all sorts of different terms that may or may not allow the use in this way, in fact some of them may expressly prohibit it. So the question here is does the Ai company have the right to copy that information or material for training the model?

So I think that touches on training. You also mentioned on the generation side there's some potential liability issues for infringement there. What happening in the generation space?

Right. So, so once you got the model then I use the model to generate content. This is tricky because once the model is trained, it doesn't then actually directly incorporate the training corpus. That's just used for training the model. So individual copyright works are not at least discretely stored in the model. But it's like it is stored in our brain in the sense of neurons and synapses connecting things. So big kind of massive association of mass and so when you go to create content, a lot of the time that's not going to matter because you're creating new content, you're generating something that's different to anything that's been around before and there's no issue of copyright infringement there. Large language models and other generative Ai can actually recall material that they've been trained on or close to it. So you try asking ChatGPT for song lyrics and it will be able to tell you a lot of the song lyrics. It can tell you the chorus and it can probably tell you most of the song. What I found is that toward the end of the song, often it starts to make up new versus and that's pretty fun but it's close enough to the original song that if you were doing that side by side probably would be an infringement but the fact it's gone through this model really complicates the copyright analysis.

That's really fascinating. Is anyone actually suing these companies right now, these generative Ai companies?

Right. So this is a very recent development and yes, there are. There's 3 pretty significant actions that have just recently started. The first one is the class action against Microsoft, the owner of GitHub which you would know as a very large depository of source code and its business partner, OpenAi that is responsible for ChatGPT. You know that ChatGPT is not only great at writing literally text, human readable text, it's also great at writing software code, also human readable but through different purpose. So what's happened is they have trained ChatGPT up on huge amounts of code taken from GitHub and the developers are claiming that there has been an infringement there, or at least a violation of rights management information by doing that.

This is an interesting area because I know, Simon, you're also a computer scientist. How would you feel if your code was then sucked up into one of these tools?

Right. Well, I think most people who put their code on GitHub generally do it as part of a community where they are kind of well intentioned to develop the science and art of so many things. So I would probably be okay with that, if I was releasing my code on GitHub, I think, in my view anyway, it's a fairly kind of open platform.

Interesting. But then when it does spit it out verbatim, that's the interesting point about attribution, isn't it?

Right.

Some people are saying "at least acknowledge that it's my code".

Right. Right. Well, that's it. A lot of the licences would require that and that hasn't happened so that's really actually the basis of the litigation here, is that the developers aren't being attributed and the licenses are so permissive anyway. The MIT Open Source licence allows you just about anything so I think that's probably why the case is they're claiming straight out copyright infringement and in a lot of cases it probably is actually licensed for this purpose. Now, whether anyone actually thought of that when they were putting their code up there is another issue but the attribution point is probably more the key point of that case.

That's on the GitHub side. What about this sort of image generation side?

Right. So there's another case that's been brought by 3 visual artists against Stability Ai, the maker of Stable Diffusion, Midjourney and DeviantArt, and they're all generative Ai image tools. The artists are basically alleging that the defendants abused their work by sucking up huge numbers of images including their work to train their models and they've done that without permission, and they're doing it to generate new and, in some cases, infringing works. One of their main concerns is that these tools allow people to create images in the style of a particular artist and they see that as potentially damaging to their livelihoods or their reputation where it's possible now potentially create art, and the music industry, as far as I understand, is very concerned about this where you could say generate music in the style of a particular artist rather than waiting until the next album comes out. Now, a different question about whether you think that's any good or not but certainly it'd be a concern.

We were talking about GitHub, we were talking about Stability Ai. There was another case on the go you were telling me about as well, Simon?

This one is actually a new case. So Getty Images just very recently on 17 January has commenced against again Stability Ai, one of the image generators, basically saying that Stability Ai has used a huge number of Getty's images without the right to do so. In fact, Getty even says, "We've actually even got a licence available, a type of licence for training Ais and you guys haven't used that." So they're alleging infringement in creating this Stable Diffusion product.

Yeah. I understand obviously the creators involved in this process then understandably are not happy with some of this and so are turning to the courts for remedy. So how are these cases going to play out and what do you see being involved in the cases?

The US cases are probably going to turn on the doctrine of fair use which is a pretty broad exception that allows creating transformed works that are really something new, that don't have a lot of harm on the original source work, and you can see in the Ai context that there's a real argument that that's what's happening here, that training up models on an existing corpus of data is creating something new and query what effect that has on the original work. This has been used by the tech companies successfully previously. You might remember back in 2015, Google imaged millions of books to create full text searching of books and it was able to use the fair use defence in that case. The UK case, so far will probably be different. They, in the UK, have an exception in their legislation for text and data mining for non-commercial purposes. So that may be relevant to that Getty images case. Query whether it's non-commercial but that may come into play in that case. This kind of leads to an interesting and very controversial practice that's emerging and where the Ai companies have been funding non-profits or academic institutions under the banner of research to create and train the model. It's being alleged essentially that they're doing that to get within these copyright exceptions that enable the model to be trained without infringing copyright. The suggestion is that once that's done then the Ai company can take the model and commercialise it. That practice is being called data laundering. It's untested at the moment but there's a lot of controversy around whether it's actually effective to avoid copyright infringement.

It feels like there's a lot of nuance in here in terms of these defences. Now, you talked about this is happening in the US and the UK. Do we have anything like this in Australia here?

No, we don't. Not directly. We don't have a fair use or text and data mining exception in Australia. What we have is a thing called fair dealing, and our law allows fair dealings in quite specific and narrower situations. One of them, probably the most relevant one, is fair dealing for the purposes of research or study. Now, that could possibly apply to allow training of a model. To my knowledge it hasn't been tested whether this type of activity would qualify as research as opposed to more of a factual inquiry into a subject, whether training an Ai in a broad way like this would qualify as research. It also has to be fair. Now, that looks at things like the nature and the purpose of the dealing, you know, is it commercial or non-commercial, and the effect that it has on the original work. So, in some senses, even though it's under a different law, some of the big issues that are being considered in those US cases, I think will be relevant to consider in the Australian context where similar issues are relevant. Now, what that might mean in Australia, if we have a narrower law here, that it puts greater emphasis on getting licences or using content which is in the public domain if you want to go training generative Ai models.

So that point there about if you are generating and creating and training these models, that sounds really important if you are based in Australia, which obviously we are, but does that apply? Are we going to be training any of these models here in Australia, do you think?

I think we will. So probably the first thing to say is that training large language models, as I understand it, is extremely expensive and capital intensive in terms of computing capacity and having a huge number of human trainers and so on. So the industry will probably end up with a small number of very large companies, like Microsoft, OpenAi, Google, Meta, possibly a small number of others including nation states, China has one, I understand. You will have these kind of very large language models. It doesn’t stop there though because you can build on top of them. Large language models like GPT allow you to create a new specific layer on top of the general model which provides better knowledge or accuracy on a particular topic, and that's called finetuning the model. So that's going to allow all sorts of organisations to do all sorts of things, to build more specialised Ai but also transfer all the existing natural language capability up through into their specialised models. Say you're a government department or a bank and you want to produce a chat bot for your customers that's really easy to talk to, like ChatGPT, but it also knows a lot more about your specific products or services and your regularly environment. So rather than create your own large language model from scratch, which is going to be prohibitively expensive, you could finetune an existing model. So I think in Australia and globally we'll see a lot of that over time, both by organisations themselves and product and service providers with niche offerings to make available to them, and that activity is going to involve the same issues in training a model and it brings in all the IP rights issues that we've been talking about.

From what we've discussed already, Simon, it's clear that there is some challenges with respect to copyright and patent law with how it applies to this here in Australia. Now, this to me feels like technology disruption in some other spaces, and I know for example in the mobility space notably there's been a lot of disruption over the last 10 years and this is one are where it feels like organisations moved a little bit ahead of the law and consequences then flowed afterwards. For this, do you see any sort of change or moves to change the law here in Australia to actually catch up with this?

I think that is a good observation. I think that this does feel similar in that there's some disruption here where the companies are moving ahead of the law and potentially that will create consumer demand where we all get used to doing things in a particular way and the law then needs to catch up to allow society to operate in an ordered way that deals with this new disruptive technology. Of the things that we've talked about today on the IP issues, I think there's absolutely potential for law reform. So on the ownership of Ai generated works, actually this was considered nearly 30 years ago by the Copyright Law Review Committee back in 1995, who recommended that we give copyright protection to computer generated works in Australia. That didn't go anywhere at the time but I think it's a good time to be having that debate again as the stakes are really going up considerably now with generative Ai. More recently, the World Intellectual Property Organisation has been asking its members some pretty deep philosophical questions about should we have this type of protection for Ai generated works and, frankly, there are good arguments for and against, it's not an easy call. So advocates say it will drive greater investment and innovation into the space 'cause it gives commercial incentives. Opponents say mass produced works generated by Ai could devalue works and devalue humanity in a way by flooding it, I suppose, with Ai generated works, and really, I suppose, reduce the rewards that humans get for artistic expression and talent. Now, Australia so far has been pretty non committal to that so I think more debate is really needed on that point. On the training and infringement front, there's been some debate around that and, I suppose, firstly we can look at overseas for guidance on what we might do. The US has that fair use exception. The UK, which is quite pro Ai, has those text and data mining laws and they're actually looking at the moment about whether it would be a good idea to change their laws so that it's not limited to non-commercial purposes, that it can go further. So Australia has actually looked at both those options. In 2013, the Australian Law Reform Commission released a report that said if we don't have this fair use exception either we should have that maybe or if not then we should have a new exception that allows these types of uses like data and text mining. Again, that didn't go anywhere at the time so again I think it's time to be re-engaging in this debate.

Lots of interesting things to be thinking about in our policy making community, I suspect. It's an interesting point about 1995 and obviously hindsight is 20/20, I was also just reflecting on the state of computing during that time and I think that was when there was the famous speech about no computer will need more than 640 kilobytes of RAM. Obviously, these big models are trained with millions of GPUs, each one would have gigabytes of VRAM in them and so a very different age that although it was present, was not exactly foreseen so lots of interesting things but the future so I guess one of those is a lot of companies are getting into this now and trying to look into the crystal ball a little bit and see the future and so what advice would you give to firms that are starting to get into this generative Ai journey?

Right, well I've put together a list of top 10 points to manage the IP issues and, around generative Ai. So I can kind of group those I suppose. The first group is like what are the things that you really need to do straight away if you're thinking about starting to use generative Ai. So one immediate thing to do, point 1 is to tell your staff not to put confidential information into chat TPT because it is not confidential, it is in research mode and it is not confidential. So we sent an email to all of our staff if they don't put confidential information into it when you are experimenting with it. Point 2 is really try to get your head the basic principals of how IP works in the generative Ai context and you might need to do that in more than one jurisdiction where you operate. We talked about copyright and patents and there, look there are also some particular issues for other types of IP so if you are using Chat TPT to help you brainstorm ideas for trademarks there are some issues there or if you are wanting to protect confidential information, there are some additional issues there. Point 3 is that if you really need to own copyright or patent that in something, then it is probably a good idea not to use generative Ai for the time being. And you know an example might be if software company, it is critical that you own the copyright in your source code. Maybe using generative Ai is going to create some issues for you over time on a big scale around questions of do you actually own the copyright in your software and that's important then you might also want to flow that down into your service contracts to prohibit that in outsource developments. Now that is obviously going to involve a trade off between cost efficiency and the benefit of IP ownership. Point 4 is make sure that you don't falsely claim that you own copyright in Ai generated content when it doesn't exist. So in some cases that could expose you to a claim for unjustified threats or a breach of warranty in a contract. Misleading conduct or even fraud. Point 5 is understand the risks of infringing the third party Ai when, sorry IP when using generative Ai. So that's probably something that you would do in the context of your specific application and look at how the risks are mitigated or transferred. Some of the tools are already starting to try and manage this so with co-pilot, it will actually before it generates the content for you, it will now go and search hub to see if it's a close match to something that is already there. Some of the image generators are doing something similar. Point 6 is understand the terms of service of the generated Ai and whether they work for you. So they can be quite different for example Chat TPT says that it assigns IP to the user, stable defuse says that the user abandons IP and dedicates it to the public domain. So you know in Australia maybe that's not such a big issue when there isn't likely to be IP but in other jurisdictions, it would be. Point 7 is think about whether you want to stop Ai companies training their models on your content. And set up appropriate terms on your website or how you make your data available and also look at the technology measures that you use to stop and regulate access and some of the Ai companies are also creating pathways to, for content creators to ask for their content to be removed or not used in training. Now as for the next group of issues is for new businesses. So point 8 if you are thinking about building a fine tune model on top of Chat TPT then really you should have a pretty clear strategy to clear the rights and use the training data and also to protect the resulting model that comes out of that training. There is going to be lots of opportunities for business and government to re-use data that they have, say for example we have just done an agreement for a government department to licence copyright in x-rays to CSIRO to create software that helps in diagnosing a particular type of lung disease. So you know that's going to require an agreement in that case and we'll see licences that are in deed agreements and that sort of thing. Point 9, if you are looking to invest in an Ai business, then I think you kind of need to understand how these issues impact on the valuation of that business so what IP will be created. How is it protected? And is there a risk of infringement against that business so it might devalue it. And also understanding the limitations I guess of Australian copyright law in owning databases and that is fundamentally what a trained model is. And the law is that generally databases that are just a result of computer generation are not subject to copyright so more likely the strategy would be that they be protected by confidential information. Last point, point 10 is that there are lots of other issues other than IP to manage in addition to the IP issue so liability and reliance around inaccurate content. Privacy for data ingested into the model. Compliance, procurement and contracting issues. So as you said at the start, we are planning some further videos on this bringing in other experts to talk about some of these issues. So lots of other things to think about in addition to IP.

Simon, thank you. That was fantastic. Really appreciate you sharing your insights. I know actually on our journey here at Clayton Utz, we are building solutions with this as well and your insights have been really important for us to actually work through some of these issues as we go so thank you for that and listener's thank you for joining us today. As Simon mentioned we've got a lot of great new content and extra experts that will come in and share different specific angles so we'll see you on the next one.

_Disclaimer

_{Clayton Utz communications are intended to provide commentary and general information. They should not be relied upon as legal advice. Formal legal advice should be sought in particular transactions or on matters of interest arising from this communication. Persons listed may not be admitted in all States and Territories.}