 All right, we've got a lot of stuff to talk about. So it's technically in one more minute, but I think I'm going to get started. Hi, everybody. My name is Van Lindberg. Just to give you a little bit of background, I've been around open source and coding for 25 years. I've been a lawyer for 20 and doing both of those. And so when I come at this, I come at this both from the perspective of someone who has been doing technology a long time as well as doing the law. So let's start by talking about a few of the terms. This is the AI and data track, but this is a legal foundations presentation. So I expect that I've got probably two distinct groups here. I've got probably AI devs or data scientists who want to learn a little bit about the law. And I've got lawyers who probably want to learn a little bit about AI. And I'm surely going to disappoint both of you. But this talk is intermediate because I am going to try and go into a reasonable amount of depth, and that's one reason why I got started a minute early. Both about the tech side and the data side. For anyone who is an AI expert, I've tried to be as complete and correct as possible while creating a description of AI that is suitable for lawyers. And for anyone who's a copyright expert, I'm sure I'm going to be talking about cases that you already know, issues that you're already familiar with, but it's really in the coming together of the both that this becomes interesting. Perhaps at some point AI law will become a well-developed area. Right now it is the Wild West. It changes almost every week, literally, both on the legal side and the technical side. And both sides of those are essential to really understanding what, for getting things right, understanding how to apply the law to these various AI questions. Unfortunately, most legal analyses in this area are either incomplete or inaccurate in the way in which they describe the technical underpinnings of what's going on. And so they come to unsteady or incorrect analogies and analyses. And I see this in law review articles. I see this in lawsuits. I see this all over the place. Also, I want to specify what I'm talking about when I say AI in this presentation. I'm specifically referring to machine learning or ML and more specifically I'm talking about generative ML. That's what you all want to hear about anyway, but it's important to understand that I'm not talking about expert systems or regression or even inference. Those are all fun to play with, but I'm really going to be focusing on the questions that come up with generative ML. I'll also largely be referring only to advances in ML that have happened over the past five years and really only over the past 18 months to two years. Machine learning as a technique is not new. The foundations were laid in the 1960s and working systems were created in the 1980s. What has really changed in the past essentially 18 months is scale and quality. Data scientists have been able to harness the increasing capacity of computers and the explosion of digital content to make sure that to create programs that can rival humans in their output and it's only because of those, because of that jump in capability that we're talking about this anyway. So also for anyone who's interested, if you're really interested in the long version, this is only 40 minutes long and a lot of what I'm covering here, some different, some different, but a lot of it is covered in this upcoming article. What I've got here is a pre-print version and if anyone wants to send me feedback over the next week, I've got one week to finalize it. So, after all that presentation, let's begin. We're going to talk today a lot about models, how we get them, what they are, how they're used. I find that understanding models is really at the core of the problem in terms of people's understanding. When they say the word model, and I'm really talking to the lawyers in the audience, they effectively mean a magic black box that confirms all of my pre-existing point of view. But if we really understand how models work, then we can actually start to apply the correct legal analysis to the facts. So, before diving into the mechanics of machine learning training and models, there's an analogy that I think is helpful in developing a mental model of what, of how machine learning training works. So, imagine a newly hired art inspector that has the job of examining every painting in the Louvre. The inspector has no background or experience in art and he has no preconceived ideas of what's good, what's bad, or what even is significant about any particular painting, what makes a Picasso a Picasso, for example. Without any guidance, the art inspector studies each painting by measuring everything about it, everything. The number of brushstrokes, the paint thickness, the average space between the brushstrokes, the size of the painting, where they happen to sign their name, how many bumps there are in the name. Anything that you can think of that can be measured, he measures it. And the inspector measures these aspects that even seem bizarrely random or unimportant, such as the number of consonants in the artist's name or the relationship between colors that are six inches apart. He just does everything. Nothing is left untouched. And then he records it all in his database. Now, as the inspector studies each painting, he gets a little bored, and he decides to make his job more interesting by starting to play a game. He decides to make each one of these measurements a guessing game. Before he makes a measurement, he tries to predict what the answer will be. He's like, using the information he's gathered already, and he's like, oh, how many brushstrokes are in this painting? Well, it's a Rembrandt from the middle third of the career, I'd say 87 or, and then he checks the measurement and he sees whether he's right or he's wrong, how close he was. And as he starts, his answers are usually wrong, but as he goes through and studies more and more of these paintings, he becomes more and more correct. And so after studying thousands and thousands or millions of these paintings, this is fictional art inspector, he becomes the world's most foremost authority on validating paintings. He's regularly asked his opinion on who wrote this, who did this? Which, who created painting and the other things about it. And what he does is instead of taking the knowledge about it to predict the measurements, he goes sort of in reverse and takes the measurements in order to predict the other information that is going back. He uses the measurements to predict the painter. So training and machine learning model is very similar to this process of the art inspector. In both cases, the basic steps are the same. You receive an example, you predict the relationship between the various elements of the example, you check the result, you adjust to improve your capability of predicting, and then you repeat, then you repeat over and over again. These commonalities apply to both the mechanical process performed by the computer. These also apply to the mechanical process performed by the computer during the training. But there are some individual points that we should identify. The art inspector analogy is good at showing how ML models start with a clean slate. They don't know anything. In fact, they start with being filled with random data. All the associations that are included in ML models are learned by observation. The inspectors measurement of small random details is also closer to the fixed process that occurs during ML training. But one significant difference is that the inspector recorded all of his particular measurements in his database. The model doesn't do that. It actually records the changing probabilities associated with the various inputs. So in order to build a model, a data scientist begins by describing a logical structure for processing inputs to create outputs. Now each part of the training process corresponds to a different part of this structure. I've got an exemplary structure. There are all sorts of structures. This is just a stylized example. But they usually share three general structures. The input layer, the hidden layers, and the output layer. And each part of these correspond to essentially a different part of the process. The input layer is where the data is provided to the model. This is similar to just the art inspector looking at the painting, making the measurements, doing things like that. But unlike humans who can take in a whole painting at a time, the computer just gets individual pieces of data. They just get individual numbers. The goal of the input layer is just to provide a uniform representation that can be worked on by the other parts of the machine learning process. It's just, from the machine's point of view, it's just a vector of numbers, just a list. Now the hidden layers of the neural network are where the majority of the processing occurs in an ML application. These layers are called hidden because you typically don't look at them directly. They're intermediate processing layers between the input that you put stuff in and the output where you take stuff out. There consists of a dense series of interconnected nodes. Again, these are logical nodes, not necessarily physical ones. And each one of them has an associated probability or weight that it will take one of its inputs, that it will pass it on, where it will pass it on, if it changes it, et cetera. Similar to the art inspector, scientists actually really have no idea what ends up being significant and when it starts investigating a lot of these various, when it starts being trained. It used to be that data scientists tried to specify specific features. It was called feature engineering. It was a big deal for a long time. A lot of that was because we had smaller computers and that made it so it was more efficient to identify specific things to look at. However, isolating the right features was error-prone, took a lot of time and didn't honestly work super well in many cases because you didn't know what was going to be important. So the current trend is just to send it all into the neural network and let the network figure out what's important. So the output is the result of all of the predictions that happen as it progresses through the hidden layers. And it is essentially the final guess that corresponds to whatever this architecture. However, like the input layers, it's just a number. What we do is we essentially apply the same sort of process that we use to take a work and put it into the input layers in order to interpret whatever that number means in order to get it out. And then what happens during training is that you have a known answer. And so you compare the answer that you predicted with a known answer and then depending on how close it is or whatever, you actually adjust all of the various weights in order to make it so that you're more likely to get the right answer the next time. Because this worked backward from the output layer through the hidden layers, this is called back propagation. Then repeat a billion times. So then eventually you're satisfied and you stop training. And at that point, using the ML model is basically like training it, except there's no check and there's no adjust. You just use the output directly. For the difference, by the way, between inference and generative ML is also highlighted here. For inference, basically you take just that one output and you run with it. That's the final thing. Frequently that's actually going to be a question that you're asking about the input that you're looking at. Going to be metadata. For generative AI, you actually take the output and you essentially do things recursively. You apply it to the input as part of the next input and go over and over again. And the result that you get out is the series of outputs that are stuck together. So what is a model? In computer science terms, the interconnected network of weighted nodes is equivalent to a program, a very complicated program that does something. Training is the process of iteratively evolving the neural network's weights, their probabilities, so that it can emulate running this desired computer program. And so the model is the combination of the neural network design and the weights. It is a set of numbers and equations that essentially encode the statistical probabilities about everything that it observed. So if you were to open up a model and you were to take a look at it like on a disk, well it'd be a Python pickle, but let's just say you go a little bit further. It's basically just a big matrix of numbers. It just looks like that, or something like that. It doesn't contain any of the copies of the inputs, even in compressed form. In fact, it's hard to say frequently, although there was, as of last week, an effort to do this using GPT-4. Very interesting, I haven't studied it yet though. What any single probability associated with this means. We can identify some of them sometimes, but there are, when we're talking about these models, we're talking about 768 million different parameters, or four, six billion different parameters, or 135 billion different parameters. For anyone who's familiar with Bayes reasoning, Bayes theorem, it's a way of taking the information you know and making an educated guess of what's gonna come out. This is like a very complicated Bayesian prediction with 160 billion different parameters. What we do know is that through all of this, it has a very complex and nuanced statistical picture of what we humans would think of as knowledge. It thinks it has encoded things about what makes a story a story as opposed to an article, or what makes an 80s pop song, an 80s pop song, or, and it has embedded so much of this stuff that implicit that the models can seem quite intelligent. They are not, they are not creative, they are just the statistical amalgam of all the probabilities of things that it has observed. So, now that I have gone and probably bored half the audience, now I'm gonna bore the other half. Why do I spend 20 minutes or 15 minutes talking about the process of training a model? It has everything to do with correctly understanding the law and understanding how we're going to apply the law to this. Because, as I said, if you think about the model and you think about these machine learning applications as something that reinforces your preexisting conception, it's a magic box, it just does the thing, then you are going to start, everyone who is, who wants to get paid for using their stuff is gonna say, well, this is this thing that what makes me wanna get paid. For those who want to use it, they're gonna say, well, this is a thing that it's completely separate that makes me wanna use it. It doesn't, you're just projecting it as an actually analysis. So, let's look at where are we going to find legal issues? And really, these are going to be the big four areas that you are going to find legal issues. There are many others, but I'm gonna put them into these four categories. We'll start with IP and we'll go through these as far as we can and then we can take some questions. By the way, I'm gonna talk about some generalities. I'm a lawyer, I'm not your lawyer, et cetera, et cetera, et cetera. Now, let's look at intellectual property. This is the thing that is going to come up most often and it's probably gonna be the most interest to the people here. So, when you're looking at applying IP law to machine learning, the most important question you have to ask first is what part of machine learning are you trying to look at? Because training is different from inference, is different from generation. And what about memorization, which we'll get to in a second? So, let's start for a moment by looking at the training process and the legality of training. This is one of those places where some of, depends on your priors, depending on what you already think, but there are gonna be people who, artists, writers, musicians, they're saying, look, you're using my stuff without my stuff, if you would have nothing, I wanna get paid. And that's a very reasonable and emotionally correct sort of way of looking at this. Because they, to a certain extent, they are providing this value. However, this is one of those places where the technicalities of the ML process and the law intersect to actually give a different answer. So when you're looking at the ML training process, what are you doing? You are taking measurements about the works. You're not copying the work itself. You're not copying anything about the work. You are making measurements about the work. You're recording a series of facts that already exist in the world. And one of the important, one of the important cases in Feist, this has been a doctrine for a long time, but it was most recently discussed in a Supreme Court case called Feist, where they argued about the copyrightability of a phone book. And they said, look, facts are out there. Facts cannot be copyrighted. A compilation of facts possibly can be, but you have to have some sort of human input into it. You have to have human creativity. And to a certain extent, they said, well, we arranged it in alphabetical order. And they say, no, no, it can't just be a wrote order. It can't be something that's obvious. It has to show some sort of creteteries, some sort of human intelligence. And a good example is to think about is between the distinction between something that might be copyrightable and not, is the top 10 songs of last year is just a fact. By whatever record, however you would measure the top 10, they just are, they were the top 10. However, if you said the top 10 best songs of last year where you were saying, oh, I'm going, this one was number five actually in this, but it was a hip hop song. I think it should be number one because it was the top ranked hip hop song. And then number eight was EDM. I don't like EDM, that drops off the list. That shows some sort of creativity that comes from the human, the human inside the human. And so that is the copyrightable element. Now let's look at what are the arguments that they would have associated with being able to copyright this series of numbers. Well, first of all, as I said, they're facts. There's not going to be something that's going to be inherent in the facts that's copyrightable. It's going to have to be something about the compilation. But number one, they are not choosing which facts that they are going to say. They set a number of parameters and it is actually, remember how I said, the current trend is just to dump everything into the model and have the model figure out what is relevant? That means that the human is not the one deciding what's going into these various parameters. It's the model. And what's more is it's happening, wrote over exactly the same for every single thing that it sees. It is not the human who even designed the model architecture or who's performing the training who's making any of these decisions. It is the statistical probabilities that are being changed over time. So then you could say, well, I chose the particular things that are going to be part of this input. Now there is a possibility that you could argue, for example, if you chose things that had a particular value, a particular aesthetic and you wanted to only train with that, perhaps there's an argument there. But in general, that's going to work against the grain of how machine learning works because the more inputs that you have, the better your model, the more complex and more nuanced the latent space is, the better your outputs. And so you actually, most of what's happening is we're having these large linked models these foundation models where it's basically used train on the whole internet. You're not choosing, picking and choosing in a significant way. You're trying to, this is the biggest pile of data we could find. So that's what we're training on. So let's talk about inference. I'm going to spend about 30 seconds on inference. Like I said, inference is usually metadata. It's a question about one of the inputs. Is this, is this, what song is this? Can you identify it? What style is this? Who wrote this? Et cetera. That becomes not really legally, no one cares about that legally. But let's talk about generation. The thing about generation is that it's going to, most of the time, create a new, in fact, almost all the time it's going to create a new work. It is designed, what happens is as you go through, we actually have a parameter associated with the weights it's generally called temperature that basically says you have a certain probability of not picking the highest probability thing. And the ML application doesn't have creativity. It has randomness. And so what's happening is that you just pull different places, slightly different from the, slightly different. And we have learned to interpret those things as being creative, but it is not, it is simply the application of randomness to this iterative picking situation. Now, I know everyone over here thinking, but haven't I heard about some of these ML models that will spit out the same thing that they saw? I mean, they're just trying to wash the copyright through, et cetera, et cetera. That is, I refer to that as memorization. That's what gets people excited. Memorization is essentially a training error. If the training continues beyond the level at which the guesses stop improving, the probabilities associated with a specific input can get pinned to a specific output. So it still does not actually store the work inside it, but instead, if you over-train it in this particular way, you can actually store sort of the instructions for recreating it, usually in a lossy way. So while the model will not contain the copies, it could nevertheless reproduce them if prompted to do so. But over-training and memorization is not the purpose of machine learning. In fact, it's a type of failure that scientists are trying to avoid. The desired outcome is to encode enough probabilities that can respond effectively to new types of inputs, and there are new types of inputs and create new types of outputs. If you had something that memorized too much, it'd be like a student that memorizes, and that memorizes all the entire textbook, but can't do any problems that they haven't seen before. It's brittle, it doesn't work. One interesting implication for copyright purposes is that this means that number one, the larger, the bigger the model, the better. Number two, the less duplication in the input, the better, because both of those things are going to lead to, the first one leads to a more complex representation. The second means that you're going to have less chance of over-training. Now, let's talk about data and privacy. Trade secrets. Now one of the interesting things about trade secrets is that they're only valuable if they're a secret. The definition of trade secret according to the Defend Trade Secrets Act is that it is a piece of information that holds its value by virtue of being secret. It doesn't have to be copyrightable, it can be anything that holds values as a result of being secret. One of the interesting things about machine learning though is that if you number one, a lot of, if you give someone your model, you've just disclosed your trade secret, it's there. But what's more is if you allow people to interact with your model, you are also leaking that trade secret. It's not a trade secret. Because you can use the existing model as an oracle to recover the weights and train your own model. So it becomes, maybe it's a trade secret for a while, but there are ways in which if you find a trade secret without essentially doing anything wrong without breaking the law, then it's not a trade secret anymore and you can use it. So trade secrets are very, very difficult to keep. It's going to be very difficult to keep in a model. Let's talk about confidentiality for a moment. There's sort of the standard concept, there's the standard thing that people are worried about ChatGPT in particular. And there have been circumstances where peoples, where we've seen trade secrets, trade secrets from one company using ChatGPT actually be leaked to another company. How did this happen? The answer is that it's actually in open AI's terms of service where they say anyone who's using the API has essentially a dedicated instance. Anybody using the web based version, we are going to retain and use your inputs as further training material. So if you are worried about putting your trade secrets or your employees or whatever into this, be aware that if you are using the web version, you are disclosing your trade secrets to open AI for possible partial reconstruction as part of a later iteration of ChatGPT or one of their other products. This is just because sometimes a very specific prompt will only have a small number of examples and you may be able to get that sort of leakage because if you've something that is unique enough, well, if you ask it, prompt the model correctly, it may generate it. This is also tied to this idea of PII. PII stands for Personally Identifiable Information. This is what the GDPR and similar things are about. Interesting thing about most machine learning is they don't care about the names. They don't care about the names. In fact, it just becomes noise. And remember how I said when you put things into the model, it's just a bunch of numbers. And so you could actually have a hash of people's names and put it through and it will be equally as useful to you as putting in their actual name. And so one of the things that's going to be a big deal, well, it was a big deal five years ago, it's going to be an even bigger deal now, is going to be both in terms of data prominence, but cleaning your data from any PII before it starts going into the training process. Because when you do that, if you do that, you're not ever going to be able to put, you're not ever going to put it into your model and it can't ever come out. Here's the last little bits. Liability. Stuff goes wrong. Who is responsible? If you, there's sort of an interesting dichotomy here in that the terms of service for essentially every single one of these says anything that comes out of it is your problem, buddy. They will not cover you. That's what they say. You have to agree to your terms, et cetera. But that's if you're taking their output and doing something with it. What about situations where you have something like, people always like to think about self-driving cars? Who's responsible? Turns out that this is a very difficult question and it is giving the insurance industry fits already because product liability is something called strict liability. If you, if it breaks, you're responsible. If it causes harm, you're responsible. There's no one who's made a bad judgment, just if it broke your fault. Whereas if a human is doing these sorts of things, the human is allowed to exercise judgment. There's a reasonable person rule. So if you've got essentially an AI, is it going to be strict liability or is it going to have some sort of reasonable person rule? To make this more specific, let's say you've got a self-driving car, is going, something happens, it makes a decision that a reasonable human being would make and someone gets harmed. Who's responsible? I can tell you that the personal injury attorneys will tell you that it's the car manufacturers because that's where the big bucks are. But we don't know. When, for people who are looking here though, you're probably going to be thinking about co-pilot, chat GPT, mid-journey, et cetera. Who is responsible for that output? In general, the person who's going to be able, who's going to create and prompt that output and then use it is going to be the person who's going to be responsible. And the reason why is first of all, not only the terms of service, but the ML application is a tool. It is just like Photoshop or just like Word. It's a very complex tool. It can do a lot of things. It's got a lot of automation, but it's just a tool. The creativity and the judgment that comes, that goes into it and adjusts what comes out, comes from the human. One interesting place where this comes out is in the area of something called character copyright. Character copyright is a little corner of copyright law that says some fictional characters especially, comic book characters especially, especially, have such a well-defined personality, a well-defined look that you can't, that even if you create a new image, or a new image with that one, you're still necessarily going to be copying, for example, Superman's S, or Tony Stark's Iron Man costume. And so if you prompt, this can also come up in not as much in code, but even in textual works, but it's easy to see in pictures. If you prompt it to create something that is effectively copyright or takes a copyrighted character, you can be, that is an infringement, you've just made it, it's no different than if you used Photoshop, or Word, or what have you to create a copy. You used a tool, you created a copy, you're responsible. So be careful. The thing that gets tricky is really who is, what about inadvertent copying? This is going to be unlikely for something like the image generation, and maybe a little bit, chat GPT and especially co-pilot are going to be ones where people, especially because of the duplication and the way it was trained, is going to be places where you can prompt it in a certain way and it will spit out one of its memorized outputs. This happens maybe between 0.03% and 1% of the time in these commercial models, which isn't a whole lot, but you can make it do it. As a concrete thing, if you're going to be using one of these things, especially for code, I recommend snippet scanning. Used to be that this was only for copied and pasted code. That's essentially what it might be doing behind your back. So just something to look at. The last thing before I stop and allow any questions is also be aware of the terms of service in general. One really interesting piece about OpenAI's API terms of service is that they say that you cannot use any part and their API to create or participate in the creation of any model, not service, any model that might compete with OpenAI. Now when you think about it, that's an extremely broad provision. That doesn't talk about creating a commercial service. And if you are going to displace them, that competes with what they're doing. So be very, very careful. Google has on some of their beta barred, they have a similar, they have a similar thing. They say, we can use any of your input to improve our model, but you can't use ours to improve yours. That's gonna be how it is though. So just be aware that these terms of service restrictions, the thing that is going to bind people most when it comes to the law in AI is gonna be contracts. Because a contract is just an agreement, it can be enforced. And so be aware of the agreements that you get into, because that's going to, for the foreseeable future, I think govern the law around AI. Now, I'll stop, ask questions. I think we've got a mic, so that people on the live stream can hear. Wonderful presentation, Van. My name's Alana Whitaker, came over from the US and this is a particular topic that I find fascinating because we are having these conversations. When it comes to more of the stable diffusion models, like Lenza, where you're feeding it your images, you're feeding it data and then it can replicate, almost like say we fed it 1,500 images of Scarlett Johansson and then we output another new image of Scarlett Johansson. What legal precedence does she as a public figure have to sue to protect herself? That's difficult because you can't use her, there's name, image and likeness, essentially rights of publicity. You can't use that stuff in order to suggest that Scarlett Johansson is the source of whatever you're doing. On the other hand, Scarlett Johansson is also a public figure, you can take a picture of Scarlett Johansson, you can print it in the tabloids, you can create new pictures of Scarlett Johansson and there's not a lot that she can do about it. Was really have to be defamation or defamation, right of publicity, things like that. Okay, I guess, and then my follow-up question, I think how this conversation came more to layperson conversations recently, some different artists, they've been generating artificial songs and it didn't generate from an artist. And the style of. So is it the same thing with that? Style is not copyrightable, period, end of sentence. So in order to have a copyright, in order to have a copyright interest, you have to have an identifiable part of an identifiable work that has been copied. So it's not enough to copy, do something in the style of somebody. You have to actually be able to see a specific copied piece. So you had mentioned in the training corpus that the assumption is that you've got a large body of data that's not being curated, you're just throwing a ton of data into the model, letting it figure things out and then set its own weights. If I were to curate a data set that's specific to cats, right, I have something very specific, there's a lot of human involvement that goes into choosing what goes into the corpus, does that change the calculation at all? Possibly. I'm not 100% sure, but something like that I think would have a chance of being able to be copyrightable as a specific compilation. That said, I think it would be a hard argument. So you argue that taking the measurements of a work is not copying the work. Must you also have in the first place in training the permission to read, view, or listen to the work, and is that something that can be restricted, say in a software license or just copyright law, not even permit you to restrict the measurements? Okay, this is going to be jurisdiction-specific. In many jurisdictions, you're going to have something called TDM allowances basically that is explicitly outside the scope of copyright. In the United States, that would be covered by fair use. That's basically been litigated. The most prevalent cases there are the Google, books cases, authors Guild v. Hathie Tress and authors Guild v. Google. If you remember, that was about Google scanning all these various books and creating index. If you think about it, actually, and there were also, there's the perfect 10 case, there's a bunch of others. All this was when they created the internet and they started creating these search engines, they had all the same arguments. Hey, the value that you get in your search engine is from the fact that you indexed our stuff. We want some of that. Well, it's the same argument. They said, you know what, a search engine is completely separate from, it's a separate type of work. It is not a market substitute for the original work. It is transformative, it doesn't, it's transformative. We allow people to read, there's a generalized right to read anyway, even if that involves an incidental copy. And then there's a technical argument that I'm not going to get into, that this is actually one of the few places that 17 USC 117 applies to, that just copies of digital works that are owned by a person. Hey, awesome, Doc, I really appreciate some of the real cases you're mentioning in all this. I'm digging us. I was wondering about the OpenAI API thing that you were mentioning, how they specifically say you can't train models on top of it, compete with us. And I know a bunch of different other people are trying to talk about it. I have two questions on that. One is like, is that enforceable just because you say something in your... As a contract. As a contract, but I mean like, now that they release a public product that is as wireless to our GPT and the entire internet is, awash with like text generated by OpenAI. And in fact, you can see reviews on site saying, I mean, like how can they possibly claim in going to let's say two years down in the future, that just because we flooded the internet with OpenAI text, somehow all of the internet is now belong to us. I don't think that they can say that. Number one, they explicitly grant you any copyright that exists. There's an argument about whether it does or not. But whatever it is, they grant it to you as, they grant it to you. They just have restriction on the use of the API in the model. But they don't have restrictions on the use of the text. Can I just very quickly, can you talk about the stability AI and like other diffusion models, like them being sued and like what... The trademark argument is the best one. Okay. Everything else is not good. By the way, this is probably the last one and I'll talk to anybody outside. So I heard you say in your talk that the outputs generated by this could be infringing on copyright. Yep. I didn't quite hear you say whether outputs that you generate with these things can be copyrightable. Now your answer there just kind of sounded like you were saying like, it's a little unclear if that is the case or not. And I'm specifically thinking in terms of when the work being generated is code and can we then put it under a certain software license that requires us to have copyright on it. So the official way that the Copyright Office is saying right now is that you have to disclaim anything that was generated by the machine and that it is not copyrightable. It is just out there. You can only claim copyright on the specific edits or changes that you might have made to the output code. I believe very strongly that the Copyright Office is wrong and it's going to actually go to more of a model similar to photography where the AI tool is just a tool and they're going to recognize even the modicum of input that was put in by the human as supporting at least a thin copyright. But that's not there yet. All right, thanks all.