 Okay, so in the past I've actually defended GitHub Copilot quite a lot, but lately I've read this article which I think is very well written and I think was published like a few days ago and it makes some good points in issues that GitHub Copilot has. So let's be a good person and actually report what people way more expert than me found. The website I'm talking about is called GitHub Copilot Investigation, you might have seen it. And the title is actually quite provoking. It is maybe you don't mind if GitHub Copilot used your open source code without asking, but how will you feel if Copilot raises your open source community, which I think is quite provoking and also not really the point, but I'll get to that. And in this case, it's also very important to talk about the author of this article, which is Matthew Butrik, sorry if I pronounce it incorrectly. He is a writer, designer, programmer and lawyer, which is very important in this case. He has written books and article, he has many open source projects. As an example, he has Poland, which is used to publish books and all these things are open source. So he actually has a very nice and long history of open source contributions, probably way longer than mine. So he has the expertise and he is a lawyer. The important part is that recently he took the next step, he reactivated his California member membership to team up with amazingly excellent class action litigators and so on. And in very big font we have, we are investigating a potential lawsuit against GitHub Copilot for violating its legal duties to open source and others and users. In the rest of the article, they A, argumentate what they think is wrong with GitHub Copilot from the point of view, of course, of blow, because that really is the point. And they also ask for your help if you are people that have used GitHub Copilot or if you are people that think their code has been used in training of GitHub Copilot. So let's start with a very quick introduction of what Copilot is. It is a plugin for Visual Studio and other IDEs that provide this suggestion regarding your code, you type a comment and then Copilot actually writes the code that corresponds to that comment automatically. It is based on an AI called Codex, which is built using OpenAI and Codex is lessened by Microsoft and OpenAI is called like Microsoft is called the unofficial owner of OpenAI. So you see the point here. Now what code is this GitHub Copilot has been trained on and this Codex AI, it's not super duper clear. Actually, they claim that they have been trained on tens of million of public repositories including GitHub, including code on GitHub. Microsoft also talks about billions of lines of public code, which is very vague. And the Copilot researcher Eddie Aftandilian, sorry, I'm not good with names, in a recent podcast has confirmed that Copilot is trained on public repositories on GitHub. Now I've actually heard a lot the argument that if your code is on GitHub, then you kind of agree to their terms of service and they can do whatever you want with your code. And that is actually very much not the point here, which surprised me actually. In this case, we have really no guarantee whatsoever that GitHub hasn't trained its AI on code outside of GitHub as well, especially if they use wording such as tens of million of public repositories including code on GitHub. Can they do that? That's no point. As you might know, open source code usually comes with licenses, which states some obligation that the weather users that could should follow when using it. As an example, you could ask to preserve an accurate attribution of the original source code, which is fair, which means that if you do want to use that code, you either have to A, comply with the license of that repository, or B, you can say that it is under fair use, which is a license exception under the copyright law. And if it is fair use, you don't actually have to follow the license. Now, if you do follow the license and co-pilot has been trained on this large amount of repos, then it should have attribution for all of that code. And they don't. So what they claim themselves is that training an AI such as Codex on public repositories is an example of fair use. This comes from the CEO, not freed man, CEO of GitHub, not freed man that says that training systems on public data is fair use. Is it? Well, it's not a matter of opinion. Of course, it's a matter of law. And currently, there are some organizations such as the Software Freedom Conservancy organization that claim that this is incorrect and it is indeed not for use. And the leader of this project organization has contacted, tried to contact GitHub regarding this asking for solid legal references regarding this fair use policy that they have. And they did not get back or they say they have provided none. Why is that? Well, personally, I might think that Microsoft GitHub might not want to actually answer at all this kind of questions, especially from organizations that are significant smaller than theirs. But that's my personal opinion. The article says that probably they do this because there is currently no precedent or legal authority that clearly states what is for use and what is not in regards to training on AIs because, well, it's not unexpected, AI is such a new technology. What this article claims is that if you actually go to court, and yes, that court might say that it is indeed for use to train your artificial intelligence on public repositories, but they also might want to put some conditions to that. And until we do that, Microsoft and GitHub have a really easy game just being able to do whatever they want because there's no clear legal precedence of that. That really explains why they might want to do a lawsuit in the first place to actually clarify when it is for use and when it isn't so that even GitHub and Microsoft actually have some condition that they have to follow and do this kind of things. Then there's a second issue, which in my opinion is even bigger and even more embarrassing for Microsoft in this case. That is, when you generate code without using GitHub got copilot, who is that code that he's generated owned by? Like, do you own it? And of course, Microsoft in this case claims that, yes, if you produce a code with copilot, then you own it, just like if you produce some code with a compiler, then you own the result. And, you know, that's way too easy, especially because, yes, Microsoft does not claim any right to the code that you produce using copilot. But at the same time, they provide no guarantee of actually the code working. Neither they provide guarantees of the code being secure. And most importantly, they say we recommend you take the same percussion when using code generated by GitHub copilot that you would when you use any code that you didn't write yourself. OK, these precautions include rigorous testing, which, OK, makes sense. And then IP scanning. Now, IP, not that IP, but IP here is intellectual property, which means that whenever copilot, you know, outputs some code, you should yourself as the user, check that that code is not violating any intellectual property, which is weird. Basically, Microsoft is kind of shifting the blame on the user. If anything happens, if GitHub copilot would ever create code that is actually copyrighted and then your as the user who uses GitHub copilot will be sued and it will be your fault for not checking that they could from GitHub copilot isn't already in wasn't actually written by somebody else. Now, in my case, the defense that they usually gave to copilot is that if you correctly train an AI, it should not output large chunks of its training data just like that. That's the theory. They actually have this filter that filters out things that matches the training data to exactly, which is already weird. But literally, we've had many examples of GitHub spitting out some code that was way too similar to some code that is in some public repository. And that would indeed be copyrighted because it is just so similar. In fact, it's verbatim. And this is actually, for me, the most crazy part. So just this week, Texas A&M Professor Tim Davis gave numerous examples of large chunks of his code being copied verbatim by copilot, including when he prompted prompt copilot with the comment sparse matrix transpose in the style of Tim Davis. So it's basically like if I went to copilot and did a comment saying, I don't know, Katie Plasma theme in the style of Nikola Vennerandi. And then copilot actually speeds out some contribution that I've made to Katie Plasma. That's crazy. And it's also a big issue for whoever actually uses GitHub copilot because if you have if GitHub copilot might speed out some code that is verbatim from another open source project, which has a license, then it's up to you because Microsoft decided so to check that this doesn't happen. And if you don't notice it and it would be very fear that you don't notice it, then your label for that happening. This, in my opinion, is really the key point. Now, one thing is GitHub copilot, maybe not following like being under fair use when training it's AI. That's a big issue that we can talk about. But a very different thing is GitHub copilot right now actively putting their own users at risk of being low suited. And when asked about it, they just say, well, it was your responsibility to check that this didn't happen. Was it? Was it? I mean, yes, legally speaking, yes, that's the issue. Now, at this point, the article goes on to a section called what does copilot mean for open source communities? And to be fully honest with you, I'm going to be fully honest. I didn't understand half of these paragraphs and the other have some seem like some rumblings that I'm not sure I agree with. So I'm going to skip it. I think that the most important criticism that is being done towards GitHub copilot at these two things that we've talked about. This is more like about the philosophy of GitHub copilot. To make an example, copilot copilot introduces what we might call as a more selfish interface to open source software. Just give me what I want with copilot. Open source users never have to know who made their software. They never have to interact with a community. They never have to contribute, which I think means that if you use copilot and copilot, you know, does the job for you to give you the code that you want without even you having to actually interact with an open source community to get that code from and use it with proper license, that is a problem for whoever actually maintains that open source community. Because if you do approach that community, then you might want to do to help them, do merge requests, contribute to them, these kind of things. And with GitHub copilot, you don't have that fear. So personally, I am super interested to see if there's any follow up on this. If there's any lawsuit, I really want to follow it. It's super interesting. And what this article doesn't talk about at all, but is clearly super important is that this does not apply only to code. Just look at Dali and me journey. They are doing pretty much the same thing. They are training on available images online without disregarding their license because the claim is fair use. And it's very easy to say that these things might be the same. Like it's the same argument. It is fair use because I'm training it on something that's accessible to everybody, even if it has a license. So allow suit against GitHub copilot. I think we'd have some repercussion even on other types of AIS that use open source publicly, publicly available images, code, text online with this disregard to their license. It would have a much bigger impact than just GitHub copilot. That's my impression. So if you want to help them again, if you use GitHub copilot, if you think GitHub copilot has been trained on your code, they want to, they want you to get in touch. If you have anything to say interesting, they want to get in touch. You can find the article, as I said, at githubcopilotinvestigation.com. I hope I've done a good job of informing you about this. So thanks, everybody, and see you tomorrow with another video.