 Okay, let's get started. Welcome everyone. By show of hands, I'm curious how many of you are using AI tools? Is there anybody not using AI tools? Well, great. Well, AI is taking over the world, I think. So my name is Joanna Lee. I am a staff member of the Linux Foundation and Cloud Native Computing Foundation. I'm the Vice President of Strategic Programs and Legal. I am an attorney in addition to being part of the management team. And my happy place is really at the intersection of technology and innovation, business and economics and law and policy. So prior to joining the Linux Foundation and CNCF, I practiced law for about two decades, representing technology startups and some more mature companies as well and venture capital firms. So I see most of the people in this room are not lawyers, so I thought it would be helpful to provide some AI copyright basics so you have context for understanding the practical advice that's going to come later in the presentation. So what is copyright? Copyright is a set of exclusive rights that a copyright owner has in a creative work. And a creative work could include software. And those exclusive rights, there's a set of exclusive rights and some of those rights include the right to copy and reproduce, distribute, perform and publicly display. There's also the exclusive right to modify or create derivative works of a copyrighted work. And a copyright owner can authorize others to exercise these rights also in the form of a license. In the context of open source, it's usually an open source license like GPL or Apache or MIT or BSD. And for a work to be copyrightable, there has to be some minimal spark of creativity. So a pure mathematical formula, for example, would not be copyrightable. However, if there was a creative way of expressing that mathematical formula, maybe singing it or expressing that mathematical formula through a poem or some innovative unique way of expressing it in code, for example, that the unique aspects of that expression might qualify for copyright protection. Content generated by a non-human, whether it be a monkey or a machine, does not qualify for copyright protection. So that was very important. However, if content consists of some portions that are generated by a human and some portions that are generated by a machine, the portions that are generated by a human are copyrightable. So if you took code that was generated by an AI and you added to it, rearranged it, modified it, et cetera, your contributions to the end result would be copyrightable. So although AI can't author copyrightable content, so it can't be the owner of copyrightable content, it can infringe others' copyrights. So this is where some of the legal risk is. AI models usually train on, they train on pre-existing data. And sometimes that pre-existing data could include code that is owned, code or other content, whether it be images, artwork, literary works, films, et cetera, maybe owned by third parties. And if you give an AI a prompt and it gives you output, if that output reproduces and copies some of the material that it was trained on, without permission, that could result in infringement of third-party rights. There is some pending litigation over AI output. There's pending co-pilot litigation and then there's litigation, a lawsuit that Getty Images brought against stability AI. And I'm not going to go into the details of those lawsuits. I'm really going to focus on what are the risks and how do we manage them, because these are very, very manageable risks. There is some uncertainty in the law around how copyright exceptions apply to artificial intelligence. Certainly if third-party works are being reproduced without permission, that would create legal challenges and copyright issues. But if pre-existing works are just used to train an AI model and they're not actually being copied in the output, and that's without permission, there is some uncertainty whether a doctrine under U.S. law called fair use would apply and allow that even without a license. There is a copyright, there's a text and data mining exception under EU law that is maybe the closest analog to what we have is fair use. And there are some uncertainties around how that would apply to AI as well. So as has been alluded to, there are some legal risks and challenges. The copyright concerns that have already been discussed, there's also a concern around license compatibility. So let's say an AI model trains on all the publicly available code in GitHub, and that includes a mixture of permissively licensed and copy left license code. The code in a GPL repository, if that's reproduced in its output, shouldn't be then contributed to a repository that has an MIT or BSD or a patch or other permissive license because there's a license incompatibility there. Some AI tools do provide you with information about third-party materials that it's copied and reproduced in its output, but a vast majority of tools today do not currently provide that information. So if you're a user of the output, how would you even know whether it is being reproduced with permission, who the copyright owner is, and what license applies, right? So there's a license compliance challenge created when there's a lack of notice and attribution. This also makes it very difficult to produce S-bombs or software bills of materials. And S-bombs are required in some regulated industries and usually when you are selling to government. And S-bombs are not... It's not just a licensing compliance document. S-bombs are also used to track security vulnerabilities. And so this is also a challenge. Again, if you're using a tool that does not provide the requisite notice and attribution as to third-party copyrighted works. With some AI tools, there's also an inconsistency between the contractual terms that apply to that tool and its output and the open-source definition and then also the terms of all open-source licenses. Give you an example, chat GPT. These are some of the contractual terms and chat GPT terms. You may not use the services to develop foundation models or other large-scale models with open AI. Another restriction is that published content created in part using open AI may not be related to political campaigns, adult content, spam, hateful content, content that incites violence, or other uses that may cause social harm. And you might be thinking, well, I'm not using the output of chat GPT to do any of these things, so I'm good, right? It's not quite that simple. Any restriction on the use of software disqualifies in terms of how it can be used, disqualifies it as open-source under the open-source definition. The OSI maintains the open-source definition and provides a license can't restrict anybody from making use of the program in a specific field of endeavor. And there are other requirements for code to qualify as open-source, including that it doesn't discriminate against users, et cetera. And if it doesn't qualify as open-source, it's also, by its nature, incompatible with all open-source licenses. So it's both a violation of chat GPT. It could be a violation of the chat GPT terms and conditions, and it certainly creates an incompatibility issue if you take the output of chat GPT and then you contribute it to an open-source project. Now, that's not true of many other popular AI tools. For example, co-pilot doesn't include these types of... terms and conditions don't include these types of clauses. There are also questions about how does this really work with the developer certificate of origin that a vast majority of open-source projects require. So when you make a contribution, you're required to certify that one of these statements here is true. So I'm going to summarize a couple of the paragraphs here. One is that the contribution was created in whole or in part by the contributor. Well, if you took the AI output and you made modifications to it and you combined it with your own work, then, yeah, you create it at least in part. Maybe not in whole, but you create it in part. So you're good. The alternative is if you just take the output and you're not actually adding your own creative content to it and it's based upon a previous work, then you need to have permission to submit that work. And again, if it's a tool that tells you... it tells you the information to confirm that you have permission from third parties whose content is included in the output, that's great. But if you don't, there's really no way of verifying that your contribution qualifies under paragraph B here. There's also questions around consistency with the Apache CLA, which is used by many open-source projects. So the Apache CLA has a couple of provisions that are very similar to the DCO paragraphs on the prior slide. Basically, you're representing that the contribution is your own original work, or if it's not your original creation, you have the necessary rights from the third parties whose work it is to contribute it under the terms of the applicable license. Additionally, the laws and regulations around AI are rapidly evolving and changing. Last month, ChatGPT was banned in Italy for several weeks after a breach led to people being shown excerpts of other users' conversations and financial transactions. The ban was lifted after OpenAI agreed to enforce rules protecting minors and users' personal data. There is also proposed legislation in the EU called the Artificial Intelligence Act that would subject both AI providers and AI users and distributors and others in that supply chain to certain restrictions and compliance obligations. And this is a proposed act that has not been passed into law, but it's something that we're watching very closely because it could have a dramatic impact on innovation in AI. Open source AI and not open source. What's troubling about the most recent draft is that it could not just impose compliance obligations around, for example, data governance, risk management, accuracy, liability for errors, etc. Not just on the companies that are hosting the models, deploying them, selling them commercially. It could also impose liability and compliance obligations on the people who wrote the code. That's very, very troubling in the context of open source development. And if this law passes in its current form, this could very much hamper open source AI and probably consolidate AI market power in organizations that have the funds and resources to develop robust compliance programs because the compliance obligations under this act are pretty extensive. There are some other risks, but there are not really risks that open source projects and developers need to worry about. There are concerns around trade secret, loss, loss of privacy, intentional manipulation of AI models, etc. But these are risks that may apply in other contexts, but not so much open source development. So I'm not going to talk about those at length here. These risks could impact adoption if these risks aren't managed well, and they are very manageable and they'll talk about what that looks like later. But there are companies who might be hesitant to adopt and utilize open source software packages if they include code that's generated using certain types of AI tools or if they don't have good licensing and AI hygiene. Even if project maintainers and hosting foundations are willing to live with a certain amount of risk, if the adopters are not comfortable with those risks, that could chill adoption, which would impact the health viability of open source projects, of course. Whether companies are willing to take on certain risks or willing to consume open source that was generated in part using AI is probably going to mirror what their internal policies are around use of AI by their own developers. And those range. For example, there are some companies today that just completely prohibit the use of AI internally. Those tend to be in very conservative or highly regulated industries or they sell products that are really mission critical and I think eventually they will allow use of AI but they're waiting for the technologies to mature so there's a higher degree of reliability, accuracy, security, et cetera. There are some companies that permit use of AI for selected uses in context, but not others. For example, there are some companies that will allow their employees to use it for debugging but place restrictions around use of generative AI for developing new features. There are some companies that will allow generative AI to develop code that is consumed internally but not for code that's incorporated into products that are distributed outside the company. Some companies will generally allow use of AI tools for developing code but they require that the code go through some type of internal copyright review process before it can be incorporated into a product. And just recently I learned of a company who is authorizing use of generative AI but only by developers that have certain credentials. So these are developers both that have a certain level of experience so the company deems them to be trustworthy in reviewing the output of the AI and being able to spot errors and make judgment calls about whether it's suitable for its tended use and in addition they go through training on how to use these tools responsibly. And there are other policies that companies have developed to manage other types of risks but those aren't so applicable in the context of open source development so I won't go into those here. So how can open source projects and developers navigate these risks? These risks are manageable and the tools are rapidly evolving to help manage these risks which is great. I do want us all just to keep in mind that these are not brand new risks. Open source projects today don't currently police the origin of where code comes from, right? Today there is a risk that a contributor is going to copy code off of Stack Overflow without permission or take code from a GPL repository and contribute it to a permissively licensed repository and create licensing compatibility issues so that risk does exist. Anything that AI can do, well maybe not everything but anything problematic from a compliance perspective that AI can do, a human being can do as well and often do do either sometimes just out of innocent mistake. However, generative AI does present this risk at a much bigger scale and in a systematized manner so we're no longer talking about just isolated incidents of somebody copying code that they shouldn't have copy and then contributing it to an open source project. This is systemic and for that reason at the Linux Foundation projects we've been evaluating what it might look like to explore potential policies or at least guidance for how open source communities can go about managing these risks. Bullet points that are not circled are the ones that we are not actively considering but just to lay them out all on the table the most cautious approach would be to say you can't use generative AI to develop a code that you contribute to open source projects. That seems overly restrictive of course and not realistic. Another option would be by use case, allow for some context like debugging but don't except for other uses in context but that's very difficult to regulate police, monitor, et cetera and it just requires so many judgment calls not really, I don't think a pragmatic type of policy. Another option is to decide on a tool by tool basis so for example allowing, having an allow list for AI tools that meet certain requirements meaning that they don't have any terms and conditions that would be inconsistent with the open source definition and they provide enough information for you to confirm for you to comply with any of the upstream licenses or they have features that allow you to suppress reproduction of incompatibly licensed code in the output. The status quo really would be just trust the developer and as we do today let developers figure this out and how to navigate these risks. I think that the direction that the projects within Lynx Foundation that are looking at this are leaning is really trusting developers as we do today some guidance because this is a complex rapidly evolving area and it's not intuitive it's not easy to figure out particularly if you don't have a law degree. So this is an early proposal for what the guidance might look like. This is going to probably go through many iterations it's going to be discussed within legal committees and the projects that have legal committees we're getting feedback from member companies we're also engaging in conversations with other large open source foundations around what this might look like so this is a very very early draft but basically letting contributors know that yes you can use generative AI but please do so responsibly. Please confirm that the terms and conditions that apply to the AI tools output that they're not incompatible with either the open source definition or the license for the project that you are contributing to and there are some other conditions that would need to be met I mean really if there is any third party content that is reproduced in the AI output that you're contributing make sure you have permission from those third parties under an appropriate license and you're complying with those license terms. A couple of other scenarios might be that there is no third party copyrighted material in that output or that the output is not copyrightable subject matter and would not be even if it was produced by a human for example there was just zero creativity in that output then copyright wouldn't even apply to it and we wouldn't have to be concerned about these licensing and copyright type of risks. As I said the tools are evolving to make determination that the output is compliant much easier for example AWS Code Whisper recently added a feature that provides notice and attribution and the GitCub co-pilot has announced that this is a new feature I don't know if it's been rolled out yet but if it hasn't it's any day now some additional guidance that is under consideration but there's still not any consensus around this for any of the projects that are looking at this would be perhaps providing a pre approved list of AI tools that meet those requirements on the slide before. Projects might in the future also ask developers to include notice and attribution to the AI tool itself for example in addition to if you copy code from an open source project and you modify it and then you contribute to another project you're supposed to maintain the license headers and copyright notices from the third party code that you copied. Perhaps there should also be another disclosure that this contribution was generated in part or in whole building generative AI and listing which tool. That is something that might be coming around the corner but that is not a requirement today. Some of the other questions that the open source legal community is thinking about are should non-code contributions be treated differently because they have slightly different risks for example images, graphics and artwork these are almost always going to qualify for some type of copyright protection because they are inherently creative whereas some code may have that minimum spark of creativity needed to be protectable to copyright and some might not. Documentation blogs I think this is a very low risk area because it's very easy to change or remove documentation or a blog post if you find out that you accidentally infringed somebody else's copyright whereas code has dependencies and it is sometimes quite challenging to modify code if you find out that there is a compliance issue. Again that's not a new risk this happens today. Some of our larger projects we do go through scanning and when those scan results show that a piece of code got in there that it's under an invadable license there is a process that we have to go through for figuring out what to do with that and whether we extract it, modify it, go to the copyright holders and ask them for permission under a different set of license terms etc. And similar questions around should standards and specifications be subject to a different set of rules or guidelines. So the ongoing efforts within the European Commission we are working on draft guidance document and additionally so earlier I mentioned the European Union Artificial Intelligence Act and how if it is enacted into law in its current form could be very problematic for innovation in AI and open source and also among startups. So we are planning to help the European Commission expressing our concerns helping them understand how this act in its current form would undermine innovation and we will be inviting other open source software organizations to co-sign this letter and if any of you are interested in contributing to either of these efforts please come talk to me we welcome input, we welcome collaboration and with any type of proposed policy I think the more people that regulators hear from more different types of stakeholders the more impactful the messaging is so I also encourage you if you are a part of other organizations or your companies have an interest in this I encourage you also to have a voice in this conversation. And this is going to continue to evolve it is a rapidly evolving area. There is going to be changes both in the law and also in the technology itself and the tooling my hope is that very soon a majority of the tooling is going to be developed in such a way and have features that help take the effort out a lot of the effort out and friction around the compliance so that this is not something that developers have to give so much mind share to. So there will be changes and tolerance for risk and ambiguity among adopters and companies for sure. In some ways this reminds me a little bit of the early days of open source when late 90s and early 2000s when there were companies that banned the use by their employees of open source code and I understand that because there is a lot of fear and uncertainty and doubt but when you do that your employees are going to use open source anyway and so I think the better approach is to help educate your employees and in this case help educate members of our community about don't avoid use of AI just learn how to do so in a way that is responsible and help contribute to the evolution of these tools so that we can be able to work together and do something itself. So are there any questions? Yes. Oh, patent summer meaning. So patent summer meaning is going to practice where a patent holder sits and waits for there to be broad applications in which AI models that train on prompts could be theoretically manipulated in order to facilitate a patent summer meaning conspiracy but that's not really a concern that open source communities really need to be thinking about. Yes. This litigation that's pending could be like the API I think could take another 10 years before the courts actually resolve it and then probably resolve it on a very minor subset of issues that are really helpful to the rest of the ecosystem. So the U.S. Congress talked about this. Are there any fields pending there? Yeah, there is definitely talk about this. I'm not following it quite as closely as there is talk about it but there's not like it's not mature enough that thought it worth providing analysis at this point. But what's frightening about one of the other things that's frightening about this EU AI act is that if it is adopted other countries and jurisdictions are going to look at this as a model. Yes. So, as early as June 2024 but what's your opinion how do you think EU commission is going to take your inputs because if it weren't for COVID I think that would have been a loss similar to GDP I already know. And I am from EU. I don't know my crystal ball and this is not very clear but but I will say that one of the things that keeps me up at night is the lack of open source sophistication I would say in the European commission. It's interesting you have some divisions of European commission that are publishing studies that they're talking about the tremendous economic benefit of open source and you've got other parts of the commission that seem to not really appreciate the impact of their legislation open source and for example the resilience act has some troubling provisions around cyber security and who the liability and compliance obligation are arrest with and so I'm concerned that there's a trend of not really understanding open source innovation at all. Thank you. Awesome talk, first of all definitely want to be a part of that legislation or that letter formation and helping provide another voice. One thing I wanted to understand is I'm actually confused about this piece of legislation being something that is a threat to the open source community originally I thought it was something that was positive because there was a room for malicious intent repurposing of AI generated content that are trained on flawed models or to then do harm to broader society so having some sort of legislation and having on a global standpoint you already have having been a lead in GDPR in other areas it seemed like it was the right thing to do I didn't realize that there was open source implications that could potentially limit the freedom that open source provides but is there a way to strike a balance because I actually did look at the legislation just an overview and I did say that they broke it down into levels of risk and mitigation and more people are asking for it because they're going to be the new model to be inclusive of normal usage and rather than maybe getting it is there a way that we can say hey this is great but could we also include use cases for open source that don't limit the freedom that traditionally we have and you even have studies that are documented on this that are showing the value of it so how could we strike a balance between yes setting your global standard letting them know that we need to create an FDA for a algorithms but at the same time it shouldn't be at the expense of limiting creativity and freedom yeah so I agree with the general sentiment that some regulation could be a very beneficial right in terms of regulating security, risk, ethics etc, privacy all the different types of challenges and opportunities that AI presents the issue with the most recent proposed draft legislation though is that the way the question is who is ultimately responsible for the compliance and who has liability if something goes wrong so in my opinion it should be the person who's you know furthest out in terms of actually providing AI as a commercial service right because they understand like how the AI is being deployed in action they understand what use cases etc right they have they actually have and they have ability they have ability to implement the governance that's required it shouldn't be the open source community member who wrote a piece of code who has no control over is it going to be used downstream in a high risk application or a low risk application who is not going to necessarily who as an individual should not be expected to have liability or compliance obligations around the security the documentation etc so it's not that I think the actual the aim of the regulations is bad it's just it's who it imposes liability on needs to be rethought yeah I think if you think about proprietary software and services the developers the company is behind them right in an open source community it's different so yeah they're not thinking about how open source software is created in the way that the proposed legislation is the most recent draft is currently structured yes I think we have time just for one more question so let's say that we are using generative AI in our code because it sounds like everybody is talking about what tools does the Linux foundation recommend that we use since this guidance is not in place yet what tools are most akin to or compliant with the aims of the guidance so we can go there as quickly as possible I'm not in a position to by the Linux foundation I'm not sure that we'll ever actually recommend tools probably the far this will go is helping publish information about what tools have certain types of features like notice and attribution and anything I tell you today is going to be out of date I guarantee like three weeks from now a few months ago when I first started speaking about this issue and helping provide it analysis I was advocating for please evolve the tools to provide notice and attribution and I kept getting pushback and like well it's harder to do that than you think and then like a month later AWS code whisper announces that they have this feature so anything I told you today would be probably out of date in a couple of months so the tooling is going to evolve and I just hope that either through market pressure or norms and expectations in AI or regulation that eventually all the tools that are being used to develop code that might be incorporated into open source project are going to include that compliance built in all right thank you all .