 Hi, this is Yoho Sapil Bharati and today we have with us once again, Ibrahim Haddad, VP of strategic programs AI at Linux Foundation. Ibrahim is great to have you back on the show. Thank you Swapnil. I'm happy to be talking to you again. Last time I think was very cool. And today we are going to talk about the recent open source license compliance report that you authored. Let's just go ahead and do basics because last year we saw some disturbing, we should not, I mean, it depends on who you look at disturbing, you know, where companies change the license and then the community responded with open source projects. So I just want to look at open source license in general. When we look at this modern world companies, I mean, most of the code that is being written is open source, but, but a license is what kind of brings, you know, companies or products together because compliance is a big issue. I remember in early days with Linux kernel, the reason Linus chose Ganu was that it was, you know, he knew that what projects he can work with, you know, versus everybody writing their own license, it's not compatible. So I just want to understand in modern word, how does open source license matter for the sustainability of open source? So let's go to the basics. So I think as a background, I have been involved in open source license compliance since year 2000. So it's been almost 24 plus years. And that goes back to the days when I was an engineer. And I was responsible for figuring out what is open source license compliance at a time where there were no knowledge or public knowledge. And we started from studying, you know, what is the new GPL, what is the Linux kernel, and how we can use it. And this is how we started. So fast forward 24 years, we are today in a place where not a single company in the world, regardless of what industry and specific vertical they operate in, can actually offer any services or products without using open source software. So from the moment as a consumer, as a person from the moment you wake up, till the moment you go to bed, you are touching open source software in pretty much any action you do from looking at your phone, from online shopping, from streaming, from using your bank, etc. So the importance of open source license compliance lies in the fact that there are numerous numbers of organizations, thousands of them and hundreds of thousands of developers that are coming together and collaborating on some of the largest software projects we've ever witnessed ever. And all of this code is being available as if you wish, open R&D licensed under an open source license that allows anyone out there to look at the code to modify it, change it, customize it for their own use, redistribute, etc. So there is a massive amount of technology assets in the billions of dollars that are being available for use. And in return as a company or as an organization that you are using and deploying this kind of software assets, the only obligations you have is really to respect the license under which these different software components and technology pieces are being available. So from that perspective it is extremely important to respect the license and in some cases make your changes available under the same license depending on kind of the originating license like the GPL for instance, because that beyond kind of the legal aspect of it, it actually shows that you as an organization, you appreciate the software and the effort of the thousands of people coming together to produce this and you are being a good citizen of this ecosystem and that's your contribution back for using the software is acknowledging the effort and respecting the licenses of these different communities and projects. But certainly from 24 years to go today a lot has changed and the ecosystem today is very different and you've noticed you mentioned the new introduction, change of licenses. So there are a lot of new licenses entering the ecosystem, different startup organizations are licensing or changing their license for their software projects to accommodate certain business requirements if you wish. So it's a very dynamic and moving space. Now let's talk about the licensing when it comes to open source. In general, let's say there is no open source license for example and this is I'm going to very, very basic because I think you and I, we have been working open source for so long that we kind of take it as a granted that everybody understands that but that is not the fact that there are new companies and then there are a lot of companies. So like when you read a piece of technology and then two companies want, they will go and negotiate the terms of usage. You know, hey, you can do this, but I think licenses kind of make it open and sometimes, you know, some of the open source license whether you go with FSM definition, whether you get go with OSI, this is what you can do. This is what you cannot do. But in general, people might see, you know what, it's an open source license doesn't matter is BST or MIT or GANU-GPL, GANU-GPL V2 or GANU-GPL, can you talk about the intricacies of these licenses where we do have to worry about the compliance, not worry about we should be as you said respectful, but we do need to understand the differences and why it matters. When we look at open source licenses, I think we go by the gold standard, which is the OSI definition. So the OSI has approved about, I would say anywhere between 75 to 80 different licenses to be quote-unquote approved OSI licenses, meaning these licenses respect the core freedoms of the software for using, sharing, distributing, customizing, et cetera, the software. So there are about, let's say 80 different, roughly 80 different licenses. And certainly they vary in their use in terms of what's required to be in compliance. So let me give a couple of examples. So a lot of people know the BST license or the MIT license, and they refer to them as permissible licenses because they're very liberal, meaning if you use MIT or BST as an example licensed code in a given product and you ship with the product, you have to include in order saying that this specific product, let's say it's a cell phone includes this component, that is licensed under the MIT license. Here is the license text. You include the license text, which is about maybe like five lines. You include the copyright notices and any attributions. And this is it. These are your requirements that you have to satisfy when you ship a product. Now we go to the other end of the spectrum. So when we move from, let's say, MIT to kind of all the way to the other side of the spectrum, you have, for instance, the GPL licenses or family of licenses. Let's use GPL v2 as a very popular, highly used deployed Linux kernel is licensed with a GPL v2. So under the GPL v2, let's say you have used the Linux kernel, which is already used on many of these portable devices. It's everywhere and a lot of people don't even know it. I mean, it may be in your car as well. So if a company uses and employs the Linux kernel, which is licensed under the GPL license, then in this instance, they need to provide a number of things that are considered needed for a company to meet its obligations for that license. These different things include providing a little notice saying to the end user that this product or the service includes software license under the GPL license and providing certainly the full license text, which is maybe like 10 pages for the full GPL license text, and then providing copyright information and providing something very important, which is any changes they have made to the code. So if you've taken a piece of code license under the GPL, you made changes to it, you need to make these changes available to the deceiver of that code. So how do companies do this? As you notice, and there are about 80 of these different licenses, and the requirements for compliance, they vary licensed by license. So how do companies actually manage this is by creating and implementing a license compliance program within their infrastructure. And what that entitles, or entails, I'm sorry, is automated tools that allow organizations to scan all the source code use, identify all the components used within the certain stack, identify the license for each of these components, and at that point, generating an S1 software bill of material. And then the company knows that, okay, for these different components, that license under BSD, here's what we need to do to be in compliance with the license. For other components, the license under Apache or Eclipse or GPL, and there are different variations of the license, for example, GPL v2, GPL v3, or AGPL, and others, and identifying these different components and their licenses. And then at that point, the company puts a plan, which is in most cases, mostly automated on how they can fulfill these obligations. So a lot of licenses today. And what's adding kind of complexity to the process is all these ad hoc licenses that are mushrooming here and there, where all these automated systems are not yet updated for such licenses. So you may scan a project license under some hacked up license. And it may go under you rather because the compliance tool you use did not have an updated license stack to be able to identify these licenses. So there is always need to have a person looking over what's going on on the automation side. But to go back to your question, you know, there are about 80 of these licenses, they differ by how a organization should respect them or should fulfill the obligations. And to be able to do this at large scale, where you have companies having thousands and tens of thousands of developers using hundreds and thousands of resource components, you cannot do that manually, you need to waste to scale. And scaling is one of the major issues that organization will have to address going forward. So now let's talk about the challenges when it comes to compliance. As you rightly said, first of all, if you look at a container, you know, there are so many pieces, you know, we talk about libraries, run times, there are so many dependencies. It is, as you rightly said, manually impossible to get. Now, this may be a totally different topic. We started talking a lot about software bill of materials, you know, which is more or less like, you know, what's in your code based not to have an inventory. So can you also talk about when it comes to looking at software compliance, does software bill of material S bombs, those, these two things are different or they can help each other because, you know, we have to automate things, but we also have to find, I will of course talk about one of the ways, but I want to just quickly talk about S bombs versus open source compliance. Yeah, so I think they go in hand in hand. And I will kind of explain a little bit. But before I get into that in the past few years, there has been really enormous emphasis on S bombs and S bomb, as BOM stands for, you know, for the viewers who may encounter this word for the first time, software bill of material, which is basically a list of all the software components included in a given product or service with some metadata attached to each of these components that includes the component version, the license, the origin website or origin GitHub, et cetera. And the reason for this becoming so much in the news is that there has been a lot of emphasis across different policymakers in the US, there was an exact order that emphasis the need to have as well as from a cybersecurity perspective. So there has been different cyber attacks and there has been different vulnerabilities discovered in different pieces of code of software that gained a lot of attention. And there has been a lot of momentum going in securing the software supply chain. So when organizations shift and move code from one company to another up the stream or down the stream, we need to make sure that that software is secure and we're able to track any vulnerabilities and address them in a very time appropriate way. And this is why SBOM gained so much momentum. However, SBOM is from a compliance perspective has been known for since the early 2000s. We were generating SBOMs, but we were not calling them SBOMs. Okay, we were able to generate the list of all open source code included in any given product or service with the license information, version number, origin site, et cetera, and sending it to whoever is receiving the source code. And as recently as maybe five or six years ago, all software composition analysis tool provider SCA tools are these tools that help us compliance and security. So these tools were initially created and the focus of these tools was on ensuring license compliance because they scan the code, they tell you what's open source, they identify the license. And then these SCA tool vendors discovered that, hey, you know, we're doing most of the work already. So let's since we're scanning the code and find what open source code is available, we also have the vulnerability databases that are available publicly. And then we can create this dotted line between all the open source components we're scanning and identifying if there are any open security vulnerabilities out there. And if there are, we're going to flag it to the end user. And we're going to flag the priority, how important this vulnerability, and we're going to flag if there's a solution that is available, whether it's a patch or a new version of the software, and so on. So today we are in space where you're able to have a software composition tool that scans your code base, it tells you the origin and the license information of all the software components it discovers, plus it's able to flag to you any known security vulnerabilities. And when you generate that report, this is your S-Bomb. So today the S-Bomb is basically a combination of both license compliance information plus security vulnerability information. And the goal with that is to bring awareness to all the vendors across whom, whose supply chain these components are moving, that this is the lighting of this code plus, which is of higher importance kind of at least from a PR perspective, here are some security vulnerabilities that may exist or that exist with no solution that you must address and to prevent any kind of potential breakout or kind of headaches down the road. So today when you look at the, if you are operating in the U.S. and you want to do business with any federal entities in the U.S., you are required to be able to generate S-Bombs, otherwise you cannot do that business. But it is becoming much widely adopted than that, as most organizations are actually mandating the use of S-Bombs across their supply chain, which is really an extremely important practice because it encourages better license compliance and better security across the supply chain. What are the downsides or dangers or risks of writing code, using code without worrying about compliance? It can be challenging for big companies who may be but smaller companies. Can you talk about those aspects? I'm very glad to say that you are today in a space where it's very rare to come across an organization that thinks that it is okay and it's not important to ensure license compliance. In the beginning, if we go back 20 years, there were organizations that thought maybe license compliance is not enforceable. We can certainly use open source code. Nobody is going to come look on our door and challenge us on it and so on. The experience has showed us that definitely the open source community cares that organizations using the code actually respect the licenses and there will be voices at testing cases of miscompliance or non-compliance. From lessons learned from a lot of non-compliance cases, we can look at all these different case studies and know that at the end of the day, an organization can actually ignore ensuring compliance but eventually they all have to comply. Eventually, they all have to address and ensure they have license compliance programs. In some cases, there were some court orders to even hire a compliance engineer to oversee that and fix any previous product. In many cases, this causes a PR nightmare for the company because their vendors and their customers know that they're not or they don't have good compliance practices. All these lessons learned in the past, at least in the early 2000s, 2010, demonstrated to organizations that it is actually a bad practice to ignore license compliance, that you need to have a proper compliance not because there might be penalties or whistleblowers but it's because it is good for business. It will help you build great relationship with the open source community that is creating these different components available to you and it will also give you better visibility into your own software stack and understanding of the importance of open source code within your platform. When organizations, and I was part of these different companies, I was at HP, I was at Motorola at Samsung at Ericsson and many other places, Motorola and others, where our license compliance practices actually educated us on where we rely mostly on open source, where we have our value at and how we can actually create better leverage and better use of open source code in our platform and services. Although compliance today is mostly what people think of compliance, they think of it as an overhead, as an exercise they need to do to make sure they respect the licenses but certainly experience open source professionals look at compliance also from an architectural perspective and from value perspective. Let's figure out where we're focusing on open source and where are our jewels and let's protect the jewels and kind of make them bigger and larger and enable that through the use of open source. I don't think we are in a place in time where organizations think it's okay not to be in compliance. I think there is a very deeply understood concept and lesson there that all organizations using open source will comply with the licenses and I think very mature organizations with open source use compliance or license compliance practices as a way to discover their dependency on open source and increase the use to maximize their value at where it is. One more thing I want to talk about is that of course this is a big elephant in the room which is arrival of Genitive AI and of course there are some lawsuits going on where news publications are swing some of these companies who are using Genitive AI and Genitive AI companies are also like hey don't take our copyrighted code base so it's becoming a bit complicated space. What does Genitive AI mean for open source license compliance and what does this compliance mean for Genitive AI? I want to look at it from both perspectives. Yeah thank you for the question. Actually this is a very interesting story so at the end of towards the end of last year I was talking with a friend of mine and he was telling me you know Ibrahim you know what are your thoughts about compliance going forward and I told him well you know I think we're in a very good place and kind of compliance is kind of boring today because we know the the landscape and we know how to address it and then two months later Genitive AI is like the big thing and it threw at us a whole bucket of new challenges. So very interesting space we we're operating in with respect to Genitive AI where as a developer we can go to any given AI system prompt and tell the system you know I want you to generate a function that does x y and z and within like you know two seconds you have 300 lines of code that you can grab they're functioning and they actually meet the needs of that you want. So from a developer productivity if you wish this is just an incredible piece of tool that is available to everyone mostly you know junior engineers certainly senior engineers can use it but I think the the the most multiplier of output is coming from the use of junior and you know intermediate engineer and it's really something incredible and I wish I've had that when I was an engineer you know 25 years ago plus when I was writing code. So with that type of benefits and that immediate access to incredible knowledge that is being generated in front of you comes certain risks and you know one of them is the fact that all these different systems were trained on publicly available source code and in some cases depending on your prompt into the system you may actually get pretty much an existing piece of code and what's really interesting and I'll explain a little bit for our viewers who may not be experienced in this. You know I went to a couple of the projects that you host in in my umbrella foundation LFA and data you know very new projects they're in sandbox incubation levels so they're very new projects and I went into their GitHub account I went down the repo for a few levels and I came to a you know a function file I opened it and I copied I looked at it and then you know it's maybe about you know 70 80 lines function you know a function of that was just maybe 70 to 80 lines of code and the engineer who wrote that actually had a description of the function and I said you know I want to test this so I went back to the AI system and I said can you please write me a function that does the following and I copy pasted the function description from the code in GitHub into the prompt system and within one second I got the same piece of code exactly at 100 percent okay so this is a very unique situation where if you're able to provide a very precise prompt similar or equal to maybe a function description somewhere you're going to get the same code because all these models were trained on these systems so it's not generating right it's not creating a new code it might be copy pasting that code and providing it to you and this is kind of a very unique use case where that model or that system is giving you that piece of code but it's not telling you that this came from project x on github and is licensed under the given license so this is kind of the risk and how do companies are dealing with this really depends on the companies there are different approaches to this one of the approach is focused on the use of SCA tooling that are capable of identifying snippets of code so when you generate piece of code using generative AI systems you can actually scan them using these SCA tools and the SCA tool will tell you the origin and the license and you're able to address that challenge right there other companies are either using a combination of tools and policies or some companies are focused on policies only and there are multiple policies that companies can choose from they can be very restrictive policies where a company can decide you know I don't want my engineers to use any generative AI tooling to generate code at this point so basically the policies do not use such tools at this point this is very restrictive now on the other more liberal aspect you can leave it up to the engineers and enforce some compliance at the tooling level or you can be in the middle from a policy perspective as an organization and say we allow the use of generative AI in specific context meaning you can allow it for the use of for generating code for use cases for testing purposes but not production or you can allow it for any type of use case as long as engineers use a specific system or specific tool to do that for example and then this is not an endorsement or publicity but for example let's say GitHub co-pilot or system x or system y you know organization can opt you know we trust the system let's use that system or this tool versus that tool so there are really a variety of approaches at the policy level from very restrictive no we don't use these tools yet until we have a better understanding of the legal landscape and what's going on in that space you can be very liberal and tell your developers go use whatever you want or you can be very in the middle where you allow the use depending on the use case or allow the use depending on the tools and certainly as a bottom layer to all of that it is certainly a good idea to run all the code generated through license compliance tooling because by purpose and by nature these tools work to tell you the source and the license of the code they identify so this will help tremendous and let's not forget there's always that risk has always existed in open source meaning a developer knowing you or unknowing the contributing piece of code that shouldn't contribute to an open source program right and that's why we have DCOs in place and CLA's in place to be able to capture these however now with the use of generative AI these risks are happening at a much larger scale you know hundreds of thousands of developers but the same way that we're able to manage compliance the same way we've came up with tools and methods and policies and processes for the past 20 years to manage license compliance now we're looking at a new challenge with generative AI and we need to update our internal use of policies and update our policies to reflect this new development and accommodate for it so I don't think it's the end of the word I mean there are some in some cases a lot of voices calling you know this is dangerous and this puts you know at risk open source and you know I think it's just new development similar to anything else we experienced you know technology goes in cycle in cycles and we are at a point where we're facing new challenges this is one of them and certainly I think in a couple years we would have addressed it and like also and we would be looking at new challenges anything else that you feel like here so this is really important for this discussion and I feel to us that are you think that we hit on some major points today yeah thank you so I think as a rep the Linux foundation has a lot of research which is kind of a business unit dedicated to producing really top-of-the-line and very timely report across all technology sectors and topics and I think at the beginning of this year we just celebrated 50 reports in the past three years which is really incredible so I would urge people and invite them to visit our website at the next foundation and have access to all these really amazing reports produced by LF research with collaborations our members and a lot of all these reports are available publicly and all the infographics all the graphics all the images figures are also available for use to encourage use of these survey results and others and similarly there are within the next foundation there are many efforts going in support of license compliance whether directly or indirectly so for example stdx is one of them we have the open chain effort which is also pours into that direction we also have the effort of the to-do group which is the collection of ospo offices across our membership that collaborate together for better open source practices plus a number of publications and also equally important free training that organizations or individuals they can go and get free for example free open source license compliance trainings for developers and a number of other publications and best practices so we are working very diligently to produce a number of knowledge pieces whether it's videos training publications and make them available for people and individuals to increase the knowledge in that domain and help them be better at using and continuing to open source so everyone listening to this report you know please consider visiting the website and accessing all these resources available at your fingerprints ibrahim thank you so much for taking time out today talk about this report great insight there and as usual i would love to chat with you again soon thank you thank you very much for you i appreciate it