 Hello everyone and thank you for joining us and welcome to Open InfraLive. Open InfraLive is our interactive show where we talk about all things open infrastructure. I'm Mark Collier. I'll be your host today. We have some amazing guests coming on and we are live streaming to YouTube as well as LinkedIn and we encourage you to put in your comments, your questions throughout the show and we'll try to get to as many of them as we can. We want to try to keep it interactive. Before we get started, I want to thank all of the members of the Open Infra Foundation, these organizations that make it possible for us to do what we do including putting on these Open InfraLive episodes. So today we're going to talk about what's been happening in open source lately and kind of the intersection with AI and all the different crazy things that are happening. I think if you think about open source, it's so much bigger than it was when we all got started a decade plus ago, some of us even longer. Working at open source, it's gotten so much bigger and so much more attention is attached to it that we're starting to see more and more examples where people want to grab that attention. They want to call their effort, their initiative open source and they get that attention. It usually works. The press like to pick it up. A lot of times you'll see a lot of examples where companies come out and say, look, I've got this open source initiative or this other open source activity and the press kind of just repeats it oftentimes. So go through a few quick examples certainly in the world of AI. You see Meta had their llama and then their llama too. And look, it's great that they're creating more open access in my opinion to these powerful AI systems, these tools, these large language models. But climbing its open source really doesn't match the definitions that we have come to understand about open source, particularly with some of the commercial restrictions on use. And it's a complex world when it comes to AI. We've got some great guests today to talk about what open source even means or open means in the context of AI. But if we kind of keep going through some of these examples, we also saw in the next slide IBM and NASA open sourced this geospatial AI foundation published at the Hugging Face, Alibaba and the next example was there's another one about Meta. Meta got a lot of headlines you may have seen. And then Alibaba released their model and in some of these cases, there are restrictions on who can use it. If you're one of the Fang companies, you can't use it for commercial use. So a lot of those restrictions really run directly into conflict with the very definitions of open source that we have agreed upon, the OSI open source initiative we have their executive director on to talk about this very topic. And even if you talk about Llama 2, I asked Llama 2 itself with a hosted version that I found the A16Z had put out. What's the license for, what is your license Llama 2? And it said MIT, that's very much wrong. It's not MIT at all. But of course, these large language models, these AIs, they've been known to hallucinate. So I kind of went back with a few more follow up questions that eventually confessed to the fact that in fact, it wasn't an OSI approved license. Now, of course, every time you get a different answer when you ask these large language models questions. So this was just something I did for my own amusement. But eventually, yes, it did acknowledge, it's not an OSI license. So this move towards more open is a good thing, I think, in my opinion and AI, but it's important in my opinion that we actually take seriously the definitions around this. We'll talk about that in a little bit. And lastly, I want to mention another trend that's been happening in the last few years, which is relicensing. This is really not specific to AI. It's just another example where you see companies that are backing, investing in certain open source projects and kind of backing away from what we would consider a true open source license to source available. There's a lot of different examples of this. And even if the license isn't changing, we see examples where other forms of restrictions are being put in place because companies are feeling the competitive threats, perhaps different motivations. But we're seeing this continue to happen. And so it calls into question, I think, what is the definition of open source? How much does it matter? And we've got a great set of guests to talk through this with us today. One of those guests is Stephen O'Grady. And he actually just published an article a few days ago why open source matters speaks directly to the heart of this issue. And so now that we're on stream, let's invite Stephen on. And Stephen, why don't you talk a little bit about your background and then we'll bring the rest of the guests on. Sure. So I'm Stephen O'Grady. I'm the co-founder of Redmonk. And I've been researching and writing about open source for a lot longer than I would care to admit. And dating back to some of the origins of these projects, I have traced the histories of these licenses, have worked with many companies in this space around navigating licenses, in some cases, working with them on relicensing efforts. So yeah, I've been in space for a long time. And as you might have gathered from the piece, I have pretty strong opinions. I would suspect a lot of our panelists about the value of the open source term and sort of what the definition is and sort of why we need to defend it. Wonderful. Let's bring Pam on next to introduce herself and then we'll bring the rest of the guests on and get started. Hi, Pam Chastick. I have a firm called Chastick Legal. It's my firm. I've been working in open source software for probably about between 12 and 14 years now. My first experience was at a proprietary software company where I read the licenses and kind of thought I knew what they meant. And then I got a job at Red Hat and learned what open source was really was about. And I've been working in that field ever since. I have all kinds of clients. I have both proprietary clients who want to understand their compliance obligations for the open source licenses. I have open source foundations. I have projects that companies that want companies and people who want to figure out what kind of license they is the best for their work. And frankly, sometimes the answer is not an open source license depending on what their goals are. So yeah, so I've been working in this field now for probably 12 to 14 years and just sort of I'm really endlessly fascinated because it's such a, and I think maybe I would comment on this later is the way that the communal development and community around open source as a result of having made these license choices. I have having set up the OSD and the growth that that allowed which I just find a really fascinating social statement. So that's wonderful. Well, thank you for joining us. We're super lucky to have you. And on that note, why don't we bring Stefano in and he can introduce himself as our next panelist. I'm mute Stefano. Okay. We may have to come back to stuff. I can introduce him, but I probably won't do it justice, but Stefano is the executive director open source initiative. They are directly at the heart of this question of defining and maintaining the definition of open source. So we're excited to have him on stuff. Are you working now? Yeah. Okay. Sorry for that. Sound issues. I'm the executive director of the open source initiative. I've been on the on in working and dealing with open source for many, many years, probably before open source as a term even existed. I can't trace it back. But in any case, I came to software from as a user. I'm not a developer. I just noticed how much time I was spending as a software user trying to figure out who I should talk to in order to get the software to be modified that I was using to be modified. And I thought it was that was wrong. So I discovered the glue manifesto, the new project, and I fell in love with those concepts that started advocating and basically never stopped from different angles from nonprofit work to corporations, other nonprofits, and now back into an open source world nonprofit world. Awesome. Thank you for joining us. Next, we're going to bring Armstrong on to introduce himself. Hello, everyone. Yeah, Armstrong, a research scientist that do dependable and explainable learning. It's a project at the University of Montreal. So my focus is on certifiability of deep learning models and making safety critical system and investigating safety critical system like aircraft transportations and this kind of medical systems. But before focusing on that, I did my PhD on open source software ecosystems. And it matters a lot to me when we talk about open source because I have benefited from it. I have seen the values it brings and breaks barriers across different sectors. It brings things explainable, transparent, and builds a whole lot of collaborations that we cannot even imagine. So this can, even my research work right now, I really am enthusiastic about the context of open source, especially in the AI domain, which is also building into a massive ecosystem. Wonderful. I have somebody with your background as we get deeper into the AI topic. I think to sort of open things up here with the first question, I want to actually kind of pause on AI for a second just because it brings a lot of complexity into the debate and the conversation about what we mean by open or open source and just kind of step back to the general open source definition and just kind of ask you all like, why does this matter? I think we can definitely all see that there are a lot of actors out there that have this motivation to call their thing open source and we have a definition that just clearly doesn't meet it and there are more and more examples of this or companies just acknowledging, hey, I'm not going to be open source anymore. But in either case, I guess the question is just like, why does it matter that we actually have a clear definition and that we hold people to it? Maybe I'll start with Steven because I think you've done a lot of writing on this and whoever wants to jump in next, feel free. Yeah, I think it's actually pretty simple in the sense that the definition itself was the product of a lot of work and it was essentially the definition is something that is kind of the sort of least possible restrictions that will work at scale. So the difficulty is that you'll hear vendors come along and say, we've already mentioned Facebook, so they have a couple of projects, Lama most notably, that says here's the source, the source is available and it comes with these sort of restrictions. In their case, it's commercial restrictions and restrictions vary and that's part of the problem in the sense that if you have a bunch of licenses, all of which carry sort of differing arbitrary restrictions depending on who set them up, that's a problem longer term. So if you are a vendor and you want to introduce a particular restriction on a license, that's your right. If you wrote the software, you can license it however you want but the thing that a lot of us are insisting on is just don't confuse it with open source. You can make the source available, but for it to be open source, it has to be available under the definition that we've all come to understand over multiple decades. You see all sorts of people arguing, well, developers don't care, well they don't care because they haven't had to. They've never been subject to these sort of arbitrary restrictions that among other things don't allow developers to recombine software in different ways as again, they've sort of been entitled to sort of the definition so far. So basically the definition means something to the whole industry and if that definition is compromised, it doesn't mean anything in which case open source as a term of, you know, that describes a particular set of values and behaviors and permissions, it becomes meaningless at that point. So the thing, you know, to me it's, people want to settle the debate on how things work. I don't, that's not for me to say. To Pam's point earlier, there were circumstances where an open source license just isn't appropriate. Okay, but that's fine. But just don't call something open source just because you want the marketing benefit of that term. Pam, what do you think? Yeah, no, I, you know, I can't hardly find anything to say beyond what Stephen said because he said it so well. I would also say that, you know, there's just an expectation when you use software in an open source license, there are no, I mean, you know where the guardrails are and what you can and cannot do with that software, and they're very minimal, right? It's providing attribution, it might be having to make the source code available with certain licenses, but everybody knows where the guardrails are, these very low guardrails are, and it gives them a great deal of freedom to operate, freedom to make choices, freedom to improve. I'm sort of harkening back on what Armstrong said in his introduction, which is, you know, you just, you have room to run and you don't have to worry about where the fences are when you have open source software. And as soon as people start to put, so when you go into the proprietary world, you understand, you have to read the license, you have to understand that maybe you only have so many seats or, you know, other kind, so you implement procedures ideally, you have implemented procedures to make sure that you don't breach that license, but there's a lot of overhead in doing that, and so this just gives the open source license, and as defined by the open source definition, just gives so much freedom to operate that is lacking if it's not an OSI approved open source license that has ratified that yes, you have these freedoms that are available to you. So I just think that people are kind of missing that aspect of the open source ecosystem and why it works. Yeah, for me, it's a very tied to my personal experience because when I was a student in architecture, one of the first lessons I got from my professors was to go out into building sites and ask for permission to enter as a student and learn how things were put together, like beams and pillars were attached or read other books and read magazines where the description was done about how a building or a square was designed starting from sketches all the way to the final prints, and that experience I could not replicate when I started to use software. Like when I started to use CAD, I had to not only go through multiple steps in procurement of the software, but also once I received it, I needed to procure even more plugins, extensions, all sorts of other things that were required painful negotiations with the head of the department at the university where I was working with how to negotiate with sales vendors, the sales teams contact them. It was multi-thousand dollars worth of licenses and it was just painful and that pain was removed when I understood that there were the free software license, the open source licenses, where you knew you could download packages from the internet, you could even pay consultants so you could pay a vendor to download packages and use them, but also build on top of that and have complete freedom and control of the solutions that you were building. That to me was the life-changing environment. More to what the other panelists have said, I look at open source as a culture. Being a culture means we have a heritage that we need to preserve and to grow it sustainably. When we look at our human cultures that exist around the world, if you make certain statement, people might label you anti-semitism and this and that, the phobia and things like that and even the law will come after you. We in the open source world we also feel threatened when people are violating that space by claiming something to be when it is not. Is it by design? Is it by any other form of intention or not? But we have, for example, Stefano and Zevan and Parm, they have said it in many beautiful ways. We still have to define and try to accommodate this new changing environment that AI is bringing. We know very much that AI has been in the world for many, many more years. So they are not ignorant about open source. They do not just have the enabling technology to do what they are doing now and it is thanks to open source that they were able to be what they have right now. They cannot sabotage open source if they are doing it intentionally that I just hope it is not an intentional design. We may come to that later. Thank you. I really appreciate all the different perspectives and they all definitely are very cohesive in the whole and to me one of the takeaways that I am hearing is that during this kind of decade-plus boom of open source where if you go back far enough you can find the time when a lot of companies were very scared of open source they did not want to touch it. A lot of that had to do with the legal departments. They did not understand it. What does it mean to bring this into my company? The friction that was dramatically reduced by defining what it means, particularly with a certain set of licenses that those attorneys could understand, read and see my peers, the other Fortune 500 companies, what have you they are adopting things that are Apache or MIT I do not want to get into the licenses into the weeds on that but once they become understood I understand if it is one of these licenses this is generally what it means. It just really reduced the friction for adoption and experimentation in these companies and that is what led to this boom I think as Steven pointed out new licenses come out that creates a huge barrier regardless of what they say even if they say something that is not too far afield from existing licenses you now have legal teams that all these companies have to go what does this mean? I have never seen this one before and they just put the brakes on it and I think that is one of the things that is at stake if we let this proliferation of new things pop up and call themselves open source without kind of like the culture and the consensus building that as Armstrong very eloquently pointed out so that is a couple of two cents for me but the next question I want to ask is not necessarily about licensing in that side of it but just kind of broaching into the subject of AI and really I want to talk about just why do any of us feel it is important that AI as a technology this powerful new technology you can use the large language model example which is the LLM like a chat GPT is kind of the one that has caught everyone's imagination but why is it actually important that that type of technology be accessible be open to people I think there is debates about that some people say it absolutely shouldn't be and some people say it absolutely should be I think we probably all come more from the side of wanting things to be more open but what is important about that and I just want to maybe even start with Steph on this because I think he has just started a thread recently in a group that I started participating in and one of the questions he asked is before we talk about defining it like why does it actually matter and I think this comes back to some of the culture questions too Steph what are your thoughts on well I'm trying to build my opinion on this I just we just started and we just started a little over a year ago I gotta say and maybe it's worth reminding people that the open source definition came out well 25 years ago but after decades of existence of the GNU manifesto the practice of releasing software in the open the GNU licenses, the GNU project so there was a lot of practice there was a lot of understanding and kind of a coherent or organic evolution of computer software, computer science and an open source that went hand in hand with AI defining and thinking about what open means in this context is coming a little bit rushing accelerating with the evolution of technology for recent months or weeks even so why I started thinking about this with the open source initiative is because I think we need a framework as communities and as society we need a framework to replicate the kind of to have some sort of tooling to explain to people why with the evolution of science and computer science and software we want to replicate that into the AI space we want to have a way to control the technology to understand the technology how it's being deployed and ultimately we want to be able to tell the policy makers if you're creating a digital space if you're creating a world where machines are capable of deciding who goes to jail or not who gets a mortgage who has the right to cross the street in front of a car that is driverless we as a society we need to have a profound understanding of what went into producing those machines and explain why they are behaving one way or another and ultimately write laws and norms and regulations that allow us a society to keep control of these machines so in order to do that we need to understand the same way that we've been telling the policy makers that they needed to create a website that worked in a way with standards that could be implemented on any computer when the internet started to boom and digital services started to be irrigated from the government to citizens we wanted to have access to those tools we need to have a similar way for AI it sounds like one of the clear examples that are reasons you're giving is transparency so we see the rise of these technologies I think everyone has a sense now especially since chat GPT and just a short number of months all these new models have been released we're finally starting to see glimpses of this sci-fi AI powerful thing that has been written about in sci-fi for decades now we're saying this actually looks like there's something there but if we're going to then inevitably rely on it as society, governments, you talked about policing these are very heavy topics if we don't know how they work or we don't have the transparency that comes from the openness that we've come to know from the open source software world then that's a concern that seems like one really clear example where why we would want to have openness in these sorts of designs does anybody else want to jump in on this topic? I'll just say I think in addition that's that sort of outline there are clear societal incentives to make these technologies as transparent as possible but also in other words at Redmonk we spend our time essentially researching and writing on behalf of practitioners and there are huge practical implications for this in the sense that even if we set the societal transparency requirements aside just from a practical standpoint if you're a developer we have seen at the top Redmonk's been doing this for a long time we have seen generation after generation after generation new technology come up this is by far the fastest growing we have seen and arguably the most important and if we sort of end up in a world where these technologies are not open they're not easily worked upon or understood by engineers that's just practically speaking that's just a loss and at a minimum in a perfect world they'd be open to all participants they would be truly open source and we can all sort of work on them without reference to where somebody works or how much revenue the company generates or how many users they have as some of the restrictions placed in these technologies are currently imposing that'd be the perfect world but short of that we're able to have in Lama as an example a technology that is open unless you work at Google or Amazon or Microsoft not really but at a minimum we have access to the source code so that at least is an improvement so like I said as you take this apart there are really big important societal reasons but then there are sort of day to day practical implications for engineers for these technologies being available and ideally open source yeah I mean I think I'm glad you mentioned the Lama one again we've kind of beat up on meta and Lama for kind of using the open source term when it doesn't meet the definition which is fair but at the same time I do think it's all moving very fast and generally speaking like what they've made available is more available more accessible than sort of say which is even though it has open AI in the name of their company is really by and large much less open than sort of like the Lama example so you have this sort of spectrum and I think that having the definition clear does matter but also this is we should acknowledge probably some steps in the right direction for transparency and people working on trying to repeat it and understand what it actually does so a net positive probably for this but not quite meeting the standard Armstrong do you have any thoughts on this well I think they have really elaborated much on it and I'm really in line with what they said so I don't really have much to add here okay good well in that case why don't we go on I was just going to comment actually to sort of put together with Stefano and Steven said which is I think another aspect that we're seeing in AI and agree Steven that this is sort of monumental and how quickly it's grown is the consolidation of power that exists at this moment in time and maybe that won't be true for the long term but at the moment we have a great deal of consolidation of power in AI which I think is another reason why we need to have transparency into it I love that point Pam I think personally from my point of view I think anytime you see concentrations of power happening and that's kind of the pendulum swinging that way a lot of unpleasant side effects can come along with that and I think that's been one of the great hopes and results that have been driven by open source is kind of more evenly distributed they say the future is here it's just not evenly distributed yet and the source has helped kind of counterbalance that trend and now we see kind of it people backing away from it and I think that concentration of power is absolutely a fundamental reason why we need to take this seriously and you know even when it comes to things like regulation people talk about regulatory capture and you have to maybe kind of question some of the lobbying and motives behind that as well even if it's a very small number of players so all worthwhile things to keep in mind so as we go more deeply on this AI topic I want to actually talk about why it isn't as cut and dry I think as it has been in sort of the software world and certainly AI is much bigger than just even generative AI or large language models but even if we just zero in on the large language model LLM because that's kind of the hot topic of the day because that's sort of the chat GPT thing that more people are getting exposed to if you look at that that world it's much different than sort of what we are used to from software which is okay we have the source code if I have the source code I can compile it and turn it into a binary and it will match exactly the binary that someone else produces and so I can essentially reproduce the final output in the binary form coming straight from source code but in the world of large language models I'm just getting started I'm very early in the learning curve but I understand there are a lot of different components it's not just about sort of software if you want to recreate the final result there's many other components to it and I wanted to maybe ask Armstrong and Stefano both have a lot of investigations going on in this area but maybe we can start with kind of just understanding what are some of the different pieces that could potentially be open or not that are not just about sort of the software side of it okay so it's really interesting you link these models with software development because things turns to be like the reverse order for example in software engineering we always start with some requirements to try to get things adjusted at that level know how we are defining classes, functions and things right up to the development of that model itself but here in the machine learning world in general we are talking about data and how to feed those models on data different types of models exist some are permit me to use the language black box or white box it has been used but I mean for inclusive reasons if I have a better name I will use it when they talk about these black boxes what are they these are where we start have this is a starting point where we have to get worried even when we are talking about the openness because explainability at this point is low now once we talk about training a model what does that entails we have some data we have some observations that we have seen that data comes in a distribution the way it is curated now we want a mathematical model that could mimic that the behavior to understand that data that is that model that we are talking about but as the model gets complex even the best mathematical formulation cannot explain that same model that was the mathematics was used to design that is where things now start getting really off hands even without being open source or closed source let's just focus on the model itself when we start training that model when fitting on data to see how it performs let's take example a deep learning model with no runs with their perceptions the layers increases we don't actually can tell which particular neuron contribute to what extent to the final decision that will happen this is the place where explainability is research is going on there and the more we are talking about AI scale we are talking about foundational model which is a generalized form of narrative model where you can train the model with massive amount of data and use it with any kind of downstream task this is leading to a new variety of AI that we have not yet seen but it's they are working on right now in research which is AI artificial general intelligence it's far more than we are not even close there but research is heavily going in that direction when we put all these things together we see that by the time a decision is taken either a prediction or a generative kind of output many other factors the training of the model comes with weights with biases that's what we call the parameters of that model when we talk about 75 million parameters we know the vast amount of data that was trained the different combinations of parameters of different weights and biases it gives some certain outcomes it's a complex procedure that we may not really go in depth but just at the surface level at this point it is difficult to tell anyone that if your model functions let's say 95 percent or 98 percent accurate what does that mean what happens if let's be concrete your model is 98 percent accurate wow beautiful what does that mean how can you explain the two percent that is having an issue what does that mean to somebody who wants to use that model for example you go to a medical test and everybody is oh it's very accurate and they wrongly predict you and that was like your last chance to obtain something the catastrophe of that in an aircraft one percent chance of failure what if you are the one traveling at that moment that the plane will crash so all those kind of things explainability needs to come in so we have two types of openness here the openness that might come in intentionally not releasing the entire license or making the entire license open source or they don't have the bad intention of not but the model itself places restriction and this is what is happening in the field of open AI not because they don't want to release it but the complexity of the model makes it hard to understand what is happening inside and this is one of the reasons I believe research should really bring it itself closer to the open source world so that many eyes could look at some of these things I think mark makes a very interesting statement and even Stefan how will you tell me for example in America it has been studied and published in the literature the biases in data that disfavor black males it's well known in the literature and when you look into the algorithm it was hugely biased so open source is the I can call it the police that we could use in this setup because if you tell me that my mother predict that I for example I was traveling to hit for one time the immigration officer told me we are doing a random search and our algorithm randomly select you for a search I said beautiful I like the word random can I know how your random generator works I like to submit but I just want to be sure that we are not biased here he could not explain to me he went to his boss they just and later on asked me to go I said but no I'm not breaking the law I just want you to explain how the algorithm works for I to be selected and that was the end so this we can go on and on and you realize that explaining how the training of this model works at some time it's difficult not because of malicious intent but the nature of the algorithm themselves at some other point people might be using the wrong algorithm to solve a problem the data is available and many people for example let's talk about the large language models I will come in from the point of the footprint the carbon footprint the number of GPUs of processing powers to train one of such models is expensive and it consumes it produces a lot of carbon waste now if everybody keeps every company keeps producing it like that how are we doing with global warming how are we doing with the carbon footprint what if they make it open source for people now to reuse like what organ phase is doing this is a good example of something that we can start learning from if it is there the organ phase and then the paradigm they are using we can now leverage to make AI in a very open source just to concretize what I have been saying in this aspect I have talked about the foundational model which is a generalized form of this large language model which is what we are seeing right now it has billions of parameters as I said this parameter comes in from the the higher the dimension of the data the higher the dimension of those parameters because we will be having what is called weights and biases to make that prediction that training to really come into a high accuracy a form that can learn there are other things that happens studies have shown that they are training models to do something but that model end up learning skills to do some other tasks that was never trained to do this zero shot problem is common in most of these generative models so they have tremendous powers to do things that research itself has not conclusively agreed that they understand fully a few days back Joshua Benjo was talking at the congress in the united states he made some a kind of expensive statement on open source comparing open source with nuclear the atomic bomb that is why they should not release this technology to open source I was really mad at that point because that's completely misconception of how open source world works so if we can make the explanation of this model transparent we make the knowledge accessible why can't we make that model itself and the data available to people and the more eyes will look at it the better to improve this algorithm we know we can learn from encryption algorithm in many years back how it was difficult open source it but it only gets better when it was open source I believe AI will benefit a lot if they open sourced the procedure on the other hand I know that we define open source from software engineering perspective we should be able to sit down and talk with the AI community might be our definition will become more inclusive and capture some areas if you want to jump in absolutely you have highlighted a lot of the challenges that open source AI or finding what open source means for AI means because there are so many different layers it's not just the source and the binary like Mark you were saying the data aspect the architecture of the model the training software the inference software all those pieces all those components they have they come with their own set of challenges and understanding of what open means and the implications of openness when it comes to the sovereignty that developers and users have on the system as a whole have a different impact so if we go back to the early days we had a simple way to tell people to tell companies to tell governments if you're deploying a new software system make sure it's open source because that will grant the public access to explainability access to the control of the solution and have that sovereignty over that digital implication or the digital life we have to find similar things into these separate different aspects and as communities open source communities we've been very familiar with became very familiar with copyright law and we tend to think about slapping copyright over everything you know on data on train models on binary objects and all of that stuff it may not be and think in terms of copyright who has the ownership of data you know like my blog or my pictures but we may one have to challenge all of those assumptions when it comes to AI thank you that's really great and I think one of the things that I'm hearing from you all is that there even people that have access to all of this training data the people who created the models did all the work from the beginning they don't actually fully understand why when you ask it a question you might get two different answers or why sometimes it might give you a really insightful answer so I think this sort of reproducibility question which we kind of take for granted if you have enough transparency and it's all math surely you can reproduce it but there are just times where even the people who've developed these technologies they have no shortage of access to how it was built can't fully explain why it does what it does so that's just another level of complexity and I think as we become more impressed by the power there's also this concern of why it's doing what it's doing even if we're impressed by it or it seems to be useful we really should have pause and reflect on that because that's sort of unchartered territory for us and now I want to actually go to a question we got a question from the audience and the question was related to monetization so if open sourcing the models if you're open sourcing the models what means might an organization monetize their efforts? Pam do you want to start? Yeah I have kind of a non-answer to that question but this is what I tell people particularly when you worked at Red Hat the first question that everybody asks you is how does Red Hat make money and I've been answering that question I left Red Hat more than 10 years ago and I'm still answering that question when people find out I worked at Red Hat and the answer I give is literally we are so accustomed to the thought that I have an intangible property that I can sell the license to and that is how I'm going to generate revenue from that's how I'm going to generate that revenue and people have a really hard time to think more creatively about other ways that when you take away that model that ability to monetize that license what other ways are useful if we look around we see that open source is used everywhere and people are using it not because Red Hat is kind of an unusual company in that Red Hat or unusual companies in the sense that they are monetizing software per se but doing it by providing support and services for that software that's the model they've chosen on how to monetize that open source where you take someone like Facebook and Google they use it as a platform they're not monetizing the software per se they're monetizing services other types of services related to it so I think just using that as examples I think to say I need to monetize the model itself people need to just kind of take that away and say what is the business I can operate what am I trying to do what can I do that will make money in other ways and I'm hoping that Steven has something a little more concrete than I have and sort of that advice I mean I think Pam I think you're exactly right the way that we talk about this with our clients is basically you need to stop thinking about making money from the software making money with the software right and there are a bunch of different reasons for that one just selling software these days is much much harder there's sort of a ton of competition and even with some of the AI pieces as sort of innovative and advanced as they are they're not without competition so if somebody is charging you for one in many cases you can go get a free alternative somewhere else it might not be quite as good but it will do a lot of what you want so the question is less like hey how do I monetize this software directly than what are the models that it enables and in so many cases if you look around the industry people are sort of applying licenses to software they're not actually that's not where the majority of the revenue comes from the majority of the revenue for a lot of businesses commercial open source businesses comes from them actually running the software and operating it as a service right you know in a lot of cases the licensing for the software itself as a discrete asset is pretty minimal you know sort of as a percentage of revenue with the majority going to hey I don't want to run that I don't want to operate it so but I'll pay you to do that that's sort of one very common model so I think Pam's exactly right really everybody gets wrapped up in I wrote this so somebody must pay me for it that's just really not how things work most of the time today that's not where most of the revenue for software is coming from great Steph do you want to have anything to add before we go to our final question I like to think about restaurants where in general you have entire industry of books with recipes and still lots of restaurants are making money so the knowledge of how to make a meal is fairly available think about the soft and that that would be my software thing but there is still a lot of the industry knowledge and capability of generating revenue is from cooking meals for someone else so in that analogy always helped me understand the business of open source in that sense good I like that analogy well for the final question I'm going to kind of turn the conversation around and sort of upside down and rather than talk about sort of what open source means in the context of AI I want to ask what does it mean when AI is actually writing code and machines are writing code contributing it to our open source projects we have these co-pilots these coding assistants people are really starting to use these and find utility in them they obviously are not perfect but they seem to have some utility there's some stats out there about just how many people are starting developers are starting to use these assistants which are writing the code and so if some of the code is being written by machines maybe more and more over time and that's coming into our open source projects as submissions to be reviewed what are some of the implications of that and certainly a lot of those probably are in the legal realm so start with Pam to field this curve ball quick yeah so this is where my clients are freaking out and absolutely justifiably so I think it's kind of as Steven said at the outset this has developed so fast and our as lawyers are drawing to evaluate the risk on the generative AI models in particular like our heads are spinning because we just have kind of no clue someone else commented and I think it's very accurate to say if you go to your developers and ask them are they using generative AI tools to write software and they say no they're lying to you so it's something you know this actually it actually is exactly the question you know 12 years ago with open source software if you go and say are using open source software and they say no they're lying to you it's very parallel to that so we kind of have to just figure out and obviously the biggest concern in the biggest legal threat where the lawsuits are being filed right now is the ingestion of data and whether or not that is lawful to ingest data the way that these LLMs have been doing it and you know they're people who are you know it's fair use you know they're just staunchly believe it's fair use I would say my position is unfortunately in fair use exactly you know these questions are very fact based so I don't think that there is one size fits all answer to as much as we would like to be able to just wave a flag and say oh it's fair use it's fair use to have ingested this I don't know that that's the case I think it depends it may depend on what the purpose is what the use is you know what's done with it who the audience is for it our fair use world just got more complex with the war whole decision out of the Supreme Court so even just answer it the lawyers are concentrating on the two ends of the equation which are is it lawful to ingest this data and the other end of it is is it infringement is what spits out at the other end possibly an infringement if someone's copyright and I think both of those questions are unknown you know we know anecdotally for example that when copilot was first launched it would replicate the GPLV2 license because it had seen it so many times that can be started with a blank file that's what it would that's what it would generate so can you how do you say if someone were to enforce copyright in that that was not a copyright infringement the copyright system doesn't care too much about how you got to the end result so it's very possible and there is something called unconscious subconscious copying where you don't realize that you've copied but the end result is so similar to the origin work that you must have copied and there is liability for that so I kind of don't I put less weight on what happens in the middle and so I think the input question is we have no clue whether or not this is ultimately going to be decided how this data use is going to be decided and there are lawsuits and we may find out on the outside is it infringing so if I take that generated code and I plug it into my into the code I'm working on am I infringing someone's someone's work I haven't I depends on you know I don't care so much what happened in the middle I'm more concerned the legal question is how similar is it to that original work and then there are and what makes it more complex and software than I believe it does in you know visual works or works of fiction or other kind of works are that software also has a highly functional component so software tends to have a narrower copyright protection anyway because you know functional need is not protected by copyright sent off they are merger there these there are these doctrines that allow people to use what needs to be used so those come heavily into play in in software in copyright cases so so we're all just swimming as fast as we can to try to figure out answers for these questions and to be able to give our clients at least some understanding of the risk that they're taking one thing that I would like to add at this point is also that we have the opportunity we can also start thinking a little bit of like it's a blank slate like really the courts are going to be deciding based on the laws that exist today but these field is moving so fast and new laws are being written I think we have the opportunity to influence the policy makers and ask them to write the laws that we need for a functioning society in this space yeah I think I agree with Stefano and just to throw the light here the future of software engineering like what the pioneers from Microsoft Research suggested in one keynote at XE some few weeks backward is this generative model will be writing codes and software engineers will be doing more of the review in that context I look at it as co-authoring now if we just go a little bit one step backward all these co-pilot and generative model that produces code have been trained by codes different people used then that is where Stefano comes in to say okay the laws should also reflect that reality because if you train a generative model with my code and you deprive me from using that then we should also think about that and now it gives us a template the boiler split where I could effectively now complete the code so it's co-authoring and I think that is the future of software engineering so we cannot just wave out that reality and I mean it's debatable and something that the law definitely addressed thank you well those are great great answers been an awesome episode it sounds like individually if we want to monetize open source we should go to law school but it seems like we're going to need need more lawyers so Pam's well positioned but anyway we're running out of time like all of our panelists been an amazing show there I know that we didn't get to all the questions but we'll have more upcoming episodes so thank you all for being on the show and I want to actually just wrap up and talk about thanking our sponsors one more time our open infra members and these are the members that make it possible and also want to mention one of our next shows which is August 24th we're going to be talking about the CRA the cyber resilience act this speaks directly to this question of what's going on with all the laws that are coming down the pipe that might impact open source and this one is a pretty much a hot topic hot potato you're not paying attention to it you should be and you'll get to learn more about it on our show if you have an idea for a future episode you can go to ideas.openinfra.live thank you again to all of our guests and we will see you all back on Open Infra Live August 24th thank you and have a great time with Open Infra.