 Welcome everybody. I'm excited to be joined by these great panelists today. My name is Justin Colonino. I'm a lawyer. I've been an open source lawyer for almost 15 years now, which is a lot to take in for me. And I've been at Microsoft for the last six, which is kind of still a surprise for me given the history of Microsoft. And I've been on the board of the OSI for the last two years and just renewed for another two-year term at our last board meeting. And before I introduce the panelists or let them introduce themselves, I want to set the stage for what an exciting time it is for open source and then in the AI hype cycle as well. I'm really happy to stand here and say open source is one. Right? It's ubiquitous across software development. The world has really realized that transparency, collaborative improvement, autonomy, and the freedom to use code for any purpose drives innovation and allows everybody to learn from and build upon what's come before us. Right? That's fantastic. The open source definition has guided this for the last 25 years and it's been realized. But we heard from Mirko that with great power comes great responsibility. Like the regulators are here. There's the CRA focused on security. There's a PLD focused on product liability. There's all sorts of other things coming down around AI as well. And there's a lot of work to do in those. But the fact that we're talking about this and we're saying and we've been out there talking to regulators when out there talking to policy makers means in some way we've already won. Right? Like people are listening. They're hearing us and they're hearing that we need to think about open source and open source innovation cycle as we think about how we build software that's critical parts of our ecosystem. But then now comes AI. That's what we want to talk about today. Two other building blocks on kind of how open source and open source development is one. Right? The first is that even as there have been some closed open AI things that have been put out there we've been seeing developers put open source pre-trained models out on the internet. Right now I just checked. There's about 335,000 models available to download pre-trained models available to download on Huggingface. And if you're like me an open source lawyer thinking what are my clients internally going to be doing? How do we think about ingesting those and putting those into product? What's the responsible way to do that? And at the same time much like the CRA and PLD we're seeing that policy makers are accounting for AI in the way they're thinking about open source AI without definition in their innovations, in the way that they're regulating AI in particular. So it's an amazing moment kind of the fact that they're even thinking about this, the fact that developers are defaulting to open at the beginning of this cycle is kind of like a dream from 10 years ago, from 15 years ago when I started in this space. So that brings us to what the panel is going to be about which is why does getting the right definition of open source AI matter? What are the challenges in that? And how do we both protect the transparency, collaborative improvement, autonomy present in open source with that definition and also get that innovation cycle going, that open innovation cycle against this regulatory backdrop and focus on AI safety. And so that's what we're going to be talking about for about the next 25 minutes. With that I'll turn it over to Steph, Shashiko and Aster to introduce themselves, tell you a little bit about their organizations and how they've been looking at this space. I'll start since to your right. I'm Stefano Maffulli, I'm the Executive Director of the Open Source Initiative. The organization has been maintaining the open source definition with the community for the community for 25 years and we come, we build the open source definition on top of the shoulder of the giants, the free software movement that started 40 years ago. And I think this weekend they're celebrating the 40 years. So part of that continuum from the GNU manifesto to the open source definition and that matters on AI too. Yeah, I was going to say that Aster stole my perspective but now I realize that I go before you. So hi everyone, I'm Shashiko, I'm the Chair of Open Form Europe and I'm also a Senior Researcher at the Swedish National Research Institute, RICE. And I think for this panel I was going to let Aster talk about sort of the policy perspective and sort of what Open Form Europe does in that space and the added perspective I could bring is even though I've been involved in open source policy for 15 years or so I was going to take an outsider's perspective since in the day-to-day now I work at the national level with public sector organizations trying to go through digital transformation. They're feeling the pressure to innovate for the good of their citizens and I can tell you that I find your introduction a bit provocative because I would like to say that open source is one and to some extent we have but it's just that the world doesn't actually know that and that causes problems. I think unfortunately a large part of the people that we are talking to every day are kind of unaware of the fact that our societies depend on open source and that there's all of this opportunity as well around open source collaboration so we still have something to do. Crawl walk run maybe. I turned it on or turned it off just so I wasn't like hotmiking when I was out there. Yeah this is a lot to do so the doing part is what Open Form Europe tries to do. My name is Aster I'm the executive director of Open Form Europe and we've been around in Brussels for I think it's 21 years now trying as well as we can to explain the merits and benefits of openness and technology to different policy makers at different levels so both the member state levels sometimes municipalities and sometimes EU commissioners and now we are like everybody else also wrestling with okay but what does open mean for AI so we're often looking to OSI for guidance there so I think it will be an interesting discussion. Okay at the outset there's a kind of question kind of like open source of AI but like AI is what right and you know in software there's kind of the traditional you know source code object code or you know source code and then you know package source code if you're in a scripting language but AI there's it might be helpful to kind of start out just to level set for the audience some people might be technical some people may be less so like me and want to know kind of what makes up an AI system and Steph you recently with OSI been doing a deep dive into both the technology and kind of policy around this and maybe you could set the stage for us a little bit on the technical bit. It's been quite a journey for myself because I'm not a developer so diving into AI was for me interesting to notice immediately how there are new components new artifacts in there and I traced it back a little bit it helped me understand the history by today history by looking backwards and I noticed that for the AI systems the machine learning systems that we hear about so much on the press we need data we need data there are there are components that software systems that train that data and speed out models and weights and other parameters and and after that the the application can be applications can be built using these these systems there are various new components that are being built by by machine inside the machine learning systems and all these new components there they share a little bit of similarity from when software started to appear on the horizon in computer science when legal systems started to become kind of confused by source code and binary code and there was no legal framework that immediately could be used and adopted there was a policy decision made to apply copyright to source code at the time today we're facing a similar a similar moment where data being built into datasets that goes into training a model and weights these new artifacts build bring new challenges and they don't necessarily fit into the legal frameworks that we already have so when I was looking at the open source definition I was looking at the 300 000 models on on hugging face the immediate challenge to me was how do these models inside hugging face why do they use the Apache software license for example the Apache software license has a lot of terminology that refers to copyright talks about source code and what is the source code of a model when when it's made of training datasets it's made of model weights parameters other source code software software tools all these confusion I think made me realize that we the open source definition doesn't fit squarely into the space anymore and that's why we need to we need to look into this right and and just to play on the Apache license for a second that that defines source code is the preferred form of making modifications to the work do you have any thoughts about what that might be in a AI system right it's source what is the source code is the preferred format to make modifications which brings us to do we need data to be able so data goes into training the model some intuitively think that data is therefore source code to the model you need to have that in order to modify the model but technically not every model I mean some models many models can be modified by themselves you don't need the the data you can give it new data and retrain it fine tune it there is a very it's very new and as a as a system so the old paradigm work up to a level not fully got it so so what I'm hearing is that there's you know if you're going to be building an AI system you need software for for running the model the software to train the model that produces weights which are are kind of part of the network and then you need the underlying data we might want to it might be helpful to get a perspective staff and Ibrahim who's not here about how should we be thinking about these categories how they intersect and how we should build projects around them should they be built as kind of a monolith of you know the data the weights and the code is it just enough to have the data just the data just the weights what what are the what's a good way of building projects in this space if you have if you have a thought if I have a thought so it's mainly it's what I'm seeing happening that there is there are many projects that are being developed and and and distributed with with or without mostly without data at the so if you look at the very latest like llama to from from from meta or falcon from other you know there are from a lot of AI is an exception but most of these large language models and large models they get released with some sort of transparency about how they've been built but not all of them have the the full set of training data training software model architecture maybe a scientific paper attached to it also that describes the inner working of the system and and and and we're seeing a variety of approaches and all of them call themselves open source AI at one level or not so there is a lot of confusion inside the these these environments and that doesn't really help anyone it doesn't help with the policy making it doesn't help us with understanding inside you know explaining to to policymakers in in Europe and United States what they should be doing what they should be paying attention so we're helping the community driving together a large conversation to start putting ourselves a little bit of clarity okay with that kind of groundwork of of you know what the parts of an AI system are and and how they might you know interact in a project let's turn a little bit to policy so I was excited to see you know some discussion in the EU AI act about open source and I'm also excited to see the amount of development you know like I said hundreds of thousands of projects developed in the open kind of embracing you know openness on some level maybe not all the way just embracing that and and living it so but Aster what are you seeing at OFE from a policy perspective now as we are considering the regulation in the space yeah I don't know I think most of you probably were here for America's presentation where he went through some of the impacts of the kind of current state pre-tri-logs negotiations of the AI act as it is right now and that's I would say kind of where we were and what we kept on seeing for the last year or so when it came to open source in this space with the relation to the AI act that's of course what we focused on at open forum Europe and Brussels but and there the discussions were you know often very much stuck in the okay should everyone be able to access and use these things isn't that incredibly dangerous you've heard the discussion I bet um but the thing that has I would say changed as of end of June or we'll see how much it actually changes things it was a statement by President Macron of France at a large tech conference in France where he yeah he said en croix don't know open source we believe in open source and this was a change of position things that had happened the company mistral AI got a large valuation in France this did not go unnoticed and there seemed to have been maybe a penny dropped a certain connection was made in the LSE Palace that open source is actually very much in the interest of the economies that are challengers in the digital space open source is a way of scaling fast and innovating and it is fascinating like what Justin says that the open source model have the level of sophistication that they have at this point in the kind of the market development um and this has been appreciated by the French now there's whispers in Brussels of a 180 turn of France on the AI act to make sure that open source AI which they now believe in and love so much needs to be protected I haven't seen exactly what this will mean in the negotiations the French position etc but it is definitely a very interesting change of rhetoric around open source away from the kind of standard talking points of how incredibly dangerous it is if anyone can use large language models or AI models for XYZ to this also has some benefits this is actually very interesting in terms of innovation and competitiveness for for European economies so that is something that we're following very closely and if you also look at a statement from Ambassador Verdi the digital ambassador of France he essentially you can make this connection to the infamous Google memo that you might have seen a few months ago about there are no modes for the large language models he essentially made a statement where he said well we don't want to build a regulatory mode for the ones who are currently the market leaders we don't necessarily want to pull the ladder up you know some of these companies that are just ahead now might want to pull the ladder up behind them and these things are changing the discussions around open source AI but here and to really linked is then to the question of definitions what do they mean when they say open source AI because we don't have a definition and if it's now starting to get a lot more attention we need to get this definition right in one way to make sure that they preserve the benefits that come with open source as it been in a definition is extended to to open source AI but also if we look at policy making we talked about the PLD we talked about AI Act and CRA making sure that we get to a good definition and then stick to it is going to be very important because if I sit and communicate to a policy maker that we should protect open source exempt open source we can't have any questions about what it is that we're exempting if companies stakeholders start to be loose with these definitions and start to not follow these expectations and principles then the policy maker will over time turn around as but what the heck is this like we can't so these definitional questions suddenly very glad that I was I started this work already last year because we need to get there very soon things are being put in print in loss no pressure yeah we're we're supportive but this comes to the question because I think here also there's no question we can't talk about no regulation for AI or large language models that's off the table that's not what we're we're talking about it's I think we need to get away from conversations about over regulation no regulation versus regulations really start looking into the right regulation guided by the principles of the open source definition principles of what open competitive markets can do for you know the AI offerings that LLMs that are out there for the users for consumers by extension for citizens that kind of focus and that kind of quote-unquote good regulation I think there's a lot of opportunity there to for once get kind of an exciting new market where open wins elastic um Sachiko putting your public sector hat on for a moment how might it consistent in definition of open source AI benefit the public sector so I was gonna first almost inter interrupt you asked her because you know as you kept talking you stole more and more my point so I knew already that this was a bad setup so given that we actually work together so but coming last but not least I think a definition does matter a lot I think it matters for us because we've been talking about the importance of having an educational you know campaign with policymakers and public sector officials and we cannot communicate effectively if we don't you know have clear definitions that you know that we don't know what we're talking about so and I think it's going to be needed because I think you say open source is one to some extent it has but you know new technologies come around there are new forms there are new opportunities for lock in okay every time there's a new technology you know there's going to be you know there's going to be actors thinking how can I lock in my customer into a proprietary system and I think a lot of the logic that we have sort of been trying to explain around open source for the last 15 years or or more you know a fee it's it applies here as well and I think and the opportunity is maybe bigger than ever because we used to want to say that okay technology should move into the boardroom decision should move from the back room to the boardroom I think with AI we might finally be there because we're starting to more and more audiences they don't just say oh it's just technology you know it's going to get to higher levels of policymakers that also business executives that are going to say okay this is not just technology this is important for like you know has important implications for for for citizens and their rights etc so so what's at stake is is you know it's just more at stake and the calls for transparency are coming from from more and more places so the opportunity therefore open source to come in and and as a way to sort of mitigate some of the concerns around AI it's a big opportunity at the same time and I mentioned just this just during the break what I see is that you know there is a lot of open source there's a lot about open source as counterintuitive to non-expert audiences and this is going to be gonna not necessarily play in favor open source because you know non-expert actors will say AI has a lot of opportunity and a lot of risk the the way that we can engage with this is going to be safer to go with you know one supplier and the term open even though they like the transparency aspects it's going to sound dangerous so we will need to you know create good narratives and and analogies to illustrate to sort of the benefits of open source and you know notwithstanding the fact that the ball you know the penny has dropped you know with micro and and France I don't think that it's it's not as widespread as that right so and so just like being able to communicate that so just coming back to the definition it is important because we need to be able to communicate effectively right and that in that transparency but danger is something that that people are going to be hearing about yeah um well so kind of going back to to building maybe the right definitions maybe picking up a little bit on that transparency piece right there's this concept we were hearing earlier about how you know the code and weights are what people are sharing and then are able to kind of make further modifications to an AI system but what's the right way to balance that you know versus the transparency in the data and particularly with data what's what's kind of tricky as many if anyone here has been working in the open data realm for some time you know some some data it's no problem but other data like medical data which there's a lot of applications for AI around that's something that's not very easy to share and so how do we balance that transparency of how the system might operate the biases underlying the data versus in that definition if you're asking me that everybody yeah i mean if i i can start with that but that's probably one that's the crucial point that we have to address and it's quite clear from the early conversations within the group that is drafting the the definition now is exactly that what's the how do we assemble a large amount of data what kind of policies do we need to write what kind of approach do we want to simulate and and accept for data to be shared to be shareable without limiting the risks from the people who share it from the groups who share it because it's not just a privacy issue there is also copyright issues that are being now challenging the the openness or the advantages of being open because the paradox of open future talks about this quite a bit it's a group in europe the the fact that by being open groups that have been developing large language models that are available for everyone to share to use and and without limitations to study how to work and modify etc are a disadvantage because they have exposed the source of their data and the source of that data contains copyrighted material so paradoxically open ai is that the company is at an advantage by being secretive yeah yeah i think it's going to be very interesting for regulation and for us as i don't know software policy people to really properly at open forum europe deal with for example the revision of the gdpr which is coming up very early by the way for for regulation but also there just imagine how many questions we will have to be able to answer in exactly this okay well yes private data mixed data sets how do we deal with these things it's going to be you know very very important to get to now when open source has won we're also going to be expected to be able to answer these questions and as an ecosystem part of it is everybody in this room have to solve the kind of collection a collective action problem of looking into this funding this research providing the answers finding the spokespeople this is another element of collaboration that the open source ecosystem now stands you know i was going to say that you know and some of these things can't be solved at a high level either you know you talk to policymakers and things like that they're talking in principles what you when you're working for example with public sector officials they need clarity but you know that and some guidelines that can help them with their specific you know situation i think you know this is a bit of a challenge i think for us because it's not just it's not just highly sensitive data it's it's also about sort of i'm part of some collaborations like energy communities and things like that and they there's just a question of you know who do i even have you know is this even my data you know if this is is the kind of result of a project where we have collaborated where there are many different sort of actors there's the you know there's the consumer there's the there's the energy supply you know company there's like some kind of middle people there there is it's not just sensitive data it's like who owns this data do i even have the right to you know is it even up to me to make a decision to give this this data so i think we have to you know some of the work needs to be going into actually some more painstaking work about looking at specific situations maybe looking at for example municipalities or or in certain sectors like you know just try to sort of i'm not an open data expert but we will have to work with these people right and that's you know that comes back to like this this notion about different communities coming together you know if we're gonna have this kind of definition about open ai we're gonna have to also work with the open open data community it's not the same as the open source community how do we do you know are we speaking the same language um you know oh yeah maybe in the last two minutes um one big picture one minute big picture thoughts from from from y'all how what we you know we've seen how open source have shaped the industry over the last 40 years um what's your hope or expectation for the way open source ai might shape the industry my hope is that we we really gather these thoughts very quickly and and we we we empower society we empower companies communities in in very wide terms to to to stimulate the progress of the discipline the same way that open source the open source definition of free software definition have empowered the progress of computer science and software about evolution for 40 years that's my hope maybe i'm hoping that you know um ai will sort of bring open source the mainstream that you know next dinner party i can talk about open source and people won't just say you know let's mingle you know so perhaps these concerns will become more um the concern of of the everyday man or a man on the street yeah now you were the one like taking the thing i was gonna say but uh i think i mean then i really hope that open source ai and i really think there's a good chance of this actually happening because of the sophistication of these uh where the the open source models are now we will see a an ai lm market that is competitive is good for users not overly concentrated and all the things that have let's say haunted the many other digital markets in the last 10 years or so and i really hope that we're not going to see that here all right well thanks everybody could give a random applause to our panelists here and and osi is busy crafting this uh open source ai definition um you should check out and it's at open source dot org yes come come see me uh mond tuesday and wednesday have a office hours type of type of thing at the at the osi booth at the linux summit at the open source summit yeah we we need help uh this is a complicated problem and it's important that we get it right so thanks everybody for your attention