 So good morning, everybody. I'm Amanda Brock. I'm the CEO of Open UK, and we are the industry organization for the business of open technology. And this morning, I'm joined by Ben Brooks of Stability AI and Peter Sheehan of GitHub to talk to you about regulation and how AI regulation is going to impact or is already impacting open source. So I'm going to pass over to my two fellow panelists to introduce themselves. I'm going to take five minutes to give you a bit of an introduction on my view on the landscape and ask each of them to do the same. And then we're going to spend sort of 20, 30 minutes ourselves working through what's going on across the globe. And each of us is involved in that in different ways. So we'll have different perspectives for you. And we want to make this quite an interactive session. So we're going to open it up to the audience for questions. So as we're speaking, or if you have burning issues that you want to debate and discuss, please be thinking about those. And we should move over to that at about what have we got, 10.30 to 11.40. So probably about 25 minutes before the end of the session. So probably around 11.15. We'll move over to that. So Peter, would you like to start by introducing yourself? Sure. Thanks, Amanda. Good to see you again. So my name is Peter Sheehan. As you mentioned, I work with GitHub. And I think about public policy issues that impact software collaboration and developers' interests around the world. And focusing, our team is very small, but we would say small, but mighty. Three folks thinking about the world and tend to focus on cybersecurity, AI, and copyright challenges. Now, it makes sense that we'll see. You can have been here, I think, representing stability of excited to get into the contributions to the open ecosystem that your company is making. GitHub has a little bit of a different role in the open AI ecosystem. But it's important that I try to emphasize to policymakers that we provide an infrastructure context at each level of the AI stack, as a way I tend to explain it to folks. So thinking about PyTorch and TensorFlow as some of the most popular projects on our platform, getting contributions and support from around the world up the stack to training and inference code and model weights directly from the likes of Elutor AI and other open source collectives and organizations up to software frameworks like AI Verify from the Singapore government that helps with assurance connected to policy objectives, but again, a software suite that's on GitHub. So we play an interesting role in the ecosystem and certainly have been following regulations around the world to make sure that developers are heard and that ultimately we can ensure that regulation can enable open source innovation to continue and to benefit society. Looking forward to jumping into the conversation and details as we move forward. Ben. Perfect. Hi, everyone. Thanks for turning up. Well, it feels like bright and early. My name is Ben Brooks. I lead public policy for stability AI. Many of you may be familiar more with our flagship image model, Stable Diffusion, which is a large open source image model and by some estimates accounts for about 80% of all AI generated imagery in one form or another. We're best known for image, but we produce models across many different modalities as well. So we have language models ranging from large fine-tuned models down to small base models, three billion, one billion parameters. We have models trained for underrepresented languages like Japanese and Spanish. We have audio models, an audio model that was listed on the time best inventions list of 2023. It was very exciting, well received by the creative community, as well as some very recent video models and image to video and text to video generation. And in addition to that, we fund and support a lot of research around the medical and scientific applications of generative AI models as well. Much along the lines of Peter, GitHub is dealing with a lot both in the AI space and outside of the AI space. We focus really on the perspective of a model developer, but also thinking about how can we in our advocacy support this huge downstream ecosystem of developers, creators and researchers who are taking our models, fine-tuning them, turning them into useful applications or building new ventures on top of those models. So I think we'll get into a lot of that through the discussion today, but that's my background. And we've been very closely engaged across the US, the EU, Singapore, the UK and elsewhere, Japan over the past year in particular, as they think through what does the future of oversight mean in AI generally, but what does it mean specifically when you think about these huge open source ecosystems? Thanks both. So I mentioned that I'm the CEO of Open UK. My background is that I was a lawyer for 25 years. My journey in open source started 15 years ago when I joined a company called Canonical, which I'm sure many of you know. I was the first lawyer, I was general counsel and I ran the legal team globally for five years. And I am very definitely not an AI expert. And in fact, I avoided all conversations about AI until 2023 because it's too difficult, right? You really have to know what you're talking about and you have to talk about it with care. And the reason I've started to talk about it is that it is having a significant impact on open source and open source is having a significant impact on open AI as we heard from Jim Zemlin this morning. And in my five minutes to give you a sort of overview, I'm gonna pick up on some of what Jim was saying. So what I do know about after 15 years is open source. To the extent I'm an expert in anything, it's open source software. And if you ask me what it is, I'm gonna tell you that it's software that has an OSI approved license because that's an easy and quick way to make sure that it complies with the open source definition. Now that compliance is important and it's important because it means that with open source what you have is a free flow of the software. So the source codes made available, nobody in this room, nobody in any room ever speak to could tell you, well, maybe somebody could just to define me, but could tell you the 10 points of the open source definition. But the two that I always remember are five and six because five says that the software can be used by any person and six says for any purpose. And what that generates is a free flow of the code which means you can rely on open source being open source. It's not gonna be taken back later, at least not that version. And you see an ecosystem where there are no commercial restrictions. Now, as we see open evolve in AI in much the same ways we saw evolve as we set up Open UK four years ago. And Open UK I mentioned is the industry body for open technology and I rattled off some of the opens that we look into. Because what we realized four years ago is that open source software alone isn't the full picture. It's not the full landscape. And to look at software in 20, what year is this, 2019 when we set up 2019-20 you had to think about data, you had to think about hardware. Increasingly you had to think about standards and opening all of those up. And what we also have seen and the last few years in particular is friction around open source. And we see friction when we look at things like the SSPL licenses which don't meet the open source definition which sometimes people refer to in my view wrongly as open source but which have been accused of being faux pen or fake open source licenses and sometimes the community feels aggrieved about this and has described it as open washing. And the sentiment behind that open washing is very much about how there is this sense that the open source community's work has been taken and used and then not respected because there's been a shift away from it. And of course that conversation is one that we have to understand and perhaps we haven't always seen in the AI landscape it being fully understood as we talk about AI and the impact that open has. And what we have to do is we have to think about the context of opening up AI because of legislation, because of policy, because of regulation. And we see when we look across the landscape now regulators being concerned not just about AI but also about open source generally from a security perspective because they're concerned about their citizens. And we've got to a position where today open source is dependency in 96% of software. And in the stacks that you look at 76% of it is open source. Now what that means is that governments are having to learn about it and having to understand it. Some the UK had the world's first open source first policy in 2011. Some think they've been talking about it or understanding it for a long time. And when we look across our ecosystem we see this definition that requires an OSI license is used fairly frequently. But then when we fast forward to this year and we start to talk about AI clearly we have an acceleration. So suddenly chat GPT4 in February, the leak of Lama, the first Lama not an open source license attached to it but a leak from academic researchers into the marketplace. And we start to hear about open source AI except that AI generally is not on what we would traditionally have considered an open source software license, an OSI approved license or a license that meets the open source definition. And often what it has is a restriction that stops that five and six stops anyone using it for any purpose. Now that's not necessarily a bad thing. And Open UK supported Lama 2's release in the summer knowing what the license was gonna be having seen it in advance. And if you look at the meta website you'll see it was very carefully described until launch as open innovation. And we took the view that opening it up was a good thing because with openness comes transparency and that transparency enables control it enables trust and it enables competition. So we think that that's the right route to go down. And what we've seen is that even an organization with a big focus on open source software is willing to accept shades of openness. But we need to understand what it means. And when we look at regulation like the EU's AI Act which we can't look at yet because I think they're still drafting it if I'm right. What we see is principles discussed last week and we see the EU AI Act being pushed over the line by the Spanish presidency in Europe because they wanted it finished by the end of the year before they're actually ready to share the text. And the rumor mill abounds but what it's saying is that there will be a carve out around open source and they use free and open hyphen source which I hate but it's what it is free and open source software as the thing that's carved out. Now that gets us back to what that free and open source software is. So if it's saying you don't have to comply with legislation if your software is free and open source what does that mean? Do they really mean free and open source or do they mean all these shades of open? And the concern that comes from the open source communities around this and actually ought to be coming from regulators is deeply understanding this. We saw the OECD produce a new definition of AI for really to facilitate in my view the EU's AI Act but we don't have the same for open. We don't have it for open sources. Open source is gonna be a generic term that covers all the shades of open. And the reason it's so concerning is as somebody who spent all that time being a lawyer you wanna be exact right when you've got risk and you've got liability and the risks and the liabilities are potentially different. So I don't know what that was. It's a bit like saying that vehicles are all gonna have the same regulation. So if you are a motor vehicle or a driver of a car or if you are riding a bicycle you're gonna be subject to exactly the same levels of regulation when you look at the different shades of openness. They are all motor vehicles but you would never think that you were gonna apply the same laws, the same regulations, the same rules to those. And it's the same with the different shades of openness and AI in my view. So I'm gonna pass you on to Ben first of all and then over to Peter. And we're gonna spend sort of 10 minutes getting the landscape from them before we start to look at what the countries are doing. Ben. So look, I think I spent a lot of time on the hill talking with staffers and senators and congressmen and women about AI generally and their anxieties around what we can now call widely available AI. So it's almost like it doesn't matter if it's an OSI definition license or it's a kind of Lama II license. What government cares about is an AI model that is released to the public and where the weights are easily obtainable or trivially obtainable. What that means is from our perspective that's a good thing. You can inspect the model, you can customize it, you can correct for certain biases and certain vulnerabilities and you lower the barriers to entry. You help someone develop an application or start a new business without having to spend 10 or 50 or $100 million on training, pre-training this model from scratch. From the government perspective, from the perspective of authorities and regulators, it's a problem because you can fine-tune bad behavior back into that model and you can use that model for various kinds of malicious and a various purposes. I think what we're seeing is in general, my view is there is a bit of a brand problem with open source among legislators in particular. I think when they think of open source, they don't think of Linux on our nuclear submarines and in our air traffic control systems and in our data centers. They think of the worst excesses of crypto. They think of Napster. They think of all of these other things, many of which aren't open source in the relevant sense at all. But they have this view that it's uncontrolled and uncontrollable. And you're starting to see some of that percolate through into various kinds of legislative and regulatory instruments. It has colored a little bit about how the White House approached its recent executive order on AI, which we can maybe talk about in a minute. It's colored a lot of the surrounding conversation around the EU's AI Act. It's colored certainly a lot of the initiatives in the UK around the UK government's AI safety summit. But when you drill into it, there's a lot of imprecision between different stakeholders and government about what exactly motivates their concerns. Is it a threat of catastrophic misuse in the sense that we see on Twitter and in some of the more sort of outlandish theories? Is it more about everyday misuse, the production of illegal content, political disinformation? Or is it a concern with product safety? Is it a concern that if you turn this into a chatbot that the chatbot is going to interact with customers and users in a way that meets minimum expectations for safety and reliability? And the reality is that between these different jurisdictions, there's a lot of imprecision about what exactly government cares about and how exactly they should address it. I think that's right. And I think there's a couple of things in there that I've been acutely aware of. And the first is the term open source and people shouting loudly about it, who if you ask them to define it, have no idea what they're talking about and who are starting to use it as a catchall for bad actors and all sorts of bad things they perceive could happen in AI rather than understanding it. And one of the concerns there is that when you look at a lot of the consultations, there's been very little representation of the true open source communities, of the foundations that hold our code, of the organizations that represent us. And that's one of the shifts that we need to see to build that understanding. But I think there is also a balance that makes the concern greater about understanding what it is. And that's the competition and the free market aspects, the innovation and enabling innovation and being able to open up to a live competition, which is something that where you have those commercial restrictions or other restrictions on users that we see a problem with. So Peter. So many threads to pull on, it's very important. I think there is the definitional point that you're touching on and Ben, I think it's really helpful that context you're providing about widely available model weights is kind of this parlance and the US executive order kind of largely punting to the community, which I think is a good thing. I don't wanna see policy makers creating distortionary kind of licensing decisions for the community. It seems better off to really ground that conversation in the OSI process, which is ongoing to fix a definition that we might understand as open source in the AI context. But perhaps for most of this conversation we can embrace kind of the broader framing of openly available, widely available. But I think that's an important point to pull on. You know, in the past year there has been a huge emphasis intention in policy making circles on AI, largely in response to the increasing visibility in everybody's daily life of AI systems. But it's helpful to kind of ground those conversations in ongoing work in the policy community for a very long time, thinking about the European Commission's introduction of the AI Act in 2021, really focusing on high risk AI systems that would be used for decision making, as opposed to generating content and decision making in very important consequence decisions like access to education, employment, and other opportunities embedded in critical infrastructure and other kind of use cases. And that product liability frame that the EU is approached with or kind of consequential decision making more broadly, as we've seen kind of taken up in Canada, some bills in the US and California is kind of important to keep in mind as a North Star. And it's something that GitHub tends to emphasize when we're talking to policy makers. There's a lot of interest amid this hype and various kind of concerns to look up the value chain and to shape how a open source collaboration, one, just works and two, the decisions to design and release models in particular, when really a lot of the focus and where people's fundamental rights are impacted and where regulation is first and foremost needed is at that application and use case context. And so trying to kind of level set with folks is a North Star for us, you know, towards precision. I think that's, we need precision, Ben. I appreciate you kind of emphasizing that. And I, you know, when policy makers tend to think about open in the AI context, it really does shift towards this misuse conversation. It's like, well, there are all these benefits that we could list off. And I think it's very important to kind of underline the ones that you're talking about in terms of transparency in terms of competition, in terms of expanded access and capabilities. So you think about, you know, folks who are not working in for leading labs able to, you know, collaborate on AI research, build capacity, whether that's in a grad school context, working with Eleuther or other kind of open source groups, or ultimately in government where we want capacity to understand the affordances of the technology. It's also very important for scientific reproducibility and for safety research, right? So there are folks in this kind of Twitter debate that we'll probably get into later who are hopefully acknowledging that, you know, having direct access to model weights in Lama II has advanced both the security and safety research on AI to understand where exploits could be used in prompt injection that affect not only open but closed models. And so we don't want to embrace a security through obscurity mindset and kind of relitigate, you know, lessons of the past. And so I think it's important to really underline the benefits but to the precision point, maybe if I'll indulge another minute, just I try to explain this to policymakers in kind of two funnels context. There's a lot of interest in proliferation and these analogies to weapons, but really we need to think about that we have existing structures for disseminating software and regulating its use. And so concretely, folks are very concerned about kind of model weights being fine-tuned and kind of abused, but often they are shipped with protections in place, and that's a good example of Lama II, a good example. And the first funnel to think about is really expertise. So if we're worried about misuse, you know, the classic examples are user interfaces and applications that people can download to their Android phone and use and really abuse. You need to be an AI researcher to fine-tune something. You need access to compute might not be very much, but there are barriers to unwork these protections. And there's a second funnel around distribution. So if you were to then choose to redistribute that, enable other folks to do malicious activity, well, one, to the open-ish license debate, you'd be violating the license, like if you're thinking about a rail context, by adjusting the weights and undoing that protection. So that would be taken down as a copyright violation or say it is an open-source license distributing on the likes of GitHub or other platforms would violate our terms of service in terms of it's not a dual-use thing, it's really in malicious intent. And we have an established policy of taking down things for say, terrorist or sexual content. And so we certainly don't support the distribution of that and it's helpful to kind of understand that law enforcement has been looking and kind of breaking down on the kind of abuses of open-source software over time. Silk Road is a useful analogy that there are consequences for folks having access to technologies and there is ability to do harm, but this funnel of access and the ability to actually do that harm is very limited and can be targeted. And we need to evaluate that kind of narrower piece as opposed to these hand-waving conversations about proliferation, it's gone forever, everybody can use it in its terrible ways and balance the kind of precise risk with the benefits so that we can lead to good policy whether that be in DC, Brussels or elsewhere. Yep, I totally agree with that. Could I pull on one thing there? I mean, there's exactly to your point about applications versus models, this distinction is constantly conflated in media and in government. Before 2023, the conversations around AI regulation were predominantly concerned with applications and deployment. It's like automated decision making in a labor or an insurance or a financial context or it was algorithmic bias in social media content recommendation. Those were the conversations we were all having pre-2023. And then 2023 comes along and to Peter's point, there is this interest with going up the supply chain and starting to sort of control for this different choke points either in the base models or in the fine-tune models or in the APIs or in everything that comes before an application is actually deployed. And it's challenging because there is this tendency to think or to look for silver bullets. So if you're a closed source AI provider, the silver bullet is you just turn off the API. You turn off the tab. And so a lot of the conversation around AI safety and AI regulation this year has been through that lens of gate kept systems and gate kept models. The reality is, as we do with other kinds of open source software and other kinds of open technology, we have to take a more thoughtful approach to defense in depth and to cumulative risk. And so take examples from the software world. Software is presumptively open and unrestricted with a couple of exceptions. And those couple of exceptions are things like zero day exploits, which end up on ITAR and the commerce control list and export controlled by the US government. And there's a raging ideological debate that we can all have about whether that's appropriate or not, but the reality is that's the legal framework as it stands today. When we regulate things like zero day exploits or high precision GPS versus low precision GPS, or we stop people buying switch blades, but we allow them to buy kitchen knives, we fundamentally go through this process of saying what is the marginal risk of this technology? Can we mitigate that risk to an acceptable level across the supply chain? And is the benefit of unrestricted open access greater than that residual risk? We go through those three criteria in one way or another no matter what technology we're talking about. And I think in AI in 2023, that's all gone out the window. And so there's a tendency to say, how can we turn this off? How can we stop it at the source rather than thinking more thoughtfully about how do we measure risk? How do we mitigate that risk? How do we determine if it's acceptable or not? And that's the conversation we need to have in 2024 and onwards. And that's the world we're going to with open source models that six months ago there would have been a handful of powerful open source based models. And in the six months since there's been this explosion in downstream developers fine tuning and deploying these models in all sorts of useful ways, helping to make AI safer, helping to make it more useful. And we need to support that ecosystem but do so in a way where we have this thoughtful approach to risk management and risk mitigation. I think that's absolutely right. And I vehemently agree with what both speakers are saying. And if we take that back to its simplest and most practical, as we're talking to those who are creating regulation, we're not specifically talking about open source. We're talking about the technology full stop. And we're talking about building a deeper understanding for them in the impact and implications of open so that they can accurately and clearly assess risk. And obviously something evolving at the pace this is evolving at is difficult to do that with. So back in my home country, the UK, we are very actively engaged now with regulators from the competition and markets authority, from the home office, from DCIT, which is the department which is creating regulation, the office of AI, except we're not creating regulation. And what has been decided in the UK is that we're gonna pause and we're gonna have an optional opt-in code of conduct in the fairly near future. And regulation will come down the line later when we're more able to do that in an informed way. Now, it's unusual for me to be saying I agree with Rishi Sunak multiple times in a presentation until the last few weeks. And I have to say that I agree because when we're looking at what Ben's talking about, we're looking at a situation where AI is another technical tool, whether it's open or whether it's closed, it's another technical tool. And that tool has to be used in compliance with laws. And what I've found in talking to some of those departments, talking to some of our politicians, is that they're visibly surprised that the open source community expects to comply with law. They're visibly surprised that we know that law's Trump licensing because there's a lot of ignorance and a huge amount of misunderstanding or lack of understanding around open source. And one of the dangers is that so many people who don't understanding it are jumping in because they want to have an opinion, they want to be relevant. I don't think it's a massive FUD campaign. I don't think it's something that's been done deliberately to cause confusion. There is just a desire to be relevant and in the heart of a conversation that you don't really have any understanding of or can have any sort of real weight or impact and bearing in. So that's something that's really important. And since the AI summit in the UK, we've seen a shift and we've seen really good engagement with the regulators who want to get to that understanding to the extent that they're now working with us to do series of workshops and roundtables. They're looking at engaging with the open source community at our conference and doing sort of real consultations of how open source works and what it's about. So the UK isn't going to have legislation for a while, but I've already mentioned that the EU does or is. And I'm going to pass this back to you, Ben, to talk a little bit about the EU and the US and then Singapore and the Netherlands. So what we'd planned to do, but we are running a little bit late, was to break these down country by country. And I think we should stick to that even though we're running a little bit late. And I'm going to leave you two to start with, shall we start with the EU and then go to the US? Or do you want to do US first? I'm going to keep you moving along. You do either, but I mean, we can deal with it all relatively quickly, I think. There are some really interesting differences between how the EU and the US is approaching this issue. If you look at the executive order, so for folks who aren't familiar, the executive order is this presidential regulation, essentially, that was issued in the beginning of November. And there's a whole bunch of stuff in there. It is the longest executive order in US history. To give you some sense how seriously the administration takes this topic. And it's substantive. And it's substantive. That's right. It's not waffle in the way many executive orders are. But there's kind of two really interesting things in there. One is that they're launching a nine month consultation on widely available models. And this is, in my view, kind of the first time any major jurisdiction globally has had a structured conversation about what open source and widely available means in practice and whether, if at all, there needs to be some kind of regulatory framework around that. So it's a good thing that we're having that consultation. And that will run for nine months. And we certainly encourage everyone to get involved. Anyone can make submissions to that. And I think Peter will be at the announcement later this week with the NTIA, one of the lead agencies. The second thing that's really interesting with the executive order is for the first time we've kind of seen this regulated threshold. A threshold above which models will have certain presumptive legal requirements. Now, the requirements are relatively straightforward in the scheme of things. You will provide some information about who has trained the model, who owns the model weights. You'll have to report the findings of any evaluations that you've performed on the model. But the threshold is so high that really at the moment it's only going to be a handful of the largest tech companies that hit that threshold. 10 to the 26 floating point operations in the training of the model. What, when I look at that, I see... Which is larger than any model. Which is larger than any model. Which is maybe Gemini Ultra, maybe. Oh, cool, maybe they cross the threshold. I'd heard they'd cross the threshold potentially. We can get into the lower threshold that you've crossed there over that. But I see that threshold not as an attempt to say models above 10 to the 26 are unsafe. It is saying above 10 to the 26, we're not quite sure what's out there. Does everybody know what 10 to the 26 means? So basically it's a measure of the amount of computing power that you use in training the model. And 10 to the 26 flops is a huge amount of training, more than we've seen in any model that exists today. But there will be someone, most likely Google, who gets there in the very near future. And so if they hit that threshold, they'll have to comply with some of these requirements. That threshold is saying, look, we don't know what's out there. We don't know how scaling laws will apply beyond that threshold. We don't know if capabilities will change if there'll be kind of new emergent capabilities that we haven't seen before. So when we hit that threshold, please tell us what's out there. That's essentially the spirit of the executive order. And viewed in that way, it's fine. I think the challenge is if that line in the sand shifts over time, there are plenty of people who think that every open source model out there today is unsafe or should be regulated. Who think that Lama 2 or Mistral 7B or Stable Diffusion or any of these other open models ought to have presumptive legal restrictions on them. And I think that's not the view that the administration takes, but I think if that were to become the prevailing view and if that threshold were to shift lower over time, or if the threshold doesn't move and we get to that threshold and we realize that actually nothing has really changed, but they don't update and modernize that threshold, you will start to see a pretty significant impact on the open ecosystem. It will make it harder for folks to take one of those models, fine-tune them, deploy them in useful applications. And if they do so, they'll have a very large compliance burden. And that's not just something that will potentially impact AI. And this is where organizations like mine have been very concerned to make sure that the open source communities are also heard as well as the AI communities. That will have a knock-on impact across the ecosystem. So us engaging and us being very clear and responsible in the way that we deal with these regulators really matters at this stage. Absolutely. And I think the, again, we need to, and hopefully after this nine month consultation, government will have clearer frameworks and clearer criteria for determining what that line in the sand should be if there is one at all. And at the moment, no government has really thought through what that framework looks like. So 10 to the 26 is basically a stab in the dark. It is an arbitrary heuristic supported by a little bit of research. And what we need to see is making sure that that threshold is updated and modernized over time. Yeah, I think the two pieces that you pulled on for the executive order are really important and kind of underline those. Please do have your voices heard in this forthcoming NTIA consultation on widely available model weights. Yeah, there's a launch event tomorrow in DC. I anticipate it actually won't open until the beginning of the year. The consultation itself might run a little bit shorter because that deadline of nine months is when President Biden needs his report from folks on the consultation and their advice. There are three pieces to this, which you might have already hit a bit is worth underlining. The risks associated with open or widely available model weights and particularly fine tuning capabilities, the benefits to the ecosystem and possible governance measures, whether that's self-regulation, regulation, or what have you. And so please do have kind of your voices heard. In particular, I think it's important to offer kind of research and kind of some concrete evidence to what is often a kind of philosophical debate without much empirics. And giving examples of how these models are being deployed in practice, I think. The funny thing is you can go on Huggingface or GitHub and you can see the huge downstream activity in models. Very hard for government to visualize what that actually means in practice, put a face to it. They don't realize that it's average Joe or Jane's citizen in their basement somewhere in small town America playing around with the tech, helping to make it safer, helping to make it more useful. And so examples and teasing out those examples for folks in government is gonna be really important over the next six to 12 months. Absolutely. And we probably do have to keep moving, but I wanna give a bit of praise to the EO more broadly. It is, as you said, the largest or longest executive order in US history and it is very substantive. So it's taking a whole of government horizontal approach. It's not quite as mandate oriented as the AI Act, which we'll get into, but it's very helpful in underlining that regulators in their jurisdiction need to be thinking about the use of AI responsibly. So it's housing and urban development and triaging resources in that case or thinking about deploying large language models to detect and repair vulnerabilities in critical software code at scale within Department of Homeland Security and Department of Defense. There are a lot of good efforts, both in thinking about the defensive use of the technology, but also thinking about the sectoral approach. And as we get to that, as we start to think about it, we've got the executive order on AI, which is clearly, it's a very long order. It's very dense. There's a consultation going on. I will make sure that Linux Foundation share something with you in the next few weeks as well as Open UK sharing it about that and how you can respond to that and any key points that we think should be being raised. But we have sort of two strands of legislation. We have the existing legislation that AI needs to comply with as a tool and we're already looking at a risk-based environment. We're looking at a principles-based approach where the AI has to comply with laws and the environment in which it's used in any event. So the user needs to think about that and how they use it. That's obviously very different in a regulated sector like healthcare or finance in consumer products from just day-to-day use for businesses. So we've got that and then we've got this specific AI-focused regulation that we're talking about here now. And to go on to the EU. Now, the EU is an interesting position so I'm probably gonna be a little bit controversial here and I'm gonna say that I don't like the approach they've taken. I think it's extremely detailed, extremely paternalistic. There is a risk with what they've done that only maybe eight, 10 companies will really be able to comply with that and what that will do is it will foreclose the market. It will mean that in the same ways we've had these concerns about the Cyber-Resilience Act that there are gonna be restrictions around open source that we don't really want to apply and I don't really think they've thought through or understand how software is developed and how this is gonna work. That aside, I'm sure you don't agree with that view or not fully, so let's hear what you two think about what the EU is doing to the extent we know because we haven't seen the wording. Yeah, I think that's important to emphasize. Like there's a political agreement on the AI Act that was reached very, very late on Friday after a marathon trilog, which was a spectacle. And seeing- And a trilog is a meeting between the different institutions of the EU to get the final legislation over the line. Yeah, the final negotiation of three parties, thank you. And it was interesting to see the kind of lead negotiators take essentially at midnight a press conference kind of running through what they were most excited about the file and it was helpful to see kind of the commission presidency led by the Spanish leading with an excitement for an open source exemption and for protecting research and development innovation in that broader framework of making sure that they're protecting fundamental rights by offering or requiring a lot of oversight on systems deployed in high risk use cases. There's so much to the file that we could spend a very long time talking through, I think, to touch on how it might affect developers. Really the North Star remains in the final text or will remain, we have the political agreement there's going to be, as you're alluding, a number of technical meetings to iron out slight shifts to the language but essentially we know what the North Star is. That systems deployed in particular use cases are going to be closely regulated and that set of regulation looks at like, basically relying on harmonized standards that have not gone into force and until that is the case and they are developed, you could think of it as like the best practice in the industry for cybersecurity, data governance and kind of documentation, use guidance, transparency of what decisions went into making the model and number of other requirements but those cases are really going to be following on businesses that are offering systems in critical infrastructure, helping folks, triage, software, CV and other kind of evaluations. When it comes to kind of up the value chain, up the supply chain that we're talking about and how it might kind of affect developers, there was a lot of controversy over how foundation models as opposed to systems might be finally governed within the context of the law and ultimately what was settled on was a tiered approach which GitHub is quite happy to see. We worked with a number of open culture and open source organizations to offer a paper in the summer with a number of recommendations on how the AI Act final negotiation could really ensure the law worked for open source and ultimately we were arguing for a tiered approach that would take that risk as the North Star, right? So thinking about foundation models above 10 to the 25th, not 10 to the 26th flop because the Europeans wanted to apply two systems that exist today so they wanted to be very affirmative that GPT-4 would be in scope and that everything else that's larger in the future would be given the presumption of falling into a higher category that requires scrutiny. Now it's nice as to flexibility. And 10 to 25 is just to be clear is an order of magnitude lower than the executive order and we will hit that very soon and not only that but you'll have people fine-tuning models above that threshold too. Yeah, it's helpful to actually call it the fine-tuning piece so again we need to see the text. We've seen versions of the text. My understanding is it's not going to fall on folks who are taking a model that has been released and then fine-tuning it. So it's- Which would be great if that's the case. One of the biggest controversies in these model provisions that Peter's talking about is earlier text would treat an everyday researcher operating out of a university tuning Lama 2, 70 billion parameters the same as if they were open AI in terms of documentation, record keeping, risk mitigation, et cetera, et cetera, et cetera. And Europe's competitive advantage, the UK is as well in this space is the open-source community. You don't have the venture capital dollars, many cases don't have the compute. It is that open-source ecosystem that helps to make this safer and better and that's why so much of this foundational research has come out of the open-source and the academic European community. And so to see some of these questions starting to be considered in detail has been really promising. I think the proof will be in the pudding in terms of the final text. But what we want to see is making sure that those thousands, tens of thousands of downstream developers are not treated the same as a stability AI, a meta, an open AI, an anthropic. That's going to be really important to Europe's future here. An early indication is that they did that. They got it right. And I think making sure that we see that clearly because we have these concerns with the Cyber Resilience Act as well, where whilst they might try to carve open-source developers out, as soon as you start to make your income out of open-source and services provided around it, you suddenly attach a responsibility and a liability. So it's making sure that it's not only a lip service and an intention, but a reality that needs the engagement that I think we're seeing now. And they've said no to the most. I'm happy to talk more about the CRA, I'll find there's lots to say there. Oh, there's lots to say about the CRA. So I want to take just a few minutes to, I've mentioned the UK. We've talked about the executive order. The Biden administration clearly are being a bit of an exemplar here and they have been in security as well. I'm hopeful that the UK will continue to do what it's been doing, which is to liaise heavily internationally and to pick up particularly on what the US is doing. What about other places? I hear Singapore, the Netherlands mentioned. Any quick thoughts on those before we move on? I think you're seeing, if you look at the Falcon models that came out of the UAE, I mean, TII, the organization behind Falcon is funded by the Abu Dhabi government. You have Singapore, thinking about $50 million into, hopefully into open source, but into local regional Southeast Asian models that better represent culture and language from those communities. The Netherlands, again, investing. And France, the Macron government investing in open source development. Because they understand, again, that this is their competitive advantage, is how to make models that are not just better tailored for a particular context, but are available for downstream developers and downstream businesses to build on. And if you do so, and you make that available as a public resource, you lower barriers to entry and you kick off the same virtuous cycle that we've seen in open source software for decades now. So that's really exciting. I will also flag on the Biden point quickly. There are about 150 bills coming out of Congress that talk about AI in some way. And about a dozen of those are fairly serious. So keep an eye on that as well. Yeah, so how can the community engage? How do we see this going forwards? I know Ruth, who's sitting at the back of the room and there's vaguely waving at us is going to be hosting something on what open source means this afternoon. Yep, this afternoon, which I'm definitely going to. Where does the community respond? We've also seen an announcement this week, I think, or last week, last week on the AI Alliance. Where are we seeing the community engagement? How can the community respond beyond just responding to the Biden executive order? At a very high level, I think it's really important that the community promotes this narrative that it's and not all. There is a really troubling perception in media and government that it is open or closed and that that is the fork in the road. The reality is it's open and closed. We will continue to have great closed source labs doing amazing things at the edge of the envelope. But we also need to support the long tail of open source development downstream and the release of powerful open source base models that serve as the foundation for that ecosystem. And so I think when you look at Twitter these days, there's a kind of a bit of an ideological clash between open or closed. And there's an attempt to kind of present it as black or white one or the other. I think to the extent possible and in whatever forums and whatever communities you're engaged with, just emphasizing diversity and helping folks empower and folks in the media to understand the importance of diversity as well. I think that's going to be really important to get us out of this kind of existential clash of civilizations narrative that's taken hold over the past year. I think that that's right. And I think what we've seen in 2023 has been a bit of a panic. Maybe in some ways AI has always been on the horizon and suddenly it's a reality. Our UK digital minister described science fiction as science reality. That's something I was out recently, which I think is a really good line. And I think that's what we're now trying to grapple with. Peter, any sort of wrap up thoughts on this? The science fiction is science reality. Engage me a little, we'll puzzle it a little worried. But I do, yeah, no, I think, I really like Ben, your comments here. And I would say that the executive orders consultation is a useful kind of shelling point for everybody to really focus, come together and have your voices heard. In the EU context, there have been periodic files that have come through and really affected open source. And they've served as galvanizing moments to pull the community together and really offer kind of contributions that have shifted legislation for the better. This is an opportunity to have that conversation at the earliest stage of the regulatory conversation on open AI for responding to this Biden EO. So I see that as priority number one for folks. Generally, I do think it's helpful to offer kind of concrete cases of applications where you have used an open model and downstream kind of adapted to your cases, just simply pushed it into production in an application. More research generally beyond anecdotes is going to be helpful. This might not be the exact audience, but I tried to spread that to researchers as well. We're spending everything from fundamental research about how do we value the contribution of open source and open science and AI models in particular to current accounts and GDP around the world, but then also to think about just how costs are driven down, right? So it's easy for us to kind of make the case in open source software as your marginal cost copying. There are clearly barriers to folks retraining from scratch when it comes to large models. We need to acknowledge that. But per token analysis of is it, how much cheaper is it to run something that you can pull and fine tune versus closed models? I think it's very useful, particularly when we think about that long tail of applications and folks around the world that are not having the privilege of being in California or going to work in San Francisco, but really thinking about how these tools can catalyze international development, whether it be the digital public goods alliance, the digital public infrastructure initiative that the Indian government is kind of pushing at the G20. There are so many opportunities to take this and do a lot of good in the world. And so demonstrating that for media, for researchers, for everybody, is an important beyond the particular opportunity afforded by the Biden administration, but really would emphasize seizing that first and foremost. I think I'll just pick up and do a free plug here. So state of open corners on the 6th and 7th of February in London, and we have four regulators already who are going to have two R slots in a room that takes 40 people. They'll each work in different ways, but it won't be about bringing deep experts because we're doing workshops and roundtables with them on that. What that will be is an opportunity for community voices to be heard. So it will be for the audience, for the delegates attending the conference, and each of the regulators will run those in different ways, one's talking about doing a sort of speed dating thing to do short one-to-ones, another is looking at trying to get consensus in a room. So it will be an interesting exercise. And with that, I think what we'll do is open up to the floor and see where you are and questions, any burning issues, right? Let me just see. I've got a few, so I see one, two, three, two, three. Okay, gentlemen here. Could you, I don't know if you need a mic. You probably need a mic for the people who are. I'm gonna give you mine. If you could introduce yourself. Thank you. Fascinating conversation. Good morning to everyone. This is Aftab Faruqiay, a very small startup here in San Francisco, a semiconductor domain-specific startup. Two questions. I forgot the unit that's attached to 10 to the 26. What is it? Floating point operations at training, so it's the amount of compute required to train the model. How does that translate to like compute power or, because that to me is still fuzzy. I understand watts. Yeah, yeah, yeah. It's a really, it's actually quite a convoluted series of equations, but it takes into account the model architecture as well. And there is a kind of a number of GPUs versus time on GPU. I mean, there's a kind of relationship there as well. So if you've got a huge super compute cluster, you can obviously do the same amount of training more quickly than if you've only got a few GPUs. But in terms of calculating the flops that have gone into a particular model, that's quite a complex exercise, is my understanding anyway. Okay, so I will look into it myself. The second question I have is, we've talked about the models, but most recently OpenAI announced that if they were to use their enterprise class models, that they would protect users from any copyright infringement. Where are we there? Because I'm actually looking into developing some domain specific models, which Ben, I will come talk to you about. Yeah, yeah, so we're kind of at the center of a bit of a storm around copyright along with every other model developer. Look, the reality is that there is litigation around the question of, is it fair use to use existing webcrawled content for training AI models? That question is still kind of in the procedural phases in litigation, but what we've certainly emphasized, and what I think a lot of policy makers understand is that fair use has given rise to a culture of open permissive learning, open permissive training in the US that has helped to support innovation here in the US. Other jurisdictions are trying to emulate these policies and trying to emulate these frameworks. So I think at a high level, there's an acknowledgement that intellectual property settings here in the US have been an important part of bringing AI to this point. And it's important that we address emerging concerns while preserving that fundamental core cultural tenet. In terms of the specific indemnities, yes, certain large labs have been offering indemnities around models. I think you'll see that a lot with the larger labs that can kind of absorb a lot of that cost and a lot of that risk. But it very much just depends on who's providing the model and you'll see it in a closed source context because there's a contractual relationship. I think in an open source context, it's obviously harder to have some of those kind of frameworks or arrangements in place. So I'm gonna jump in with, I think there's two pieces that this involves as well. And one piece is around the ability of large companies to offer these kind of indemnities. And we saw, I was called at the Hokey-Cokey, but I think Americans call it the Hokey-Pokey that Sam Altman was playing with open AI recently. And that's the risk that you get when you have something that is entirely closed because customers were panicking. What were they gonna do? If that company was gonna implode overnight, which it looked like it might, how were they gonna manage with that AI not being opened up? And I think it's a really good real-life example that we can give of the benefits of it being open. From a liability perspective, as somebody who spent 25 years doing contracts over, you know, emerging in new technologies, I would say this is just another technology, right? So what we're looking at is another situation where we're dealing with the same kind of contracts with slightly different provisions around that new technology. And what happens there is it's hard to get insurance initially in these things as you push the boundaries. So if you're asked to sign up to a contract which has liability provisions that are very high, small players can't do it and it becomes a barrier to entry and there's a way for large companies to game the system by encouraging legislation which others won't comply with. They call it legislative capture where only certain big companies will be able to do that. And also their competitors, the small competitors aren't able to buy insurance and they're not able to stand behind insurance if they self-insure. So it narrows the market down to only those big companies. And I think that's gonna be a really important issue. And I know that the regulators are starting to look at that and what it means. And they're also going back to the definitions of open source and what traditional open source software means. You know, some would say that's the software of yesterday. I don't think so. I think it's gonna continue to be the software of our today and often of our future. But that we'll have a new one surrounded for AI. So I'm gonna come up to this lady. We don't have too much time and then I know we've got one more at the back. Hi, thanks. It's so great to listen to all of you. It's a niche area really. So my name is Katharina Körner. I'm with the Tech Diplomacy Network in Silicon Valley currently. I'm also advising a couple of startups. I come from privacy and legal and I did a lot of work on responsible AI and also started to look into open source. And I was asking myself like the impact of AI regulation, open innovation in AI, don't you think that the effect of all of this upcoming and existing regulation, but especially of upcoming regulation, even though there will be generally an open source exemption in the EU AI Act with the exemption from the exemption being high risk systems, deployment of open source and high risk system, et cetera, that the upstream effect will be that we just need to ramp up governance in open source, you know, model cards or transparency tools. So whatever, even if you're exempt, that the downstream like companies, like they will use your open source developments, it will depend if they can trust you upstream or has this always been the case or has this new non has changed. I think it would be great to have some guidelines, you know, responsible open source governance or something like that. It's a great question. My initial response is this is not a new problem, right? If you're a developer and you're maintaining something and you want it to be useful to many folks, you're gonna be documenting this well. Oh, sorry, you're still going. I thought you'd stopped, you paused. I paused. Sorry. Model cards, right? I think that is a North Star. It's very helpful that a lot of folks in the open source ecosystem have been releasing models with a model card that really fulfills the intent behind this. You know, you can document something to any number of degrees, but if you're going into the detail of the architecture that you trained on, the data set that you scraped and then trained on in the order in which you ran the data over the model, I think you're really fulfilling that expectation and you're making it very helpful for folks downstream. I hold up the pile in a Luther AI as kind of a series of models that have built on it as a good best practice of using and fulfilling the model card. The model card kind of expectation or like the set of ideas there have made it into regulation. So suddenly the AI acts kind of annex that's expecting the models that are not exempt under the open source kind of exemption will essentially have to fill out a model card. They might not share that publicly. They might have to share it just downstream with customers, but we're seeing kind of open source best practice shaping policy and to your point, governance is not carte blanche if you want it to be useful. You definitely need to take some steps and that's not a new problem. Ben, do you want to add anything? Even in the absence of regulation, I think people aren't going to want to build on something that they can't inspect and don't understand. That's true if it's a closed source system. It's also true if it's an open source system that has zero documentation attached to it. No one's going to build on an open source model if you've got absolutely no idea what's in the black box in terms of training data. No idea whether evaluations were conducted, what they yielded. You can measure some of this yourself as a downstream developer or a deployer, but to some extent, the more transparency the better and the more transparency the more people are going to want to integrate and use that model or that component. So I think that the AI Act in some ways just formalizes something that is already going to happen and does so in a way that is generally pretty workable. And we said this from the beginning. I mean, we said, look, 99% of what we see and hear from a model developer perspective, we, a corporate lab with lawyers, can comply with. We're just concerned about what happens to the next team of two or three PhD students at a university somewhere who don't have 50 GDPR lawyers. So, but broadly speaking, I think, yeah, these disclosures, this transparency, the record keeping, we will need to see more standards and good practices in place as to what that actually means in an open source context. But it was an old problem and some of this was already underway, even in an unregulated environment. And I think it's an old problem from an AI perspective, but it's also an old problem from a general open source perspective. And for at least the 15 years that I've been involved, there's been lots and lots of governance, there's even books on it. But the whole set of practices are generally referred to as curation, so the good technical hygiene and the good governance of open source software. And what you see a lot of the time is that it's not a legal solution, it's actually a technical solution that fulfills the legal need. So, we see that in the ecosystem. So, we've got time for one last question. I had a living as I get there over to Sal. Yeah, I thought this was a great discussion. I actually, I do wanna go back to why representatives for these major models are going to talk to the White House, right? Because it's absolutely true that there's policy perspective in that, but it's a column A, column B problem. Every single one of those models is looking forward at the next magnitude and saying, number one, that cost, it's way more than we've ever had before. And number two, specifically looking globally and having conversations with the US right now, they're saying you're not regulating this like an emerging technology, you're regulating this like it's a utility, right? You are now trying to make policy as if the end user for this is every one of our citizens and that's a much more stringent way to be working. I'd love to know your thoughts on that. I'd say, Ben, I was just gonna say, I don't think I was as clear as I should have been at the beginning that the reason I was avoiding AI is that I think it's something that really has to be dealt with on a cross-border basis and this gets to the heart of that issue with the fundamentals. Yeah, I mean, I think it's interesting. I think the, I would push back a little bit on the characterization that it's kind of a public utility approach to regulation in the sense that what underpins public utility regulation is common carriage and equitable access. There's nothing really in the EO or even the AI Act that's kind of encouraging the technology to be widely available or to make it more widely available. It really is more along the dual use regulatory lines and that's language that they use and it's language that has a long history and export controls and other areas. So I think when the administration or other governments kind of look at that problem of thresholds and some of the stuff that you see in the EO, they're thinking about it more along the lines of, we're concerned about dual use risk. We don't really have a good way to measure these kind of catastrophic risks today. We've been told that above this threshold, we don't really know what's out there and that there may be discontinuities in terms of capabilities and risks. When we get there, please tell us what's there. And theoretically, in principle, they could then abolish the threshold or update it or move it further up. In my experience, that's not really consistent with the kind of the public utility approach to this, but I see what you mean in the sense that public utilities are heavily regulated. Yeah, yeah, yeah. And that's a great distinction to call out. Like compute has been, in my mind, has been treated very differently to the model and the application layer. A lot of that predated AI, I mean the CHIPS Act and a lot of the CHIPS restrictions and export controls, that was all motivated by kind of geopolitical, strategic rivalry and stuff that predated generative AI, or at least the hype around generative AI. I think what we've seen from some policy makers is an awareness that compute is a much more convenient choke point than model software and application layer stuff. And so there's a natural tendency to go after that. So one of the things we didn't really talk about, but in the EO is a requirement that if you're a compute provider and you are facilitating training runs for foreign entities, you have to report those training runs. When they're very large. When they're very large, yeah, exactly. Again, some huge threshold. And so again, you're seeing there a level of government intervention that at the moment does not apply to models and software. But again, because there's just this awareness that hardware is easier to monitor, easier to regulate and easier than code and pure information. And that's another area that we're actually seeing governments jump in on the open hardware side right now. We're seeing a lot of debating discussion around that separately, entirely separately to the AI discussion. P2. So I think, Ben, your response is, no, no, it's fantastic. But just to kind of pull on the thread around, it's the EO, like I could see kind of some coverage perhaps leading to this risk and concern of where does the regulation evolve from kind of the initial stage that has been set with the executive group. But it's really important to emphasize that the expectations on compute providers on these large training runs is really minimal. It's about reporting your practices to the government. And so we're not seeing kind of regulatory capture. We're not seeing this kind of highly regulated industry at the outset. And I think it's worth kind of calling out that the expectation of monitoring to understand the risks is relatively light. And that's one piece of a very large executive order that's trying generally to expand capacity in government to understand what these risks are. So we have been saying this threshold we're playing with, I think it's important to emphasize it's really crude, nobody really likes this. It's the idea that that's the point where there is interest and need to do more work, right? And the executive order underlines the need for NIST, the Department of Energy to build infrastructure to understand risks. They also move towards precision of narrowing what types of models we're talking about. The few cases where we're talking about reporting on compute. And in particular, it's worth underlining the model side. It's dual use foundation models. It's not all AI models. It's not all foundation models. It's only models with, again, more thresholds. I'm sorry. I didn't write it. More than 10 billion parameters that could present risks for weapons of mass destruction, creation, cyber risk or loss of control risk. So they're trying to be pretty precise about this. It is a little bit science fiction, perhaps science reality to your, I don't know a question here. But ultimately it's directing many agencies in the government to get a better understanding of that risk. So we have better informed regulation and don't end up in the world that you're describing. I'm sorry, I'm gonna stop you because we're right at time. We've got a couple of minutes just to wrap up. And I know it's a fascinating discussion that we could probably have for several hours more. Just as we wrap, Peter, thinking into 2024, some would say this has been the year of AI or the year of open AI. Where do you think that we are going into in 2024? What are we gonna see? Well, at this past year, what was the year of CHAT GPT and maybe the finalization of the AI Act, I think we'll look back and say the first horizontal kind of legislation was agreed upon at least at the political level. We'll see some of the work continuing in 2024, but really a focus shifting to DC. And I think it's a huge opportunity for folks in the open source ecosystem to really have your voices heard on this consultation. We're plenty of bills. I am not gonna prognosticate on whether they'll move. But what I would offer, I guess, is another kind of piece, both for like policy researchers, but I think also policymakers will be increasingly kind of attuned to the understanding of how large language models can be used in applications as opposed to this hypothetical looking to large labs and seeing what they've done with it as a single model confronting a user. And instead about orchestration, tool use, when you allow a language model to search the internet and do any number of things, that introduces challenges, both from like a oversight as a user and a user affordances there, and obviously also policy challenges. And so I think this kind of focus on orchestration and bootstrapping agents is gonna be an increasing one in 2024. So Ben, we've got other speakers waiting and out of respect to them. Can I keep you to one minute just to wrap up? Yeah, I just say, look, we are continuing to emphasize that it's a diverse ecosystem. Open models, open technology is an opportunity, not a threat. And while there are no silver bullets, there are layers of mitigation for various kinds of emerging risks. It's important that governments get their head around how those mitigations come together, at what point in the supply chain. And that whatever we choose to do from a regulatory or legislative perspective, we preserve that diversity in the ecosystem. It is fragile and it's easy to stifle. And I think from my perspective, just the final word for me, I think we've seen a year of AI panic in 2023 and I'm hoping that what we'll see in 2024 is a bit more AI pragmatism, where governments take stock and we see regulators trying to understand the scale of openness and that there's space for open and closed models. Thanks all for joining us today.