 All right. Welcome everyone. Thank you for joining us today for our third RSM speaker series event of the semester. We're really lucky to have Rob Gorua and Michael Veal joining us via Zoom, and I'll just quickly introduce our guests. Rob is a postdoctoral research fellow at the WZB Berlin Social Science Center, a fellow at the Center for Democracy and Technology and the Center for International Governance Innovation, as well as a co-founder of the Platform Governance Research Network. And his first book, The Politics of Platform Regulation, is forthcoming with Oxford University Press. And Michael is associate professor in digital rights and regulation and vice dean for educational innovation at University College London's Faculty of Laws. His research focuses on how to understand and address challenges of power and justice that digital technologies and the users create and exacerbate in areas such as privacy enhancing technologies and machine learning. And they're here today to discuss their work on the emerging governance challenges posed by AI model marketplaces and other intermediaries in the AI ecosystem. We'll be sure to leave some time for questions, so please put them into the chat and the Q&A function. But with that, I'll turn it over to Rob and Michael. Welcome. Hi, I think I'll get started. It's really great to be here. We'll dig right in to this paper, which is forthcoming. It's available online, but we'll be running through some of the main points together. So what we're talking about today is about intermediaries in AI. We often hear of AI talked about like a black box or an algorithm or something like this. But in practice, it's a lot more like traditional internet regulation and challenges within it because just like the internet, it is made up of often tens of hundreds of intermediaries that work together to deliver the services functionality that you see. So you could say that AI is more about intermediaries and artificial intelligence in a way and we can feel that we can just see some of them familiar faces and they'll recur again and again the training data centers, AWS, the media provider chips or Azure teaming up with a lot of the nascent AI companies. You find that important data sources and those who maintain and manage them are key as well. Lion Common Crawl for images and text and Microsoft also runs something called Cocoa Common Objects and Context is a fraction of some of these training data sets that we see. Of course, famous foundation model developers building large models using large amounts of compute that others build upon and doing so using other intermediaries themselves are also digital labor platforms like Sama or Mechanical Turk labeling this data, cleaning, refining it. Further providers such as Vertex AI from Google or Amazon SageMaker offer people the ability to tweak and fine tune some of those foundation models using their own data which may in turn come from those labeling services and use the compute services above so you can start to see that intertwined. You also find that the devices that query the models themselves are involved in consumer devices, for example, Apple and Google being key actors here and the operators that host these models like Huggingface and GitHub which we'll be talking about today with that open source that will be placed on these hosting intermediaries. Services like chat GPT and so on provide users as feedback providers and actors services to intermediate between these whole technology stacks. And you have firms that integrate all of this together to sell it as a bundled product and to resell it and to make all these links together for people. So all of these just indicate a world of intermediaries that activate not always in the same way, not always in the same configuration and not always with knowledge of each other in a single process. But regulating or governing AI, we have a lot of intervention points here, just like we do on other kinds of network technologies. And that is the focus of research agenda that Rob and I are working on along with many other people to feel about this area to work out where interventions look like. So, Rob, to talk about your focus for today. Yeah, so the paper that we want to talk about is about something that we call model marketplaces shows our alliterative little term for AI hosting intermediaries. Again, these are interesting platforms that have emerged in the AI intermediary and development ecosystem in the last couple years. They've started to attract more journalistic attention, but not actually that much from academics so far. Sorry, we just started getting interested in these different platforms, which at their core are online services that allow users to upload or create custom models. And what I think differentiates a model marketplace from all of the other services that are bundling, you know, access to AI models or building business models around, I guess, the ability to query and interact with models is that this basically is something as like a kind of new type of user generated content platform where people can upload different models or more commonly tuned version of models and then interact with them in various ways such as downloading them or maybe querying them by a structured web interface. So these are often characterized as part of the open source AI ecosystem. There's a bunch of different services and business models involved here that will overview in a little bit. And in the paper, we really dive in on kind of two broad categories of these model marketplaces. Many of you, I'm sure will have heard of hugging face, a French startup French slash American based in New York startup that brands itself as a kind of GitHub for machine learning. These are general purpose online platforms that allow people to upload models, as well as data sets and weights and other kind of things that might be useful for people doing machine learning work, either as individuals or organizations. And these models are across a whole wide range of topics from translation to image classification feature extraction. And there are also some kind of quote unquote generative models like major open large language models are available via these platforms. The second broad category that we focus on in the paper are generative model marketplaces, especially in the image generation context. So we'll also talk about some platforms that are emerging, specifically kind of in the amateur art community, which specialize often in one broad category of model. So you're not going to find translation models here. It's going to be image generation models. And sometimes even, you know, specialize in specific sub models or specific kind of image generation models like stable diffusion by stability AI. So, I'm sure many of you and I wish I could, you know, we can't see your faces today because it's online. And if we were in a room together, I'd maybe ask for a show of hands to ask how many of you have been on hugging face or aware of hugging face have interacted with it. But this is basically the front page of the website. And just to point out a few things here. This is, it's really interesting because a you can see the sheer kind of popularity of these platforms as they have emerged, especially having face in particular. So I don't know exactly when you took the screenshot Michael but today as of today, there is more than 500,000 different models available here, and you can easily search them with all of these kind of tags that are on the left hand of the screen. And one thing that's really important to notice is that a mix of basically big research labs and organizations. So, you know, for example, Google brain Google research, Facebook AI research slash meta stability AI, all of these kind of major major research organizations are uploading their models, making them widely available here. Some of these are very popular. You can't see it here but one of the major stability AI models has four or five million downloads or queries in the last couple months, as well as a kind of mix of, I guess, you know, hobbyists. So right at the bottom left column, you can see a model called moon dream, which was just published by an individual. And this is what was it it was, it was a design. Exactly. So again, you can see that there are some content discovery mechanisms that are being integrated here they have a trending models function, that one was just uploaded a day ago and has 3000 downloads. So, anyway, this is hugging face. So, very briefly in the paper, I guess what we were trying to do was explore this kind of weird space that is emerging and just do a little bit of a preliminary mapping of the different platforms and companies that are existing in this space. And as we'll get into kind of how they are dealing with content governance problems. And we'll get into this as to why this is the interest of our paper and why we think it poses some particularly interesting challenges for internet regulation. So anyway, without going too far into depth of this, it's all available in the paper online, but you can see that there's a bunch of different services that we found. They have kind of different valences. There's a lot of image generation ones. There's some generic ones, even though ugly face is the most popular. And I guess just to reiterate the kind of first point I made the key thing that we consider to be, I guess, the necessary condition of a model marketplace is that it allows third party user upload of models. So there are many other services that provide intermediation functions that for example allow you to interact with maybe stable diffusion via an iPhone app. There are other services for image generation like mid journey, for example, which are really popular, but you cannot upload models to mid journey or kind of directly tamper or play with the models. You have to interact with them with a structured interface. I think in the case of mid journey, it's largely by discord. So this is a kind of breakdown of the different models in the space. And we looked at whether they had content policies and how they develop those policies, but we'll get into that. Okay, so kind of bringing us to the meat of this paper was the realization that model marketplaces, which we think are interesting and increasingly important intermediaries in the kind of AI supply chain and the AI stack are increasingly facing moderation issues. So one public scandal, which didn't get a huge amount of attention, but caught some traction in the machine learning community happened in 2021, 2022, where a Swiss machine learning researcher who's also a YouTuber took to popular, well, he took a popular open source, large language model, and then tuned it using a data set that a group of researchers had collected, which was basically a collection of all of the posts that were posted on the 4chan politically incorrect forum in the last few years. And using this, the YouTuber slash researcher kind of said it was a bit of a prank. He created a model that he called GPT 4chan, in effect, a kind of large language model for hate speech and harassment, but he said he released on for research purposes, putting the code on GitHub, and then the model up on hugging face. So this kicked the hugging face staff directly into gear and Michael is going to briefly talk about how and what they did. Yeah, so here you see, this is like the community discussion pages where some of the content moderation action now happens on a hugging face. They're not really having an exact venue for it. Initially, when it was set up, it didn't really have a clear content moderation policy in its early days. And here you see some people concerned about it, you see that someone says they tried out the demo model of a tool four times using some benign text. The first response was just the n word and the second, the second was a sentence about climate change which led to a conspiracy theory about the Rothschilds and Jews being behind it. And you kind of get an idea of what this model is doing. So Wade's in the content moderation stuff of hugging face and the content moderation stuff is actually the CEO. So we go quite straight to the top when we do content moderation in this very hands on platform. And this is only a couple of years ago with less than that now, even where Claremont de Lang asks that comes in and says, okay, we had an internal debate around this, you know, clearly not having consolidated a policy previously and we think it should stay up although we think it should be clear about its limitations. But that only lasted for a few days, because within about a week, hugging face stuff changed their mind and said we will after all disable this model. So what's happening here, well, one thing that's happening is just a discussion about the shared difficulty of moderating these kinds of models here and one of the reasons it's so difficult is because we have three different ways in which we might think that models as tools can be understood as being harmful or risky or so on. We have their intended uses. We have the idea that you could say this is a model designed to troll people on on 4chan. And therefore you have a policy about that you could put in a model card, which is a metadata for models that's been popularized following model cards for model reporting paper. That's pretty easy to moderate right because it's just comparing intention against a list of policies, pretty simple, but you have two areas which are then harder to deal with at this point you have the realized uses empirically has a model been used in a certain way. And you have the potential uses doesn't model have affordances or does it have the actual capabilities to be used to do this. The first requires empirical evidence and the second requires analytic evidence, neither of which are cheap or indeed self evident. We can see that the hardest one maybe from this is the potential. Because suddenly you have to look into a model and really read team it or put it in a lot of different situations and imagine how it can be used using a variety of say technology assessment methodologies. As well as some actual formal analysis of what's what's its capabilities are, but legislation in many jurisdictions already forces platforms to moderate models for their potential use. So a context here to start with is that we're looking at jurisdictions that are not the United States because the United States is a global outlier in the way that section 230 does not require take down under any circumstances. That's not quite true there are specific circumstances but we're not going to get into the amendments of that. Almost every other jurisdiction in the world operates for most of the illegality a bit like the DMCA does with the notice and take down process. So that's the norm and it means that when I submit a complaint of illegality. A corporation can lose at least that level of shielding if they don't take down the content and the complaint was well founded. The notification was clearly pointing to illegal content on that platform. There may be other layers of shielding below like you don't automatically become liable the laws themselves determine that the law that renders the whole thing illegal in the first place. So that's the context. So let's look at some of these potential legislation. Well, looking at the UK models that produce information that are useful to a person preparing a terrorist act are likely already illegal like the anarchist cookbook. If the model regurgitates this. This is likely to be full foul of the terrorism act 2000 it can be transformed into say bomb instructions or so on. And that's about potential information there. Models that have the potential to be converted into indecent images of children are considered indecent images of children, whether they're virtual or not. And that's effectively they're called a pseudo photograph in the law. And some of the logic is you can't password protect a zip file and say well I put the other bit of information you need to unlock it over here therefore the zip file itself is not illegal or contains is not trigger any liability. Similarly, a model itself that is just one prompt away, whether it's carefully crafted or not from producing indecent images of children would fall foul of this regime models that leak private data. We know that that happens we know that models memorize credit card numbers natural security numbers, social security numbers other bits of information. And it's not just private information actually it can be biographical information that is considered personal data, well they can attract liability doesn't necessarily mean they're illegal, but it's something that can be considered from their potential. It's just like a data set that can be potentially released and more controversially copyrighted content and we're not going to go into that but of course they're ongoing court cases about this that you can you many people in the audience will know about. So, just to give a flavor, these things point to potential so already drags platforms into having to respond to the hardest of the three categories or arguably the hardest. So they don't have a choice just to look at the intended use. They are often going to be dragged into making decisions about the potential use. And this is compounded by the fact that some of the model parking places we looked at, like Civiti which since we wrote this paper has actually become fairly well known in this area. We put the paper out last November or so initially last October. We found that, you know, Civiti is self was it's, it's working on the basis of illegality, pretty much, who's going forward back here. And we're working is working on the basis of consent not consensual non consensual AI porn in many cases and therefore it's not surprising that we get liability that accrues from that. And we find that in that liability, there's pressure, the pressure of other infrastructural providers other supply chain actors that place content moderation pressures on other parts of this algorithmic supply chain. This is not uncommon we've seen this say payment providers, obviously pressuring content moderation on say tumblr or only fans relating to Mastercard or PayPal providing pressure it's classic internet regulation here. Okto ML which was used by Civiti to power some of its image generation that was happening on the platform itself was very concerned around child sexual abuse imagery being used and demanded certain moderation undertook certain scanning itself and eventually cut off its relationship with Civiti so we see those. All things is causing pressure for content to moderate both in jurisdictions around the world but in this case, also in the US despite section 230. Yeah, so there was actually a really. Oh, thank you. Thank you, Michael. Yeah, there was just a interesting question in the chat or a comment noting at the kind of prevalence of different social features that are being integrated. And just in the interest of time we had a few more slides that were walking through the interface of Civiti and some of these other platforms, which we cut unfortunately. But they do have some really interesting affordances relating to the kind of generation of some of these problematic forms of material that we're going to get into. And that's things like the ability to combine different models on the platform, the ability to easily deploy models on the platform to interact with them. And there is like more fundamental business model questions around how they are making money and who their user base is that we'll get into all of that in a second. So first, this question of our model marketplaces actually responding to this. So the first half of the paper, I guess sets out this context and deals with some of the policy challenges and also I guess the existing kind of background and context under extant law, as Michael mentioned. And then we moved to a set of policy case studies of three major model marketplaces as part of writing, or I could say maybe two model marketplaces and one kind of legacy software repository where hosting intermediary that has model marketplace ex palences. So how are these companies responding. One of the first major things we're seeing is, I guess also through a fairly classic dynamic of content governance and content moderation in ways that I guess is inspired by some of the tradition in social media and user generated content regulation. So filtering and also trying to limit model visibility and adfriction in various ways. So we're seeing that some platforms under pressure from government stakeholders. Michael will get into this. It's not just journalists and the public but also increasingly some government actors that are requesting the take down of certain models. But because the business model of a lot of these marketplaces is predicated on removing friction on making the ability to deploy models easier for lay users and for organizations. You know, things like hugging faces inference at points, which can be queried and moves to play with all sorts of different kinds of models in the development pipeline, as well as I guess various online features that just make the discoverability of different models easier. There's some simple interventions that companies can take and are taking to add some of that friction back. So for example, if you go on hugging face and you look at the main model repository pages, most of them have a little widget where you can directly interact with a toy version of the model. And that is something that the hugging face team in the case of GPT for Chan, for example, initially limited as part of their kind of moderation interventions into that. They can make it harder for users to download, for example, putting it behind a logging wall so it's not just fully public. And they could remove it from, you know, major discovery mechanisms like the trending section from page and something we're seeing increasingly with new kind of type model releases is that some of these models are requiring users to actually identify themselves and sign some kind of contract or license before being able to download. Another thing we're saying is something that we call in the paper bolting on mitigation features or basically different types of automated filtering for, I guess, safety and for trust and safety. And there are two broad categories of this. We're seeing some developers actually try to bundle these into their models. And again, this is a potentially important intervention, but it's hard to make the stick, especially for more savvy users who can who can remove it locally. And a focus of our paper, I guess, is on marketplace filters. So the model marketplaces themselves are starting to apply both input and output filters. So, for example, doing keywords, searches that block, for example, certain types of queries. Again, users are working around this. So it's a bit of an arms race as well as kind of output filters. So what CIVITI was doing, for instance, was taking a off the shelf computer vision offering like Amazon recognition, trying to get it to the technology and then running all kind of outputs that are being created directly via the CIVITI platform and their partners like Octo and ML, those are getting scanned and potentially flagged as not safe for work or even removed. So what's the second thing that we're seeing in this space as well? Yeah, so we're looking, one thing that was moving down to some points, I think, that are a bit kind of novel and curious that we were discovering and wanted to analyze as part of this paper. One is the outsourcing of moderation standards themselves. And that's an unusual thing. We don't really see this very often. What's happening here when we look at hugging face is that hugging face is spending quite a lot of effort and energy, including in academic spaces, pushing something they call behavioral use licenses. These are like the evolution of copy left licenses where the distribution on use of IP protective material is conditional on the behavior of the licensee. So the main one is open rail, that's open responsible AI license. But there are many, many others that are being used, including specific ones being generated by developers themselves. It's not entirely new. JSON, the JavaScript object notation language actually has a don't be evil license. Don't use for evil, only use for good. So you can't use JSON for evil, but I think that's probably not very enforceable. And there's the, they're not really famous, I'd say, but it's not, honestly, the tempting says famous. The, anyone but Richard Stormlin license, which is a was sort of maybe one of these joke licenses that exist that say anyone can use this for anything unless you are Richard Stormlin, in which case you cannot use it for anything at all. So there are a world of specific licenses that have existed before. That's happening here. However, however, the What's happened here is that hugging face doesn't just respond to the actual license saw or their agent saying hey we see something on on your platform that is say a version of our model that has been used in terms of say medical use or to create harm to people or in a discriminatory way or to be used by police or something like this, that someone's posted on your platform please take it down. They actually respond to third parties, pointing out these license, these license terms have been breached so they've someone says hey you know this model actually breaches the terms of the upstream model that were written in a very open ended way. And so hugging face ends up enforcing by the looks of what they're doing right now and this may change in the future. Looking to a world where they're enforcing rules that are written by other people. This is a problem because you should be very careful with the rules you write into your content moderation guidelines community standards and similar because some things are very difficult to operationalize. And we'll see some of these here so we look at the open rail license, which is the most popular of the behavioral use licenses on the platform. We'll see that you can not use models to disseminate verifiably false information. You can't disseminate personal information that can be used to harm an individual or to generate it at all. You can't disparage somebody using a model. We're not sure what that would mean in this case. You can't violate any applicable national federal state local or even international law using this model, which will depend on your jurisdiction is very unclear how that would work. And you can't have indirect discrimination or disparate impact using a model which is also difficult to prove. We're not saying any of these things are bad terms in principle, but they're very hard to operationalize in practice. They're very hard to operationalize in the classic content moderation landscape of outsourced or fast decisions that are being made, ideally not by the CEO every single time. And we saw this as an example that we draw upon in the paper. And thanks to for a full media for the picture when they covered our paper they made some beautiful images. So we're going to just borrow these now with much thanks for reporting please go and subscribe to that fantastic journalistic endeavor. In 2022 and 2023 hugging face removed models of Xi Jinping singing. And they removed it because a complaint was made it was redacted we don't know who made this complaint but it was very similar to complaints made on GitHub by the Chinese network authority which is part of the Chinese government that does take these kind of requests on the base of Chinese law and take down requests. And they said please remove this model because hugging face did remove it we can't see the exact request but hugging face removed it saying we remove it because it is not in adherence with the upstream models license. The upstream model which creates singing voices or whoever you want to upload and fine tune the model to do was hosted on GitHub. And someone had placed the other license or had placed a license saying no political use allowed for this model. And so immediately you weighed into saying this is political. And that is the reason why you couldn't do it and a hugging face did afterwards when pressed by 404 say it's also impersonation. And that's not part of our community standards at which point 404 asked why they hadn't take down was taken down all these Biden singing generators that were on hugging face and apparently there was no response forthcoming at that point. So you can see this is a very thorny area and we're not pointing the finger hugging face for doing anything bad or an ethical in fact we quite like them as an organization. The point is they they can't put that hand up and say we will solve all these problems or we will be the intervention point or the arbiter for all of these issues or indeed the arbiter of other people's rules. It is not a job you want and it's not a job you should be volunteering for and we need to have some careful consideration about how this whole ecosystem is going to work in terms of moderation for it not to go a bit wrong. One lesson that we did see that is a very interesting trajectory and something that we think could hold promise for the future, although it's not perfect was on GitHub. Now GitHub isn't strictly a model marketplace at least not primarily it's a software repository. And one thing that Rob and I found when writing this paper is it is astounding that nobody in scholarship as we are aware of has written about GitHub content moderation policies. We tried to look if you know anything please do send us a postcard. There was no literature we could find that was really doing analysis of software moderation on GitHub. Surprising there's been lots of research on all kinds of platforms but so we had to do quite a lot of primary work digging through threads and and and blogs and posts and forums and all these kind of things and read it threads and all these kind of things to find out some interesting juicy tales for you folk. And here's a juicy tale that we discovered which was that in 2020 and some of you may know this it was reported lightly at the time. It was a YouTube DL which was a tool probably many people have used it in the chat I'd imagine it's a command line tool where you can put in a YouTube URL and it will give you a copy of the video very handy. The beloved recording industry America Center takedown request under DMCA section 1201, which is the anti circumvention section, and that relates to software that can go round digital rights management. And it was presented as problematic as a as a clause by people like Pamela Samuelson for many, many years. But it's a difficult one to deal with. And the GitHub did take it down. Then there was public outcry GitHub then commissioned legal analysis. And they looked at this and they actually in the end said look it's not really removing protections the rights holders are put on. It's just getting the HTML video out. It's just been cracking some specific rights holder protected files so we don't believe it falls under the anti circumvention rules and GitHub reinstated it in the process. GitHub in our view from our reading of this clearly realized that this is unsustainable they cannot go and commission legal analysis in a detailed way and reason in this slow expensive way about every anti circumvention tool takedown request. And it was very interesting. They put up a $1 million fund, and they said, you can use this fund you can apply to it if we take down your anti circumvention potential anti circumvention tool or your potential potential circumvention tool rather, which will do fairly liberally now like we will take them down if we get requests as long as they look plausible. You can apply to us for money. You can use that money, and you can hire some lawyers or an analyst or whatever, whatever you need to write a DMCA counter notice, which the geeks in the audience will be will be loving because they count notices are a very interesting tool there. They're not exactly great at solving the problems in the world. They require the person whose content has been removed to submit under threat of perjury a good, a good faith statement that they believed it to be actually legal to have that content on the platform. And that effectively triggers the reinstatement of the content on the platform, and it moves the dispute to between the rights holder and the uploader. So the content can go up and the dispute is then moved elsewhere and then the conclusion of that plays out. We don't know how much has been taken up, but we do know that it's been offered many, many times because we have that data from GitHub. And what's happening here is an externalization of analytic capacity. So while hugging face was doing a lot of analysis in house, it seems and maybe moving to a world where they're going to have to do more and more of this and it's going to get very burdensome to think about that potential. GitHub says in their previous non machine learning model, but software dual use software related story says actually we need to put this outside put a limit on the money will pay. We need to make other people do that work and give us better information or get it moved to get these disputes moved elsewhere. So this could be a way forwards or at least a flavor of a way forwards, because we don't believe that these model marketplaces will be capable of being everything to everyone and having all this capacity themselves even though they will have to play an important role nonetheless. So what's next. We've already been talking for more than 30 minutes so we'll start wrapping up. Yeah, please do check out the paper if you're interested in any of this. It's, I don't know about you, Michael, it's my longest paper. I think it clocked in at like 18 or 19,000 words. As Michael said, we were, you know, really diving into some of the earlier software studies work on GitHub. We were trying to find work on content governance in the GitHub context having a hard time, and just also trying to do a lot of background mapping analysis of different business models emerging, as well as a policy analysis and some case studies. So, yeah, we were having a hard time finding a venue that would take such a monster paper. Maybe we should have split it and done like a part one part two but anyway. It is a journal with like a million pages and one in two inch margins. Oh man, with the US law journal footnotes this would have been would have been off the chain. But I just want to kind of wrap up with with a short note on some of the layers of challenges that we think are interesting here. And my background is as a platform governance as a content moderation policy researcher. So I really came out this from the, from the angle of trying to, I guess, think about the stuff I know, and I guess think about what is new and different and interesting here in terms of different affordances and dynamics and Yeah, different kind of policy features. So, I think we're seeing at least a different types, a few different types of problems emerge through these case studies that we've done, which again are available in the paper. And some of these are not new challenges for internet regulation. Right. So something like a bad actor problem where already and we didn't give it, you know, that much time here. But some of the model marketplaces that are being really popular are doing stuff. And maybe are aware of their liabilities under extant law or maybe not. We were digging through the Civiti Reddit forums and there was a beautiful post where actually users told them that they need to have the MCA computer I take down forms and they said, Oh yeah, good idea. I've never heard of it. None of us are lawyers. But as they've been kind of taking this stuff more seriously, there's a question to what extent they actually can or wish to deal with this problem. They've had a super adversarial relationship with creators and copyright space, especially. There's this incredible open letter that they published themselves on the reddit where a artist complained that many users were making models designed to specifically impersonate his art. And they send him an email which basically said, Hey, guess what? Have you heard of the strident effect you idiot? We've created a contest where now, you know, dozens of people are trying to make the best Sam does arts impersonation model. So, you know, how do you feel about that? And it kind of reminds me. I mean, that's like an extreme example. Screens are going to think this by the way. Yeah, and I don't I don't think they're, you know, as I guess laissez-faire, or as, yeah, just just completely candid about these things with areas like child sexual abuse imagery, pseudo child sexual abuse imagery, which is a huge issue for them, as well as non consensual intimate image of abuse imagery, which is also, I guess, you know, something that they do have policies against. But to just quick things here. It reminds me of that old saying where it's very hard to make someone understand something when their livelihood depends on them not understanding it. The case certainly isn't helped when certain platforms in the space are being funded by VCs that consider trust and safety and content moderation to be an enemy of progress. Just a mere kind of speed bump to be blasted over in the race for AI supremacy of whatever sort they're envisioning. So again, this is just an issue. If companies are really hesitant, even if they're headquartered in a jurisdiction where they're accessible like Civitai, which I believe is a noisy, you know, there's always a question of actual compliance and foot dragging. Even there, that isn't the worst that we've seen. I think, you know, there's a there's a comment in the chat about some of this, the new services that are emerging, which again are of some freaky affordances. Already we've seen a service called show off emerge online, which is branding itself as a regulation proof, peer to peer model marketplace that's based on the BitTorrent protocol. So again, what does that mean for the ability of regulators with differential levels of regulatory capacity to be able to competently and confidently intervene in this area. I think that's a problem. Relatedly, there's a capacity problem. Here we're just focusing on the model marketplaces, not just on regulators, but particularly for these companies as Michael I think showed really well. Once you're under fire from government stakeholders and stakes are high, or you're increasingly getting public public pushback or pushback from NGOs, child safety groups. There's a lot of pressure to do something. But in this particular case we think that moderation is really difficult. And it's just inherently expensive and difficult part to outsource. I don't think that long term, just simply bolting on some output filters is a sustainable is a sustainable, I guess, policy response here. And again, we don't really have evidence on the actual efficacy of these types of filters on pseudo content. But yeah, that's developing. So moderation. There's a lot of pressure to do it. And I will add like the stakes are high. So obviously in the traditional user generated content, context, content moderators, platforms, you know, have to deal with incredibly difficult, traumatic, polarizing, I mean, you know, pick your adjective of choice content that really can potentially have harmful impacts on people. But here I think the scale is just way harder because again, these are tools that can be used in very, very powerful and even potentially kind of unforeseen generative ways downstream by people. And once the cat is in the time that a platform hasn't responded yet. And again, remember that a big platform like coming face is getting hundreds of models uploaded daily by lay users 500 plus 1000 models. Again, we have a real capacity problem. And then I guess there's just inherent structural problem and I've already alluded to this. I'm getting ahead of myself. But basically, I guess about the way that models can be mixed and tweaked off platforms platforms are a regulatory leverage point, both informal and formal through actual live ability. But again, like I said, it's relatively easy for lay for sophisticated users to take these to try to bypass these different filtering mechanisms, take models off platform run locally, etc. And yeah, the safeguards that different actors are trying to embed into their platforms and even into the models themselves might not be that sticky. And yeah, to what extent is it possible, I guess, for platforms to be really actively and comprehensively auditing these different models at scale without, I guess, doing some major changes to their business model, like starting to do manual review of different models in the way that App Store does for developers. So again, interesting valences on this interesting platform. So what do we do about it? I guess there are three broad, you know, these are just some of the problems that we pin out in this paper. And I know that Michael has some ideas of how we might kind of reply to this capacity problem. Yeah, so the last thing we're just saying, I think just to wrap up here, so sort of relieve the work going forward for some things we're working on right now as well. Maybe it's a legal geek problem, but when you submit a take down request to a platform, alleging that this plan this violates either a law or a standard or a license or something like this. The platform has to adjudicate on this. We know this platform governance is the way it works. However, legally, thinking of jurisdictions that are not section 230 star jurisdictions, and maybe ones that are more DMCA style but including the DMCA as well given all the behavioral use licensing. Intermediary liability shielding is lost if a in EU and UK at least if if a firm is given specific actual knowledge or made aware of infringing content, such that a diligent economic operator should be able to know that the point of having this is a fairly high standard rather than just someone saying hey you know eBay it's full of counterfeits, aren't you aware like your eBay you should know. But saying no this was a counterfeit bag specifically here's the evidence here is the legal we have the rights for that please take it down. It's kind of how it works more in those jurisdictions is designed so that the platform doesn't have to do detailed legal analysis and therefore it can operate fairly smoothly and cheaply. However, AI infringement in this case in this place is not about legal clarity, at least not alone. It's about empirical. It's a terrible question. Content moderation on a platform like hugging face or GitHub or Civiti can become a research project each and every time if you say can this can this model generate terrorist content can this model generate CSAM. Is it possible? Is it the likely you know under what conditions. Well, this is our questions that you actually have to ask with people who can do this kind of probing. There's no automatic tools and the problems are so open ended that it doesn't seem like any automated toolbox is going to give you an easy way out of this challenge. So we propose that we do need to develop evidence packs for for model flagging. These are just implicit or explicit standards for for this area. So you have to think about the what kind of evidence would actually lead you to take this down. And it's a balancing test between evidence the platform can look over and kind of verify without doing their own research, but not also being so big and detailed that the that the platform itself has to then verify like a huge document. And analyze it itself. So this balance is going to have to come out implicitly anyway, as this is figured out and we argue it should come up more explicitly and we have to do work on what this should look like. Yeah, so with that, that's just sort of the next steps and the paper is there apologies for running over we have time for questions and I believe we can also go over a bit if if if we need to, to answer questions for those of you around but please do put them in the chat please do put them in the question answer thing we'll go through them all we've seen some already. Thanks everyone. Wonderful. Thank you guys so much. I'm happy to just pick out a couple of the questions that are coming in through the chat, but a first one from Chris quarrels. Do you think that moderation in AI is easier or harder to do effectively than traditional content moderation. So yeah, any any discontinuities or continuities that pose unique challenges in the space compared to content moderation in the social media context. Very quickly. Exactly. Now, that's what I was going to say. So I think that that kind of ties in nicely to Michael's last point, which is basically that yeah, you need to, you know, this is not a context where you can imagine someone spending 10 minutes 10 seconds, you know, per model to review it as you know a content an outsourced in the social media moderation supply chain does for, you know, major companies like Instagram or Facebook or Twitter. I guess the one thing which I also just had as a kind of gut reaction when I was starting to wade into the space with Michael last year was a concern also that just some of the easy wins and some of the lessons of basically the last 15 years of social media content moderation of platform governance haven't really been applied here yet. And maybe that's just because of resourcing or incentives and all these things we've discussed. But again, I think there's a lot more that companies could do, especially the major ones that have big valuations and are widely used. So you know, having staff that are actively kind of red teaming the platform and just looking out for staff that is really problematic doing this research. Again, they could be doing it proactively, especially if the stakes for harm are high. They could be doing some kind of, you know, trusted developer program or some kind of review for unfair findings. But there's, there's many, many different things that they could be doing to kind of govern visibility, especially for for new content. And again, as far as we know, this is a fast moving space that people haven't faced, especially, you know, seem really motivated. So there's probably a lot of change that is happening, you know, every day. But I think there's, there's clearly some, some things that we could see in this space, at least from the actors that are really well motivated and want to tackle this problem. Great. And there was a question earlier in the chat about liability in some of these cases and I guess I would just add to that by I know in the paper you go in depth about the way that the DSA and the EU AI after pretty silent on the role that these model marketplaces and intermediaries play in this space. I'm just curious if anything related to the proposed revised product liability directive would change that at all I know you have in a footnote somewhere in the paper that products can still be considered user generated content and online platforms in the EU. But curious about that additional sort of regulatory thing that may be coming down the road, and it also segues a little bit with a question from John and Bella about whether or not models should be viewed as a fundamentally different class of technology or content with distinct distribution rules, almost like armaments, and whether it's worth thinking about models as a different subcategory within this regulatory space. Certainly, there are. I believe in terms of products we're thinking about not just product liability but also the EU AI act for this zone because product liability is more about a taught harm when something happens rather than the hosting of a model unless the hosting caused harm, then you're probably not going to get a liability claim on that directly. But you can get products taken down and platforms do get takedown requests from product regulators and that's allowed within various legal regimes. The act actually in the digital services act in the EU into play in such a way that you don't get. I don't think you get AI act related takedown requests. And I think that's that's actually foreclosed by the combination of two regimes. But they can be asked to be taken down by by by by regulators themselves. So you do have some parts of the law where you think about liability of intermediaries around AI, but I guess what we're trying to say here is that this is about national law. If it's about liability, much more than it's about a sort of European law if we look at Europe. And the whole hodgepodge of national law, which is exactly the bread and butter of content moderation having to deal with what does this model does this model regularly blaspheme the Thai king is of course like the kind of the kind of classic example in in Como. So I think that's where we start to see this and the idea that platforms may have to get to grips with some of these regimes, particularly if we don't want platforms to take down content arbitrarily and we don't really know how many of these regimes deal with these open ended models. So in other work with Noel Galman. I've been illustrating how many jurisdictions. There are laws that forbid essay mills and contract cheating, including in US states but they're some of it now really scoped, which actually kind of by accident, likely prohibit generative AI services, because they're so broadly framed without knowledge and intent requirements. So sort of paper that we have as a forthcoming but is a preprint online now. And that's just an example of laws that have these unintended effects that interact with these broad technologies. So the problem with this is that these are some really tricky legal questions right where laws exist and they look like they actually quite overbroad or they cover things they were not envisaged to cover, but it's not a court deciding this. It's a platform. So the CEO is faced with the be a CEO of the policy list is the CEO, you know, but the platform is then faced with not just like a tricky problem like one of the problems where a court would be like, oh, this is a very hard thing and we may have to make a whole new interpretation of the law to deal with how we square the circle, certainly not something that any of the platforms that we talked about are equipped to do or probably have the legitimacy to do. Just can I do finger really quickly I mean that that common is really interesting. You know, classic, I guess, defaming the monarchy question or other attempt by certain jurisdictions to enforce their national laws, connecting to your slides earlier Michael, which are touching on the open real licenses. Right, if you know how many of the major models, including ones released by big research labs are being released under open rail by a platform like hugging face and open rail says that take down is justified if it violates any national law. Right, I mean I'm just I'm just interested to see how this is going to play out in the coming years and if we'll see them cheeky jurisdictions trying to remove major models. Just to see if they can. And I think part of what we're, I guess, arguing for and saying here is that I don't think companies are really prepared for that and prepared to make those kinds of decisions in a robust way. Although bear in mind with this that legally, it's only if the license sore complains. So if foundation model developer says you broke the terms of my license or an agent of them says that there is a takedown obligation, but we're seeing hugging face and proposing a solution here which is different. So, you know, legally we might have some different, different challenges there, but that's that's the sort of interesting tension which is more just saying their path through this is a very weird and messy one that we think is not going to go anywhere. And is mostly going to be theatrical more than anything but certainly we do see that already in normal terms of service right where they say if you're breaching any law, you know, that's you shouldn't publish content that's breaching any law on our platform. So it's a common thing to put a legal content on a platform as a common wording, and we do see also legal regimes like the Digital Services Act that say you have to enforce your terms and conditions proportionately. So, we actually see legal obligations to follow these terms of service through, and those do apply to model marketplaces. And we have a another question coming in which I think is especially timely in light of the net choice or arguments that happened last week in the United States I know you guys were sort of setting aside the American legal context here. But the question is about the seeming broad societal consensus that certain kinds of social media content moderation, or the consensus that that was necessary, which seems to now be fracturing. How do you see this, how do you see reversing this trend for model marketplaces, especially when the economic incentives point in different directions. And I guess that question just also highlights the danger that a sort of common carriage theory would pose for model marketplaces where they couldn't take down the chat for the chat GPT for Chan or whatever else, but curious for your thoughts about about that question is probably a rub question to start with I'd say. Yeah, I mean I've got a couple of thoughts I think it's, and it's interesting that you linked it to the net choice cases and what's going on in the US. I think, I mean I have two immediate reactions. So one is that, you know, a lot of the time when I was writing and researching this with Michael. I kind of felt like we're back in like 2008 or 2009 and a form governance land in that you know it's still very small scale it's happening organically you have the literal, you know you have a high level companies even CEOs, jumping in and making decisions, and they haven't really staffed up infrastructurally bureaucratically in terms of kind of policy development to actually deal with this yet so I think in one hand. It's kind of interesting connecting to that question like maybe this is just kind of the this is before the backbush, and we will maybe see an increase in resourcing in this area. And then again maybe you know once that gets politicized and there's pushback things could change that's like one kind of reaction, but I guess the second reaction. You know is more basically that. And now I've actually lost my train of thought so I don't know if you have you have something to jump in there Michael, and maybe I'll come back to it. And I'm hungry. Yeah, I'd say that this is interesting because you're not sure that the trust and safety world then maps super well on to the intermediaries world at least the way it's it's sort of been constructed. And of course it looks the same but it's a bit different. And the logics that are kind of different because we're still wondering where in this supply chain you intervene where social media is pretty obvious we don't really think it's a good idea to intervene at the DNS level or like the ISP level and we don't really think it's a great idea to intervene at like client side scanning And I have everything done on the device. And so in general like it's pretty clear who does this although there's community moderation and sort of the actors within that space, but for the AI space like really not sure do you just stop these things being created. Do you stop them being deployed by users who are like really at the very, very end. Do you put some like intervention in the middle to stop them being distributed. And, you know, the actors like hugging face that are saying hey please pick us pick us it will be a nice idea that we would we would moderate for this. That's, that's a bit a bit tricky I'd say. And I'd also spend this moment to go back in the chat just to the questions about lumen and just to indicate that the lumen will be a great thing to have this uploaded to but interestingly get hub and hugging face, both without being asked have already create the repositories of all of their takedown requests and publish them to almost no one ever looks at we would we couldn't find anyone who looks at these things over to us, you know in any scholarship face redacted. I hope it's redacted some of it. Yeah, exactly. But yeah, but that is a best practice for due process and transparency. I just I just picked up my thread. Sorry, long day right away what I basically was going to was going to add which I think makes this particularly interesting maybe for the US context. And I hate to do it but to plug the book that I've been working on for the last five or six years, which is really I guess about the political drivers and mechanisms driving government efforts to shape how companies do content moderation so the politics of platform regulation. If we think a little bit about that in this context, what I think is interesting is that actually a lot of the cutting edge issues here, especially for some of the potential bad actors that we mentioned might actually not be as problematic from like a regulatory demand perspective in the US context. And I think part of that is because these are issues where, you know, I mean the hate speech generating chatbot is one thing. But, you know, revenge porn, child abuse imagery. These are things where even in the US we see bipartisan support and you know that's that's in that little Venn diagram, where you could imagine congressional action for example happening and you know, but I wouldn't just point to any choice I'd point to COSA and other kind of related bills around child safety. So I think on one hand, obviously, you know, the tech lash absent flows and there has been a wave of investment from companies which also now has subsided in certain instances where companies think that actually, you know, regulatory compliance or, or, I guess, good faith efforts to do trust and safety aren't really worth it anymore. But I think, you know, still, yeah, I guess like in the medium term it's feasible to imagine government pressure, especially on the informal side, not just from civil society groups that are really motivated in spaces like child safety, but also, you know, government actors that want model marketplaces to take this stuff more seriously. Because I think there's a real history, especially in the US but also in the you and other places of this kind of informal regulation being a major way through which platforms basically change their policies and practices over time. Right. And I guess if we could squeeze in one last question from the chat I know it's late across the pond. And I appreciate you guys sticking with that. But the question concerns examples of community governed marketplaces and federated marketplaces and how you see governance, or how do you see it taking shape and looking like in these spaces I know you make the distinction in the paper between commercial and community sort of content moderation policies and that and that sort of binary. So curious for your thoughts here. Yeah, so there's two few things that go on so we don't we don't see any examples of what you would say a community governed federated marketplaces yet. We see we mentioned shock off systems we see kind of federated marketplaces as being a response to content moderation to say we need to develop stuff with no moderation whatsoever. We do see and we outlined the paper we didn't give it much time in the talk we see community governance a bit stronger in software regulation and we particularly see that on GitHub and hugging face, where, well, some extent on civic type given that the people who run it. The customers are telling them to maybe have a DMCA process, but the, the, the, on GitHub, we, we look at times when where software for security researchers is causes controversy in particular tool kits for pen testing where a GitHub actually took down or prevented a toolkit that had Microsoft exploits in where I just after they've been bought by Microsoft and people found that quite controversial. And then there was a backlash and they get up sort of felt like they had to put multiple versions of this policy out to the community and they did actually edit them in response to this and they had a version that was quite different at the end, hugging face I think also by relying heavily on the community talk pages to have like open discussions about content moderation we see this is a new thing. I'm not calling you obviously in the whole Internet but it's a new thing for a kind of commodity platform I think to have that kind of open threaded discussion that's sort of outside in inside out rather, you know, the moment at least. So I think we see that we may go and see more federated governance but in practice we're not talking about speech and like hugely different standards here. But, you know, the problem with the tool is that the speech you can just close your ears to and say hey you know in our federated zone we talk like this and your federated zone we talk like this. And the harm is like pretty localized you'd say we call stuff but there's lots of complexity on top of that, but a tool is like I'm selling a gun over here. And like, I'm not selling a gun over here but you can just download the gun here and then move and shoot the person over there. It's different than the normal expression based different standards so we don't, you know, that's why the laws that we talk about a much more extreme about, you know, generation of CSAM generation of terrorist content and instructions and so on. And so if you do have federated marketplaces, the small ones are going to get hit by a legal hammer that's the size of the moon. And that's not exactly the same if you say actually Federation helps us balance different expression standards without having to have a one size fits all things like well you maybe do need that if someone is selling like malware LLMs over here that are affecting the entire world that's different than a than a traditional speech harm at least at least in Europe where we don't see everything as speech. Yeah, I mean there's just a question of, I guess the ability of certain types of communities to effectively govern in the in the public interest, unfortunately, right. And already we did mention it here. But someone brought it up in the chat that a lot of these communities, especially in the image generation space have a lot of community features so they're maybe not fully community governed they're still doing. They're not going to quote commercial content moderation or rather, you know, community oriented moderation where all the mods are hugging faces, staff, or civic ties small staff. But they're actually trying to exploit the community for monetization purposes right. So Civitai really relies on donations and is active and trying to grow its user base, its whole kind of value add is being easier to use easier to interact with models easier to see what models look like than something like hugging face so a lot of models are being owned and are posted on multiple platforms and you know we see in the talk pages on Civitai that people are talking about talking about this and where they keep their different stuff as like almost insurance strategy in case stuff is getting moderated. The feature that they have rolled out too much journalistic fanfare from our friends that for a poor media is something that they call bounties. And that's their first attempt to monetize some of the community on their platform where basically they've developed an auction system where any user can request a model that produces certain types of content. So for example, a model that is tuned to create the likeness of X celebrity or, you know, and or create like art that kind of looks like a certain photographer or current a painter that you like. And again, their policies explicitly say that this cannot be a porn model that is impersonating someone and again how fast are they actually in terms of, you know, acting on these types of models when they post that something that'd be interesting to research empirically. But again, because of the features we mentioned, it's very trivial to take the impersonating Michael veal model and combine it with a porn model, either on Civitai potentially or off platform. And the thing that the journalists at 404 found, which I found slightly freaky is that, you know, it's not just big picket artists and celebrities that are being targeted through these bounties. And sorry, I didn't fully explain, but the idea is that you have an auction multiple people submit impersonating models, and then the person that created the bounty gets to decide which one is their favorite and then there's some kind of financial compensation to the winner of this contest. And we're also seeing this being used for micro celebrities, twitch streamers, influencers, and frequently or for found like some apparent RANDOS. So people are just posting, here's an IG handle, can you scratch the public photos of my ex or something of some random person. Again, this stuff is very gendered and say, hey, you know, create me an impersonation model. So again, that I guess is this tricky question of like to what extent to community balances actually help you govern or responsibly. Amazing. Well, hope everyone can join me in thanking Michael and Rob. We've seen the sun go down behind each of your office windows. I really appreciate you joining us at what I know it's a bit more of an inconvenient time than those of us based here in the US. But yeah, thank you again. Looking forward to seeing how this really exciting research area develops and all your your subsequent work, but yeah, appreciate the talk and the generous contributions from the folks in the chat as well. And we hope to see you in 2 weeks for an event with Dave Wilner on moderating AI and moderating with AI. So looking at how large multi multimodal models and large language models can help solve some intractable problems in more traditional content moderation. So a nice sort of follow up to this event. And we hope to see you guys there. Thanks everyone. Thanks for having us so much. Thanks for having us.