 Wikimedia Foundation, where he conducts foundational research and develops new AI technologies, and the discussion will be led by Ellen Yelena Simpel, who is a professor of computer science at King's College London, and who is also the current president of the Semantic Web Science Association. So with that, Yelena, over to you. Thank you. I mean, those of you who have happened to attend a conference with me, maybe pre-COVID times, know that I'm a big fan of karaoke, so going back to singing, really be careful of what you wish for. Well, welcome also from my side. I guess we all agree that we do live in interesting times, and artificial intelligence, essentially, a set of technology that's been making in the last for more than 50 years has now reached what some people call its iPhone moment. So chat-Upt is almost a verb, and essentially people in all walks of life are becoming more and more aware of what this AI can and cannot do. Organizations and entire professions are reflecting what this disruption means to them and their futures, and governments are considering regulation. And since November last year, I hope you agree with me that we are once again moving really, really fast, and we're almost suddenly going to break some things along the way. By the way, Yelena Simpel, I'm a professor of computer science at King's College. I'm also the director of research at the Open Data Institute in the UK, and I am an AI scientist. He hopes to help build a better wiki data ecosystem, so that's my link to this community. And before I say a few words, my personal introductions to the panelists, let me just tell you, I started out as a researcher around the same time as Wikipedia was launched. And as someone working in AI, I really just wanted to do my bit to make sure that the technology will benefit the Wikipedia community and the society as large. So when I was asked to moderate this panel, I couldn't possibly say no. And I'm even happier and humbled to welcome our two fabulous guests, and they're fabulous for various reasons. For me, they're fabulous because they've worked across disciplines. I'm a fan of some of the things that they've done, like a real fan, and because they've contributed both of them to open source development. And say, you've heard the introduction, Thomas is a co-founder of Huggingface, where he's working in open source and science teams. You may have used the Transformers library. You may have also heard about the data sets library. For someone like me who works in data sets discovery and open data, those 33,000 data sets made me really excited. And then Isaac, a research scientist at the Wikimedia Foundation since 2018. Amazing credentials. I'm a big fan of your fact 2022 paper on data governance in AI and also the work on quantifying the value of Wikipedia. So very happy that you could make it. So I'm going to ask you to start with your opening systems statements. You can't take the engineer after the woman. So five minutes opening statements, with or without slides, as you please. And we start with Isaac, shall we do that? Sure. Pass me two. So yeah, I also enjoy karaoke, but I will I will save folks from having to hear all of that. Thank you, Elena. I'm very excited to be here. This may be a few things about my kind of personal interests and maybe some context to help set the stage. So personal interests, as you highlighted, have been developing kind of AI tools for Wikimedians for the past couple years been a research scientist with the Wikimedia Foundation, so that informs a lot of my thinking and going through those processes and learning about how to do that in a very wiki way. So I'm bringing that to the panel. And as you and actually any earlier in a breakout session earlier reminded me, I also have a research background around reuse of Wikimedia content. So understanding how Wikimedia content shows up outside the Wikimedia ecosystem and what impact that has, which obviously has a lot of overlap with a lot of the conversations that are going on around AI. And then I think more recently, too, I've been involved as part of this kind of inflection point of really focusing in on this ethical AI, what is our approach here, and really trying to think about what that should look like within the Wikimedia ecosystem. So I've been doing a lot of kind of thinking there. And so in that sense, I'm very excited to be on the panel to hear Thomas's perspectives to hear the questions from the audience to really get to talk through these things. That's always valuable. Two pieces of context, maybe, and then I'll pass on to Thomas. The first one, you said, as you pointed out, kind of an inflection point right now around AI. And the other piece I'll say is it's also part of a larger history. And I think that's really important to keep in mind internally. And this has come up within a number of sessions today already. There's been a long history of bots kind of constrain AI assisted editing on the platform around content translation, vandalism detection, a whole swat of things. And so that's exciting. There's a lot of things from that that we can learn from. And I think externally as well, there's a lot of similarities between the questions we're asking about the impact of AI and some of the discussions that went on around search engine usage of Wikipedia and what that meant for the Wikimedia projects. So in that sense, I'm going to try to be maximally boring in many ways. I'm like, okay, yeah, this isn't that new. So I think there's that piece of context. And the other thing I think that's important to bring up is that while we're still, I think, trying to figure out what Wikimedia's role will be in this kind of new AI ecosystem, it is a unique one, which is exciting. I think we stand at this interesting balance between an organization and community that thinks a lot and does a lot of really hard work around doing things ethically and having good process and doing things openly. But we're also a community that builds things and produces content. And so we don't, you know, we really have to put the rubber to the road, we have to build things and we have to progress. And so that's also an interesting kind of space that we feel that, you know, isn't necessarily true of others. So I think that's another interesting piece of context. But with that, I'll pass it off to Thomas. So thank you. Thank you. Thank you, Isaac. And I think I've heard stats that three percent of the data set for of chat GBT, the initial release was Wikimedia. Thomas, over to you. Yeah, thanks. I'm also very, very happy to be here today. So maybe at the difference of Isaac, I'm really here to learn a lot, I think. So I'll be very happy to hear your boring history, because obviously, I mean, I've been using Wikimedia like everyone, but I don't know all the history. And I hope you see there's a lot of things that are probably just repeating in a slightly different way, right today. So we also almost co-author with Isaac, because the data governance paper was kind of one of these things that spun out of the big science and the group project. And Yassin, in particular, had his face was really focused and did a really amazing work here. It would be much more relevant than me as a panelist, but maybe it's also nice that I bring a different perspective, slightly different from somebody a bit outside. I think just like the Wikimedia project, maybe one specificity of a big face on our work, if I compare to many, many other people developing AI is that we've tried to have a big sense of the community. And maybe the difference on some other, we are maybe more interested in the journey that we are taking towards, you know, these more powerful AI tools rather than just the end product. So what I personally would like to see is, well, it's quite exciting. It's also hard to deny that this new project, these new tools, they can do things that we would have never really thought possible, right, a couple of years ago, but I think would be really great if on the way to developing this tool, we keep having as many stakeholders as possible involved in this process. And we kind of keep building them as a community and not just a little bit like, you know, islands of people kind of isolated from each other, which is a bit what I'm worried is starting to happen here. So, yeah, I think I try to, or at least a heavy face, we try to navigate the same line between staying rather positive on the outcome of this technology and that's putting kind of a blanket ban on it and trying to see, okay, now how can we maybe nudge it in the direction that's the most responsible, how can we find the right incentives in the community to make them do things that we that we feel are more responsible. And today, really a lot of this gravitates around the data, the training data, or the data we send at inference, how do we use that? How do we, how do we find a way that, you know, we don't have this disconnect between like knowledge creators and knowledge users? I was actually here talking about that today as well with many other people. And I think this kind of like, I think this technology that's, that would be more like a link between people create knowledge and people who consume or read it rather than just this kind of wall that we have right now, you know, knowledge until they disappear and then we have someone at the other side who use that would be really, really amazing. Yeah, so we stand a bit on this position trying to both create collaborative projects and at the same time trying to also, you know, find a way to build things and not just kind of say you should not build any models. So that's basically where we try to be. Thank you. Say, and before I start asking some questions, Bob has asked me to do one thing and I have forgotten and that is to make you aware that there is a Google document which he has posted in the chat where you could add your questions. So we're going to start with some questions that we came up with and then we're going to move over to the questions you have. So I'm sure with such a topic there will be some. And while you do that, I'd like to ask both of you to unpack this relationship between AI, machine learning in particular as a set of technologies and as a community and ecosystem and Wikipedia. So how do you see that relationship? Where are potential tension points? Where are overlaps or commonalities or things to learn from each other? Who would want to go first? Thomas, would you like to go first? Yeah, I can go first because just like Wikipedia has a long history of having to deal with both and AI also has a very long history of relying on Wikipedia, in particular NLP. It has always been seen as the source of truth and if we need a source of knowledge it always turns to Wikipedia as the status sets. In every training of large language model, if there is one data set that people will reiterate and do like 10 epochs on each and it's instead of just half of an epoch, that's always Wikipedia. It's always upsampled in every training data set. Maybe a nice token of recognition. But also what I think we lose a lot. Well, Wikipedia is actually one source, but even the edits have been used a lot in terms of training model to correct mistakes. Every part of Wikipedia I think almost has been used in one way or another in an AI model. Something we lose, at least up to now we lose a lot, is all the interesting context that you see in the discussion pages, the top pages. In particular when some articles are maybe a bit more hot or debated, it's very interesting to go there and look at that. And that's definitely something you lose a lot. This is one aspect, all the discussion around this knowledge creation. Second aspect that we lose is obviously also kind of crediting in Wikipedia. You can look at who wrote which part and that's also something that we lose. So I would say up to now AI has been kind of a small parasites on top of the big elephant Wikipedia. Which is interesting is that this kind of start to reverse. Now we have this huge elephant on the top of this small Wikipedia part. And how can we manage to make the two maybe unequal footing? As a part I'm most interesting. And obviously the second part that I think we go there later is in the creation of Wikipedia itself. Should you use AI? How to use it? Thank you. Isaac, how about you having been part of this ecosystem for so long? Where do you see those relationships? Thomas talked about source of data for AI and the balance of power and perhaps popularity and public awareness. How do you see this relationship? Yeah, building on what Thomas was saying and definitely it's a fluid relationship and one more still I think trying to figure out what the current state of it is. I do think Wikipedia, this like now that AI maybe is a little more salient or prominent, Wikipedia is still quite important in this world as a source of reliable knowledge. And so I don't think that Wikipedia is less important. But maybe an example to follow through on my commitment to bring lots of historical examples. An example around kind of how this relationship can be good or bad. And I think a good example of that is with the machine translation and its use on the projects a number of years ago. When that was kind of formalized into a tool content translation on the project, the state of kind of AI in that space was still relatively rough. So you had a lot of kind of rule-based systems. And there was some open source projects in that domain. But the performance wasn't great. And then I think, in the late 2010s, Google really brought kind of the neural machine translation. You saw this big advance in capabilities, which was it was good for the Wikimedia projects. They were able to set up partnerships to take advantage of those APIs, but it was kind of like a static relationship in some ways. The team was able to kind of set this up, was able to set it up so that the data that came out of these translations and the work that the editors were doing was publicly available, which is good. It allowed other kind of machine translation systems to learn from the work that Wikimedians were still doing. But these were tended to be kind of closed models for proprietary models. And I think it's only in the last year or two that we now see some really performative open models. The one that has been used at the Wikimedia Foundations, this no language left behind model from Meadow, which is an open source model. And now that's really exciting. That opens up a lot of possibilities on the projects. They're no longer constrained to just this like, well, help in like article editing, but you can imagine many more uses for machine translation. And because it's an open model, there's a lot more opportunities now for kind of building on that work and having kind of the work that Wikimedians are doing going back. So I think it's gone from this, you know, like the performance was maybe good, but the ecosystem and the relationships weren't great to now, you know, having this more open model were in a place where you can start to see, okay, now we can like, we have a better kind of structural ecosystem in place around this technology, where Wikimedians can contribute, other, you know, open source advocates can contribute. And so I think that's maybe a note of optimism around that. Great. Thank you so much. And I'm just looking at the time and I have seven questions. And I have, I think 15 minutes left. So I'm going to keep you on your toes and just change the order of the questions a little bit. So, so Thomas, you said that Wikipedia is very valuable for for for machine learning. So I'm going to ask you, what could AI and the AI community give back? What could they do for Wikipedia for their ecosystem? And for the community of editors and community managers and researchers who have helped build it and improve it in the last 20 or so years? Yeah, yeah, that's a great question. Honestly, I'm also here to to hear about what people think, because it's, I think it's really, we're in open territory, right, today. Like, I think nobody really has a very clear idea, but there is many interesting things. So it can start from like simple tools, maybe what, you know, other people are doing, which is start using maybe this this model as a way to aid to help for moderation or this kind of editing and kind of community part. But you can also imagine maybe interesting things, right? So, so let me just maybe draw one that we're thinking just now, which is like, when you're when you're on one article, you're reading your train to understand it, you could also have, you know, some kind of AI that that's that's kind of there that you can talk to, which is also, you know, reading the same articles of kind of dialogue. And if you have like enough reasoning, enough, you know, knowledge in this AI, that could be kind of a page specific tutor that would basically use the knowledge here and can help you maybe reformulate maybe one paragraph that I don't know, it's difficult to understand and you can say, can you explain this and this. So this type of thing where instead of having somehow static pages, Wikipedia page could be maybe the beginning of a dialogue or something where you can explain, you know, ask, ask a more question. I think a lot of Wikipedia is great around the dialogue for the data around data creation. I was talking the talk pages, but sometimes you also need to talk about pages to understand it. So that would be the type of thing I would see in the future where we have this if we start to have this AI things a bit everywhere, basically you could discuss in detail. That's very interesting. And I'm sure people here in the Zoom call, but also the rest of the community have many views around how having such a tool would change things, not just in terms of leadership and access, but also in terms perhaps of engagement with the content and editing and quality checks and all the practices that are available. So do comment on this or ask follow up questions if you want in the document. Leila has just posted the link again in the chat. So possibly ways to access and engage with Wikipedia content through AI. Isaac, Thomas said earlier that Wikipedia is the source of truth. You mentioned the world of reliable as well. Say, I'm going to ask you something perhaps a bit controversial. So there's lots of concerns in some parts of the community around the potential of generative AI models to create or cause a new wave of misinformation at the scale that we haven't potentially seen before. So does that mean that Wikipedia will soon be obsolete or outdated and not just Wikipedia, also other open source public good initiatives that we hold dear? What are your views on this? Definitely not absolutely. Maybe trying to pick apart pieces of that. So I think there's one piece and I think when people think about Wikipedia, we think a lot about kind of the content generation side. So like the articles and that's the piece that most readers see, but there's really a lot of work that goes on behind the scenes. That is like the work that keeps Wikipedia reliable and also keeps it to expanding to kind of new knowledge areas or things like doing the fact-checking work and doing the maintenance of that content. But also like the discussions, as Thomas has been saying about the talk pages, these discussions around is this a reliable source? How do we want to talk about these sorts of issues that are things that AI is kind of a reflection of where we're already at, can't necessarily have these sorts of discussions? And then like, for instance, I was in a developer track earlier about OCR for WikiSource. I think that's a great example of that sources of knowledge that aren't accessible to AI right now. And so like, you know, the Wikipedia community is doing a lot of really hard work, though, to digitize those sorts of manuscripts and make them more accessible to folks and that's work that you can't really replace right now. And I think the other piece of that too is the like Wikipedia. And I think we often think about that as kind of an English Wikipedia thing. So I was just reading what the Google did their whole unveiling today and they were with Pong. And I think they're really saying right now it's like two to three languages and aiming for 40. And, you know, Wikipedia, we're working with over 300, right? And so like, yes, maybe there's places where AI can be pretty impactful on some of the project, but there's a lot of projects where it, you know, it's far, far behind in its ability to even kind of produce a facsimile of what's going on. And just as a follow up question also to you, Isaac, should Wikipedia and Wikipedia editors use more or less AI and what for? I think I'm pretty agnostic to that. You know, you kind of mentioned AI like scales things up. And so, you know, it's like, well, where, you know, what are the goals of Wikimedians? Things like, you know, increasing the diversity of the community. Or like just removing barriers to being able to contribute. And so where AI can help with those things, I think that's, that's a good thing. And I think the community reflects that in their own kind of discussions of, you know, these tools aren't inherently good or bad. It's just, you know, are they helping us to serve our goals that we've for a long time had or not? And that's really the kind of question that we should be asking. And I am going to mix things up a little bit, because I have seen a related question in the document. Say the question is from an anonymous member of the public. What implications do the new AI advances have for content creation Wikipedia? Specifically, the scenario where articles can be produced with just the prompt, how can Wikipedia address this? Who wants to take this? So shall we prompt models to write Wikipedia articles? Thomas, what do you think? That's a good question. Yeah. Well, I guess this will probably be, you know, just like, you know, just like in the past, probably where people were probably, I mean, there is like probably no one I would be happy to hear. Basically, I guess they have always been cases of people, you know, creating a lot of content and pushing a lot of content, you know, kind of simulated way. And so I think honestly, like using a prompt to do a first draft of an article that you want to spend time editing, that's totally fine, right? But what we probably want to avoid is this type of, you know, automated the non red content that's just uploaded there. And probably there is not so much difference with what I guess has already been the case in the past, right? And I'm going to ask both of you say, how advanced is our knowledge of prompting for the specific task of writing Wikipedia articles, which have a certain structure and they need to have references and so on. So are there, I know in some fields, there's entire websites selling prompts or sharing prompts. Are we this far with Wikipedia? Does any of you know? Just on this, we can maybe dive a little bit deeper, right? So what happens basically if you prompt a recent language model and you try to make it right, something. So so easily you get basically Wikipedia out from the training data, which the end is fine, but that's kind of circular. So probably there's just something that's there. Or you get some knowledge from other sources. And then there is this interesting thing that you that you will want to have like citation, we want to have links to these other sources. But at face value, that's still nice to bring more knowledge, I would say in Wikipedia, but you need to link this back, you know, to you have this problem of you cannot just write again, this like physics book, you will need to link to it at some point. Yeah. Oh, sorry, I didn't want to interrupt. I thought you were you sounded like you you you were wrapping up. So I was just about to ask Isaac for for his view. Yeah, I think Thomas is spot on. And just the addition of, you know, the model only can only produce like what sources it has access to. So right, like, you know, sources on the web, things like that. And so just going back to that point of, you know, that's going to largely right now mean English language sources. Obviously, it means purely digit, you know, already digitized sources. Whereas a lot of these articles really, you know, your folks are going into books, they're going into history archives, and finding this content and bring it up. So thank you. And I'm just going to take the opportunity because we're talking about this. And she asked a question that Pablo asked, or maybe Pablo, you want to ask it yourself. It was a question about circular content creation and knowledge integrity. Okay, I can try. I cannot. Okay, go ahead. I think it's very related to what we've just discussed. Yeah, I was just asking, like, if large language models are using Wikipedia for training data, and at the same time, these large language model, they will start at some point to be used to create content on Wikipedia. Is it possible that we will be under the risk in the space of knowledge integrity of suffering from circular reporting from AI? So that these models are trained by data that they also created that affect factual information in our knowledge spaces. So what are the implications, knowledge integrity, potentially something else? I think I'm going to start with you. Yeah, no, it's a good question. And I think, again, this is an AI scaling up things. These aren't new concerns for the community. And they have absolutely fascinating set of discussions around what sources are reliable. And I've even seen, I know, in recent months, discussions about, like, hey, FYI, was it? No, I shouldn't say it, because I'm going to get the publication wrong. But there's a publication that started using AI generated articles, and immediately the community caught on to that and said, hey, like, let's be aware, like this is no longer maybe a reliable source in this domain. And so I put a lot of trust in the community to kind of pay attention to these things and be able to update their guidelines for what they consider to be reliable and worth incorporating. Thomas, anything you'd want to add? Yeah, it's more of a meta meta comments. But I think one thing I find very interesting in Wikimedia Wikipedia is basically how the community that was created there is kind of healthy and thriving over a long time. And I think that's a very great example for community that we probably would like to build as well, maybe more in AI or like in the research or practitioner. So and it's a community that's able to discuss complex topic around moral values, ethics, you know, this kind of very complex topic that I would like to see discussed a lot more in AI, because this AI we put a lot of this question in there. And so if we can find a way to copy that, or like at least to take inspiration from the success here, that would be great. But here again, it's more like the AI field taking inspiration for Wikipedia. I'm trying to think more about the other the other directions. Thank you. Moving back to my questions that I was interested to ask. So the next one is for both of you. And the theme is, is this technology ready right now? Or could it do more harm than good? And what are some of the current risks? And how do we manage them? And in particular, there is an ongoing debate in Wikipedia community, Wikipedia community and elsewhere around implications. And, and for Wikipedia, specifically, we're talking about issues with copyright, we're talking about the ability to verify the automatically generated content, we're talking as in many areas about hallucinations, and specifically also here about the neutral point of view and editor engagement. So copyright, verifiability, hallucinations, neutral point of view, editor engagement, pick one or pick all. Isaac, should we start with you? Sure. Yeah, so when I think about kind of this question of do more harm than good, I think one of the big ones that's come up in a lot of discussions and I think about too is the kind of attribution piece. And this gets back to what you're saying around copyright. And then certainly the kind of first wave of like, oh, let's get these models out there. We're really bad at it. And we're seeing advances now as I think some of the developers are taking this more seriously of like, you know, making sure that when you build these tools around these large language models, there's an ability to reference your source material because, you know, that disintermediation otherwise, you know, you getting back to the question about like, what is the relationship with AI and community and Wikipedia, right? It disintermediates that. And so long term is not a sustainable place to be if you can't provide the attribution back. So folks are aware how this knowledge is getting created can go and, you know, edit it and those sorts of things. So I think that's one that I think about a lot. And I think the other thing this is a because it came up in the came up in the initial introductions, the big science piece that hugging face ran. And one of the things I really liked about that. So big science, this big consortium of folks who came together to try to build a large language model and do this carefully and do this with a lot of like, you know, ethical ethics in mind. And one of the things, and I was part of the data governance side of this, one of the things I really appreciated there was the care at which that group both tried to expand to more languages, but also didn't do this with, you know, whatever data they could get their hands on. And they thought a lot about kind of privacy and how do we ensure that, you know, we're maintaining privacy as we expand these data sets, the governance of that data, like, you know, making sure folks have the ability to, to contest what was in that data and things like that. And so I think the community when I think about harm or good, I think a lot of that depends on whether we see more things like big sciences careful expansion into other languages to ensure that, you know, where there is benefits, they reach more people. Thank you. Dallas, how about you say it was copyright, verifiability, hallucinations, neutral point of view and editor engagement as potential areas of concern? Yeah, yeah, and I think all these points make sense. Is that all, is that all aspect that you want to be careful about? I think it's great that there is a, that there is already a large language model guidance page around this. Yeah, yeah, perhaps around hallucinations, there was a question also in the document, thank you for that around the, the technical background for that. So, so many people share the views that hallucinations are our big concern. How technically feasible do you think it is that we will improve large language models to hallucinate less? And what do we need to do to get there? Yeah, that's very tricky. A lot of people are trying to do that. It's difficult in part because it's not only trained on text on the internet, that's always right. But it's also trained on all these parts, which where people are just creating stuff, you know, like stories that we write, blog posts where we invent stuff. And so basically the model has a lot of difficulty distinguishing what is, you know, just a story that's been written in a book and what is actually a real fact about the world. Just like we do, I think all along our entire life, we kind of build this kind of intuition of credibility of various sources and even late in our life, we're still somehow surprised that this actually is not credible at all. I thought it was kind of a good source. So, so the language model a bit like this and this, this is very difficult. So, one thing we rely a lot on in AI is actually labeling this as, as human as like possible sources or not. So, so here we have this dangerous circular references where if we label Wikipedia as stressful source, you should not use the assimilated content in it. But yeah, I think this question of citation looking right, that's a big problem. I think that's something people did not really anticipated for large language model. The fact that they produce text that really look very convincing is really challenging our intuition of what is good or not. We have kind of this very deep intuition of like, you know, if something is written in very good English, that's a good sign of like, trustability. Or if you have a nice citation with nice number, nicely permitted, you don't even need to click. And all these reflects that we've developed with the internet, I think we may have to, we may have to change them. This is a big problem. You've that in science, I mean, we've had that for some time, but not at this point. I do agree, yeah, with that, yeah, it's a big threat in particular with this citation. And that's question I have maybe for Wikipedia, when you have a citation to, you know, something that's not really accessible or like to book that's how to get hold on. I would be interested in diving on what are the ways that people kind of control and this type of contents. Does anyone want to comment on that? About accessibility of citations completely unrelated to LLM's and automatically found or sourced references. Thomas said he's here to learn. So we want to make sure that that happens as well. There was Bobby had an interesting question around hallucinations versus another evil. Would you like to ask it live? Yeah, sure. So I was kind of trying to play the devil's advocate a bit. We have a lot of trouble with human vandals on Wikipedia, people that don't behave well. How much worse is a language model that potentially behaves well, but was even trained to behave well? And then sometimes it doesn't. We can look into we can screw open the language model and probe it and look into its mind, if you will, we can't do that with a human. So I was curious about your reflections on that bad humans versus ill behaving language models. I'll say a few things. So there's like a common thing that many folks start out editing Wikipedia as a vandal. They vandalize it and they're kind of impressed with like, oh hey, the community like fix this very quickly and actually begin to learn more than and contribute in positive ways. So in that sense, I see maybe more hope in the vandal than in the language model where maybe you can kind of fix some of these things, but I don't know that it's ever going to become like a member of the community in the same way. So I think there's maybe that that aspect of things. And I think just what Thomas has said earlier about with a lot of vandalism, it's often very clear, it's vandalism, it doesn't make it easy to deal with necessarily. But it does, you know, the AI models, there's this like patina, right, over top of it, that can make it really hard to detect them. So in that way, I think it can create a lot more work for community members to constantly be able to work to it and be fact checking every little piece because yeah, just it's just a lot harder to use kind of basic heuristics to detect it. Thank you. Thomas, did you want to add anything? Yeah, maybe just one positive example as well could be that, you know, this model knowing how to write, you know, good looking and easy to read English could still kind of help on this aspect, maybe for like non-speaker, non-native speaker, trying to write a good article. And I think on this kind of tool editing, a bit like Grammarly, honestly, or like this type of tool editing and development of that, they could help making the contents more easy to ingest. So I could see some very practical use case where this will be nice. And if we can figure out in a very long term, like, you know, good LLM, that have kind of a positive and thoughtful mindset, you know, that participating discussion that that could be also very interesting. But a small future. That'd be interesting to understand the impact on editor engagement. Leila, you had your hand up. Thank you for your patience. Oh, sure. I just wanted to go back to the question Thomas had just before this conversation about how we can probably learn from Wikipedia or what can Wikimedia do. And that was inspired by Phoebe's question around hallucinated citations and references. I just want to call out that I think this is a place that, you know, Wikisource as a community and as a project can come to a good extent to rescue. Of course, there are many things that Wikisource is limited by, for example, there's so much that the project can do with regards to what is copyrighted material. However, Wikisource is a digital library. It's an open digital library. And it is it is a project in which editors and contributors to the project try to bring more digital content available. And that can help with some of the discussions around what we can do about hallucinated citations and references and things that are currently not on the web accessible to be automatically checked. But they will be if we support the Wikisource community more in their efforts. Thank you for that. Leila, I'm going to I'm going through a while. I'm trying to multitask. So I'm going through the questions. There is a question, which I think addresses something we haven't really touched upon. How might existing biases, including gender biases, race biases, and so on play out into large language models, given the connection between Wikipedia as a data source and the technology. And as a follow on, do you think human in the loop is a potential approach to avoid existing biases in the technology? Thomas, yeah, you seem. And just say, yeah, it's very interesting. That's that just raised a lot of ideas, a lot of questions. You know, you could imagine, you know, to that this large language model is instead of being this kind of huge black box, that could be a bit like Wikipedia. And when you have a debate on, you know, something they say, they have a discuss page on this topic and say, hey, actually, judge is saying that, but I don't agree. I don't think that's really and then you have this discuss page around and the content of the large language model should be able to be updated. And frustrating aspect of this, and that's a huge difference between this model and Wikipedia is that they are kind of static. While Wikipedia is this ever evolving thing, you know, that's kind of follow, you know, when there's new presidents, you know, everything changed when there's like a new something happening in the world, you have already these pages and the model, they are just trained. A lot of them have been trained like three years ago, they're still this long with this kind of version of the knowledge. And so I think a future where they would be kind of a witty GPT that you could edit would be would be interesting. And just brainstorming in the open right now from this question. Thank you. And obviously, some of these systems like chat GPT do learn as well from human feedback on the fly. But the data they've been trained off is a bit older. Isaac, any views from your end around biases and the ways to mitigate these biases, whether it's an algorithmic approach or a human in the loop one? Yeah, just like reframe a little bit what Thomas was saying. I think there's this this tendency to come with like the really flashy like seamless kind of AI model that can do everything. And I think, you know, what Thomas was saying is like, you don't need that to be super, you know, like, you can kind of open it up a little bit, right? Like you can show the different pieces of it, you can make it a little bit messier, a little bit more open to folks. And so not try to generate like the perfect response, but like showcase the different kind of potential responses and things like that. And I think that goes a long way to helping folks understand what's going on and pick what is actually valuable out of these models as opposed to, yeah, that. So I think there's that piece to it that I think a lot about. You know, I did another piece, but it's it's leaving my mind now. So I'll leave it at that. All right, let me just ask my last question, which was for you, Isaac. And then if I may ask Tillman to ask their question for Thomas, because it is a long question, and I'm very jet-length. So first, Isaac, because of all these reasons, should Wikimedia build their own AI data sets and models? Yeah, we well, we already do. I think that's like the quick response to it. But yeah, I think what you're asking is not the because like vandalism infection, things like that, we've been doing it for many years and not really asking about that. I think you're asking like large language models. And there, you know, I think, well, first I'll say like, it's not a decided question, right? Like this, these are the sorts of discussions that we're having now in the community. I think one of the approaches that does feel very promising in this space are work like the big science project where, you know, in collaboration with folks who have, you know, much larger collaborations of folks who are dedicated to doing this in the open and doing it with ethics in mind. I think there's a lot of potential there to be building large, you know, models that do fit better to our principles and take some of these concerns to heart about biases and harms that these models can output. And so yeah, big science. I know they, you know, just went through big code, which was a code generation model that was in a similar vein. And we're excited about that because, for instance, I know like sparkle query generation is a big challenge for many wiki data folks. And there's hopes of maybe being able to take some of the big code stuff and buy it to that problem. Sparkling is beautiful. Beautifully messy, yes, so. Before I get too excited about it, Lydia is agreeing with me. That's nice. Tillman, would you want to ask your question for Thomas about frame-giving, creating a disconnect between, yeah, is that okay? Yeah, all the issues earlier, sir. So my question is about, so I liked your framing about just disconnect that AI creates right between the creators and consumers of knowledge. And that's obviously what we, we Europeans have been worrying about a lot as well. But the current frame is more like from, we see a lot of concern about copyright and said, right? So creators want to get paid or they want to opt out of training data, even though it's currently fair use, right, at least in the U.S. And so there's a worry that this, there'll be this AI backlash with new laws that actually expands copyright a lot, right, in the direction of corporate maximism. And that's like directly counter to the wiki media vision, right, where we want to make a lot of accessible and it's going the other direction, right, again, pay us for that. And this, I'm just curious how you're, no, you're talking face, got a lot of this discussion. So I'm just curious how you see it's developing. And especially if you see the consumers of knowledge also get a voice, not just the creators and the corporate industry. Yeah, it's a good question. And it's funny, because before creating hiding face, I was the Intel, intellectual property attorney. So I was actually very much in favor of copyright or at least protecting creators. And that's taking face. I'm really doing something very different pushing open source, which is very much the negation of this. I think in general, seeing an expansion of copyright would probably be be damaging, I think. And that we hopefully don't want to go there. Maybe the reason I'm talking about that is that behind beyond beyond just copyright, I think. But that's interesting, because maybe it's less the case for Wikipedia, but in many communities, and I see that credits is still something quite important. You don't really need or want monetary rewards for what you share. But when you spend a lot of time on creating something, you would like to have your name somewhere associated, or maybe even your anonymous name. And this idea of being still linked to your creation is quite an important incentive, I think, for people to share something that they spend time to work on. And so I'm kind of worried that if we lose these incentives, we have a lot less people sharing knowledge openly on the internet, basically. So that's kind of what I'm worried at the moment. But then you don't want to go in the full in direction where you can't reuse anything that people have been saying, or like you don't want an extension to crazy extension of copyright. So it's quite complex. And the problem with AI model is that they also often don't copy verbatim, but they change some steps. So it's never a direct copy. And so you're in this very blurry area where it's not verbatim copy, but it's still definitely inspired by the training data. But yeah, I don't have a really an answer. I think it's interesting that you're saying that, because that's a good point. And that would be also an unfortunate. I think with the development of AI system and this like, in just a matter of few months, this use transition from like this very kind of niche or still quite niche research area to something that everybody's using. We have a kind of a number of dangerous things that could arise. And one of them could be all this community kind of closing their doors basically and saying, we don't want to share stuff anymore. So that would be quite unfortunate, you know, just like after chat, every place like this, like Stack Overflow already, a place where people were really discussing would just close their door. It would be very, very bad. Another bad outcome could be indeed that like copyright become like a much larger as a reaction. So yeah, yeah, we'll see what's happened. I suppose there's a few ongoing cases that will give some insight into the direction of travel into that. We are reaching the end of the hour. So that means that sadly, I will have to ask our panelists to come to their closing statements. So I'm going to ask Isaac first and then someone's going to get back to you. Final statement, short. Yeah, I don't think I have anything particularly momentous to say at the end just to say that, you know, we're in a change of our time of a lot of change. And so these sorts of discussions, I think are incredibly important. And I'm hoping that we can continue to have them so we can be hearing folks concerns and questions and hearing the different points of view about what directions they should be going in. So I'm very thankful for this panel and hope to have more in this style or in other styles. It would be interesting to actually look at these questions and the answers that we have provided in three, six or 12 months and see how our knowledge has evolved. Thomas, how about your final statement? Yeah, I wanted to thank you. I will read all the document that was super interesting. I think all of these questions, they can raise new idea, new potential project, new things we could do in better direction. So I'm very thankful for everyone to participate in this debate. And yeah, just like as I said, I think it's important that people don't hesitate to voice their idea, their opinion, what they think could be done. And if you have some interesting project collaboration, you can reach out to me on the hugging face. We like to push these things. Wonderful. Say we had lots of interesting questions and discussions in the document and in the chat here as well. So thank you to everyone who has participated. And unfortunately, we couldn't answer all questions, but maybe I could ask our two panelists to have a look at the document and answer the remaining questions. And also perhaps go through the comments in the chat that we didn't have time to address in greater depth. It's been a pleasure. I'd love to watch this recording, as I said, in six or 12 months and see how our thinking has evolved, how naive or perhaps overly pessimistic we were. But I think we all agree that these two communities have changed the world in different ways. And any ways in which they could come closer and collaborate and learn from each other will benefit not just them directly, but also society at large. Thank you so much for the discussions, for your comments and for giving me the opportunity to moderate this panel. Thank you. And with that, I'm going to hand over to Bob. Thank you. Thanks also for my side. Really, really interesting discussion. Thanks a lot.