 Now it looks green. That's great. Hello, everyone. Great to see you. This is going to be a participatory bit of conversation, discussion about what kinds of reference works we can generate today. So we should definitely all be at the roundtables up front. So if you're here for the automatic construction, come up to the front tables. And I'm going to give a little background about the state of tools that are helpful for constructing reference works to scale. And then I'm inviting everyone here to brainstorm what kinds of new reference works we could make, what's easy now that was hard a few years ago, and some of this will lead into the wikitopia discussions of what kinds of wiki society are now sustainable or potentially can overcome the challenges of keeping out bad actors and spam and capturing all the relevant knowledge and actually making things accessible to everyone in their own language and context. Some of you may remember the old wikipedia weekly news forecasts about what the future wikipedia would be. These were dreams in 2006 about what the world would be like roughly today. And one of the summaries, still a little bit prescient, was that we would no longer have a lot of lag. Conflicts would be resolved automatically. And the software developers would already be planning the next phase of media wiki with built-in AI capabilities. And some of this is true. But I think to a great degree, all the conversations we've been having at this wiki media about AI have been tentative with the idea that maybe it's going to be a problem for current workflows or really a problem for current editors. And what do we do about that? So I'm asking a different question. Let's think about the kinds of new reference works that we could all make and then see the extent to which they fit into current frameworks and figure out whether this can advance some of the current projects or whether it makes sense to think of these as new sibling projects or to find other people who are now doing the same work, but they're doing it under the rubric of automation and AI models and not even thinking about privacy and free licensing and collaborativeness that has brought us together in the first place. A few years back, we had much earlier generation conversations about what AI and the wiki were what it looks like. Andrew and Sarah both helped organize a hackathon with the Met and Microsoft and MIT. And people were just imagining what the future of museums could be in the world of AI. And the Met was pretty open to the idea that they might be generating parts of their museum catalogs. They might be trying to fill in gaps in the historical record in ways that felt genuine or that they could at least use to illustrate a concept, not just artists' renditions, but some actual synthesis from available data. But things have changed since then. And there are a lot of things that felt like they were curious but not necessarily satisfying that are much more plausible now. So Sarah, if you want to talk a little bit about this. Yeah, just briefly. But some of the things I've been thinking about recently, I work on large-centered models in AI. And most of the time, we take data as a fixed thing. And we try and design better platforms around it. Platforms that ingest it, make sense of it. Maybe some of that sense-making is done by humans or in collaboration with the platform. But data is something about the world and it's fixed. And one of the things that I'm most excited about with our new generation of language models and image generation tools is that maybe the data isn't fixed. So there are some structural properties of these large-generative models that lend themselves to making something that I like to borrow from a colleague, Phillip Easler, called data plus plus. Thinking about data is something that we also have a model for, where every archive might come hand in hand with a model that could generate additional entries in that archive. So what are these properties of such large-generative models that kind of give us this data plus plus? Well, one thing is that they can take a small data set, read data about the world, and turn it into a much larger data set. And as these models improve in quality, the fidelity of that larger data set is better and better. And so we can increasingly interpolate between existing pieces of data and some small data set we have and flesh out what lies in those gaps. There are other interesting structural properties that allow us to do more informed extrapolation. But even just that interpolation itself is already pretty interesting and useful. And also importantly, these outputs are themselves steerable and controllable. And as we learn more about how to design better architectures and understand how they represent information, then we can not only interpolate data that fills kind of in the gaps between something in our existing data set, but we can do that in an informed way. And we kind of started to, as I hinted at, do something like this in the Met project. It was a bit ad hoc, kind of steering randomly between maybe existing artworks in a museum digital collection. But I'm sure most of you have experimented with text control, right? Recently, text-guided diffusion models allow us to steer through this latent space of information using the language, which is excellent. Because then we can kind of beef up our data sets in these much more intelligent controlled ways. So I would kind of think of the provocation for this session as asking what happens when we can make better data. And the data that we're designing platforms around is not something that's fixed, but something that can be kind of automagically augmented, auto-translated. And these kinds of gaps can be filled in. And maybe once we kind of come to terms with that, then we can start thinking about platform redesign around that fact. But I think the marriage of these two things is now kind of in a different place even than when we were a couple of years ago when we did this project with the MET because the quality of the large models that we're collaborating with is just a lot more promising. And scaling is part of that. So I'll turn it back over. So I find the changes in the last couple of years very inspiring. Echo likes to say you should always be ready to rewrite the encyclopedia. And we can definitely redefine the sorts of things that we can write. One example from a session yesterday was expanding the idea of Wikipedia to include enriched citations. Citations for every claim, automated citation evaluation or quality evaluation for articles that could expand on the Wikipedia 1.0 project, which is Wiki project-based assessment of the quality and importance of every article. So let's brainstorm for a minute. Everyone who's here, if you have a laptop, get your laptop out. And we have a handful of examples of things that can be automated on the Japan for new PDFs. And let's take three or five minutes to write down things that you work on that you can imagine being automated if everything worked out. I think we can also easily imagine ways that current automation might fail. So make notes of the things that would give you pause in simply completing a catalog with the automatable versions. A nice example in the world of glossary since there are a lot of glossary-based models is you can generate a topic catalog from a corpus of documents and you can generate definitions from all of them from the same documents. And Wiktionary has been intentionally limited to things that reach some level of notability with plausible etymology and plausible usage cases. In, this is an example where you get usage cases for free but maybe not the etymology. And I'll talk through an example that I'm working on and then let's come back together and look through the examples at the end and see which of them has an existing prototype that people are using and how we could imagine that actually building a sustainable community of practice. I think one of the interesting features of a really successful generative model is that it can speed up a lot of the maintenance tasks which are usually what cause a briefly successful wiki to fade. There's some threshold of activity below which we know wikis die out. And a lot of that is keeping things fresh, avoiding changes that are somehow against the initial spirit of the project. And for many of these things, some of these tasks that models are very good at like translation transfer across modalities, translation into different media formats or for audiences at different reading levels. Some of those things are actually just very easy now. And rather than the initial translation requiring the participation of the community, there's more of a moderation task of making sure that the whole still fits together. There's some style guides that are applied uniformly that new contributors are feel welcome and are able to contribute themselves. There's a lot that's been done on summaries and abstracts and really the initial generation of abstracts for journal articles was for a wikipedia-like desire to have a quick overview of everything in a catalog. Please, could someone get the other microphones so we can pass it around? Is it, okay. In these last few months, I've been creating categories on the wikipedia for musicians because we have the category albums by singles by and so on, but most of in many cases we miss the categories. I have been helped by Nigo, so I have a list of categories that are missing, but I like to get them by hand, so I'm not going to ask to abort, even if for sure it could be done by abort. So, no, I mean, of course in many cases some work should be done automatically or something, but in some cases, I mean, sometimes I like doing just something which I don't have to think too much. So I wouldn't like for abort to do that work. That's just something silly to say. So I know it's silly, but maybe I don't think it's so rare anyway. I think that in many cases, wikipedia is like doing things just like this without that could be done. Oh, it's very fun, yeah. But for all the things that are fun, it's also good to have tools to speed it up. So is there a piece of the categorization that you think can be learned as a repeatable style? Yeah, I mean, it could probably be done automatically. I mean, you have to check a few things. Maybe the category exists with a slightly different name without the article or with the Zambiguation, but it could be done by both for sure, yeah. But I'm not going to ask any, please don't tell anyone from Italian Wikipedia. No, I'm just joking. But yeah, I mean, just it's okay. There's some aspects of catalog alignment that also that reminds me of trying to figure out how to fill out an artist's publication catalog from a range of different sources. And some of those have some of, in many of these things, the whole task is generally fun and takes a lot of human creativity, but there are parts of it like coming up with the full list of possible titles. That's just a lot of searching and deduplicating. And some of those things can be pretty cleanly handed off to a model. Yeah, I think once you see, once a model is tangibly a partner in this, getting the model output to start feeling like your own output is a bit like getting a good info box template to work or getting the style guide right so that your collaborator in a different time zone is now making things that feel like they've captured everything you've learned about what makes things easy to read. SJ, I'll just highlight one thing I put up there, the purple, which is Naseem et Shabun had an interesting presentation at this conference about sound maps, which is basically recording audio of different locations around London and having that be the main content so that you're listening to neighborhoods, not looking at neighborhoods, which I thought was really interesting. But I think this does highlight something that I've been trying to get the attention of the foundation for a while is to support these types of things of experimenting with making new pedias, whether it's through AI or through other means, is to have a more kind of modular playground where we could try these things out rather than needing really deep developmental knowledge to do these things. So for example, like wiki data query has some nice canned ways to do maps, to do timelines, right, to do these types of interesting things, but they haven't changed in probably seven plus years. It's still the same suite of your basic ways to take the data and make it displayable, so to have some kind of real robust framework there instead to say, hey, here's all these cool new JavaScript libraries out there like Kepler to do like 3D mapping and things like that. It'd be nice to have a more modular system so that we could try these things out very quickly rather than needing to do a heavy lift of coding something from top to bottom. And a lot of the Scalia components could be described as a model that would then be usable in a lot of different environments, but it just shows up right now. All right, one more minute for writing down notes of things that you've been working on, and then we can wrap this with some more utopian thoughts. Also if any of you have explicitly worked with one of the current models to try and help tune or train them, I would love to hear about that. I don't know how the latest lift wing versions of ORAS are being updated. Yeah, neutrality, whoever added that for new articles. That would be great. Can someone say a little more about suggested AFD resolutions? Yeah, it could be maybe a terrible idea, but it could also sort of work that you could evaluate how everyone voted on an articles for deletion page and say where the consensus of the discussion is. Like I say, it might be a terrible idea, but you could also maybe see who's citing actual policy, where people agreeing with each other, what are the possible range of things that someone might select. You probably don't want to have the sense and article to life or death based on this, but it might give you some idea of the possible range of options. Yeah, reducing the amount of time people have to spend on social mediation or knowledge mediation would be nice, because that is fun, but also a source of burnout. Okay, take one more minute and look over the high level ideas here, all the bolded ideas and put up to three pluses next to ideas that you would find particularly useful. Yes, or if there are some that you think would not work out very well, leave a brief comment about what you think the problem would be. Great, thanks everyone. So the last question that I posed at the beginning was how does this change the kinds of products that we could accomplish as Wiki communities? What does this mean for the sorts of Wiki societies and knowledge societies that we can support? And Feros, if you want to come and say a little bit more about that and introduce the ideas of at least the utopian flavor, if not the dystopian flavor. Yeah, we can give that to that too. That one's a secret. Yeah, they're all secrets. Sorry. Okay, anyway, I'll just start talking. Yeah, go ahead. Hey, welcome everyone. I wanted to share today the...