 Good. We will start the next set of talks. This is actually a set of lightning talks, and hopefully we will have some time for discussion as well. Very related. Lydia was just talking about Blazegraph. We are talking about using Blazegraph, among other things. The program today, I will give a short introduction to this kind of concept of WikiCite. I know many of you in this room are familiar of the people I know, but I will talk a little bit about that. And then we have four short talks, including a couple videos. We will play three of the lightning talk speakers are here. One is not able to be with us in person in Singapore. I hope we'll have time for discussion. There is a slide deck, which you can access. The link is here, and I'll show this link again. There are etherpads for this session. They're hard to find. The link is here. You can get it also out of the Wikimania Wiki. My name, by the way, is Phoebe Ayers. I am from Boston in the USA. I'm a librarian in my day job. And I have been involved in the WikiCite initiative for a few years now. So WikiCite refers to this community that's interested in citations. Bibliographic metadata is the term of art in libraries. And specifically citations and bibliographic metadata in the Wikimedia projects, in Wikidata, and in Wikipedia, and the other projects. The term WikiCite refers to a collection of projects, which we'll hear about a few of in this session, as well as the community. So this is an umbrella term for the umbrella community that is interested in citations. It's a loosely organized group of people, in fact, sort of self-organized at this point. Like, if you are interested in citations and bibliographic data, you are a part of WikiCite. Or you could be a part of WikiCite. WikiCite also referred to a set of conferences that were funded with specific grants that the Wikimedia Foundation got. Those are now finished, but ran from 2016 to 2020. There was a steering committee for those conferences and those grants, which I was a member of. And we had several good, like, strategy meetings and project meetings and hackathons in that time. The real meat of this is what are the current questions about WikiCite? So there's lots of questions. The big one Lydia already brought up, which is where should citation data be in the Wikimedia universe? Should it be in WikiData? Should it be someplace else? The question is relevant because, as we just heard, there's query problems, there's a lot of bibliographic data in the world. There's a lot of bibliographic data in WikiData. We might not have room for all of it. We might not have the ability to query all of it specifically. There's a lot that we don't have. So the question is how big do we go? And where do we put it? The very related question is how do we share this data across the project? So if you've got a Wikipedia and you want to use the citation that's in WikiData, how should that work for you? Or comments or dictionary. What tools can be built to understand citations better? The three, four talks will be about the tools that folks have built. And if we can understand citations, does that mean we can improve content quality also? Like can we understand misinformation better, disinformation better? Can we understand how to better trace the provenance of information on our projects? And how should we do that with data? I will say there's a vibrant community on the mailing list, on social media. There's a big telegram group that lots of people are in. James Hare, who many of you know from the Internet Archive and Wikimedia has also recently started monthly Zoom calls again. And you can join those. And there's a notice, there's a link in this meta page up here and they're also announced on the mailing list. I'll stop here and turn it over to my friend Diego. But before I do that, there's other talks on this program. Some of these have happened already. But when you go back and you're watching the recordings, there's a bunch of other Wiki site talks this year at Wikimedia, here are some of them. And now I will turn it to Diego. Thanks. Oh, so we will save questions for the end. I think we'll have time to do four short talks and then we'll have a discussion, yeah. Okay, thank you Phoebe. Okay, so I will talk very briefly about a project, Wiki site related project that I worked in two years ago, it's called CITA. So in the scientific community, citations is of course a central part of it, just like it is in Wikimedia as well. We are always building on what previous scientists have built. So understanding citations is a very important part to understand how knowledge evolves in whatever field. And this data, like what papers site which papers, what works site which works, has become increasingly available in the open in the last years, thanks to the effort of, for example, the initiative for open citations, the open citations project, and much of this information is also available in Wikidata. However, when we started this project, there was to our knowledge, no popular reference management software, which is the software that scientists, journalists, et cetera, used to organize their bibliographic metadata. There was no such software that had support for citation graphs. So you would know like what is the title of a work, what are the authors of the work, but you wouldn't know on the reference management software what other works each work cited. So the idea of the project that was financed by a Wiki site grant, the year that the conference could not be held because of the pandemic, this was in 2020, is to bring support to Sotero for these citation graphs, either getting this information from Wikidata or adding that information manually. Also because visualizing, sorry, providing visualization support for this information so that it would be easy for users of the software to discover new relevant connected works and also provide a way for people that might be adding this information manually who are Sotero users, but who maybe do not much about Wikidata or are not active Wikidata editors to contribute this information they collected directly to Wikidata and make it available to other Wikidata or CTA users. Well, right now this software has been downloaded, I mean, we know it's being used by around 3,000 users. We are not sure about how this will continue being because of what Lydia and also Phoebe were saying about we're not sure what's gonna happen with the geographic data in Wikidata, but regarding those things, well, CTA, it was planned that it doesn't support only Wikidata but maybe also other bibliographic repositories and also the scope of CTA is beyond just journal articles. So yes, we do know that sometimes there is some replication between what is hold in Wikidata and what it's hold in like, for example, other repositories like OpenAlex, but there are other kind of work that are not available in other repositories like for example, very old books and maybe Wikidata can still be a repository for that, but that's of course a longer discussion related to what Lydia and Phoebe were talking about. And also we're not sure about long maintenance of the project, like every, or like many, sorry, free software projects maintained by volunteers. When the grant ended, maintenance rate decreased a little bit, then someone appeared that was a user of the project that actually took the maintenance of the project. I'm very thankful for him, Dominic Dalosto, but now also he's not a developer, I mean, full-time, he's a student, he's pursuing his PhD, so we're not sure whether he will be able to continue maintaining it in the future. So well, that's it about it, so thank you. And I think it's your turn, Jose Medina. Okay, thanks. Or introduce yourself and then we can play the video. Yeah, so hello everyone. So I'm Jose Medina in Turkey. I actually have another presentation within a few minutes about using LLMs actually to edit Wikipedia and Wikidata. Well, I will be talking a little bit about a thing that I have been working on for a year and a half, which is a collection for frontiers in Research Metric Analytics about how to use actually Wikidata to provide real-time research assessment of topics and scientists. So enjoy your time. My idea about integrating Skolia and OEKJ to enhance real-time research assessment of scientists, institutions, other citizens. So as you already know, Wikisite is an open database of bibliographic metadata hosted in Wikidata and it is mainly built based on crowdsourcing from open resources such as open citations and crossroads. And it includes bibliographic metadata such as citations, source titles, authors, publication years, external identifiers, et cetera. However, what Wikisite does not support is what is included inside the paper. This means the research metadata. For example, it does not involve the reused methods in research paper, the sample size, the main results, the target variables, et cetera. However, there is another resource called OEKJ, the Open Research Knowledge Graph that is an open database of LinkedIn Research Metadata is this structured in the RDF format just like Wikidata. Well, Skolia, as the tool, is a Sparkle-based dashboard for generated sentimental profiles, as you already know. And it is mainly driven by the data provided by Wikisite project. It provides statistical overview about publications, scientists, venues, prices, institutions at countries. But however, it does not involve statistics about the method used by scientists and institutions, the main research findings of scientists and institutions, et cetera. So what we can do here is that we can use OEKJ just to add new Sparkle queries to Skolia to actually cover these kinds of useful information. And as OEKJ is an RDF Knowledge Graph, it has its own Sparkle endpoint. And at this point, we can use the query federation with OEKJ just to generate federated queries and have integrated results that includes insights from Wikidata and OEKJ at once. So that's my main idea. I think that this idea is very important to bring Skolia and Wikisite to the next stage of the development of real-time research assessment. So if you have any question regarding this method, feel free to reach out to me. I will be happy to answer it, to answer all of your questions and mainly discuss more ideas related to such an integration of Wikidata with other resources to enhance real-time research assessment. Thank you. Super, thank you. And I think that... I think that we will have time for questions for our speakers who are here in person and then maybe we'll play the last video because Rod was not able to be online. So I wanna make sure we maximize our time with the folks who are here. So our last short talk is from SJ. Take it away. I love these. I wanna talk about crap for a minute. And I don't know if people in the audience, how many of you have interacted with the Wikipedia 1.0 project? It's very federated, it's very decentralized. Wikiproject by Wikiproject, people go through and assess individual articles in the categories that their project cares about with regards to their importance and their quality. And every year, a couple of years, someone pings all the Wikiprojects and says, hey, can you guys go around and try updating and making sure that any new articles have a tag and then summarize that they run a handful of bots that summarize the current state for each project, products can compare, they can see how well they're doing. And it's just a very satisfying thing to have a nice report card. And the idea was for there to eventually be a Wikipedia 1.0, a nice snapshot of something that everyone felt good about. You could take the slice of articles that were above some threshold of quality and some threshold of importance and make a snapshot. In the end, I think we've ended up making a lot of offline snapshots that don't rely too heavily on that data, but some initiatives use this explicitly as how they decided what to distribute in different contexts. Nice things about this are it's very low key, it's very lightweight, most people don't notice it, and it doesn't really need any elaborate data structures, you're just adding categories to talk pages. So what do we do after Wikipedia 1.0? We're sort of there in a few dozen languages, we have very good coverage, citations. I think in the same sense, we should be assessing citation quality and coverage for all the articles. Most articles have a bunch of citations that have been assessed loosely based on the known reliability of the source domain, but not necessarily in terms of whether they include the claim that they're attached to. So just in the same way that we need to regularly go through other articles to figure out whether the quality has changed, you should regularly check whether the citation at the end of a paragraph still refers to what's mentioned in the paragraph. These are things that can be automated in the same way. We should, so I've been working in this arena trying to help build some of the tools that will be useful for getting this done, and I think if we did this, Wikipedia is already the best cited collection of knowledge out there on the web, which is a little odd because you would think that investigative journalists and other encyclopedias would really want inline citations, but for the most part, with a few exceptions, people are happy to just stick with some work cited at the end. So we should make more hay about that, and invite people to build tools like this. We already have citation bot which helps people clean up citations and make them beautiful. James Hare worked on a credibility bot that goes through all of the sources used as references in articles in a category, which would be very useful to Wiki projects. We tried this with the Vaccine's Wiki project and shows you things like this dashboard at the bottom. How many articles were included? How many domains there were? How many of them are known to be reliable right now? That 3% is very low because a lot of these were archived, and so it's just the proximal link is to a DOI resolver or to some other redirects. So one of the things that I think caring more about citation reliability would get to is little things like fixing the citation template so that you're pointing to a non-redirect as an indicator of the source domain, or some otherwise describing the source domain. And we do have some mechanisms for flagging sources, and there are some more, like Aure's has probably not Aure's now. Some of the new automated tools that are feeding into enterprise flows are thinking about doing source evaluation, so it would be great for us to end up with a few more explicit flags. Right now a lot of the flags are embedded in user scripts and user widgets that people use for reading that highlight the sources on the page for how good they are. So I would love to talk to people who are doing work like this. I've been working on versioning tabular data with the underlay project for perennial sources tables to do things like map the different topic-specific perennial sources tables across Wiki projects into one another or into the global one. And WD Crap, which I invite everyone to contribute to, is the Wiki data product discussing source reliability, how we can improve it, what a shared schema might look like. So I hear that the Wiki data graph is gonna be split and that if it is split, it's probably gonna be bibliometrics that are split out that gives us an opportunity to figure out what a split could look like that includes other citation data. I would love to talk to people who are doing work on enterprise that are capturing some of these reliability signals and we could use some campaigns and some partnerships that help to gather information that's out there in the wild that's not yet freely available and say, please give us under the CC0 license all this data about anything cited. Don't want to repeat you. Thanks. So stay there, yeah. I think before I played this last video, which is from Rod Page, our friend and colleague Rod Page about Alec using Alec. Are there questions for Diego, if you say to me, or SJ about their projects? I wanna pause or me I suppose as your MC, but like getting references in, using references to understand articles, getting research information in. It's very relevant to my day job as a librarian. Any, yeah, I see one question here. Yeah, maybe two questions. One was about the reliable sources and getting more sophisticated about what is included in reliable sources than what is on the current list. So I'm interested in policy reports and papers and how they fit in. They're not generally very well cataloged, so it's difficult to find them. Often may or may not have URLs, et cetera, and quite difficult to assess. And I'm also wondering about shared citations how we get the citations that are actually on Wikipedia to be able to analyze them. Yeah, do you wanna stand up here? I don't know if we have anything. Starting with your last question first. When you say shared citations, do you mean the shared citations proposal from a couple of years back? Yeah, so my take is we should start by defining the Wiki base that will split out some of the existing bibliometric and source metadata that's in Wiki data, which is a different starting point from what was in the shared citations proposal. But there are already a bunch of properties about some of the things that you mentioned that you might care about. Domain reliability or non-domain if you don't really have a domain, some other measure, and there are some very underused properties that we could start using and then normalize a community around that. There's a discussion on the WED Crap Talk page which is like what are properties that you use for these sorts of things and can we make them better? What else do we need to add? Your first question was? So currently the reliable sources are considered to be books, journal articles, news articles. That's about it. Are you working on a policy paper project? I am. Okay, great. So I've seen that project, which is great. We should definitely do more things like that and there are many more countries. You were just working in Australia? Yes, but the project won't just be about Australian policy issues. So I'm Amanda Lawrence and I've just got a Wikimedia Foundation grant to look at policy topics. So rather than there's been studies on science and medical kinds of areas where you probably are more reliant on journal articles but the policy space is really a lot of organizations. Some of them are top of the line. Some of them are very dodgy but it's quite difficult. There's thousands of them, thousands and thousands of them. So the perennial sources is gonna be very difficult. Yeah, so I'll just say that the one parallel is preprint servers and they're increasingly fields on the cutting edge that publish almost everything through preprint servers. And if you look like archive.org in perennial sources is listed as do not trust, which really varies a huge amount with category and author. So let's talk about, I think we can do much better than domains in those cases. Yeah, either of you want to address that. I can notice they're talking about that. Reliable sources is a bit bigger than the data itself but is very related to how we get the data in. Like the kinds of data Diego's talking about like all the infrastructure has to support citing these things too, right? It's my take. Nicholas, and then I'm gonna play this video I think which is about five minutes. We'll go just like 30 seconds over. Yeah. Actually it's kind of a follow up question. Yeah. Wiki source. If when there will be a split in the graph of things like that, where will Wiki source and be ending? Because we need recycling. So many Wiki data and you want other type. So that's that. Big question. I think you should answer this question about Wiki source. I don't have an answer. But say again the question. I think that's again the same question I asked before to Lydia is how we prepare to that because there will be written on data somewhere. Obviously it's unavoidable. There will be point of pen where we have to choose. And my question is how do we prepare for that? Could Wiki site help also? Because Wiki source is not just me. Anyway. Wiki source. Just say one. Want to talk about Wiki source? This feels like a hallway discussion perhaps. I don't know. It is. Yeah, yeah, yeah. Let me, because I have a nice talk here from our colleague Red Page who could not be here in person. So I will play this about OpenElic. And then I do invite, there's a lot of people in the room here who have worked on citations. Andy Mabbitt has worked on citations that are relevant to your project. That's relevant for your question, Amanda. So there's a lot to say about Wiki data. I originally proposed this session for an hour and for discussion. And I wish we could have that, but we will have it tonight over karaoke instead. Citation karaoke. So thank you, everyone. Hi, I'm Rod Page. And today I'm going to talk to you about a little tool that I built called Alec. And this is a tool for visualizing information in Wiki data. So by way of background, I'm a biologist by training. I'm interested in biodiversity and taxonomy, study of classifying species. And I make a lot of use of the wikis. I use wikipedia, wiki species, and wiki data. And one of the things that sort of is challenging here is that there are lots of different wikis, lots of different kinds of information. And this information can be presented differently. So for example, if I'm interested in a monkey called the Bendered Suralol, which is found in Singapore and other parts of Southeast Asia, I can go to wikipedia or I can get lots of information that's aimed at somebody reading it. This is human readable information. I can go to wiki species, which is quite an interesting wiki. This is one of the few wikis that's focused on a particular topic, unlike most wikis, which are global in scope, but differ in language. And lastly, I can go to wiki data. Now, the interface of wiki data is very different. It's all about editing information. So it's great if you want to edit things but not so useful if you want to just find out what do we know. So what I decided to do is to make a little tool that can help me navigate information that's in wiki data. So this is a tool called Alec. Alec is an acronym. It's also my son's name. I still haven't figured out a good explanation with an acronym. What we're seeing here on the left is a screenshot from Alec that shows some information about the same monkey that we've been looking at. And this is all information which comes from wiki data. So information about links to other databases. If you want to learn more about that particular species, where it's found, other parts of its taxonomy, there are subspecies within the species. And most particularly, and the thing that I'm really interested in is links between this species and the literature. These are articles, scholarly articles, in wiki data that tell us something about this particular monkey. So this is a direct link to the evidence for that particular species. So under the hood, what's happening here? So Alec uses the wiki data query service, standard way of querying information in wiki data. When it gets the results back, it formats them in a particular way as a list, very much like something called an RSS feed. And then I take those lists and I format them in different ways depending on what I'm looking at. And as I said, my focus is on publications. I'm very interested in the wiki site project, which is trying to get the scholarly literature into wiki data. But I'm also interested in people and species and so on. So just some examples, this is how a journal looks. And you can see here, this is a journal and these are all articles. And when you see a thumbnail like that, that tells you that we have a free to read PDF for that article. So this is a way of showing you here are the articles for this journal. And by the way, these are the ones you can have a look at and read. Academic journals often have a very complicated history and wiki data is great for this. This is the sort of family history of one particular journal. So the French journal called Adamsonia and it's been renamed and there's all sorts of very complicated things going on here. This is the way to summarize that history which is all documented in wiki data. And also, taxonomy doesn't happen without people, without taxonomists. So I'm very keen in helping document those. This is one taxonomist, this is the late Vicky Funk and you can see there's a page here. It has information on her publications but also information on publications about her. So these are obituaries. For example, published shortly after her death. So if you're interested in understanding the impact of this particular scientist, wiki data would be a great place to start. So just to summarize. So the underlying goal of this work is to document biodiversity in particular by uploading publications about species and linking those to the species but also to the people who publish that information to the specimens that taxonomy is based on, to museums, draw some wiki data and so on. I need a tool to help display the results but also to find gaps. And that's what I like to us for me. If I go through a journal page in Alec and I don't see any articles, I know that's a journal that needs work. And if you find any of this work of interest, then I invite you to go and try Alec at this website here. Thank you very much for your time. Thank you all for joining us. I'm sorry to eat into your break. I would encourage everyone to, I'm going back to the beginning, to join us in the wiki site conversations that happen online if you're not already in these places and you're interested in these topics because as you can tell, there are many, many, many very cool projects that are happening. So it's break now, right? Until 3.30? True? Okay. So we have a like 10 minute break but if there are questions, again, for any of our speakers, Sam, who's in mean Diego, feel free, we can merge this into the hallway. Anything pressing? All right, join me in thanking our speakers. Thank you.