 We're talking about the WikiCite project, the Skolia project, and how that relates to medical content in Wikipedia and beyond Wikipedia in general. How do you all do? My name is Lane Raspberry. I do Wikipedia in the United States and a place called the University of Virginia in their School of Data Science. I'm Blue Raspberry on Wikipedia. You can write to me after this. If you have questions, that's the best way. I will talk to you by video chat at any time. You all are in conference overload. Just relax during this presentation. The slides are online. I'm going to cover a lot of things here and move quickly, but we can talk another time. You can send me questions. Everything will be answered. What I'm going to do first in two minutes, if you just stay here for two minutes and can pay attention that long, I will give you the skill that you need and then I'm gonna keep talking, but you can relax through the rest of it. Everything I want to say, I'm gonna say it in the first two minutes. I'm gonna say something about this concept, scholarly profiling. When I explain this, you're already going to know it, but I'm gonna put it out there. I'm gonna introduce these Wikipedia projects, WikiCite and Scolia. I'm gonna say beyond technically what they are, why I think they're going to save the world and then I'm gonna give a list of medical examples how WikiCite and Scolia relate to medicine. Of course you can edit. You can participate. This is Wikipedia. The skill I want to teach you is how to learn Wikimedia's variety of scholarly search. How to use this tool called Scolia. It's a WikiData related tool for scholarly search. You search for Scolia in your favorite search engine. Somehow you arrive at this place. Look, a search box, very familiar. And you can put things like a search term in it, any disease, medical condition, drug. Medicine is very well represented in this search. You can put in any kind of search term that you would search for for information to develop a Wikipedia article or anything else, but climate change is very popular. Any topic in diversity representation, very popular. Anything related to the open movement, open source software, open access, open science, very popular, but whatever you might research in Wikipedia you can research in Scolia. And when you do this, Scolia will spit out all these data visualizations. And what I'd like you to know about this, these are visualizations of the scholarly or academic literature. This is, Scolia is a tool by means of which you get access to sources in journals from universities, from scientists, from researchers. And these are good things to cite as you develop Wikipedia articles. So that's the skill I wanted to teach you. Navigate to Scolia, type in a search term. Look at these visualizations to help you find the research papers that you want to do research in Wikipedia or research anywhere else and then be enlightened. You've found good information, you can develop Wikipedia or anything else. What I just showed you, those pictures, I'm calling those scholarly profiling. This is a new term, a new concept. People have of course been doing this for many decades, but we're in a new age with artificial intelligence and all kinds of support for these kinds of things. And if anyone wants to know what is scholarly profiling, I expect that most people would be familiar with Google Scholar. And I would say Google Scholar is a competitor. It's related to what we're doing with Scolia, WikiCite and Wikipedia. For anyone who needs the introduction, here's what Google Scholar looks like. We've got a Google search box. You put a term into this and then you get information from Google what Google says or academic papers related to your term. Google Scholar is very popular for topics, give you a list of papers on a topic, the papers for a researcher, the papers by a certain keyword. Google makes good products. And I'm gonna say more, but yes, I do think that Wikipedia, Scolia, WikiData, WikiCite, all these things have a competitive advantage against Google. We in our little Wiki community can do things and be better than Google in certain key ways. One of the ways that we're better than Google is that when you search for Google, you get 10 pages of results, hundreds of pages of results. We need more sorting of these kinds of things. And if you go to Scolia, it's not possible for you to read 1,000 papers that Google's recommending to you. Our Wiki community, we curate things, we rearrange the way things are, we sort them ourselves, and we can put these into smaller categories, things that are more relevant to the kind of purposes that our community needs. So of course, while you would want this, Wikipedia editors want the best sources to site in Wikipedia, but this is a lot bigger than Wikipedia. I'm talking about everybody in every context wants help finding the best sources. And we can do things that Google can't do. It's very fortunate. Intro to WikiCite and Scolia. What are these terms that I've been using? What exactly is WikiCite? What is this Scolia tool? It's a lot of name dropping in context, but there's this project called WikiData. You're familiar with it. And a big part of WikiData, this pie chart represents the number of items in WikiData. And what I'd like you to see is this big slice here, the biggest slice of the pie, that is related to a project called WikiCite. These are papers that are indexed in WikiData, research, references, citations. So a big part of WikiData is sorting out these citations. The WikiCite project is the community in WikiData that's sorting out these papers. WikiCite is sorting out this big slice of the pie there. And I would call it the flagship project of WikiData, the project in WikiData that's got the most engagement, the most number of people who are curating, sorting out this content. And the most people who are talking about the very many social and ethical issues that come from sorting out this kind of literature. There's a lot of policy controversy in WikiCite and a lot of engagement in the WikiData community for this. If you'd like to get engaged in the WikiCite project, there's a project page in WikiData. This is where you say I need help doing something technically. I wanna know more about it in a Wiki project. You go here and there's other people who will talk to you 24 hours a day for the rest of your life about this WikiCite collection. Beyond the technical aspects of it, look at this, where we left WikiData, we went to Metawiki and there's also a project there called WikiCite. And here there's communities organizing regular meetups, people meeting online, people meeting in various social media channels to talk about these things. And there's at least monthly discussions, but actually the social discussions never end about this. That's coordinated in Metawiki. I want to tell you all, there's been multiple WikiCite conferences. Conferences are transient events. You come to them or you don't. And I don't expect anyone to care about the conferences that happened in the past, although they have presentations like this when you can read about what the discussions are over time. But what I want to impress upon you is that there's hundreds of people who are talking about this WikiCite project continuously and they have been for many years. And if anybody wants to get engaged in this, there's an existing community. It's a very popular thing to discuss for many reasons. It's hard for me to briefly explain why hundreds of people need to talk about this for years, but a lot of people find this very interesting and there's a community that's trying to sort out the best references and citations in Wikipedia for so many different reasons. So the name drop of terms. So I got a list of definitions here. Wikipedia, you know very well, this is the prose encyclopedia, very popular in search engines. Everybody in the world is reading Wikipedia, right? And a compliment to Wikipedia is WikiData. WikiData is for humans to read. WikiData is mostly for machines or artificial intelligence to read. And the content in WikiData that we're talking about is machine readable structured metadata. We're not talking about the entire academic or scientific article, we're talking about the metadata who wrote it, what journal is it in, a few keywords about it, what is the main subject. So we're talking about library indexing. How do you get access to these papers so that you can read them somewhere outside of the Wikipedia platform? WikiCite is the librarian project that sorts out all this metadata that WikiData is indexing. And this tool, Skolia, that I showed you in the beginning, what I wanted you to know about, that's a front end. It means that you go to this website and instead of having to form a query in WikiData's query service needing to know the query language, you can just type in your keyword and get access to all this data in WikiData. It's like Google Scholar Search, you can go to Skolia and find papers in that way. So many Wiki projects related to WikiCite, so many people engaged in it. If you want to talk to humans about all these things, you go to a Wiki project and you have a conversation with humans. What's the big idea for Wikipedia? If we can sort out access to the scholarly papers in the world, then at the bottom of every Wikipedia article, we can do all kinds of checks to make sure that the best sources for any given topic are cited in Wikipedia. We're talking about improving the quality of Wikipedia articles. But this is also a big idea for the world. We're practicing sorting out quality of information in Wikipedia, but there's so much fake news. In circulation, we need quality checks on every source of information, everywhere in the world, in every context. Everyone at every university needs help finding better access to research materials. So what can you do with these scolia profiles? Common profiles, you can search for topics, get a list of papers by the topics. You can search for an individual researcher if you think that you're interested in the kind of things that they're publishing, get the list of papers that they've published. If you find that a researcher is at a particular institution, like a university, maybe that person has colleagues and you'd like access to their papers or would like to know what's coming out of the university. Scolia can do all these things. Let's look at a few demos. So here I've got an example profile, type in the name of a university into Scolia and it gives you the list of the researchers who are at that university. Then you can click on any of their names and you get the list of their papers. So in this way, if you know a university and the data's incomplete, we're adding more data to this all the time, but the intent is for a given university, find out what kind of research they're publishing. Then if you know the people at a university, get a list of their papers and then tag those with the kind of topics that they research. And in this case, I've got an example of someone who's researching medical topics and so you can see what kinds of journals they're publishing in, get more access to their insight into the kind of things that they publish. When you look at one topic and here I've got the issue of obesity. If someone is ever talking about this medical condition, what other medical conditions do they talk about or what other topics do they talk about? So you start by researching one topic and then you find these kinds of visualizations where if someone's talking about this, then they also talk a lot about this and they talk somewhat about this, but these little bubbles, they're talking about this less even though it's represented. So you find the entire environment of what people research when you start from one topic. You can jump from topic to topic to find related researches. If you find one person that's publishing things, so start with one person. These people here, they collaborate, they're co-authors of papers and they've also published with this person. They don't work with that group at the top. So this is called a co-author network graph. You start with one person who's publishing in one field and find all the other people who are publishing in that field and what is their relationship? I've got the LGBT flag up there because this indicates a diversity issue. Because we have demographic data in wiki data, we're also able to do things like say, what is the gender of the different researchers? At this very conference, we have people talking about things like wiki project women scientists and as people tag women scientists, they tag all kinds of gender. And when you do this, one of the trends that you'll be able to see, I won't talk about this much now, is that there's disparities in all kinds of ways that certain demographics of people research one topic and they could very well research any topic but research is often very gendered, it's very regional. There's all kinds of other demographic characteristics that suddenly jump out because we're indexing all these people in wiki data and it makes for very interesting social research and commentary if you look at these kinds of things. Publications by year, you can see this is a timeline in these years, this particular topic wasn't being researched and then the topic suddenly becomes popular. So you can see the history of research for a given topic. Citations by year, so when did a topic suddenly become very well cited and discussed by other people? So it's another perspective of the popularity of research topics by years. Who are the authors who are citing particular papers? So you can go through the citation graph and find which papers are more popular and who's citing these kinds of things. What journals should a person read if they want information on a particular topic? You can get this from Skolia, this is all in wiki data as well. Why does all this matter? Why should anyone care about this? This is a bunch of librarian work. Why are there so many hundreds of people who are engaged in developing this content? I'll give you my personal opinion and I think you would hear it if you went into any of these wiki projects and talked to some of these other people. We're doing this to save the world. The general problem that we're trying to solve is that there is so much good information out there, so much valuable knowledge, but it's inaccessible. If you go to Google Scholar, you're not going to be able to go through those thousand papers quickly enough. If you go to your university library and ask a librarian to help you, I'm sorry your librarian hasn't read the thousand papers that you're trying to browse through. You need automated tools to support you for this. And Wikipedia, Skolia, wiki data, wiki site, legitimately has solutions to help you sort out thousands of papers and find the published knowledge that you need to do the research that you want. We have a crisis that this is currently inaccessible. We have an opportunity that we really can address this challenge through the wikimedia platform. Skolia, it's in the wikimedia platform. We're talking about only using free and open data in wiki data. This is not something that you need special tools to access. You access this information through a website. You type your search terms in on the website. You get a result on the website. You can do this in your desktop computer. You can do it on your phone. And we let anybody participate in this. It's wiki, anyone can edit. Some of the bias problems that you will see in the library system or in Google, we're able to have ethical conversations in the wiki platform about how to be more diverse, how to represent more universities, how to represent more people around the world and represent more languages. You cannot have a conversation with Google if you want to complain with them. You can only have conversations in wiki. There's no one at Google. There is no customer support service. You take it or leave it with Google. We're providing a useful service. This is an orientation to academic literature. There's different ways that this integrates with wikimedia so that we can confirm that when people are researching topics in wikimedia, they have the information that they would need. I'd like to do a demo to compare Skolia with Google Scholar, but our problems begin here. Google asserts a copyright over their websites. This, we're presenting online. This is a wikimedia conference. We want this to be free and open copyright. And so I can't even show you screenshots of Google because it's copyrighted content. So that's one of the ways they were already superior to some of these other services. More comparison to Google with wiki data if you want for whatever reason to export your citations to somewhere else. We have an export button. Of course, it's free content. If you're allowed to move data into wikimedia, you're allowed to move content out. With Google, no, you can't move any of their data out. Google famously, they want you to go into their platform and stay in their platform. They don't have an export button. If you see a problem in Google, you can't fix it. They don't have an Anyone Can Edit button. And there are, in fact, a lot of problems with Google search results. There's limits to how many queries you can do. Wiki's open data, you can create as much as you like. And then this issue about not being able to openly license and discuss things is problematic as well. I've been mentioning Google, but there's also commercial products that you can buy. Some of the more popular ones is by a company called Clarivate Analytics. They make something called Web of Science. There's a company called Elsevier. They make a product called Scopus. Can we look at those? We have competitive advantages over Google. What can we do with other ones? Well, more problems. I cannot show you screenshots of the other ones. They assert copyright as well. Elsevier, this company is centuries old. It's run in the Netherlands. I don't have anything against the king of the Netherlands. This is Bill Alex. But there's centuries of restricted access to this. Old institutions that need to be updated. And when you look to the reasons why people can't access these things, it has to do with old institutions and tradition that I think very much need to be updated. Comparing to Scopus and Elsevier, the export button, it lets you export, but you cannot re-share the content. They want you to curate. It's got an edit button. Anyone can edit this. But when you edit, you're making a donation to their commercial company. And they will not let you reuse the content in WikiData or anything else. These companies make huge profit margins. It's a commercial enterprise. The shareholders come first. It's not for the community first. I think that Wikipedia can beat out and make better products than these commercial companies. What is Wikipedia's secret strength? I'm talking about us going head to head with companies that have a billion dollars. How can we possibly compete? And I'll tell you, our secret is that we design for the end user, whereas the other platforms are designing for to get money to the shareholders. Since we're doing it for the community, we have an advantage. Something else that we can do that the others can't do is that we can design products that have things like export buttons, make data open. When the other companies, they're restricted from doing these kinds of things. And I know they take our data out of WikiData because we're being generous and they're not. But I still think we're doing the right thing. And I think that we can make superior products to them. Now I'm gonna look at some other medical applications. I've been telling you so many things about the social issues of this, but what can you actually do medically with this Scolia tool? So one, profile any medical condition. This is pretty basic. So if you wanna see an example, you can go to Scolia and type in Zika virus infection. This was a big infectious disease outbreak a few years ago. It was discussed so many different ways in Wikipedia. You can see other related things. There's a Wiki project for Zika. COVID was very similar. You can search COVID topics. It's very well-profiled inside Scolia. And there's communities behind these things for the disease aspect, the humanitarian aspects, mapping aspects. Let's look at some of the Scolia profiles related to Zika. So you can get a list of the papers that are most recently published on the topic of this particular disease. Search for any disease. This is just an example. When someone's talking about this particular virus, what else do they talk about? So some of the things they talk about are pregnancy. They talk about birth defects. So this isn't just an infectious disease that affects adults. This is also affecting mothers carrying children. So you immediately see that by just getting the profile out of this search engine. When somebody's talking about one thing of this disease, what else are they talking about? I see sexual transmission up here, genetic effects, leaving the medical side and going into kind of the social side. What was the epidemic like? When people talk about one thing, they talk about the other thing. Very strange disease. This is a timeline here. It was first talked about in the 1950s, but it didn't really matter until suddenly it became an epidemic. And Scolia was waiting there. Wikidata was waiting there, curating this information, ready to go when an epidemic breaks out. If anyone mentions a location in these academic papers, you can find out what locations did this particular disease appear. You can see it was global. It appeared all over the place. Nice visualization you get from the Wiki platform. And where are the authors? So this has nothing to do with where the disease are. This is where the researchers are if they're researching this disease. You can get insight into where do the people live if they're researching a particular thing. So that's an example. Have fun with it. Go to Scolia and check it out for yourself. Now another example. How do you profile medical research at a university? I'm gonna give an example from a university in the United States. This isn't my university. This is a place called Vanderbilt University. Something unusual about this university is they tried to upload many of their papers into Wikidata and they tried to index all their faculty, all their scientists, all their professors in Wikidata so that they would show up in this kind of search. So we've got about 5,000 professors here that are affiliated with this university. It's actually not all of them, but they have a go at indexing a lot of them. You can get a list of them there. What kind of awards did these people get? You can track to see. This is good for the reputation of the university. If there's anything to acknowledge, it's all listed right there. Something else unusual. I'm not gonna go into the social implications of this too much, but again, we have projects in the Wikidata community that are applying gender labels to different people. And you can go to a university. You can go to a university department and say, what is the gender breakdown of people at this university? And again, you will see surprising things. For whatever reason, there's certain topics. They get more discussion and more research by certain kinds of researchers than others. Again, it's just interesting to do, and I expect to see a lot more research about this in the future. Clinical trials. So now we're jumping out of what's conventional in Wikipedia, going into things that are more related to Wikidata. This issue of clinical trials, which is medical research in humans. I don't have all the clinical trials for everywhere in the world, but we do have 99% of the clinical trials that are indexed in the United States by the US government. So this is a governmental index. If you want to see more about this, there's something called Wikiproject Clinical Trials with so many different ways to browse this. You can check out the Wikiproject. You can do these searches in Skolia as well. And some of the things that you can do are for a particular clinical trial, try to find the papers that are associated with it. Or if somebody has been the principal investigator, the leader of a particular research project in a clinical trial, you can find out what kind of papers they've also published. So there's a relationship between being a researcher and actually publishing papers. We've got an academic paper on this. You can see the discourse delineated, published about these kinds of things. The kind of things that you might want to see is for a given university, what kind of clinical trials have happened at that university? What kind of drugs have anyone tested? Give me the papers about these kinds of things. I'm sorry that I don't have a lot of time to go into detail about this. We've got to run, but some of the requests that we can make, we can P-10, the Wikimedia AI, she's in the background doing all these queries, giving you all this data as a robot. You can say, we can P-10, can you give me all the clinical trials for a particular disease? All the medical research that used a particular drug. All the research that has happened in my city, these are the kinds of things that we can profile. You can see the trials that are studying particular diseases, trials that are sponsored by particular companies. You can do queries to show me, I want to see the research that's only happening in cities with low population, other kinds of weird things that you would only be able to do in Wikidata. Research resources. This is kind of a funny concept. Bear with me, I'll tell you what this is. A research resource is something mentioned by a researcher as they publish their paper that they said they used to do the research. So for example, in an academic paper, there's a methodology section. And in the methodology section, they may say we use this particular software to do research, or they might say we use this data set or we created this data set. Either we're sharing data sets or we're putting data sets out into the world. All of these things are research resources. And buy and buy these hundreds of people who are doing this Wiki site project, one of the things that they're doing is trying to do extract the list of research resources named in every paper so that you can profile and do things like find out how many times a particular piece of research software has been used. And if you do this, you can find out is there very popular software that isn't so very well developed? I'll share that there's a United States Foundation, the Alfred P. Sloan Foundation that funded my university to do text mining of research papers to find software and find the most popular software that is the least recognized. The least funded like a single developer working very hard where lots and lots of scientists are using their software but no one's getting any credit. This actually happens quite a lot. What software do researchers use when researching a topic? You can put in the name of a topic into Skolia and get a list of the software that shows up there. Where we've got incomplete data. I'm not saying it's the best data in the world but it's getting much better every year. Things are moving very quickly. We're entering a different world. I've been talking about medicine. This kind of thing is also happening in other fields for example, I'm a member of this project. This is called Seek Commons. It's an environmental or climate change project and we're going through all kinds of research projects trying to find what data sets they use, what software they use, what hardware they use so that people can share more easily. These values of openness that we've cultivated in the Wiki community they also affect the scientific community greatly if we hadn't done so much discussion of Creative Commons licenses in the past 20 years. People would not be doing the kind of open licensing that they're doing in science that they're doing right now. What tools do researchers use when researching climate change we can promote all this kinds of sharing. I've been talking a lot about Skolia and Wikidata, Wikisite, Wikipedia. We're not alone in this. There's other people doing these same things. They collaborate with Wikipedia to differing extents but I'd like to call out Semantic Scholar if any of you don't know. One of the Microsoft founders he established a foundation called the Allen Institute and this is their equivalent product to Google Scholar or to Skolia. It's Semantic Scholar. It's also nonprofit. It's also open. These people are our friends. The Internet Archive also has many relationships with Wikipedia, archive.org. They've got something called Internet Archive Scholar and Microsoft itself was getting into this game. They decided to get out of this business. Microsoft did not want to compete and they converted all their data into an open data set. It's called OpenAlex. I think that's pretty cool. It's free for anyone. People are putting this into Wikipedia right now. As these things were open, we're also looking at things like copyright licenses, open licenses, you can push buttons. You can say how open is the research at a particular university? Are they doing open data sets, open hardware? What are they disclosing? You can get this kind of information at a glance. Very interesting for funders and other things. There's this concept of fair data. I'm not gonna go into detail about that, but some data, some content is easy for humans to read and difficult for machines to read. Some content, this fair content is very easy for machines to read. It's happening all over the world. So many changes are taking place as we have more artificial intelligence and more machine readable fair data. Big thanks to all the people who contributed to the various projects. I named about 10 projects in this presentation. I'm sorry for going so fast, but so many people to thank. Actually, thousands of Wikipedians contributed to all this. The Wikisite Project is a very big community. Peace, love and wiki to all of you. If you have any questions, contact me on wiki. I'll talk to you on wiki or you can talk to me by video chat. Thank you very much. I've got a minute for questions. Any questions? Yes, please, you, sir. Hi, I'm Juan from Argentina. I'm actually the Wikimedia and in residence at the National University of La Plata. We work with Scolia almost exclusively. And as you said, you know, Scolia is built upon the Wikidata query service. And a problem we usually encounter is that when we have, you know, thousands of scholars in the National University of La Plata item, sometimes the query gets a little bit, you know, overflow and it doesn't show the results. Is there a way to fix that or to avoid that? Because it's a really powerful tool and we love to use it. No, I'm so sorry. So this is not a Scolia problem. This is a Wikidata problem. I don't have time to explain this and I couldn't explain it anyway. There's other people at this Wikimedia who can explain this, but one way to explain this is Wikidata is full. Wikidata has too much content that's not quite accurate. But in any case, when you ask something like give me the list of all the papers at a university, even if we have this content in Wikidata, the kind of querying that is available in Wikidata, it breaks. It says, you have too much data, we can't do it, it times out. I would like to think that the database architecture of Wikidata is going to improve. I actually would have predicted that it would improve like five or eight years ago and I don't have the technical knowledge to know when Wikidata is going to be big enough and have the capacity for all the kinds of queries. But I have faith, I hope it's not misplaced faith. I'd like to think that very soon you will be able to query and get the list of all researchers at your university and all their papers without it timing out. I'm sorry, we're over time. If anyone has questions, please contact me. Thank you.