 Hello everyone. My name is Miriam from the Wikimedia Research Team and together with Caroline Marek today I will be telling you about our project called the Knowledge Gap Index and how you can make your own. Before I delve into this project I'd like to start with a principle that most of us know very well as this is one of the foundation or principle of our movement strategy. Knowledge equity is that principle that is encouraging us to include into Wikimedia projects those knowledge and communities that have been left out by structures of power and privilege and include and break down the social political and technical barriers preventing people from accessing free knowledge. And so while knowledge equity is our end goal, there exist inequalities in Wikimedia projects that prevent us from advancing knowledge equity. Those inequalities are what we call knowledge gaps and our goal with the Knowledge Gap Index is to be able to identify measure and visualize those inequalities and making sure that everyone can see the extent to which knowledge gaps are present in Wikimedia projects. Knowledge Gap Index is a multi-year research project. Our journey started a few years ago where we started research on identifying systematic definition of knowledge gaps. We are continuing this journey by measuring this gap and mapping gaps into metrics that everyone can look at to see the extent of knowledge gaps. And finally, we are looking at ways to visualize gap and surface this measurement into a very easy to understand format. Let me take you through this research journey. We started in 2020 by releasing the taxonomy of knowledge gaps that is a structured list of all possible inequalities that we can find in Wikimedia projects. This is the result of a few years of research looking at many references, sources, surveys in community initiatives in looking for evidence of inequalities in Wikimedia projects. And so the result is this structured list of knowledge gaps across content readers and contributors. And just to zoom in into the taxonomy, the inequalities that we can find in Wikimedia projects can be very different. The gender gap is one of the selling most famous inequalities, but we have more potential inequalities, geographic inequalities, multimedia, age, structured data, and so on. And so once we had a systematic definition of what inequalities we can find in Wikimedia, the next step was to quantify those knowledge gaps. And so the idea here is that this is still an ongoing process is to map each of the gaps in our taxonomy into metric, into a few numbers that can tell us the extent or the presence of these knowledge gaps in Wikimedia projects. And by doing this process by generating data around knowledge gaps, we can produce data that can give us some interesting insights about the state of a knowledge gap. Let me give you some examples from the gender gap. One of the measurements, one of the way in which we can measure the gender gap is by computing the number of articles that, or the number of biographies for each gender. So what we see here is that the vast majority of articles are about men, but thanks to the way in which we can, we map the biography data to genders, we can actually see the distribution across all the different gender diverse categories. Another way in which we can measure the gender gap is not only to look at the quantity of articles, but also by looking at the quality of articles from different genders. And so what you see in this plot is the evolution of the average quality of articles about men, women and gender minorities over time. And what we see is that the average article about the average quality of articles about women, especially in the past few years, is getting higher quality than articles about men. And what we also see here is that the average article, average quality of articles for gender minority is increasing at a very fast pace, and it is now much, much higher than articles about men or women. And this is really speaking about the efforts, the organized efforts around increasing quality and quantity of articles about gender minorities. Another way in which we can look at the gender gap is by looking at the visibility of articles. And here what we see is over time the average number of page views that an article about a man or a woman or a gender minority person gets over time. What we see here that an average article about a gender minority get way more page views than other genders. And finally, why most of this data that we saw until now was aggregated across all Wikipedia's, thanks to the way in which we compute this data, we can actually compare the same measurements across different language editions of Wikipedia. For example, what you see here is the average quality of articles for different genders for Chinese, English, Malay and Tamil Wikipedia's. And these are the languages that are mostly spoken in Singapore, so that is why we chose them. What you see is that these numbers vary from wiki to wiki, so this can also be an interesting perspective that you want to explore in this data. And this and much more data you can find in a recent data set release that we put out a few months ago. So these knowledge gaps metrics data set are now available for you to look. And they contain a bunch of data about knowledge gaps. So they contain metrics for five knowledge gaps, gender as we just saw, but also geography, sexual orientation, time and multimedia. And for each of these gaps, you will find data about the overall article quality quantity by different genders, for example, the quality of articles, page views and revisions. This data is available at the link below and we will also make sure that the link is included into a meta page that will be attached to the comments page for this video. And so the knowledge gaps data is available now to the public and its format is like a gigantic table. And so for us and for some of us, this might be the ideal format through which we want to explore this data because we just plug it into our way of visualizing data, analyzing, manipulate data through code and other tools. However, in order to make this data more Explorable and browsable by a non-data savvy people, we are working on tools and systems to visualize knowledge gaps in easy ways. One of such tools is being actually deployed as I speak. And again, I will be make sure that all the documentation is included in the meta page attached to this video. And the first way in which we want to make this data more available is through an API. It's an API that allows us to easily query this data, just a portion of the data that we need for our analysis, rather than downloaded a gigantic table. And so this API has a road to URL that is the same for everyone. And then you can specify three different parameters. The first one is the type of gap. So the type of gap that you want to get the data for, for example, gender, geography or sexual orientation. Second parameter is the specific category that you're interested into. For example, in case of the gender gap, this can be female male non-binary, et cetera. And the third parameter is the period, the slice of time of history that you actually need for your analysis. And so instead of maybe getting the data since the beginning of time, you want to specify a specific month or a specific year. And this parameter allows you to specify the beginning and the end of the period that you're interested into. Another way that we have provided you to explore this data is by putting together notebooks that can, it's pre-populated code that allows you to visualize different aspects of knowledge gaps in a relatively intuitive form. And so Caroline has worked on these beautiful notebooks and she's prepared a video for us where she explores with us these notebooks so that you can make them your own. So I'm going to play this now. This notebook provides you with our code for reading, wrangling and visualizing public knowledge gaps data to help you and your colleagues answer questions that you might have about gender gaps in Wikipedia content. So after providing you with some basic setup code as well as code for loading and processing our public gender gaps data, we provide you with code to help answer some questions. To start with, how can I see the current number of articles for males, females and gender minorities across all Wikipedias? Here we provide you with the code to see those numbers and percentages in both table format and pie chart format. But these are the current numbers and percentages. What if I want to see that data across time in this code below? We show you how to wrangle and plot cumulative articles created over time for these three gender categories across all Wikipedia. But what if instead of all Wikipedias, you're interested in one Wikipedia edition? Scroll down and you will find the code to generate a plot that shows the same data as above, except for a Malay Wikipedia, one of the languages spoken in Singapore. Here in the code, we've indicated where you can replace MSWiki, which is Malay Wikipedia, with another Wikipedia edition of your choice. And then you'll be able to generate the plot for that Wikipedia. But what if you're interested in multiple Wikipedia editions? Well, scroll down and we've provided you with the code for generating a plot that compares four different Wikipedia editions. Here we compare Chinese, English, Malay and Tamil Wikipedias, four languages spoken in Singapore. But we've indicated up here in the code where you can replace ENWiki, MSWiki, TAWiki and or ZHWiki with the Wikipedia editions of your choice. I hope you enjoy exploring the rest of this notebook as well as additional notebooks that we'll be providing in the future and the rest of your time at Wikimania. So again, the link to this notebook will be provided and included into my page attached to this video. And I want to thank Caroline for putting together these notebooks again. And this is everything we had for you today. I hope you enjoyed our journey towards knowledge gaps and the knowledge gap index and that you have some tools to make this data and this work your own. We are very much looking forward to hearing your feedback about any of that. And again, the link for feedback will be included into the Macapage attached to this video. Thank you all for listening and enjoy Wikimania.