 Good morning everyone and welcome to the December edition of our research showcase. This is Dario from the Wikimedia Foundation research team and today I'm joined by Leila Zia, senior research scientist in the team and by a number of co-authors who are here in the Hangout today to talk about work that we've been doing at the Foundation in collaboration with researchers at APFL and Stanford University on using recommender systems to fill knowledge gaps in Wikipedia. I'm not going to spend too much time giving introduction. Leila's going to go over this. Just a few service announcements. As usual, we have our IRC channel. Jonathan Morgan is going to be hosting on IRC. So if you have questions, please raise them to Jonathan who will ask them at the end of the presentation. And today we only have one presentation. So we'll have ample time for Q&A at the end of the session and stick around and enjoy the show. And with that Leila, let's say it is yours. Thank you Dario. Hi everyone. My name is Leila Zia. I'll be talking about recommendation systems and knowledge gaps in Wikipedia today. This is a joint work with a long list of people, but specifically in research. It's with Bob West at APFL, Tiziano Picardi, Diego Saez at the research team at Wikimedia Foundation and Michele Katasta and Eurydeskovitz at Stanford. It's been a while since we have talked about recommendation systems and knowledge gaps in Wikipedia. I figured it's a good time to step back now that we have the full hour and talk a little bit about how we have started and where we are today and where we want to head next. I think of this presentation as usual as a kind of presentation that hopefully it will engage more conversation and it will create more questions than it answers. So with this, let's rewind. Let's go back to 2015. Back then, we started looking at these maps of Wikipedia. So basically, this is the map of the world and every point on the map is associated with a geo-coordinate associated with that map, with that point on the map. If the point is white, that means there are 10 or more Wikipedia articles associated with that geo-coordinate in Wikipedia, in English Wikipedia in this case. If the point is orange, there is one. And if it's orange, it's fewer. I can't remember how many. It's logarithmic scale. If it's blue, there is one. And if it's black, it means there is no article about that geo-coordinate on Wikipedia, at least as much as Wikipedia knows it. What you see here is that if you speak English only as the language that you speak, there are some places in the world that you can learn about if you read Wikipedia. Specifically, good parts of US and western part of Europe, some places around Iran for some reason, and then sporadic dots all across the world. Now, as you go to other languages, you see that the number and intensity of dots drop significantly, which means people in these languages have basically less access to content. If Russian is your native language and you want to learn about the world using Wikipedia in Russian, this is how much you can learn about the world. If you speak Spanish, you basically can learn about Spain, but the whole South America is very sporadically covered, not fully covered at all. And if you look at Portuguese, which is a language you expect to see more of in South America, definitely that doesn't happen there now. You can say that there are good chunks of South America where you have Brazil and there are basically the Amazon jungles, which are not heavily populated, but even in very populated areas, you see that the coverage is not very good. You basically have a best one article for most of the points. If you speak Arabic, the world becomes much darker. If this is your only language, this is how much you can learn. And interestingly, if you overlay all languages that we have a Wikipedia language edition for, this problem doesn't go away. So basically, if you speak all the 300 plus languages for which we have a Wikipedia language edition, this is how much you can learn from Wikipedia if you want to learn about the world. Now, by looking at these plots, you can gain a few insights. One is that we have a lot of missing articles across languages. The second thing is that not all of these missing articles are available in at least one Wikipedia language. As you saw in the overlay of all languages, there are still a lot of points that are dark or don't have so many white or orange points associated with them. This means that Wikipedia alone won't be the solution for all knowledge gaps. We need to look outside of Wikipedia to try to close the gaps that we have in Wikipedia on top of trying to look across languages and try to import content from one language to another. So our goal in general, the goal we set in 2015, which is to date relevant, is that we want to increase article coverage in terms of the number of articles in different languages and the contents of these articles within a language by identifying and prioritizing missing content and routing attention where it's needed. So we started a work back then, growing Wikipedia across languages by article creation recommendation. This was a joint work with Ellery Vulkshin, Robert West and Eurideskovitz. What we did back then is that we defined a system. This is what the system would do. The system would find articles that are missing in a given language, let's say French Wikipedia, and then it would prioritize them with some notion of importance. The notion of importance that we defined in 2015 was that we built a prediction model that would predict how many page views the article would receive if the article gets created in French Wikipedia as an example of a language. So we find missing articles in a language. We rank them according to some notion of importance. And then what we do is that we recommend these articles to editors of that language, of that Wikipedia language. We did an experiment back in 2015 where we chose the top 300,000 articles that were missing in French Wikipedia. We divided these articles into three sets of 100,000 articles each. And then we looked at the 12,000 editors who had edited at least once in the past six months prior to the experiments on French Wikipedia. And what we did is that we used the first 100,000 articles to do personalized recommendations to the first batch of editors. To the second batch, we did random but important article recommendations. And we kept the last batch of 100,000 articles untouched. The goal of doing this would be to be able to say that if you don't touch the articles, if you don't recommend them, how much organic growth will happen in Wikipedia on these articles and then how much would be the impact of personalized recommendations or random recommendations. What we learned back then is that you can increase the growth in article creation rate by a factor of 3.2 if you do personalized recommendations versus doing nothing. And we also learned that personalization actually matters. If you do personalized recommendations versus recommendations that are from important articles but are not adapted to the user's interest, you will get basically better growth. The activation rate here is basically captured by the fact that when you receive the recommendation, whether you actually act on it or not. Whether you start editing the article or not. And we see twice more growth in those who had received personalized recommendations. On the ground, basically the result of the experiment meant that we started a gap finder project. So this is a tool that you can go to. You specify a source language and a destination language. And then you can there basically search for the content articles that are missing in one language that are available in another language. And then you can translate or create from scratch. We also built an API and that API is being used in content translation suggestions feature. And this API is perhaps used by other people who are building tools. This is not so much something that we know much details about today. So here's how a gap finder would work. You choose your language and say English and the destination language Persian. The only reason that I'm showing you English here is that I want you to be able to read the recommendations. Obviously you can do it from Persian to English and in any direction in both directions for other languages. You specify what you're interested in, in my case, agriculture. And what I see is the articles that are available in English but missing in Persian with some basically incentives for creating these articles. So these are the page views that the article has in English Wikipedia. Once you click on one of the recommendations, you basically see the full article in English Wikipedia and you can decide whether you want to create the article from scratch, translate it or flag the article. Whether it's not interesting for you or it's not notable in your language. You can do this obviously for other languages as I mentioned. So for example, this is an example from English to Malagasy. You type climate change if you're interested in that topic and you can see the list of articles that don't exist in Malagasy but they exist in English and you can translate or create from scratch in that language. So now what we learned in 2015 was a few things. One is that recommendations work and personalized recommendations work. But from 2015 to now, basically starting 2016, we went through what I call a series of revelations. First, we learned about stubs. Not that we didn't know about them but we started hearing more and more about the issues around stubs. We were reminded by the editors that there are many articles already in Wikipedia that are stubs and there are many comments about this topic all across the different lists that we are part of. I'm listing one of the examples here, Stuart says on Weekly Research L, we have enough low quality sub articles that need human effort to improve and we are not really interested in more unless either they demonstrably combat some of the systematic biases we are struggling with or they demonstrably attract new cohorts of users that can do the improvements. We also learned about onboarding newcomers and the challenges of onboarding newcomers. We learned about initiatives such as the Africa de-stubbathon project with the goal of de-stubbing the articles about Africa in different languages. We had user interviews basically with people who organized these kind of de-stubbathons in Africa. What we learned through these user interviews is that a lot of manual work is going into building templates. As somebody who organizes an editathon in a large scale, basically you're embarking on a project where you want to engage people who can easily, by reading a set of steps, learn how to edit Wikipedia, basically you're challenged to create a set of templates, a set of step-by-step instructions that can help the user understand how to edit Wikipedia. Specifically in the case of subs, one of the biggest challenges was for the editathon organizers to basically extract the information that a newbie needs to know on how to expand, for example, an article on biographies. For example, if this is your first time coming to Wikipedia and you want to expand an article on biography of Person X, you need to know how to do this and there are some basic things you need to know. For example, what sections should go to a biography article? What should be the length of a section? What kind of other articles should the section link to? What kind of images do you usually put there? These kind of basic information that experienced editors have learned over time are completely unknown to newcomers and this requires creating these templates from scratch requires a lot of manual work from editathon organizers. I think this was something that we empathize with very heavily at the personal level, especially as you're going to regions where volunteer resources are extremely limited. They're always limited, but there are regions where we have very much like clear lack of resources. If you have one volunteer whose time is spent on creating templates from scratch, going across different languages, this is like a big challenge for the work that this person does for the movement. We also looked at existing initiatives. There are initiatives such as Wikidaheim or Macamoon in French Wikipedia where these initiatives try to basically do the work of editathon organizers in a more systematic way. What they do is that they focus on the specific types of articles and then for these types, they extract templates of sections and characteristics of these sections. They hard-code these in the back end, basically. As the user, when you go to the websites for these projects, what you see is that if you want to expand an article about a city in France, you will get recommendations on the right-hand side that tells you these sections are missing or these sections are incomplete. It also gives you some more contextual information. Sometimes, for example, on average, how many characters or bytes are available or how many characters or bytes are associated with such a section in similar articles in French Wikipedia. What is nice about these initiatives is that there is a confirmation that there is a need for providing step-by-step guidelines for newcomers or less experienced editors to learn how to expand already existing articles. It also provides an opportunity for us to learn how we can surface the work that we do if we do research in this space. We also lastly looked at numbers. This is where we learned that in English Wikipedia, 37% of the articles have a stop template. I'm talking about the template because there are always discussions about whether the template has not been removed or has been removed. Obviously, this number doesn't tell you the exact count of stops. But it gives you a sense of the rough numbers that we're talking about. Out of the more than 5 million articles, 37% of them are likely to be too short to be of encyclopedic value. Only 1% of English Wikipedia articles have been labeled as good or better. And 80% of the sections created in English Wikipedia are used only in one article. We have a potential because every month 14,000 accounts get created across Wikipedia projects. And the other thing that we did when we looked at the numbers is that we looked at plots like this. So what you see here on the x-axis is the section counts. So you see section count from 0, 1, 2 all the way to 14. And on the y-axis, you basically see the normalized frequency of articles that have such section counts. The orange plot, basically this one that I'm walking with the mouse on, this is plotted for all articles. This means that more than 20% of all articles on English Wikipedia have only one section. Now if you look at the blue plot, the one that I'm walking on with the mouse now, what you see is that pretty much 0% of them have only one section. In fact, close to 20% of them have six sections. So this is telling us that for high quality articles on English Wikipedia, looking at sections would be one of the right places to start from. We know that articles in general have very few sections, but the high quality articles have more sections. So that was a signal for us that maybe we can start by looking at sections and figuring out how to do section recommendations. So we had a few takeaways here. One is that we have a lot of missing content in Wikipedia articles. And the other one is that edit on organizers are spending a lot of time for manually extracting templates for onboarding newcomers. There are initiatives that have identified these needs and are working. But the problem for these initiatives is that they cannot scale. Basically the problem is that they can only focus on specific topics because manually creating templates for many topics is not scalable. So we started imagining and this phase of exploration and imagining was heavily, we had heavy interactions with one of our designers who has been with us on this gaps project for many years now. We said, okay, let's step back and imagine. Let's imagine that we have a system that breaks down the article structure into different pieces. For example, you can imagine that you can break down a Wikipedia article to lead paragraph, info box, sections, images, links, references, etc. Suppose that the system can provide you with recommendations on what's missing from an article and it can tell you that this missing content should be added to it. Also imagine that the system can provide you more in-depth information about the thing that is missing. And this is by the way a key factor for all the recommendation systems that we're working on these days. Basically we need to know why a recommendation is given and what kind of other in-depth information can be provided to the editor so that the editor can make the right choice. The in-depth information, one example of it, for example, can be that you can say early life section in biographies on English Wikipedia usually has on average N characters. So we decided to start with sections and building a section recommendation system. Before I move on to some details about this section recommendation system, I want to wait a little bit and maybe take a few questions if something is unclear because I've been basically moving quite quickly on this part. John, is there anything from the room? Nothing from my RC at the moment. Okay, I'll move forward then. Okay, so this is a joint work with Tiziano Picardini, Michele Katasta, Diego Saez, and Robert West. So here is where the problem starts. Suppose you have an article, this is English Wikipedia article about a city, Kurdish city in Iran, which is called Sanandaj. You look at the article and you see that there are four sections in this article. Now two of the sections are already kind of the infamous sections, references and external links that pretty much appear in all articles. So basically the article has two sections, society and famous people connected to Sanandaj. Now suppose I'm an editor and I want to expand the list of sections for Sanandaj and start working on it. If English Wikipedia is the only language that I speak, what I'll do is that I'll start looking for other articles that are similar to Sanandaj. In this case, probably I will look at the capital of Iran, Tehran. I know that's a city, so I will go and look at the list of sections and I say, okay, the capital has these sections, so probably another city in Iran should have at least a subset of these sections. This is what we call the intra-language information extraction. So the only thing I know is in English is English and I look at English Wikipedia articles to extract information about the section structure and the section content in an article with fewer sections. The other thing I can do is that if I speak more than one language, in this case I speak also Persian, I can go to Persian Wikipedia and I can look at the article on Persian Wikipedia. And what I learned is that these are the sections that are used in Persian Wikipedia. And in this case I actually have cross-checked them with English Wikipedia and I know that not all of these sections are the same. So I can also extract information from other languages when I'm creating an article on English Wikipedia. This is what we call the inter-language information extraction. For most of this presentation, I'll focus on the inter-language recommendations. The inter-language recommendations is something that we have recently started with Diego working on and we can discuss them more in the discussion and Q&A part. So here's the problem statement. Given an article A in a given language L, we want to recommend the list of sections to be added to the article, considering similar articles to article A in the same language, language L. You see on the right-hand side the design recommendation by POW. If you have such recommendations, you can basically list them potentially on the right-hand side of the article for a left-to-right language on visual editor and you can empower the user to basically edit the article by adding new sections to the article. But the problem is that we don't know how exactly to define similarity. There are at least two ways for doing this. Looking at an article on English Wikipedia, there is the article texture that we can use, but there is also the categories that we can use, which usually show up in the bottom of the article on English Wikipedia. This is exactly what we're going to do. In the next two slides, I'm basically going to show you nine months worth of work in three minutes. So the first set of approaches that we tried is article-based recommendations. Here we have tried two different approaches. One is that suppose you start, sorry, let me step back. For all these systems, we basically have an input, which is the article. In this case, for example, Stanford, California, think about it as basically the city, not the university. So this is the input article. And at the output, what we want to receive is a list of sections to be added to the input article. So what we want to see in the output is a list of sections that are missing from Stanford, California, and the recommender system should tell us to add to the article. So one way to do this is to look at the article text. So we look at the text or content of the article, and we build a topical model for each article using the text, the full text of the article. This way, you will get basically a matching or a mapping from each article to, let's say, 200 topics that would be provided by an LDA model. And then what we do is that through a series of steps, we count how many sections are available, how many times a section has occurred in a given article, and then we normalize those by the LDA weights of the article belonging to a topic. And through this topic modeling, what we do is that we create a list of recommended sections for this article. Another approach that we can try is that we can do matrix factorization. So here we can say that, let's forget about the text of the article, because some articles, although their text is similar to each other, they're not actually similar in terms of structure to each other. So what we can do is that we can ignore the text of the article, and instead we can focus on the sections of the article. So here what you see in this matrix on the right-hand side is that you see the list of articles in English Wikipedia. So assume that basically you have 5 million plus rows in this matrix, and you have all the sections in English Wikipedia. So this is more than a million sections. And this matrix basically tells you whether section X or section J belongs to article I or not. So you create this matrix, and then you do matrix factorization, and you use collaborative filtering to get basically a set of recommendations for sections to be added. The gist of how this approach works is that generally collaborative filtering works in the context in which you have a user and you want to see what products to recommend to the user to buy. What you do is that you're going to find other users that are similar to this user. You look at what they have bought, and if many of them have bought a specific thing, you will recommend this to your user who is similar to these other users. In this case, basically user corresponds to an article, and this product corresponds to a section. We have an article, and we want to see what are the other articles that are similar to this article, and what sections do these articles include? Let's include those in article X that we are focusing on as well. There are advantages for using collaborative filtering, and that's one of the reasons that we wanted to use it and see whether it works well in our context or not. One of them is that it kind of respects the serendipity of Wikipedia better than some of the other approaches in the sense that it doesn't focus on the popularity measures of how many times a section has occurred only, but it also considers other factors, and that can help us basically find sections that are not very frequent but should still exist in our recommendations. These are two approaches, and I'll come back to the evaluation of the approaches after I tell you the next one. We also tried four other approaches using the category network in Wikipedia. In total, we have tried around seven different approaches to see which one works better. Here, we basically ignore the text of the article, and what we do instead is that we focus on the category network of Wikipedia. We look at Stanford, California, and then we look at the list of categories that this article belongs to. The first thing that we do is that basically the idea here is that we want to use the category network of Wikipedia to find similar articles. The idea would be that if all articles that belong to a category should be similar to each other in some sense. The problem, while in theory this is correct, specifically for Wikipedia category network, this is not correct unless you prune the category network. The reason is that you can have categories that many things belong to. These may not be, first of all, hierarchical categories, but they also may not be textonomical, even subsets of them. We need to prune the category network to make that into a textonomical category network. Now, I am almost proud to say that we have failed in the general task of pruning the category network for Wikipedia. However, we have developed a methodology for pruning the network specifically for the task of section recommendations, which we can cover in the following research showcases. It needs a little bit more time to dig deeper in it and see how we have done it. But anyway, we pruned the category network and the idea here is that if a category is not pure, we basically throw out that category. And if it is pure, we include that and we look at all articles that belong in that category. Now, one of the easiest things you can do is that you know Stanford, California belongs, let's say, to university towns in the United States. That's category three. So you look at this category three and you look at all articles that belong to this category three, and then you look at the sections in these all these articles. And then what you see is that, for example, 79 times education has occurred as a section in these articles, or 19 times history has occurred as a section in these articles. So you basically can learn how many times each section has occurred, and you can also learn about the core currents of sections within these categories. Now, once you do this count base, you count the sections in these categories, you can basically directly create a ranking of these categories and recommend them. That's one approach. Basically do nothing else other than counting. The other approach would be, again, to use collaborative filtering in this space and see what our collaborative filtering can improve the results. We can also do collaborative filtering and then do learn to rank, and that's another approach that we may be able to improve the results with. And lastly, what we can do is that we can use the count based approach and do learn to rank and then see what the results will look like. One thing I want to mention here is that learn to rank in general is useful in the sense that we know many articles belong to multiple categories. So for example, let's say Stanford, California in this case belongs to two categories, C1 and C3, two pure categories. What learn to rank allows you to do is to merge the recommendations by these two categories instead of focusing only on one of them. And we want to see if by merging the results, we can get better recommendations than by just focusing on one of the two categories. Intuitively, you expect that if you basically combine information, you can reduce the noise and then you can do better. If the way you have cured the category network has done a good job. Okay, I'll talk briefly about evaluation. So there are seven approaches and we have done two types of evaluation. One of them is automatic evaluation and the other one is human evaluation. In the automatic evaluation, what we have done is that we have created a set of test articles and then on those we have computed precision and recall. So that's automatic. And the other thing is human evaluation where we have done crowdsourcing for gathering labels on whether a recommended section is a good section for an article and we did this both with experienced editors and also using a crowdsourcing platform. I'm not going to show you all the evaluations for all the different seven approaches. They are on the meta page that was linked earlier and will be linked at the end of this presentation. So what you see here is basically this blue line here that I'm walking on with the mouse. That's the precision for the count-based approach using the category network on Wikipedia. And this is the recall, this other blue one. You see the green one, which is basically slightly over it, is what we gain by using learn to rank on top of just using the raw counts of section occurrences in the category network. And you also see basically the upper bound on recall and precision with this red plots. What you see here is that we are doing a reasonable job in terms of precision recall in the blue plots, but generally they're kind of low. This is not the kind of thing that you want to put in front of a user. Now, the issue that we suspect can be the reason for this relatively low precision and recall is that we are doing exact matching of section titles to the already existing sections in a Wikipedia article. And the problem is there are two problems. Sometimes there are typos in sections. And if it's not an exact match, our algorithm basically considers that as a bad recommendation. But there are also sometimes semantically related words that are being used interchangeably, for example, a movie or a film. These both can be sections and our algorithm at this point will differentiate between these two while a human will understand that these two are pretty much the same thing. So we expected that if we do human evaluation, we get better results. And that's in fact what we see here. So what you see on the right hand side is that there are two types of evaluations. The editor evaluation, which is this is experience editor evaluation, which is kind of the lower blue scion plots that you see. And then you see the purple one, which is crowd-based evaluation, which is higher. So a few things to note here. One is that obviously a crowd of non-recommendians is more linear. Let's say with close to 97% here, 98%. Wikimedians are around 90% here. But generally what we see is that outside crowd is more lenient, which is expected. We also see that generally the method is doing really well. So basically when we put this in front of the comedians, the recommendations seem to be reasonable. If we do only one section recommendation, the precision and recall is around 90%. And then if we do as we increase the number of sections that we recommend, the precision and recall drop. And when we are recommending around 10 sections, that's around a little bit more like 78%, 79%. So we are obviously happy with this result. It's much more than what we expected. And it's also in our context is not really necessary to make the system completely perfect in terms of section recommendations. Because you always will have the human in the loop, whether this human is going to be the person who is going to be coding Macamoon. And there are going to be 1,000 sections recommended to a series of articles that an experienced editor can go to and accept or reject. Or whether the human is a newcomer who can basically evaluate whether the quality of a section is good or not. So that's where we are with evaluation right now. So let me go to conclusion. For this specific second part of the research, we have introduced basically the section recommendation problem. We have stated it as a problem and we have defined what it means. We have explored several methods using features derived from the raw input article. These are the topic based methods as well as Wikipedia's category network. We showed that the category centric count base approach does the best. Of course, if you do learn to rank, you can do a little bit better for a Wikipedia language with the large category network such as English Wikipedia. We also have learned, and this is I think one of the best learnings like for me is that English Wikipedia's category network is key in offering useful recommendations. And it's important to call it out because we generally complain a lot about category networks in Wikipedia. At least like whenever we sit as researchers and we want to work with category network, it's a very hairy network that it's very hard to work with. But this is the first time that we are seeing that if you actually use the category network in smart ways, it has a lot of signal in it that can make recommendations, interpretable recommendations possible, which is key in our environment. Now I'm going to go to discussions and then open it up basically for questions. These are a few things we have been thinking about. So one question is, what about non Ian wiki, our usual number one question. So the reality is that the methods that we have developed are not language independent, we can at any time apply these methods to any language. The caveat is that if your language is very small, or the category network in that language is small, the category based approach will not work. And we have seen that basically the LDA and topic based approaches won't work. So if the language is small, we need to think about something else. We since we expect that the methods work in other languages in order at least in other large languages. The next thing we want to do is that we want to try the method in French Wikipedia. And that's something we're working on right now, basically adapting the methods for French Wikipedia, building recommendations and running it by French Wikipedia editors to evaluate the quality of these recommendations. And if the language is really small, that's the part that we have started working on right now. So the idea here is that if your language is very small, we need to look at other languages to extract information. This is something that actually is being done by editors on organizers. So they basically use other languages that are similar to that language to create templates. So now we want the machine to do this and it's not an easy task. Some of you who are on Wikipedia Research L have seen the emails by us asking about we're basically stuck with a problem of section alignment. We need to be able to align sections across languages to be able to approach this problem. And this is something Diego has been spending quite some time on and would be happy to discuss with you later. We also need to improve the section recommendations. In general, although the result is good, there is room for improvement. So basically one of the easiest things we can do is that we can try to source signal from other languages, the inter-language part. Even for English Wikipedia, if we can source signal from a language like Persian Wikipedia, there's probably some things for us to learn there from a different language in even a big language. The other thing which is kind of a more immediate low-hanging fruit is the semantically related sections. So right now the algorithm is not differentiating between these. And it's not very hard to improve the work of the algorithm, at least in the languages where semantic analysis is being done relatively smoothly to basically combine the sections that are semantically related. Providing more in-depth information to the editor is something that we are not doing right now what we would want to do. Basically, other than saying this is the section that should be added, we want it or that you should consider to add, we want to be able to say what are the characteristics of such a section. This should be an easier step than all the things we have been working on so far. The other thing that we are not talking about right now is the order of the sections. So suppose you're an editor and we tell you here are the sections you should add. How should you know what is the order of these sections? This is not something that we have worked on yet and probably someone should pick this up in the future. It's not really immediate, but at some point we need to be able to address the ordering problem. Including the less frequent and long tail-up sections is something which is important. While we have learned through this research that there are very few unique sections on English Wikipedia. Let's say there are around, if I'm not wrong, around 200,000 sections that are not unique and 1.6 million sections in total. This means you have around 1.4 million sections that have been used only once. This long tail is quite long and many of them should not really be used potentially, but in this long tail there are also a lot of valuable information. The question is what are the valuable sections that are being added by editors that our algorithms won't pick up right now because they are focusing on frequency and we need to be able to pick up. There are different ways this can be approached, but this is definitely one thing that we should come back to at some point. The other thing is the problem of entry point. This is actually something which in this research basically we consider the article as an entry point, but there is no reason to not think about the category as the entry point. This is one of the strengths of the line of research that we did with the category network on Wikipedia. What we learned from the editors and organizers is that sometimes they just want to be able to give a category and get recommendations on what sections to be added to articles in that category versus giving the articles one by one and getting recommendations for sections to be added for the articles. The research is adaptive in that sense and can respond to this kind of entry point. What's next? I think the biggest thing which is next is more languages. We need to build a tool API for testing it further in other big languages, let's say French Wikipedia, but also in smaller languages. We need to think about the problem of basically continue thinking about and exploring what if the language is much smaller than French Wikipedia. We need to test in these smaller languages. We need to develop a different set of technologies for these smaller languages. Most probably the category approach won't work. There's a project I called earlier Macamoon in French Wikipedia. We have done some early conversations with the team behind Macamoon and we may want to start working with them to provide some recommendations to help them scale the scope of their work to beyond just the articles about cities and towns in France. And then at the end of the day, we need to put this in front of less experience or newcomer, less experienced editors and newcomers and see whether this kind of system actually helps them in practice. We can do as much evaluation as we want right now, but once the system is in front of the user, they will need to make the call of whether this is useful for them. And I want to call out that this last bullet point is a product heavy design heavy step. So we have tried to develop this research by continuously discussing it with the community and the designers in the foundation to make sure that we don't end up in a direction which is heavy and exciting in research but less exciting in terms of product development. But this is going to be a very important steps for product and design to get more involved if we want to put this in front of new editors in the long run. And then obviously other types of recommendations. So we have started discussing with Miriam how we can do image recommendations for articles. Again, this should technically be an easier task that we should be able to do a good job in and it will be quite impactful because images are basically the way we can communicate across languages without requiring any words. So can we bring these images to give you the articles and make them more useful and understandable by their corresponding audiences? Info box recommendations is another item which is a simpler problem than the one we had been working on. And probably other kinds of recommendations. I'll stop here and then we can talk. Thank you. Thanks so much, Lila. I'm going to open up the floor to the room for any question we may have from IRC. I've got two questions and one possible comment so far from IRC. First question is from computer MacGyver. They say first, great work. And then to what extent do the related articles from category or text recommendations align with Wiki projects? Could Wiki projects be useful input training data for any step? Yeah, that's a very good question. So this is something that we also discussed with Diego and Daria because Erin is also doing some work on with Wiki projects. And I think there are things for us to learn from the work that is being done in the context of Wiki projects. I can't comment on specifically what I can say that you're exploring that space. One thing is that Wiki projects are very language specific. So they change from language to language. If you want to try to adapt our models and make them better in a given language, we can probably use the information in Wiki projects. I'm not sure whether we can use that for taking information basically across languages or not. But again, definitely something that we are considering to dive in. Awesome, thank you. One or two thoughts from NetROM. I have some thoughts, comments and questions. Two thoughts, consider a Wiki project as a point of entry. And two, for English, French and Russian using ORS parameter injection can allow you to predict quality changes based on a section of k bytes. So let me take the first one. I don't think I understood the second one. So the first one, I guess my question is, can you map Wiki projects to categories? Because if you can do that, then if you know a Wiki project can be represented best by a set of categories, you can basically provide these categories and get from the recommender system a set of sections to be added to articles that belong to these categories. If the answer is no, if the answer is that Wiki projects don't map to specific categories, then I need to learn more and see how we can structure it in ways that we can get more information from Wiki projects. That's the answer to Jonathan NetROM's question. We will see. I've asked them to clarify their second thought. Another question from a Paragos. What approach might they use to find and merge semantically related section titles? Yeah. So Diego, do you want to comment on that topic? On the semantic relatedness of sections. What kind of approach we can use for combining basically sections that have the same meaning? So yeah, we were discussing this, for example, in German, you have two ways to write the same section. I don't remember. I think geography you can write in two ways. And one thing that we were thinking is to look to sections that co-cures with other section names often with the same section name, but doesn't co-cure between themselves. So if you have two things that mean the same, like for example, I don't know, youth and early years, they might co-cure with similar section names, but they will never co-cure between them. So we are thinking to include this to deal with these kind of problems. All right. We have two more questions from NetROM. Question one. Maybe I missed this, but what are the reasons behind using section recommendation as the starting point for recommending to a user that they should add content to an article as compared to other alternatives? Yeah. So I think there were a few reasons. One is that we wanted to choose something that was challenging enough. So starting, for example, very early on with let's say info boxes that has been done quite a bit of work on info box recommendations. And it's not very clear what would exactly be our research contribution except that basically replacing some of the methods that already exist. So in general, we stayed out of info boxes for that reason, at least initially. It has to be challenging. References was something that we considered, but that was complicated because recommending citations in general is a complicated task. If you want to go across projects, this will become even more complicated. And we felt that's too big of a step for the first thing we want to do in this space. We also looked at basically community initiatives and we noticed that the community initiatives and the conversations with the community. And what we learned is that they are actually trying to manually create these templates that create these sections. So one of the biggest motivations was can we do something that basically reduce all these manual work that is going into creating these sections for the community, I think. The collection of all these reasons resulted in us starting from sections. Cool. So we have one more question from NetROM and I think then Daria was in the queue after that. Onboarding newcomers was discussed. What's the idea behind using this to onboard newcomers? Yeah, so the idea is not very well developed. So there is a general idea that is we want to be able to help newcomers. And I think this is a conversation that obviously will need to happen with each community. How do they onboard? What does it mean for them to onboard newcomers? So what are the set of tasks that they're considering? What are the set of guidelines that they want the newcomer to be aware of? I think we are not getting to that level of detail right now. What we're doing is that we're focusing on one aspect of onboarding, which is helping the newcomer learn how to expand an already existing article on Wikipedia. And the question is, can we help the newcomer do this more easily? But also, can we help the people behind these editathons or people who bring people inside the movement to do less of manual work? Basically, again, in the context of expanding articles. I think this is the extent that we are thinking about right now. I think there's nothing really beyond article expansion as far as I remember. And then basically how sections is not enough, right? As I mentioned, it's not enough to tell me, if I go to English Wikipedia and you tell me biographies need to have early life, I have no idea whether I should write three sentences or 20 sentences or 100 sentences. And actually, how should I write an early life section for a biography? So are there ways that we can automatically extract information and provide information to the newcomer that can help them learn some of these very basic components of editing an encyclopedic article in language X? And if we can provide that, let's provide it and see what comes out of it. Awesome. Daria, I think you were next. Sure. I'm actually going to refocus briefly on this issue. And it strikes me that you mentioned that there's a big opportunity for product development around this. And obviously figuring out what are the missing sections and types of content that are missing is a big part of a problem. The other problem is around UX, figuring out what is the right way of serving them in a way that makes sense to the user. And it's an entire area that we haven't even started exploring, right? So my question is about the universe of knowledge that we consider as a source for these recommendations. Obviously, this project is currently looking at everything that exists within the context of the Wikipedia universe, combining that also with Wikipedia. It strikes me that there's a lot of candidate encyclopedic knowledge about topics that already exist in various language editions of Wikipedia that is currently possible to find within the union of all languages that we have, right? And I think an example is the fact that if you look at the work that Wikieta is doing, again, for entities that do exist in at least some language editions of Wikipedia by matching these entities with external knowledge bases, we're basically building the possibilities for identifying types of content, types of knowledge that potentially should be in these languages, but are not there yet. So I was wondering if you had thought with this team about any additional direction for this project looking beyond the universal knowledge that's within Wikipedia's. Yeah, that's a very good question. And it's something definitely that we are aware of. So we are basically kind of amplifying at best whatever comes to Wikipedia and the sources of, for example, if there is bias on Wikipedia and this bias never gets resolved in at least one language, we will never learn about it, right? So how do you come out of this system and try to inject more content from outside? I think one of the things that we are kind of very carefully keeping an eye on is Wikidata. So for us, if a change happens in Wikidata, ideally we would have a system that would be able to basically map that change to corresponding sections of the article and can provide those as recommendations. So you can think of Wikidata is kind of for us is the entry point to more types of knowledge that can be added. And I think it's almost like a separate problem to think about how to populate Wikidata kind of in ways that are sustainable with the kind of content that we want to have across projects. It's an important question, but I think it's potentially outside of the scope of this project at this moment. But I'm really interested in generally exploring that problem, like how can we think about Wikidata? How can we expand some of the capacities that are happening right now there to help reduce bias and content gaps by connecting Wikidata to more external sources? I think once the information is in Wikidata, it becomes easier for systems like this to basically speak with Wikidata and get that content and model it and figure out where in an article it should go to. This kind of system, I think speaking with outside of Wikidata in practice will be really hard because basically if you look at, for example, data sources, they are maintained in different ways. It's a highly distributed system outside already in the Wikimedia world, but when you go outside, like how data is maintained, how data is surfaced. I'm not sure how reasonable it is to expect that a recommender system in Wikimedia would talk with those systems directly. I would expect that to happen through Wikidata. Sure. Just to give one example, just to clear up on this. The example in mind is basically the identifier mapping that exists in Wikidata, which basically says this topic that exists here exists also in external source or external catalog. We can make a reasonable assumption that the structure of the contents in the external catalog will be consistent across all the entities that are described by these identifiers and maybe one possible direction will be to look at, okay, if we link how to say an encyclopedia, and the encyclopedia speaks about a painter, does that entry by the painter have the same sections and the same type of content that we have in the corresponding Wikipedia-language editions, right? So I think that's going to be an interesting content to explore. Yeah, thanks very much. We have a clarification from NetROM on one of the earlier points about ORROR's parameter injections. They say you can ask ORRORs what values it used for predicting the quality of a given revision and then alter those so that it reflects that a section has been added and it will give you an updated prediction. In other words, you can use it to tell you what its prediction will be if the article was altered in a given way. Since you're suggesting adding sections, this means you could use ORRORs to figure out how the predicted quality change of a user following those suggestions. Yeah, that's a good point, yes, I understand that. No other questions on IRC? I've got a question if we still have a few minutes. Yes. So I didn't hear you address this directly in your presentation, Laila, but I may have been distracted for a minute, so apologies if I'm making you repeat yourself. You talked a bit about bias, but one area of bias that I'm curious about are some of the systemic biases that relate to the way certain topics get written about on Wikipedia. An example for which I've heard a lot of anecdotes about this are that biographical articles about women tend to focus on different aspects of a notable woman's life than biographical articles about an equivalently notable male. So you might have an article about a woman's scientist that had a lot more sections and content related to her personal life or her family life than you might have about an equivalent male scientist. In a scenario like that, are there ways in which we... Are you concerned about the possibility of reifying some of these existing biases or are there particular ways you might think of checking for these biases or correcting for them? Yeah, that's a good question. I think all of these systems will need to have feedback mechanisms. So two things, feedback mechanism and we shouldn't try to become too smart. So I think the strength for a project such as Wikipedia is basically the human, the editor. The question is how deep do we want to go to help this editor to do better? And that's how deep is not clear to me. Giving section recommendations is probably going to pass my bar up. It's probably okay. But how deep you go within the section, for example, are you going to say what kind of content should be added to that section? How deep are you going to go in that front? I think the deeper we go, we basically will have our hands dirtier, deeper, with machines. And the question is how are you going to come out of it? One way would be feedback. User feedback is pretty strong, especially in Wikimedia projects where you have a lot of people who are thinking very critically and are willing to express their opinion and tell us what they think. And I think these systems will need to have feedback mechanisms or otherwise they will fall in loops. Some of it I think is inevitable. But again, if the system is at least designed in ways that can receive feedback from the user and can express that as part of its reasoning when it's giving the next set of recommendations, I think that would be really strong. And the other thing I think is just basically the interpretability. That's a topic that, one reason that we're very excited about these kind of category-based recommendations is that we can clearly say why we are making this recommendation. We can say this article belongs to this category. And this category, articles in this category usually have this section. So at least the editor knows why they're receiving such a recommendation and they can do some reasoning. And I think that's really important to keep in mind as we're moving forward. Awesome, thank you. That's all we've got so far. All right, so I think we're going to call it a day here. Thanks everybody for joining. As a reminder, our next showcase will be in January. We already have a conference speaker, Ian Chen from University of Michigan will be talking about expert engagement experiments in Wikipedia on the 17th of January. We're going to have another speaker by then still to be confirmed. So with that, wish you a happy holiday and see you all in.