 first of the two sessions for light and talk and fish and aura presentation. The first paper is from Jerome. Jerome, I see you. I saw you around. So if you could please start sharing your screen. Yes. So can you guys hear me first? Yes. Is the sound clear? Yes. Great. So let me share my screen. While you share your screen. Please, if you have questions, please put them in the chat. Isaac Johnson is going to monitor the chat for questions and for people who want to open their mic and ask questions. And with that, Jerome, the floor is yours. Thank you so much. Okay, great. Well, thank you all for being here. What a great responsibility to be kicking off this, this amazing workshop. So what I'm going to try and do today is to quickly tell you a story about Wikipedia governance that most of you know very well already, but I'm going to tell you this story using an experiment, which is a, I think a very efficient way to convey those kind of stories. This is a programmatic paper. So I'm very excited that they have many, many students around, because I think that there is a job to be done in this space. So this paper is about trust and community policing. Let me directly dive in. What's the motivation for it? The motivation for its spread is simple. We know that Wikipedia as a peer production community relies on its ability to attract and retain volunteers to cry. And there's a puzzle that has been attracting the attention of researchers on the community projects for some time is that for most mature communities on Wikipedia, the number of active contributors has been plateauing for about a decade, so that projects have difficulties attracting and retaining volunteers, especially from minority groups or female. And so people have been trying to look into the reasons for this problem, right, in extending the community. And there's a lot of research on this that basically points to the increasing bureaucratization of Wikipedia and the impersonal or extensive enforcement of policies aimed at protecting the encyclopedia from malicious users. And so really the implicit assumption of all of this line of work is that admins but also experienced users do not strike the right balance between the inclusion of good faith actors and the exclusion of malicious ones. Or another way to say this is that there is substantial heterogeneity in the way administrators exercise their policing rights that is not grounded in hard information, which leads to inefficient policy based on this assumption. So if you have two papers, actually, you know, those can be counted with only one hand, try to design classifiers and tools, and at helping administrators and experienced users reach better community policy decisions, so that you can extract this information from Wikipedia. Now, I said, as I said, this is a very thin stream of research, and whether you think that this line of research is warranted or not, really depends on whether you buy this implicit assumption that there is inefficient policy going on on Wikipedia. And that is a hypothesis that's very hard to test, because the decision problem that administrators typically have to solve is very hard. They operate on the time pressure with limited information that may be costly to acquire, and they need to minimize risk of failing to act to harmful behavior, while trying not to exercise the policy right on good faith editors and not driving them away from the community. Now, this is difficult to demonstrate that administrators get this problem wrong. And so what this paper is going to try to do is to tackle this question using using the concept of decision making heuristics. So, what this paper is going to do is that it's going to provide a direct test of the idea that Wikipedia administrators rely on inefficient decision making heuristics when they make the community policy decisions. What is a decision making heuristic there's a huge literature on this in psychology. And basically when the environment's complex you're acting in under uncertainty, people rely on simplifying that allow them to which decisions quickly, but may also be costly in terms of efficiency because the criteria in their use may be unrelated to the problem of hand. And so what we're going to focus here in the context of this paper is trust as a social heuristics trusting strangers as social heuristics that develops early on in people's lives depending on the way they socialize or institutions. And we think that trust in strangers is very relevant to community policy, especially Wikipedia, which is committed to openness and attracts from distributions from anonymous users. Right. And so what we're going to do here is we're going to test the hypothesis that general trust attitudes in the admin population that are mostly acquired out of Wikipedia influence the community policy decisions of administrators. And we're going to estimate the extent to which this is the case. And that is going to provide us with a test of whether the decision making process of administrators in Wikipedia actually is efficient, or relies on unrelated heuristics. Okay. So this is really a test of this implicit assumption of the efficiency of the decision making process of administrators. As I said, we're going to do this using an experiment trust is very difficult to measure in a decentralized way. And so we're going to rely on a large literature that does this using experiments. So about nine years ago, 120 Wikipedia administrators played a trust game online, which is a behavioral economics game that's why you use in the literature to do this. And that that was advertised on Wikipedia through a site notice. And we're going to use this experimental measure of trust to try and predict admin activity over six months to nine years. And the more we see the link there, the more we will conclude that any stairs actually have substantial heterogeneity in the way they're exercising for their policing right, which is not justified. And by definition, we need to inefficient policy. Okay, just a couple of words on the experiment. How does this experiment work. We basically divide the population of administrators in the study in two. We have participants A's and participants B's, both of them are endowed with $10 at the beginning of the experiment. But participant A has a decision to make. So that guy over here. He can decide to send whatever fraction of his endowment from zero to $10 to participant B. Whatever he decides to send is tripled by the experiment. So if he decides to send everything, then participant B will receive $30. Participant B at this stage has a decision to make, how much to return to participant A, but he cannot communicate nor commit to do anything. So that because this is a one-shot interaction that's totally anonymous, the amount that participant A will be willing to risk sending to participant B is typically interpreted as a measure of trust in anonymous strangers. And this is really what we care about here, right? We care about administrators having to police a lot of users that have very limited information about if anything. So this is the trust game that administrators play and we're going to focus here on the behavior of participant A's, which are the trusters. Okay, so we're going to collect this trust variable here. And for the population we're interested in, this variable ranges from zero, you send nothing to one, you send your whole endowment to that stranger, and that's our measure of trust, experimental measure of trust. We also collect a bunch of control variables where your age, your gender, your degree level, level of risk aversion, plus one more variable, which is the level of main space activity of those administrators, so we don't want to confuse policing activities with main space edits, regular edits, as those two might be correlated. Okay, so we're going to control for that to the model. So those are the basically the explanatory variables of the model we're interested in. And we're going to try, as I say, to predict community policing decisions in this population. We're going to have the number of users blocked over nine years after the experiment, the number of pages that administrators delete, the number of pages they protect, the overall count of the admin actions they perform. And also, this is a six months measure, the last one here, self reported time, fraction of their time that admins report, dedicating to admin activities. So this is a survey measure that we ask administrators six months after the experiment, which is why N is a little bit lower in the response rate. Okay. And so basically, here is the results of the test. So those are simple OS progressions, each comment here corresponds to a dependent variable as defined in this table, right, that we just talked about. So this is the community policing activities of our administrators. So this is a function of trust, age, gender degree level risk aversion and your main space activity level. I'm not going to comment the interest of time on the control variables but basically what we see here is that trust is the verbal that's most strongly associated with community policy decisions on Wikipedia. So for instance, here what you see here if you focus on the number of users that administrators block, moving from no trust full trust is associated this coefficient is about to compute it precisely 80% production in the number of users that administrators decide to block when you move from no trusting strangers, full trusting strangers in this decontexturized experiment. The same thing for the number of pages deleted the number of admin actions overall, and also for the self reported time that administrators declare, dedicating to admin activities. Again, this is all controlling for the level of main space edits, so that we see this very strong relationship between generalized trust and admin activities that suggests that in fact, the implicit assumption of those few papers that you use the great text for resources Wikipedia to build classifiers and tools aimed at guiding administrators in their community policy decision is very much warranted. So I think this is a very exciting area of research moving forward for media researchers that that that we should that we should think about are the building of those tools that's going to that's going to help administrators and experienced editors reach more efficient decisions in the world in which we see them acting based on juristics because of uncertainty, time pressure, and so on and so forth. So I'm going to stop here. Hopefully I wasn't too long and take questions. Thank you so much. Thank you so much for this presentation in the recent interest of time. Would it be okay if we ask participants type the question in the chat and you reply over chat and we move to the next presenter. Anyway, there will be a post session at the end where you will be able to discuss with everyone. Apologies. I think we are in a on a very tight schedule. Thank you so much for the presentation. Next, I think. Hello. Hello. Can you screen. Yes. So, how is going to tell us about languages of knowledge infrastructure learning from research on Indian language, Wikimedia projects. So, a very different perspective. We can see your screens now. How please go ahead. And the story is. Hi, everyone. I hope I'm audible. Yes. Okay, great. Thank you. Thanks so much. And it's so great to be here as part of this workshop. So, I'll be talking a little bit today about some of the research that we've been doing at the Center for Internet and Society, the Access to Knowledge Program. My name is Sneha. I'm a researcher. I work primarily on digital media and cultures with the Wikipedia or the Access to Knowledge team at CIS. I've largely been supervising and managing a series of research projects, short term, small studies because this is a very initial foray for us into research on Wikimedia projects and platforms. So just to quickly move on. This is just a very quick introductory slide on CIS. We're a non-profit organization based in Bangalore in Delhi in India. We work largely on policy research related to various aspects of the internet, but also some academic research on digital media and cultures, open access, digital knowledge, etc. And of course, we support and work very closely with the Indian language Wikimedia communities. The program itself is called Access to Knowledge. And of course, our objective is to catalyze and support the group of the open knowledge movement in South Asia in Indian languages. And these are all the many ways in which we've been working with communities, supporting and serving Indian Wikimedia communities, building partnerships, bring more content under free licenses, working on community participation and supporting volunteers. Through various sort of projects and plans. So to get to the research that we've been doing over the last two years. So this actually began as sort of a pilot initiative as a very initial foray in 2019. And this was based, of course, on foundation feedback to tap into existing research expertise that's available at CIS and also to sort of diversify the areas of work that the Access to Knowledge program has been in need. Of course, our immediate and our sort of larger objective was to definitely be attentive and still is very much to be very attentive to community needs and priorities. So the projects themselves were sort of very short term, small scale projects undertaken by team members. You can see the names of the projects and the various team members who worked on these projects over the course of the last, say year and a half, two years almost. So the idea was really to have, you know, still have sort of a year to the ground and really sort of pick up needs and priorities and areas of work from the work that we've been doing with communities over the last six, seven years. So just to kind of very quickly go through the projects. We've had a long standing thematic on a called bridging the gender gap in Indian language communities. So there are a series of programmatic activities that we undertake working closely with Wikipedia communities and with groups, user groups have been working closely on the issue of the gender gap in Indian language communities. So, and of course, as part of that we've also done previously as well a series of studies on analyzing and understanding the prevalence of the gender gap. What are sort of content participation related challenges issues, what have been efforts undertaken by different communities, and how does the problem also sort of evolve and change and what are possible strategies and solutions that. So this year again, we undertook such an exercise that report is available and published on the CS website and also on Meta. There's a study on mapping galleries, libraries, archives and museums in Maharashtra. This is undertaken by support. So this is again looking at what kind of glam content is available across various institutions in Maharashtra in Marathi, which is the main official language of the state. And we've been trying to understand what have been challenges in digitizing and making this content available. So technical challenges, legal challenges related to open access, licenses, etc. Another project was on data gaps on heritage structures in West Bengal. So where have these projects struggled with trying to put up complete data, complete sort of structured linked data on cultural heritage structures on wiki data, what have been challenges related to resources, documentation and eventually sort of getting that content onto wiki data and linked in such a way that they're discoverable. Another study on content creation on Eastern Punjabi wiki data. So what is the nature of content that presently exists? What have been challenges with translation specifically and how what have been strategies and efforts in creating new content specifically related to the state of Punjab and its various aspects on this wiki data. There's also a case study on a long running project in Karnataka in Bangalore, in fact, with Christ University. So there's a wikipedia and education program that CSA2K has been engaged with for easily about seven years now. And this was an opportunity time for us to also do a case study to understand how that program has evolved, what have been changes, what have been strategies, what have been sort of issues with digital learning, right? So how have digital learning and pedagogic strategies also evolved with the introduction of wikipedia in the classroom. There's also now a study on article creation campaigns on wiki media. So this is a mapping of the wikipedia Asian month and project tiger to content creation projects that have been undertaken. Wikipedia Asian month of course has been going on since 2015, I recall correctly and project tiger is about two years old. So this is a sort of comparative analysis undertaken by both these researchers who also worked closely with these projects. These are the last two are of course new. One is of course a new project and the last one is a research needs assessment exercise that we've just recently completed. So again, building on the plan book and effort to map content on water resources in Maharashtra and how much of this is available in Indian languages. Where are they right now available and how to bring them online and onto platforms like Wikipedia. So the research needs assessment. I will talk about a little bit more as we move on to tell you a little bit about our learnings from even from this mapping and how we understand. What is the understanding of research among Indian Wikipedia communities as well. So, right, so just quickly to also talk a little bit about methods that we've used in these various studies, seven of them. So methods have included interviews desk research surveys data visualizations like in the case of wiki data project. So it's been a mix of traditional nontraditional methods because researchers themselves are also active Wikipedia volunteers team members and bring their own sort of understanding of working with the communities to the research. So we've kept it quite sort of flexible open, but also maintaining that sort of rigor in the research design in terms of the kind of qualitative aspects of study. So just to quickly go through some of the broad observations from the projects that we have completed so far out of these we've completed one, two, three, four, yeah. So I think for the first five are done and the study on Wikimedia and education was also very quickly which shortly be available online. So I think these are sort of broad observations content gaps, definitely still a large array of content in Indian languages that remains unavailable for wider public access digitization being a major challenge translation being again a major challenge and awareness and implementation of open access policies again being a barrier to this knowledge disparities so lack of technological infrastructure structure and policy related knowledge so again understanding open access policies for instance copyright, which again sort of create an accessible exacerbate disparities in content creation in access and use diversity and content and participation. So, BGG, which is the acronym for bridging the gender gap is a reflection precisely with this this persistent question that we've come across, you know, for several years now of content and participation gaps. Again, participation participation, not only my women but individuals across the spectrum of gender and sexual identities, which affect the sort of comprehensive picture of knowledge that is produced on Wikimedia. So, so a lot of interesting and difficult questions that have come up in the course of that study capacity building I think comes up as one of the big sort of areas of work and sort of interesting questions also as part of the research kind of for a that we've done. There's a need for training and awareness in technological skills, communication skills, understanding media social media, for instance, community health and policy related aspects. So even in the research needs assessment exercise, for instance, a lot of people who responded to that study actually said that technological understanding technological aspects of Wikipedia still remains kind of an area of work. Translation tools, understanding data related aspects, you know, policy, definitely, I think we're better understanding of open access and creative commons licenses, for instance, I think all these remains various work. So technological aspects, community building aspects against the community health policies related to community interaction process related aspects. I think these are the areas that came up as areas of research or research that community members would like to see process related learnings and challenges for many because this was again for us. I'm sorry, we are going a little bit over time with this presentation. It's super interesting. I would I will hear your presentation but you know we have a very tight schedule if you don't mind wrapping up and then you can go for questions in the chat. Would that be okay. Thank you. Sure, sure, sure, we'll do. Yes, just give me two minutes. I'm just going to leave the slide here with the last set of sort of reflections and questions. Again on Wikipedia as a knowledge infrastructure reconciling local and global priorities, looking at replicability. I'm sorry, I need to interrupt. We unfortunately can't give the two minutes because they're like also quite packed. So if you can wrap this up within the next 20 seconds, that would be great. Thank you. Yeah, yeah, so I'm just going to enter. I'm just going to leave the slide for questions. Thank you so much. Thank you. My apologies. We have a very, you know, we have so many submissions to this workshop that we have a very packed schedule and every single one will deserve an entire workshop by itself but so unfortunately we have to go on. And so Lucy, I hope you're ready for the first item. So this is going to be a very, how do you say, hectic section we're going to have nine presentation in 27 minutes. And I'm going to share the slides and I'm going to go through the side if I find them. Yes, they're here. All right. Yes. All right. Yes, Lucy. Yeah. Beginning. Okay, perfect. So, I will very quickly and briefly talk about references and Wikipedia from an editor's perspective it's work I did together with Hadi et cetera, who's also in the workshop meters later in the lightning session poster sessions next slide. I can't see the next slide. Okay, so I apologize it's stuck. Okay, I'll tell you in the meantime, we did interviews. It's one before that. We did interviews and a survey, and basically we wanted to find out how editors create articles with a focus on referencing. This is basically the workflow we found there's a selection of topic process there's a reference election process, and they structure articles. There's a lot of information the papers or the information you should read it, but also talk to us. And the next slide finally, just some points we wanted to highlight that 11% of editors in Wikipedia across different languages that we interviewed do not use any tools for finding references. We found that a lot of people only use offline references such as books, but there is 11% use online and offline references meaning there's a large margin of people using online references. How do they use online references is super interesting. I want to add more detail somewhere else but we found that features are like interesting parts for using an online reference or selecting it is accessibility, such as paywalls it's important that content is open accessible, the availability to access low resource languages, and the quality of the reference that often is inferred by looking at how often it's used on previous Wikipedia. And that's it from me. We'll see you all in the post session later. Thank you Lucy. Hello. Can you hear me. I have a brief overview of our paper on negative knowledge for open world Wikidata. I'm Hiba PhD student at Max Planck Institute, and this work is part of a research project on negative knowledge at Web Scale. For more about the project, please follow us on Twitter at Negation and Knowledge Bases. Next please. I'm answering Wikidata about the awards of Stephen Hawking with return 42 awards that he has won. One salient award that he has not won however is the Nobel Prize in physics. Existing positive only knowledge bases are unaware of such negations because they operate under the open world assumption, which means that if a statement is not asserted in the knowledge base it's not necessarily false. So for Wikidata, this is an unknown statement when in reality it is false. Next please. So what we're proposing is to explicitly add interesting negative statements to Wikidata and other open world knowledge bases. The main problem here is how are we going to identify what an interesting negation is. To do so we propose the peer based negation inference methodology. We aim to discover interesting negations about an entity by observing highly related entities. For Stephen Hawking, for example, related entities which are mostly other physicists will lead us to the expectation that he probably should have won the Nobel Prize in physics. And using the completeness assumption over parts of the knowledge base or what we call peer groups with a ranking model we are able to decide that this inference is interesting and likely correct. To showcase the methodology we present the Wikidata platform which you can visit using the displayed link. I will also paste all the links in the chat afterwards. It's a platform for browsing salient negations about Wikidata entities. It can be explored using two interfaces. Through entity summarization you can give an entity as a query for example Leonardo DiCaprio and then get interesting negations about him such as he has no children. You can explore his peers on the side, play with different peering functions as well as other features. The second application is question answering where you can give negative triple pattern as a query. For example, give me people who have not received the Nobel Prize in physics and you receive a list of ranked entities who were highly inferred not to have won this award. I would like to invite you to visit the platform, visit the paper to know more about the methodology and follow us on Twitter. Thanks and see you at the poster session, rule number two for knowledge graphs. Thank you. Thank you Siba. Hello, everyone. This is on concoctos data from Bangladesh and I'm delighted to be with you in week workshop 2021, all the virtually due to the COVID-19 pandemic. I'm going to present my paper a brief analysis of Bengali Wikipedia journey to 100,000 articles. I will turn off my video to make sure that you all can hear me properly. Okay, so Bengali Wikipedia started this journey on January 2004 and after almost 17 years it crossed the milestone of 100,000 articles. In this paper, a brief overview of this journey has been presented by inspecting various parameters and also looking at the gender gap condition. The motivation behind this study was to have a brief reflection of various sectors of Bengali Wikipedia, its improvement and lackings in this journey of 17 years, not to go deeper with any certain topic, which can be part of our future work. So in this journey, various parameters, including active editor or new content creation, etc have improved quite a bit and the stats are really encouraging. As you can see here in the past few years, its growth rate has been quite promising with a notable growth, especially since 2019. Some peaks in the graph, as you can see in the last portions, the recent peaks reflect the impact of various events like article contests, some large scale editathon, and so we can understand that these events are really having an impact and motivating newcomers. But not every event, despite being an important, despite being important for the encyclopedia can show such impact. In this study, I have also explored the editing behavior of users and I have not include those since I need to keep this short. And I have found that Bengali Wikipedia stands in the top most position among the active editor community who use mobile for editing purpose among the Wikipedia projects. Next slide please. However, a huge gender gap exists in the worldwide movement, as we all know, and also it is quite prevalent in our community. In Bengali Wikipedia, it has been analyzed in several ways in this study and let me share some of them. In the 2019 to 20 timeline, 7.41% users among the registered users expressed their gender as women, which was 3.69% in 2010 to 11. It has improved in a really sluggish manner if you compare it with other parameters like active editor growth or content creation growth. However, it leaves an unequal condition in the Bengali Wiki. We know that users have permitted various rights depending on their experience and expertise. I have analyzed users with various rights and they are the number of female contributors ranges from 0 to 5% at best. In the Indic language Wikipedia considering at least 500 registration on the 19 to 20 timeline, Bengali Wikipedia stands in the lowest position, as you can see here. I believe this study to be helpful for the organizers and everyone concerned to reduce the gaps and work accordingly. And so that's all for this slide. If you have any questions, feel free to reach out to me at room one in the community perspective. Thank you for your presentation and you can reach out to me anytime through my email. Thank you. Thank you so much and can not say I believe you're next. Yes. Hello, I am not Sir Ahmadi and will present our paper, we create a logical rules and where to find them. And next, in the current version, as you can see, we key data uses property constraint to define restrictions on data. Syntactic checks and they are defined over properties values to make sure that some restrictions are applied. For example, as you can see in this picture, the constraint is defined to declare that native language can only have one value. Right now, there are more than 8000 of these constraints defined in Wikidata. Our goal in this paper is to use logical rules to curate Wikidata logical rules are more expressive than this constraint, and they can be used to apply more complicated restrictions. Logical rules can be positive or negative. We can use a logical rules to apply constraints like a subject cannot be married to an object who died before the birth date of the subject. Or if an object is a doctoral student of the subject, then the subject is the advisor of the object. Positive logical rules can be used for adding missing facts to knowledge graphs. For example, by applying the positive rules that we had here, we can add 25 new triples to Wikidata. On the other hand, negative rules can be used to find errors and inconsistencies in knowledge graphs. And in the example here, we can find 689 inconsistencies in Wikidata. There are some other examples in the table that you can see. Manually crafting these logical rules are very difficult. And for this reason, in the next slide, we propose two automatic methods to mine these rules from Wikidata. The first method, in the first method we use a rule miner named Rudik to directly go to Wikidata and extract positive and negative rules. Here you can see an example of a rule that has been extracted using Rudik. And this rule states that if a subject and object they have a child in common, then they should be in a spouse relationship. In the second method, we translate rules that are already mined from DVPDA and these rules are stored in the database named rule hop and we translate these rules into Wikidata format. For example, with this method, we could generate a rule that you already saw in the previous slide and this rule was translated from a rule that has been mined from DVPDA. The biggest challenge that we have to overcome in continuing our project is the uncertain nature of most of the rules. There are only a few number of rules that are always true in all situations and most of the rules they do not hold in all cases. For example, the positive rule that you can see in this slide, it's not true always and there can be some people that they have a child in common or they are not in a spouse relationship. Or even a very strong rule like a country has only one capital has a 15 exception. For solving this problem, we assign a confidence score to each rule which shows how accurate the rule is. Our evaluations as you can see in the plot in this slide show that the proposed confidence measure has a very good correlation with manually computed confidence by human. The problem is that these uncertain rules, they cannot be used to clean Wikidata and we have to convert these rules into exact rules. And so for solving this challenge right now, we are thinking about three solutions, adding conditions to rules or add exceptions to rules or using a human in the loop. Thanks a lot and please visit us in the room too. Thank you Nasser. Tony, you're next. Hello, hello from Barcelona. Okay, so thanks for the opportunity to share with you a bit of this work. It's a very preliminary work as a disclaimer. So let me put you a bit of the motivations and my real work and I'm from a vision but in my daily contributor time I want to contribute to improving the quality of different biographies and with this sense Wikidata and authorities are very important. And at the same time to handle to try to reduce the gap regarding the gender bias we have in biographies that you know that it's still very prevalent in different Wikidias. I already shared with you a link so you can see some of the reports and results. So just briefly let me show you a bit of how this work. For most of you that are already working with data, you have all these problematics, all you handle with this. Okay, so the situation simply for the idea was trying to have a kind of very live thing, very, very quick thing to have very, very many times a day. So we can get things from Wikidata for five times per week. And also a relationship to template is something that is very related to want to see the biographies have, let's say templates regarding lack of references or problems of, let's say, not ability, etc. I mean something you can do very quite often but for things like authorities and all that kind of a sound you have to report to the dams. So this is something that you have to do. And of course for this we are using a database to keep it all together so you have to run things many times and pandas for making it all together and we put in and that's the thing that I want to stress put everything in the same weekies in some specific space. So the final user that are activists that want to handle these things can find that. So next next slide. So kind of the results that the ideas I commend you is trying to get an engagement of people to solve some of these is try to see generating this let's see the last biographies especially let's say for women the other fees. Also, this time to be helpful for saying to have some checks. Let's see problems with wiki data. Sometimes there are some vandalism people change gender to change some instances and all these. You could, you could find it out. In this case as far as what we get to the video was the target one. And also as I commend you to see different things are regarding the templates about some biographies know and also something that I still haven't studied enough. So I think it's about the editing pattern not depending how we see different uses might react to different things regarding to biographies and that's it. Let the next slide from David Semedo. Thank you, Tony. Thank you for introducing David David. Hello, hello. Yes, I'm here. Hi everyone. Thank you for the kind introduction to. And so I'm, I will present this work which is also still in a preliminary stage but the idea is to have assembled this framework to bring this vision and language understanding models and make them be capable of filling with open domain data. So let's go to the next slide. I have I have here these full framework where. So, the idea is that what current state of the artworks like Wilbert and Alex Merton others. They are really really good at describing and talking about things that are in the images. So, for example, in these two images visually they are some somewhat similar that they depict the same visual concept so it's just records from a natural disaster. Well, in order to really understand what these images are about you really have to take into account the full context. So the idea is to leverage on these framework which basically goes from the images that are on Wikimedia to the images which set out structures in a knowledge graph, the several entities for a specific domain, and then we also have the Wikipedia pages that provide us with even further context in the text form. So together, I think this is a great framework to be able to extend these models and allow them to jointly reason over media, the knowledge graph and the temple context of information. So this leads us to the next slide where we propose a set of tasks that as there's much more information in the paper, but just to give you an overview. I chose this image which basically is related to George Floyd protests. So if you look at the image without considering anything else it's really hard to understand what it is, and basically describe it in an accurate manner. We formulate these tasks. For example, one of tasks when tasks that when tasks that is really already someone's somewhat standard in computer vision community is image captioning. But here we have to describe the image also accounting for extra context information in order to be able to really describe the image and the topic that it conveys. And the second task is related to conversational agents. So these are becoming predominant across, well, as a search search system so it's it provides us with a much more natural way to find information through a conversation. And we want to achieve these models that are able to talk about an image, but also from the topics that are mentioned or referred on that image. This is another type of task that are others related to linking to social media. And of course we can also come up with several compelling visualizations that not only show a static overview of the topic that also shows a specific topic evolves over time. But of course to tackle these tasks, we really need a framework and the one we proposed based on Wikimedia projects as all the key ingredients to addresses. We already started working on the first one. So, like I said, this is a very preliminary project in its early stages. So if you want if you're interested in this please drop a message reach out to me and I will give you more details if you are interested. So, thanks. Yes, I'm here. Hi, I'm Philip. This, this work is a collaboration between University of Constance, Wuppertal and FISL Karlsruhe. Let me start with our motivation on the first slide, please. For developing mathematical entity linking or bikification of mathematical formulae, Wikipedia provides special pages which open if you click on a linked formula, a name and description of the concept is shown along with names and description of the constituting identifiers or the symbols. However, at the moment, only a handful of formulae are linked so we need scalable methods to do this. The second application or motivation is a mathematical question answering system which we build for questions on Wikidata. The system can answer relationship questions such as what is the relation between mass and energy to display names for variables and values for constants you achieve from Wikidata and allow for calculations. Next slide please. So to develop scalable methods to link formulae concepts in Wikipedia articles to Wikidata items, we set up this pipeline which we can see on the left. First Wikipedia articles are annotated using our Anomath Tech formula and identify annotation recommender system which is the contribution of this paper. Second, the annotated concepts are seen as Wikidata items. Third, links are included in the annotated articles. We evaluated both our annotation recommendation approach and the community acceptance of the edits and Wikipedia and Wikidata. On the top right corner you can see the start screen of Anomath Tech which is hosted by Wikimedia at AnomathTech.vmflabs.org. In the next demo video I will paste the link later. At the bottom left you can see the annotation recommendations that are displayed after clicking on a formula in the Wikipedia article that is about to be annotated. Using the AI assistance we were able to speed up the annotation process by a factor of 1.4 for formulae and 2.4 for identifiers. The community rejected 33% of the Wikidata items and only 12% of the edited Wikipedia articles within the first month. We persisted the dataset into a benchmark platform for mathematical formulae. In future project we will evaluate mathematical information retrieval task and systems on this labelled data benchmark. We are looking forward to meeting you in the virtual poster session later. And now we will hand over to John Samuel. John, the floor is yours. Thank you. Hi, first and foremost I am thankful to the organizers for allowing me to present this talk of Wikibag workshop 2021. I am John Samuel and my presentation is about check statements. In the next slide I will present the main motivation behind this work. Thanks to the open and collaborative nature of Wikidata new entities are created regularly and they need to be validated. Wikiprojects play a significant role in guiding the contributors and newcomers to various possible ways for describing entities. But how can errors like wrong use of properties, cardinalities or data types be identified. Wikidata property constraints can identify some of them. And recently introduced shape expressions or checks for describing entity schemas can be used to identify many more complex errors. However, the time of speaking there exist only less than 300 entity schemas for more than 90 million Wikidata items. So this work started during Wikitextom 2019 in Amsterdam aimed to reduce the complexity of writing entity schemas or shape expression. The major question was, is it possible to generate shape expressions from simple CSV statements or file. Secondly, it must take into consideration the work done by numerous Wikiprojects. And thirdly, the solution should be multilingual and it should help speakers from multiple languages. So check statements was inspired by quick statements and it supports a simple tabular column syntax with five columns. Two parts you have a first part for specifying the precipics and the second part for specifying the shape of entities. So in the second part you can specify node names, properties, allowed values, cardinalities and comments and you can have and I will share the link it's available at checkstatements.toolforge.org. In the next slide I will give a simple example of a TV series and a five column syntax. So you as we see there's a TV series in an instance of two five three nine eight four two six, and it can have one or more sorry zero or more genuine and you can have have one or more countries of origin, or one or more directors or one or more screenwriters. But at the last line you see an interesting example where you say, what is a Jenner and this Jenner could be how could have multiple values so here you see the vertical bars and commas being used to explain things. Thank you once again and see you in room to knowledge graph session. Thank you. Thank you, John Christian you're closing the first part of the show. Hi everyone. I'm glad to be here. And I will be presenting our work on Wikipedia editor drop off. The extensive literature about how newbie starts to participate in Wikipedia, and how sometimes they leave the project very early. And there is also a little to run editor retention in general, but we think that there is less study about how experience editor live the project. We started this project with a grant from the media foundation to find out what are the causes of editor drop off the next slide. So the first thing to say is that there is no widely accepted definition for drop off. So we started by characterizing various states of drop off using community documentation, such as essays on various Wikipedia language editions. But for example, there are wiki breaks, semi retirement and retirement that are different states in which an editor can be through their life cycle. And there are shared the definition about what the state in a particular in the paper we collected this information and from bodies languages and also we present a characterization that is based on activity or better in activity metrics that could be based on fixed threshold, for example, from how many for how many days that person did not edit Wikipedia. But we are interested in expanding this taking account also several characteristics of the user, for example, if they have an admin flag or not, because there are some inactivity period in Wikipedia policies in various languages that says that if an administrator is not active for a certain amount of days, then they will lose their admin flag. Next slide. So, together with this characterization of drop off, we also want to study which on wiki interaction that we can observe from the editor story can be associated to drop off. And we have identified these three families of hypothesis that you see one also taking into consideration the existing literature. So one could be caused from a black interactions or conflict between the editors, the other from an excess in a number or the spread of interaction. So, for example, burnout and turn the from a lack of interaction with editors with similar characteristics or let's say a feeling of isolation. Of course, this work is the kind of a position paper that contextualize our work and the work that we are going to do in our project and the metric that we are going to compute. We think that it's very important to understand the drop off to community Alf in general and we welcome your feedback in this topic and we invite you to participate in the community's perspective poster session in room one. So thank you everybody. And now it's the break, I think.