 This meeting is being recorded. Okay, I'm seeing people coming back, so I'm going to do this. Unless I hear anyone, I am assuming you can hear me and see this presentation that is taking a while to load. Great, so welcome back everyone. I hope you had lots of fun in the breakout rooms. This is one of my favorite time of the week workshop, so I really hope you enjoy that. So I'm very happy to start the first session of our research program today. Before I start, I want to take 20 seconds to really celebrate this amazing community. So every year we see the number of contributions to this workshop growing and the authors put lots of love in their papers that our reviewers really do a wonderful job. So I am very pleased to celebrate how amazing this community is. So thank you everyone for being here for contributing to this workshop as a reviewer, author, participant. You're making this workshop enjoyable for everyone. So thank you very much. So again, I thought last year we had the record number of submissions, but this year we broke that record again. So we had more than 31 submissions and out of these submissions, we accepted 25 papers to this workshop. Since we have so many papers now, we partition this into two sessions. You will hear three live presentations of about seven minutes. And then for the live presentation, you will have the opportunity to ask one question live. We will have chairs that will deal with, sorry, moderators, Q&A moderators that will deal with questions in the queue. And then you will see in between seven and nine spotlight presentations in a form of beautiful videos that our authors sent us. Again, as I said, if you have questions for our authors, either the live presentations or the video presentations, please put them in the chat or in the notes doc. And if we don't have time to ask them during this session, the poster session chairs will ask them to the authors during the poster sessions. So please don't hesitate to ask questions. If we don't have room for you now, we'll make sure that your questions are relayed to the authors. There is detailed papers scheduled for today. If you want to see it, I can't put the link in the chat, but I will do it as soon as I close this slide deck. Before I leave the stage to our authors, I really want to thank again everyone for being part of this, authors and reviewers, PC members who gave their love to this research program. We hope you enjoyed it. There's lots of high quality research that you will see in the next couple of hours. So without further ado, I am going to start with the first session of oral presentations. Please refer to Tiziano for questions if I'm not wrong. And the first presenter, who I invite on stage to share the screen is Karina Nugriano, who is great. Hi, good. Hi, Karina. Please, I am going to stop sharing. You can share the screen while I introduce you. You are going to talk about rows from many sources, how to enrich, row completions from WikiData with a pre-trained language model. Karina, the floor is yours. Awesome. I hope you can see my screen and hear me. Yes, yes, yes. Awesome. Okay, we're good to go. So hi, everyone. Thank you for joining our first talk, which is rows from many sources and reaching row completions from WikiData with a pre-trained language model. You are romantic, I feel like it's simple. Oh, I'm sorry. I am Karina, and I will be presenting this work on behalf of our collaborators. Alp, Jack, Sean, Danny, Andy, and Chinu. So we are part of Microsoft Research, and our common research interest is tabular data, and in particular, how we can infuse tabular data with knowledge. Okay. So say that you want to create a table about rap artists and your favorite friends sent you a snippet about this subject. The goal of our project is to automatically extend tables like these such that users can build upon complete data and conduct meaningful analysis. So to be able to add a row about Kendrick Lamar, we need to do a few things. So first, we need to link the table so we know what the columns and cells represent, which is known as table interpretation. Then we need to be able to add new subjects such as Kendrick Lamar, which is called subject suggestion. And then we have to fit in the remainder of the row, which is called gap filling. So throughout this talk, we will show you output from our pipeline, which to our knowledge is the first one that targets this task, row completion, and to ends. And also, we are the first ones to target wiki data instead of dvpd or freebase. So we have been dealing with different challenges compared to our community. So at the end of table interpretation, we can link all the entities in the first column to wiki data. We can detect their joint type, which is human, and we can figure out that the second column represents pseudonyms and the third column dates of birth. So unfortunately, we have no idea what column D represents at this stage. And for this work, we basically use our prior system with minor tweaks. Okay, so in order to suggest Kendrick Lamar, we first send up generating loads of potential candidates and then we rank them. So for Kendrick Lamar is a top suggestion, this is our pipeline. And then we generate candidates from two sources. So first, we make use of empty representations from an embedding space trained on wiki data triples. And this is called PyTorchPicera for PBG in short, that basically puts similar entities close to each other in distance wise. And it turns out that this method alone is awesome. It's really reliable, but we want to improve recall further. So basically this method doesn't give you all the candidates you want. And for that, we basically end up sourcing candidates using GPT-free's intrinsic knowledge. So GPT-free is a language model is a fairly popular one. And we wanted to try it out in this project. So in order to do this, we first identify properties to build a rich prompt. So we have to prompt the language model and ask it to continue our statement. So we create example sentences from each row in the original table or a subset of rows. So in this case, we will say something like Kanye West has pseudonym easy and has date of birth, 1977, martial matters. And we continue. And then GPT-free offers you a sentence about Kendrick Lamar. But this method is not, it's not foolproof and we have some issues. So first sourcing from PBG is computational expensive. So we need to come up with a way to limit the search space like property sharding. Basically the list of entities we retrieve can be incomplete because some entities might be triple poor or they don't get embedded optimally. So in this case, we need to come up with something else because the things that you actually want are far away in the graph when they got embedded. On the other hand, we found that GPT-free on its own has high variability which makes it quite unreliable which is not good enough for our purposes. So to make it more consistent, we can lower the temperature of the language model but that's significantly called, which is the whole point. So what we did was to basically combine generations from PBG and GPT-free to create basically features from these two spaces and then use a ranker. So we basically explored the wide range of rankers and with various complexities but we found that for these purposes, things like GAE based rankers are too rich and things that are simple like XGBoT are quite good. So they are both computationally efficient and have good results. This kind of led us to get a boost of over 10% for prior art when we looked at average recall among generations, okay. Right, so now we have Kendrick Lamar as a candidate. We want to basically be able to add these properties. So for gap filling, we first attempt to retrieve the property value directly because we can data is very reliable. And here we can retrieve basically the values in column B and column C because we have identified there about the pseudonym and its date of birth and these both exist in the liquid data. But we don't know what column D represents. So what we do is we basically built an iteration that guides GPT-free to source the value we're keen on. So in this case, it looks something like A is to B, a C is to blank. And now GPT-free fills in the blank. In this case, you'll be something like Marshall Matters is to New York as Andrew Young is to Detroit as Kendrick Lamar is to and actually GPT-free fills in Glendale. So now we have this approach. We use this approach in two cases when we cannot identify the property in the table be it because our linker is not good enough or because there's no property in liquid data like in this case, there's no property it had concert in or in the case where the property value is missing from liquid data. But even though our prompt gets us a completion they are not necessarily trustworthy and they're definitely not trustworthy enough to show to a user. So even though our, so what we do is that basically we try and link back the completion to a trusted web source like Wikipedia or news articles. And we do so by first extracting a loose context from the current pros by looking at web sources that say contain Eminem and New York, Dr. Dre and Atlanta. And we end up finding that it's basically something about concerts. And then we look for a source that contains Kendrick Lamar and Glendale. And if the context is similar enough to the loose context we previously found, AKA it's about recent concerts, we can state that it's likely a match. And also we can basically pinpoint the source that we think that GPT-free learned from. So using this approach led to a whopping 15% recall improvement over prior art and a significant reduction in hallucinations. So basically we managed to block out the things GPT-free hallucinated like random years or random events. So yeah, this is what we did in our project for this workshop. And please reach out with any question and thank you for listening. Andy Gordon and I are in the audience. So please address either one of us with any questions you might have. Okay, so we have one question. So first of all, thank you for the presentation. There's one question from audience about possible biases that can be introduced by GPT-free that is a non-issue. Yes, so we have that problem. We have actually done a bit of work on it. We were trying to look for this. It turns out it's a hard problem. And one is because it's kind of depends on what it learned. So for instance, we use GPT-free heavily if we use it for gap-filling that should not introduce biases because you're just filling in the remaining properties. But when suggesting that a subject that does highly bias it, we have found that is the case. So for instance, if there's a list of artists that say rap artists, in this case, you would always, always show a male artist. So or some of our identifies as a male artist. And in that case, that is a significant issue. And in that way, actually PBG has like going through the embedding space a lot better because if the wiki data knowledge graph is unbiased then that means that this will become less biased as well. So that's definitely something to take into account. Okay. I don't know, Miran, do we have time or are we done? Personally, I am so sorry to cut this but we need to go to the next presenters. Okay, there are a lot of questions so you will get... Oh, great, great. So Karina, you'll have time to answer this question in the poster session. Okay, thank you. Thank you so much for the presentation. Puyu, you're next. Karina, if you could kind of stop sharing the screen so we can have Puyu Young on stage. Puyu Young is going to present the paper with Jovan Nicola-Vica on a map of science of Wikipedia. This is going to be in the knowledge integrity poster session. Puyu, I've seen you, I know you can share your screen. Yeah, hi everyone. I'm going to share my screen. Please. Can you see that? Yes, yes, thank you. Okay. Yeah, okay. So, hi everyone, my name is Puyu. I'm glad to be here to introduce our research on map of science in Wikipedia with you. In recent decades, the rapid growth of internet adoption is offering opportunities for convenient and inexpensive access to scientific information. However, a clear understanding of the scientific sources spot Wikipedia's contents remains elusive. So in this work, we are going to explore Wikipedia's role in the public understanding of science from the perspective of its scientific sources. We rely on an open data side of certifications from Wikipedia and use network analysis to map the relationship between Wikipedia articles and the scientific journal articles. Here are our research questions. What scientific sources are cited from Wikipedia and what will on Wikipedia emerge? What will on science emerge? Answering these questions is critical to inform the community work on improving Wikipedia by finding and filling knowledge gaps and bias. All the same guarantees the quality and diversity of the sources Wikipedia relies on. Our data contains 29 million stations from six million articles in Wikipedia English word. Snap shorted in May 2020. You could find more details about this data site from this link. Before showing the result, I'd like to give a brief introduction of our network's approach. In this article, we mainly use two networks, bibliographic coupling network and a consultation network. Here, the first line represents Wikipedia articles and the second line is their citations. In bibliographic coupling network, if two Wikipedia articles set the same citations, we create a link between these two articles. Similarly, in consultation network, if two scientific articles are cited by the same Wikipedia article, we make a connection between these two scientific articles. So what scientific sources are cited from Wikipedia? In our data, we have 2.5 million journal articles and we plot a thank you diagram to show the flow from Wikipedia articles to scientific discipline. Obviously, most citations go from same Wikipedia articles to biology and medicine. This flow confirms the importance of biological medical and health science in Wikipedia. Well, other topics are more evenly distributed across fields of research. Now, let's see the network distribution to understand Wikipedia from a science perspective. We use bibliographic coupling network and colored by its auras topics and the numbered top ten clusters in the network. Below, we also list top four Wikipedia projects in the top three clusters. Combine these two plots. Firstly, we could see the systematic importance of steam and then its geography. Secondly, biographies and history play an important role in connecting steam to the rest of Wikipedia. Also, for the consultation network, we visualize it using the same layout and different coloring by major field of research. The results are consistent with what we previously discussed, the dominance of biology and medicine as the two top fields cited from Wikipedia. Because of the time we just showed some main results here and you could also find more analysis in our paper. Limitation could be seen is that we only focus on Wikipedia English version and only for journal articles and we use the snapshot data. And for now, we are going to do studies of open science and media sources in Wikipedia. If you are interested in any parts, please feel free to contact us. And thanks for your listening. Okay, thank you very much. So there is one question about if you experiment with removing stubs or bot-generated articles. Because in some languages, there can be a large number of consistent set of sources that are added by bots and that are typically from the same database. Did you look into it? So you mean seeing different language, Worden? No, you focus only on English, right? Yeah, okay. Do you know if there is any automatic creation of references? Yeah, I can maybe say something about this, Puyo, if you don't mind. So the dataset that we use is published in another paper and that includes also quite a measure of automatically generated citations. And actually, those are from a very preliminary investigation, which is not thorough and should happen in the future, I believe. These bots help a lot, for example, by adding identifiers to papers, which is very important to contact the studies that we have done. But to go back to the question, so yes, this study includes automatically generated citations and no, we haven't developed a study specifically on their input. Thank you. Yes, Titana, just because we're running a little bit late, let's do like we did with Karina. Anything else will be asked in the poster session. And we go to the next presenter, Kartik, I've seen you before. So if you could kind of start sharing your screen. I'm going to introduce you. Yes, Kartik Madanagopal and James Kaveri are presenting Improving Linguistic Bias Attection in Wikipedia using cross domain adaptive retraining. Kartik, that floor is yours. Thanks, Miriam. Hello, everyone. I'm Kartik Madanagopal, PhD student at the Department of Computer Science, Texas A&M University. Today I'm going to actually present our work on Improving Linguistic Bias Detections in Wikipedia using cross domain adaptive retraining approach. One of the key guiding principles of Wikipedia is neutral point of view, which requires all content to be written fairly, proportionately, and as far as possible without any editorial bias. But still editors may knowingly or unknowingly create bias in their articles. Maintaining a neutral point of view can be challenging for new contributors and experienced creators. So there are various types of bias that can be introduced in the objective treatment of facts. In this work, we concentrate on the bias that is introduced by the subject to a language in presenting the information. Here are some examples of the bias to statements we have identified in Wikipedia. Framing bias is an explicit form of bias that reveals the author's stance on a particular topic by use of one-sided or subjective words. On the other hand, epistemological bias can be extremely difficult because it's a kind of an implicit or a subtle form of bias that tends to cast doubt in the reader's mind. So the goal of this research is to accurately identify all these different types of language-induced bias and help the editors in objectively present the facts. Several previous studies have proposed automated systems to detect bias statements. All these efforts have mainly focused on either manually constructing bias lexicals or solely focused on Wikipedia itself as a training data. Most of these methods were able to detect simple forms of framing and a missed majority of the epistemological biases. The inability of the lexical based and syntax driven approaches to encrypt sentence semantics led to misclassifications of certain subtle and implicit forms of epistemological biases. Due to the change in the editor's writing style that we commonly see in Wikipedia and their behaviors over time, these methods were not able to sustain their performance over a long period of time. After analyzing the results of these methods, we identified majority of the misclassified bias to statements in Wikipedia belong to language and literature, politics and government and sports. With an effort to build a robust bias classifier that can detect subtle forms of bias and also continue to perform well for a long period of time, we devised a cross-domain pre-training approach. First, in order to expand our coverage for domain independent expressions related to judgments, interpretations, we did a data augmentations by leveraging annotated data sets from other subjectivity-rich domains like politics and opinions like news articles and product reviews and things like that. Additionally, we used deep transformer models like BERT that can capture language patterns related to common writing styles and expressions that is imposed in subject to views. In combinations of data augmentation and deep transformer models, enabled our classifier to detect biased statements by understanding the meaning of the statements in the context rather than using the keywords that is mentioned in it. Our training data set contains NPOV statements extracted from Wikipedia, edit histories, news-related bias statements extracted from MBQA corpus that contain statements expressing the author's private states like beliefs, emotions, sentiments and speculations. Also, we added the political ideology statements that are extracted from IDC corpus. To train our bias classifier, we leveraged a contextualized language model called Robota that can efficiently encode the meaning of text into a vector form that is efficient for training a text classifier. Also, instead of directly fine-tuning a Robota classifier, we used an adaptive pre-training approach that led to a superior performance in detecting biased statements. In our cross-domain adaptive pre-training approach, we downloaded the Robota deep transformer model that was already trained on large volumes of text extracted from e-books and news articles. Then we performed a continual pre-training to actually make the model more towards understanding the subject to language that's coming from other domains and things like that. The continual pre-training is to adapt our pre-trained model to subject to writing styles that is required for our study. Then we added a final layer to a continual pre-trained Robota model and fine-tuned it to classify the biased statements using annotated bias corpus. To study the importance of our deep transformer model, we first trained the bias classifier only using Wikipedia corpus. The lexicon-based model had a better pre-call because it was plainly looking for the keyword and classifying the statements as bias without understanding the meaning. But the transformer-based classifier had a better classification accuracy of 77%. This shows that the transformer-based models are powerful enough to learn domain-specific statements and structures that are relevant for detecting language induced to subject to bias. To demonstrate the value of our cross-domain pre-training, we trained the best model in a previous experiment using three different combinations of cross-domain training corpus. The transformer model trained with all the datasets combined had a better performance of 89% accuracy, which is 19% improvement in accuracy than the baseline models. We learned that the performance of these cross-domain models depend on the amount of knowledge overlap that, oh, I'm sorry, yeah. The amount of knowledge overlap that had with the domains that we are actually trying to mix. These results show that the proposed approach detects biased statements in Wikipedia more accurately than existing state-of-the-art models by leveraging rich pre-trained language models and fine-tuned it with cross-domain training corpus. To understand the performance of a bias classifier, we also applied the same model on other domains apart from Wikipedia. And we observed 12% improvement in classification accuracy on the NPQ opinion corpus. And interestingly, the performance of a model on the IBC, which is more on the political speech did not go well because, again, depending on the geographies, the political speed changes because there are some immigration issues, tax-related issues, low-income families, and there are so much of pre-presentative statements in political speeches that our model was not able to capture it. We're still continuing to improve our model to actually understand the meanings of these different noun phases that are coming through these presentive statements to improve our bias detection accuracy. I'm happy to discuss more in detail during our poster sessions. Please visit our virtual booth number one. Thanks for your time. Thank you very much. So we have a couple of questions that are on more or less the same topic, specifically about the moving target nature of the problem. Do you have some example of the changes in editor behavior or subjective writing in overtime that can degrade the model and why the cross-domain adaptive pre-trainer should mitigate the problem? Yeah, specifically, it's a very good question. So I have one specific example that I've actually seen. This is because I'm from the US, most of the things I'm looking into it from the US political speech point of view. So after the storming of the US Capitol, so we kind of trained the model between 2005 and 2015 and we took the model and tried to apply it on articles that came after 2015. What we have seen is our model is, most of the previous models were not able to actually detect the change in the way because most of the times, this is what is smart about editors. Once they see their article is actually flagged for the NPOV board, they start to tweak their writing in a way that this diocese is getting more and more subtle but it still exists. So our models need to actually learn over time but the model we trained was able to actually consistently perform for the next six years of time. That's what kind of helped us to understand is this model improving or not? Is it able to capture future writing style? And again, our data augmentation is moving towards that. If we try to use Wikipedia itself as a source to train a model, it won't come forward and it won't be able to detect future things. By incorporating data from political speeches outside of Wikipedia, product reviews and things like that, we were able to induce how a biased writing to more of a look like. So in future, if someone tries to get some of those writing style into it, we will be able to catch them. So that's one of the main reason why this model is able to do well. Okay, Miriam, I think we can move on to... Sorry, I didn't want to disrupt. I just wanted to get ready for the next bit. Thank you so much, Tiziano. And Karthik, I need to move the session forward. So thank you so much for the three of you for the beautiful presentations. You'll have time to discuss during the post-it sessions. I am going now to play about 16, 17 minutes video where you will see all the lighting talks for all the other papers in the session. I hope you enjoy it. And if you can't hear the audio, please, can someone tell me, will be great. Enjoy. Magnus, I work for Memo and I will guide you through the main steps of using the public domain tool. It is a tool meant for cultural heritage organizations to define whether their collection items belong to the public domain and this tool automates that process. First of all, you need to queue ID of your organization. You need to see his fee file containing an export of your data, containing typical information like type of objects, like the author, like birth date, death rate, external identifiers. The more you have, the better, but none of them is actually necessary to use a tool. You upload that CSV to the system and then you get back another CSV with suggestions, all the data items corresponding to the authors of your works. This is where the manual work needs to be done. You need to check whether these suggestions are correct, correct suggestions, just stay in the CSV. Those that are not correct should be deleted. Once that's finished, you upload the CSV again to the system and you get back an enriched CSV with data pulled from WikiData. You see much more birth dates and death dates here. For example, you see queue IDs for each and every author. You also see the copyright status of each work, either public domain, copyright protected, or unknown. At the same time, data have been added to WikiData as well, based on the collection data. For example, professions are being added, birth dates, death dates always while making reference to the source of these data. So this is it more or less. Please feel free to ask questions. Welcome to this session. I am Redavan Khdra and this is a joint work with Anna Sanderati entitled, Are Democratic User Groups More Inclusive? In this paper, we investigate a particular online community structure with community user groups and answer the question of democracy and its relation with inclusivity in these groups. The background of this research starts from us being both co-founders of a national user group having been strongly involved in the Wikipedia ecosystem and its governance. We have noticed several conflicts and discussions about the lack of inclusivity in user groups that could originate from an undemocratic context. Building on earlier research work from one of the co-authors, we wanted to understand if indeed democracy was the solution to obtain more inclusive user groups. From this work, we used a strong theoretical ground to define the different central elements and be able to conduct a thorough mapping and analysis. Our findings show that there is no strong correlation between what concepts as democracy is not always the solution for the inclusivity challenges that were identified in certain Wikimedia user groups. And that inclusivity issues need to be addressed through other recommendations and means. To the best of our knowledge, this paper represents a first attempt to tackle governance-related issues in Wikimedia user groups. We believe that this work can lay good grounds to enable further research in the matter, but also on general questions related to digital democracy that are within the interest areas of the work. Further research can check other issues apart from inclusivity and other challenges such as burnout. Thank you for your attention. IPALINX has become the backbone of the network and Wikimedia is no exception. A lot of users depart from audience that are reading, leading to better contextualization of contents. IPALINX are written by contributors and take the following form. On the left part, we found the target document, IPALINX, through the redaction states. It may also be beneficial for some readers at this system called INSET-Contextual IPALINX based on previously seen contents acting like a narcissistic reading system. Wikimedia's model contextualized relational topic model estimated the importance of each topic and the use of additional parameters, helping to learn a new hidden representation more fitted to the encore prediction task. English, German, and Italian on the encore prediction task link prediction task both with studies and the qualitative examples. Experiments show that besides being longer diagnostic and computational, our model is good results in encore prediction, both quantitatively and quantitatively. It seems to be also suitable for a link prediction task here. We will be happy to discuss this during the poster session. The center perspective in Wikimedia, a content and participation challenge. His work is done by four different professors of three different universities in Catalonia and Spain. I'm Nuri Ferran-Ferrer, the corresponding author. Wikimedia is one of the most widely used information sources in the world. Although one of the guiding pillars of this digital platform is ensuring access to the diversity of human knowledge from an neutral point of view, there is a clear and persistent gender bias in terms of content about or contributed by women. The challenge is to include women as equal partners in the public sphere in which Wikimedia is developing a central role as the most used educational resource among students, professionals and many other profiles. In this paper, we introduce the gender perspective in the analysis of the gender gap in the content and participation of women in Wikimedia. While most students focus on one of the two dimensions in which the gender gap has been observed, we review both approaches to provide an overview of the available evidence. Firstly, we introduce how the gender gap is framed by the Wikimedia movement strategy. Then we evaluate the gender gap on content and participation, especially regarding editor practices. Finally, we provide some insights to broaden the discussion about the consequences of not addressing the gender gap in Wikipedia. And we provide some research topics that can support the generation of recommendations and guidelines for a community that needs both equity and diversity. An overview of the work on utilizing language model crops for knowledge graph repair. Knowledge graphs, as we all know, are an important asset for machine knowledge. Web-scale knowledge graphs like YACO, to Wikipedia and Wikidata are constructed manually, semi-automatically and automatically. They contain billions of SBO triples such as Paris is the capital of France. It is inevitable, however, that these large knowledge graphs contain wrong information, or a variety of reasons. In this example, we see extractions of the Wikipedia triples about the entity Jessica Moog from her Wikipedia info box. While her field was successfully extracted, namely marine science, her alma mater was mistaken for the city where the university is located, causing an error in the knowledge graph. In this work, we propose fixing predefined wrong triples without losing information by replacing incorrect components of the triple by the correct ones. We do so using context augmented language model probing. We identify this context and measure its relevancy from the input KG itself. For example, to fix the triple about alma mater more real, one way of probing the language model is to simply mask the incorrect component of the triple and request an alternative. In this case, all the predictions are incorrect, especially that Jessica Moog is a long tail entity. Instead, we augment the probe with salient context about her from the knowledge graph itself, like her profession and country of origin. In this case, we obtain more accurate answers where the top prediction is actually correct. Please check the paper for more details about the methodology and systematic experiments on the Wikipedia and Wikipedia. Thank you. Furthering automatic speech recognition or ASF, research and application is more relevant to real-world human-computer interactions. People with disabilities and young and older people alike require the ability to listen to and speak to their computing devices. Foundation for ASR for languages with limited resources is human speech data and is often absent or is not enough. So new speech data has to be open and affordable. Take the example of my own language ODEA. Despite being the official language of an Indian state with 45 million speakers, very little speech data is available under open licenses. Such issues are worse in languages with lesser resources and they hamper speech research and development. Using an open source and online web app Lingualibre, I could record up to over 400 words in a day. I could expand the topics by constantly collecting words from the ODEA Wikipedia, online news sites, a science magazine and a 1930s lexicon. By building a workflow, I could expand to a repository of speech recordings of 55,000 unique words in ODEA, including over 5,000 words in the Northern dialect of Balazsori. These recordings are available under a public domain dedication and are perpetually free for anyone to use. So why key learning? First, creating a word list containing unique words in a languages critical for any speech data project, diversifying these topics. These words fall under by looking at all available sources is even more critical. Second, think of ways to expand the diversity, finding speakers of various genders, sources containing various topics and words from different areas help a lot. Even though the first launch like mine might include only one speaker of a particular gender. Third, document your process. That helps others. I've tried to document some such resources under the ambit of the Openspeaks project, an open and multimedia documentation project that I had founded for low and medium resource languages. Fourth, encourage speakers to record words of their dialects if such words exist. Dialects are often neglected due to lack of resources. Lastly, using an open license helps others build further research on your speech data. Thank you so much. Hello everyone. My name is Mojimish Manyoski and I will briefly describe you the results of our recent research. As we know, information in Wikipedia should be based on reliable sources. But nowadays there are over a billion websites on the internet and only very small part of them are assessed by Wikipedia community in special pages. Moreover, the reliability of the same source in Wikipedia depends on a topic and language version. Additionally, the reliability assessment may change over time. The purpose of this study is to identify reliable sources on a specific topic, COVID-19 pandemic. So we decided to find which of the sources are reliable for Wikipedia based on analysis of its content in different months. To do so, we searched for references in Wiki code of the selected Wikipedia articles in each considered months. Some references were not placed directly into the code of the articles. So we also analyzed how the content of special templates changed in the selected period. Wikipedia articles on COVID-19 pandemic, we can use different approaches. For example, in Wikidata, we can find items on a specific topics based on statements. Then we can find the titles of related Wikipedia articles. We also can use Wikipedia that extract structured information from Wikipedia info boxes. After extraction of the URL addresses in references, we used the public suffix list to detect which level of domain indicate the source. In our recent study, we proposed 10 models related to popularity and reliability assessment of the source. We used some of them and also proposed new one. This figure presents results of assessment of the web sources on COVID-19 pandemic in English Wikipedia in each month. We also analyzed the reliability trends in other languages. That all, thank you for your attention. In this presentation, I will highlight the potential Wikipedia analysis graphs as job interview gates. Job interview is an important step in HR recruitment process. Yet it's not standardized, especially in computer science related fields. The problem, which includes scalability, gets more difficult in the age of global edition, where applicants come from diverse backgrounds. We cannot do that. Knowledge graphs such as Wikipedia can be used to build interactive interview kits with the objective of linking knowledge entities. Under the premise that knowledgeable navigation is shorter than randomly picked process. As I try to demonstrate with a simple example here, Alice was able to have an assessment of Bob's knowledge of the topic and also see his thought process. Bob, on the other hand, was able to demonstrate his prowess. Here I list some of the important feature works and this concludes the presentation. Comments, suggestions and feedbacks are gladly welcome. All right, thank you, everyone. Thank you all authors for these videos. I believe we're going for a five minute break. Emily, if you can confirm, I think there is a timer at some point. Don't go anywhere because there will be live music when you're back, so be sure that you are around in five minutes because you're gonna have lots of fun. Thank you, everyone, for your videos and presentations and you will have the opportunity to answer any questions in the post session. Thank you very much. While we are on break, yes. Thank you, Emily. While we are on break, Oogne, I saw you were joined. I can't hear you. Be careful. I see you're unmuted, but I can't hear you. Yes. Just a quick note that we're going to this four minute break and then we'll come back for a 10 minute music with Oogne. So if you wanna just get out and stretch, do that. Oogne, if you can raise the volume of your, do you think audio? I think it's still a bit low. One, two. One, two, do you hear me? Oh, now, yes. Yes, yes, yes. Yes, great. Hello, Mira. Hi. How are you doing? I am good. We were testing some audio with Oogne who is going to play live. Okay, okay. Okay. I'm sorry I joined late. No problem. No problem. You will see the recording. You will watch the recording and you still have lots of content to go through today. Thank you for joining. Thank you. Thank you. Oogne, sorry. And do you hear my guitar? I hear it perfectly. Yeah? Yes. Does Oogne's ever come? Yes. Let me see. If Diego is around, maybe he can also confirm that he hears it as we should hear it. Well, I mean, people are on a break, but I hear you. Yes. Yes. Is it the correct audio we should hear? It's a bit low now, but yeah. If you can increase a bit the volume, Oogne, I think it will be great. Yeah, can you hear it better now? Yeah, much better. Nice. And the guitar as well. Perfect. Yeah, it was a bit low, so. Thank you, Oogne. So I'll basically, once this timer is gone, we're gonna put you in spotlight, which means that you will be on everyone's screen, full screen. And I'll briefly introduce you. Now you are famous in this community, so I don't think you need, you know, I'll introduce you. But I'll introduce you and then, yeah, the floor is yours. Last time people asked to share links of some of the songs you will sing, so probably it will happen again. Maybe we can do it afterwards. Okay, I'll make sure to have them get excited. Is there anything you need from us? I think the audio is good. I see you, the video. I mean, if you can hear me well. Yes, I'm great. Good, good, good. So let me see. I'm letting people in. Emily, is there anything else I should do? Because I'm not following all the instruction, but I think we're good, right? Yeah, we're good. Good, okay. Emily, sorry, can you stop the recording? Because I think we can't record the music. Yes, I'll make sure I stop it before Oogne starts. Sounds great. Countdown to Oogne's music. Emily, I'd like to stop the recording and then I'll make sure this.