 Hello, everyone. My name is Bahid Ashrafi from the Stevens Institute of Technology and I'm presenting our paper on detecting cross-lingual information gaps in Wikipedia. By October 2022, Wikipedia had over 30 billion words in 58 million articles written in 330 separate languages and covered many areas. Many researchers utilize Wikipedia for doing various tasks such as text summarization, text classification, semantic analysis, information extraction, and likewise. With more than 6.3 million articles, the English language Wikipedia is the largest among the available language editions. But there is a huge gap between the language editions of Wikipedia. So our project aims to address two problems. First, how can we measure the information gap between different language editions of Wikipedia and what are the sources of cross-lingual information gap in Wikipedia? Our proposed approach comprises two solutions. One, a novel algorithm for detecting information disparities between paired articles from different language editions and second, a human-in-the-loop approach to identify sources of information gap in Wikipedia. There are some related works in this field. For example, Kozli and his colleagues introduced an intelligent task routing concept and accordingly described a system name such a spot that tries to help community members to find English articles that need to be edited. Beau and his colleagues described a system representing information about a concept in multiple Wikipedias to the end users. Picardian West introduced a cross-lingual topic model that can represent Wikipedia articles in any language as distributions over a shared set of language neutral topics. Johnson and Leskak found reasons for varying quality and topics in different Wikipedia editions and provided some recommendations aimed to optimize data sets for specific tasks. Our proposed approach comprises five steps. First, data retrieval or algorithm uses QID or unique identifiers to extract the articles related to a specific topic. Then using the Wikipedia as a cross-lingual topic model that works based on LDA to learn representation of Wikipedia articles as distributions over a common set of language independent topics derived from the interlinked QIDs. Then calculate similarity between paired articles using cosine similarity. And in the next step, utilizing a scale and getting feedback from human evaluators, the algorithm finds a ground truth for understanding of the potential sources of information gap between different language editions. And finally, using Google Translate as a machine translation tool and taking English as a reference language to create a comparable corpus of English and non-English article pairs. And then using traditional topic modeling method to obtain monolingual topic distributions for a selected article pairs and then calculating cosine similarity to compare pair of articles based on their topic distributions. We investigated 38 million articles in 28 Wikipedia language editions and calculated similarity scores for more than 103 million article pairs. Language editions from countries with established historical or geographical ties, such as Russian, Ukrainian, Czech, Polish, and Spanish Catalan, were found to have similar contents. Bots play a significant role in reducing the information disparity among Wikipedia editions, such as Swedish, Vietnamese, and Wari. Except for English and Arabic editions, the top five articles in the other language mainly related to celebrities, locations, organizations, or scientific topics. And reducing the information gap can be viewed as a countermeasure against censorship and propaganda by observing the case of information disparity between English and Farsi Wikipedia articles on headscarf. Also a detailed examination of a sample of paired articles from the English and Farsi editions showed that mismatches between the interlinked entities is the main source of information disparity between two editions. This mismatches arises from differences in size, age, and composition of editor community, and also the editor's areas of interest. And there are additional factors that contribute to the information gap between the different language editions, including presence of outdated culturally dependent budgeted or geographically dependent content, censorship, and propaganda on availability of sources in one language, presence of controversial topics, mislabeling, and vandalism. The similarity scores generated by the proposed algorithm were significantly correlated with human judgment. Our future research are focused on improving the accuracy of the proposed algorithm by combining with statistical measures such as article length. Also automatically collecting the sources of information gap identified by human judge evaluating the proposed algorithm by applying an additional applying on additional language pairs and using machine based method such as SMT and developing a human in the loop. Crowd system and crowd based system to automatically bridge the information gap between Wikipedia language editions. Thank you so much for your attention.