 Hi everybody, my name is Venyman Veslovsky and I am a student at EPFL studying data science. I'm excited to present my work with Akeel, Titiano, Ashton and Bob on the weaponization of Wikipedia characterizing Wikipedia linking across the web. So to begin, a lot of studies on Wikipedia have focused on studying Wikipedia as an individual entity. There have been all kinds of metrics used, clickstream data, viewership, internal linking network, coverage, quality, bias, and so on. Some other studies have begun to examine Wikipedia's relationship with the web more broadly. Some studies have looked at the value that Wikipedia provides sites like Reddit and Stack Overflow, other papers have studied Wikipedia reuse on Twitter. And finally, a different set of papers has explored the interaction between Wikipedia and search engines. But in reality, the web stretches a lot farther than these sites and really represents this huge, rich network of different sites that linked one another. To understand Wikipedia's true reach, I argue that we need to understand people's second order interactions with it. In other words, how do other websites link to Wikipedia? And this is something that we term Wikipedia's weaponization. So in this paper, what we do is essentially this. So we stream through the common crawl dump and find all web pages that link to Wikipedia, which represents a huge 90 million links across 1.68% of the surface web, representing more than 1 million domains that have linked to Wikipedia. Then we rely on the usual suspects like Oris topics for coherent clustering of Wikipedia pages, as well as curly, curly clusters to get segments of, to segment web pages into different categories. And finally, we look at how Reddit is shared on Wikipedia as a little digression to further contextualize some of our findings. And in our analysis, we really focus in on three questions. The first one is what types of Wikipedia articles are linked more versus less on the web. To where on the web are articles actually being linked, both within like the web as a whole, but also within an individual web page. And finally, we share our initial thoughts on why individuals link with Wikipedia. So to begin, what types of Wikipedia articles are linked on the web. So just by sorting by the top five articles that are linked across the most domains we get HTTP cookies United States main page search engine optimization and the GDPR. But really we're interested in something more nuanced is like, are there clusters of articles that have a larger web presence. And to study this we define three probability metrics. So the first is in degree probability, which, given a link at random on Wikipedia, what is the probability that it points to a specific oris topic. Then we have the web probability for an oris topic which is given any link on the web that links to Wikipedia what is the probability that links to a specific oris topic. And finally the social probability which we look at Reddit shares and like what is the probability that a specific reddit invocation links to this oris topic. So here are some interesting findings so for example technology society politics and government philosophy and religion medicine and health and computing have oris topics that are overrepresented on the web. Then there's some others that are underrepresented these are typically more geographic places sports and music which have a higher importance score on Wikipedia. Right next to research question to where we look at where Wikipedia articles linked. What we do is we take 10,000 random web pages on on common call we take 10,000 random web pages that link to Wikipedia, and we use homepage to back to predict the curly topic, the first layer of the curly topic for that web page. This gives us a one versus all probability distribution for each topic. And so what we find is that business sites shopping sports and health sites are less likely to link to Wikipedia whereas the science society kids and teens and reference sites are more likely to. Then what we do is we augment this by looking at the individual web pages, and we define some HTML structural rules to basically roughly segment. The link occurs in the boilerplate and the main content, or the references like responses that we see in the bottom here. And we see that in general, the sites that are underrepresented when it comes to Wikipedia sharing tend to invoke Wikipedia more tend to link Wikipedia more in the boilerplate. And then there's business and shopping, whereas other sites like science and society tend to invoke it, tend to link Wikipedia more in responses which are usually comments. And then there's other sites like arts and kids and teens, as well as recreation, which tend to link to Wikipedia more in the main content. We've also looked at the language that's used in each of these categories and typically responses and use more conversational language unsurprisingly and boilerplate look at more technical. The final aspect of Wikipedia that we look at are why people link to Wikipedia and we spend a lot of time trying to define a taxonomy, but for every taxonomy that we could define, there were a series of articles that would break that taxonomy entirely. And it really shows that like defining a taxonomy for URLs is a difficult task. So instead, the way that we thought about this was by this bidirectional relationship between Wikipedia and the Web more general, whereas URL can link to a Wikipedia to Wikipedia as a form of content enrichment or URL can source evidence or images from Wikipedia. So what we find is that 95% of Wikipedia linking is used for content enrichment purposes, whereas only 5% go for content sourcing so this could be images and evidence 4%. And this was based off a annotation of 500 Wikipedia links that occur in web pages. So finally to conclude, we believe that this kind of new studying Wikipedia within the ecosystem of the internet offers a promising new direction for Wikipedia research. So a new research roadmap for identity knowledge, identifying knowledge gaps in 2022 the Wikimedia Foundation encouraged us to further study all forms of knowledge that Wikimedia needs to acquire in order to fulfill its role on the web and our data set and paper takes an initial step in this direction. The core findings and the core points that I wanted to take away is this was the first large scale computational analysis of how Wikipedia is embedded in the web. Alongside a release of this entire data set. We also we studied what Wikipedia articles are shared where on the web they're shared, and as well as presented a simple breakdown for why they're shared. We found important differences in the role that Wikipedia plays within Wikipedia and the role that Wikipedia plays on the web more broadly, and we have a few ideas for future research, which include studying these hypertext or text around a Wikipedia article linked on the web to find this latent relationships between different entities, as well as a more multilingual examination of how different different languages are dependent upon by the web. And finally, this can offer a new lens for how we can study different groups how different groups rely on Wikipedia. Thank you for your time and I look forward to your questions.