 Hello, everyone. This is Mustafa. I'm a graduate student at Boise State University. So, me and my advisor, Dr Francis Caspezano for this research as to work on automatic detection of soft puppets and Wikipedia. So, before diving deep into let's have a generic understanding of soft puppet trees and horses significance in Wikipedia. Wikipedia is a free online encyclopedia that's based on open collaboration. It's principal is for the people by the people. Almost anyone can contribute and edit in Wikipedia and then generally Wikipedia is the articles are written from reliable and neutral sources. So, in Wikipedia or any social media platform like Facebook and Instagram use of multiple accounts is not accepted. So, improper use of this multiple account is called soft puppetry. The primary account is called the soft puppeter and the unique individual behind this is called the puppet master. And these multiple accounts are generally done for block evasion for false majority claim board stacking and many other things. There has been various studies done on Wikipedia soft puppet tree one was done by Dr Solaria et al. who considered we keep a stylistic pictures from comments on article article pages. Another study was done by done on both aggregated it was an aggregated analysis where they considered both article and top pages comment. They also considered Wikipedia specific features and it has this study of soft puppetry investigation also has been done by in cross platforms like in Twitter or like across like news articles or like political articles and et cetera. So how soft puppetry is currently detected in Wikipedia and how they're handling it. So use of this multiple account is for disruptive purpose is a violation of Wikipedia soft puppetry policy. And it's a very assigned like it's a zip support for one K one scores similar username, et cetera. Usually there is a check whether inquiry base system where a when the user actually complains about a account being soft puppet. And it's later manually checked. And it's for those soft puppetry by these privileged editors and finally push to the administrator account for their final verdict. But this is a very timely process expensive and definitely non robust and a manual process because Wikipedia is open to all and it's important almost anyone can edit an automated system detection system is essential for Wikipedia as early as possible. With the data for this property detection we look forward to media wiki API which provides meta information about wiki and logged in users. So far from this media wiki API we look for suspected Wikipedia accounts and look for all the subcategories are not those main category and in all from those subcategories we look for user accounts and the different pages are they contributed and collected all the edits that they have made in all the second. We collected around 20000 account and level all those as positive data we kept our contribution up to 20 in order to make it consistent across all user. This is how the data looks like. In order to contrast these positive only identified soft puppet account we need genuine users so we selected genuine accounts from this paper by Dr Kumar et al and through the same process we collected the data for genuine users also. So from all these data we collected extracted some features like username best feature number of disease number of leading disease in username et cetera user contribution best features like as contribution like average title link et cetera. Finally we collected content based feature which is our main argument in this paper like we wanted to understand the deep inheritance semantics meaning because our argument is that it would help us to understand the internal writing pattern of users. Which is certainly essential for detecting soft puppet tree or the accounts that is been operated by the same user. We also considered birth we didn't fine tune but rather we collected contextual embeddings from from birth for all the comments or the edits and collected the embeddings from the last night and use those directly as features in our model. We also collected or generated topics of topics from LDA from the next library and distributed the probability of probability of this of these topics for each user's contribution and use those as features we considered up to 20 edits. From this we developed from all the features that have developed discuss so far we considered several classical algorithms and developed models like lambda forest logistic regression, et cetera we used, we considered user level data for classical algorithms we actually averaged all the features that I discussed earlier and use those as features for the classical algorithm. We also wanted to understand the temporal dependencies of user edit pattern, so we considered all the edits and develop the LSTM based model, and also used like page ID parent ID name space all those information as the user edit level data but before that we level encoded them. And we finally use F1 score to evaluate the performance of our model. We also evaluated our work with some of the related work that has been done on this field in the past like like this work by Dr. Solaria et al who considered author to the attribution or a features and also they included Wikipedia specific features. Another study was by Dr. A. McAdal who we considered like number of user contribution by name spaces and et cetera. We additionally considered or S which is objective revision evaluation system is a machine learning prediction system by Wikipedia, who categorize all the articles in four categories like okay attacks spam and journalism with a score. We considered all these previous studies in with in comparison to our work. So finally the results are random forest was the one which performed the best with all our features that we have already discussed LSTM didn't perform as well as random forest. So the user level data or edit level data didn't contributed much. So if we consider our results without proper with our competitors, it's we can see that LSTM has the sorry random forest as the best performance is in comparison to all our competitors in terms of LSTM solar isn't performed a bit better than ours. So early detection of Wikipedia stock puppet account is possible. So we just only one edit our features can detect stock puppetry with an F1 score of 0.73 and it actually increases with the number of features included. Or if we consider more edits for the LSTM our model initially performed better, but after like 10 edit it was crossed by LSTM with solar feature a bit. But even with our features in random parts it is always on top and perform the best in terms of early detection or even detection with more and more edits for better performance. We additionally did an ablation study and LDA topics was the one which you had the more most contribution in terms of our model performance. Even with individual features when you consider all the features our model performed better than all of us which irrespective of one edit or up to 20 edit. So finally we address the issue of automatic software detection in Wikipedia we introduced mobile features like topics and embeddings. Random forest was the random forest perform all the previous one and early detection is possible with just one edit is up to an effort of 0.73. Topics are really important in our model and we have released all this data for future research community to further investigation on this one. Thank you for all for your patience and I'd like to request any questions you might have.