 Hi everyone, my name is Linda Wing and I'm an economics PhD student at Cornell University. Today, I'm here to present a project called Social and Language Influence in Wikipedia Articles for Deletion Debates. This is a work in progress, so any and all feedback would be much appreciated. Now before I jump into the slides, let me just give a quick motivation for why I pursued this project. So as we all know, nowadays digital platforms are very important for group decision making. And Wikipedia is a great example of that. In these kinds of contexts, certain features are very salient to the individual user slash discussant. And I want to bring your attention towards structural features specifically, like spokes versus linguistic features like sentiment. You can imagine that both types of features, structural and linguistic, can sway an individual to agree with the majority of the majority in the discussion or debate. But it's important to note that both types of features can be imperfect signals of the discussion or debate's credibility. So I believe it's important to more precisely define the relationship between these types of features and individual user behavior, specifically their tendency to vote with the majority. And throughout the presentation, I'll broadly classify this as quote unquote, herding like behavior. I'm aware that there are many different definitions of herding, especially in academic contexts. So here I use it in the broadest sense possible. And herding like behavior, you can think of it as behavior that's consistent with what you might expect from a traditional herding example. So without further ado, let me jump right in and explain how I'm answering these questions and beginning with why articles for deletion is a great place to study this on Wikipedia. So here I have a brief outline of the procedure for nominating an article for deletion and how the debate goes. I just want to highlight the key feature of these debates is that there are voting comments that consist of both a vote, for example, of keep or delete, and a text rationale for that vote. And also there's a conclusion to the debate determined by an administrator. And so this allows me to utilize structural features related to voting sequences, as well as linguistic features that come out of the text of the rationales. The specific subset of articles for deletion that I'm working with are the debates that occurred between January 2005 and December 2018 on the English language Wikipedia. This roughly amounts to about just under 400,000 debates and over 3 million comments, which include voting comments and non-voting comments, but there are enough voting comments where it's a very rich data set. Now the first thing I want to do when approaching this problem is to verify that my quote unquote hurting like behavior can actually be found in my articles for deletion data set. And so the way I approach this is to consider the probability of the K plus one vote agreeing with the prefix majority. And just to break that down, by prefix, I am referring to the number of preceding voting comments for the K plus one vote. So for example, if K is two, then I'm talking about the third probability, the third vote agreeing with the majority of the previous two votes, which I guess is not the greatest example, but it's more illustrative of my notation. So if hurting like activity exists in my data set, I would expect that that probability would increase in prefix length. But I actually found, as you can see here, that it hovers around 0.5 and even dips as prefix length increases, which can indicate some sign of anti-hurting behavior, as it means that longer debates can actually be more contentious, making the probability closer to that 0.5 mark. So that's interesting. So I wanted to verify whether there are other sorts of variation that I can maybe utilize to get at a more detailed analysis of this behavior. So naturally language is the first thing that came to mind, given that I'm also interested in linguistic features. I won't go into too much detail, but these two slides demonstrate visualizations showing that the rationales of delete and keep voting comments are quite different from each other in terms of word choice and even politeness strategies overall. And so given that there's some mixed evidence on hurting like activity, and given that there's quite a bit of variation between keep and delete voting comments, one approach I thought I may be getting at a less mixed of an answer would be to run a logistic regression classifier and make sure to divide up features by keep or delete types and also control for length. And so here this slide briefly goes over the classifier where X is the vector of features, which I'll talk about in the next slide, and P is the probability of the vote being delete. So here is the set of features I was mentioning. You see that there are both structural features like a portion of previous delete votes as well as linguistic features in part motivated by the visualizations on the previous slides, like the use of Pernom or like internal referencing within the debate. And so again, I'm splitting between keep voting comments and delete voting comments to try and get at hurting like behavior. And here are the results, and I don't want to bore you with too many numbers. So let me just highlight the main points, a particularly interesting large scale pattern we see is that textual features tend to have more extreme coefficients in the second half and especially in keep votes. So for example, the usages of external links, hang on a second, the usages of Pernom is stronger in the second half than it is in the first half. And a lot of other textual features share that characteristic. And on the other hand, structural characteristics like fraction of delete votes actually seems to be stronger in the first half. So there is some evidence that hurting takes place, but it might vary hurting like activity takes place and that structural features and linguistic features have different influences at different positions of the debate. So before I run out of time, I just want to mention that two extensions to go off on this is to one extend the model to gender gap problems, especially with regard to women biographies nominated for deletion, as well as get it a more causal interpretation of these results via a lab experiment. But in any case, I look forward to your feedback. Thank you very much for listening to my presentation. I really appreciate it. Have a wonderful day.