 All right, all right everybody, this is the January edition of the research showcase. I'm Dario, I run the research team, and today I'm excited to have two presentations. We have Aaron Helfaker presenting on productivity measurements on a piece of media. And we have a guest presentation by Gerald Mechler from the ETH student survey. And we're going to present for 30 minutes for each presentation, probably five minutes at the end of each presentation, and we'll have plenty of time for Q&A at the end of the session. So, speak around, you can also join a conversation on IRC. And with that, I'll give Aaron the first box. Aaron, please use yours. Great, thanks Dario. Let me just get my slides going. Okay, so today I'm going to be talking to you about some productivity measures that I took at Wikipedia and what this might imply for how we look at anonymous editors. And a few other fun things that I found along the way as well. So, I always like to pull out this slide whenever I'm introducing myself at the beginning of the talk and I direct you to the quote that I have underneath my title of the Media Foundation. Think big, measure what you can, and build better technologies. This presentation is definitely going to be about measurements. It's a set of measurements that I took that have some implications. They're not, they didn't really start with thinking big, they really just started with trying to figure out what was going on. Okay, so I have three takeaways for you and I'm going to start with them. I'll also cover them back at the end. One is that we can use this measure that I've been working on, content persistence, as a robust measurement of article writing productivity in Wikipedia. Two, that English Wikipedia seems to be getting more efficient over time. And finally that anonymous editor contributions are really important. They actually represent about 15 to 20% of the overall productivity in Wikipedia. So, maybe we should consider anonymous editors as we're building new features. Okay, so moving on, what the heck is this content persistence thing? So, really when I talk about content persistence, I'm talking about how content survives through revisions of an article in Wikipedia. So on the screen what I have right now is a sample article or a sample set of five revisions of an article about apples. So we can see that the word apples, which was added in the first revision, persists for all five revisions. The word red has a little bit of a complicated history, but it keeps coming back if it ever gets removed. However, the word blue that's added in the second revision of the article doesn't stick, it immediately gets removed. And so, the idea with persistent revisions is that we should be, or persistent, sorry, content persistence, is that we should be able to get a sense for the quality of a contribution. By looking at how other Wikipedians respond to it, whether it actually persists in the article or not. So, there's a few tricks that we have to work around. So, for example, when somebody performs a revert, we have to handle that change and go back to an old state when we're tracking words. We wouldn't want somebody to, for example, blank a page and have somebody revert that page back to its old state and attribute the entire article to that new person that performed that revert. It turns out that we also have some nuance around difference algorithms for doing this sort of thing. This is a strategy that we actually use to track what content persists between revisions. So, imagine we have these lists of four words, and we're trying to figure out what sort of change happened to this list to get from the left side to the right side. This is pretty easy to eyeball with human intuition, especially because I've color coded it. We can see that the A at the beginning matches the A, the B matches the B, the C matches the C, and the D matches the D. Great. However, the most common way of generating a difference between two chunks of text using the longest common substring strategy, which is similar to the UNIX diff utility if you've used that, fails to make one of the important connections here. So, if you move a chunk of content, just like I have with the B symbol here, the longest common substring strategy can't match those two Bs together. It can't figure out that that content was moved. So, it will represent it as being removed and re-added. We can actually see this in diffs on Wikipedia. The algorithm that generates the diffs that you can see as you're browsing around the Wiki will show you diffs that commonly look like this. So, we can see this paragraph here that starts with suicide bombing was also used against the Japanese. Up here, we have suicide bombing was also used against the Japanese. This is the same paragraph, yet the diff says, remove this paragraph and add this entire paragraph over again. It would be very problematic if refactorings like this screwed up the way that we tracked content persistence and made it so that we associated content with somebody who really didn't add it to the article, just somebody who maybe moved it to some other location in the article. So, we have this problem of attributing authorship of content. So, luckily, there's been some really nice research in this area that's been pushing on good ways to do this in an automated fashion. So, there's been work by El Faro and Shavlowski and Flock and Acosta. And both of them are looking at how you do this authorship tracking in the Wikipedia context. So, they're actually testing their algorithms on English Wikipedia. And so, as far as I can tell, this is the state of the art. So, you take two versions of an article in Wikipedia and you split it into paragraphs and sentences. And so, there are some simple ways that we can do this with double line breaks and periods, and it mostly works okay. You then take each sentence and generate a hash for it so that you can identify identical hash sentences and paragraphs that cross the revision boundary. Then you take whatever is left over, whatever wasn't an actual perfect match, and just do the longest common substring strategy on that. And it turns out that almost all of the time, it makes it so that the diff algorithm matches human intuition for what content was moved and what content was actually changed in the article. And so, I've been using this strategy, the segment matching strategy, to track the history of content in Wikipedia. So, there's one more thing that I want to talk about. There's a few ways that we could measure how long something persists in Wikipedia, but I'd really like to set a threshold. A threshold by which any content that persists longer than this amount of time, this amount of revisions, is considered good enough. It was actually a quality contribution. But how much is enough? How much persisting is enough? So, for this, we did some sensitivity analyses. So, this plot, I don't want to spend too much time on. I'll make it intuitive in just a moment. But it's showing the hazard that a word that's added to an article will be removed based on how many revisions it's persisted. And so, we can see on the far left-hand side of this graph that as soon as you add a word to an article, if it has not yet persisted through any revisions, it's got about a 15% chance of being removed in the next edit. If it doesn't get removed in that next edit, then it has a less than 5% chance of being removed in the next edit. And so, anyway, we can see the decay over time, the longer your contributions to an article persist, the longer they're likely to persist. And so, it turns out that there was also some qualitative analysis that Susan Bianconi did looking at these sort of measures in Wikipedia. Specifically, she took a random sample of edits in Wikipedia and used this content persistence measure and then drew correlations between how real humans thought about the quality of edits in Wikipedia and what the content persistence measure suggested about those edits in Wikipedia. And generally, the summary is that we can set a pretty good cutoff at five revisions. If your contribution lasts five revisions, then we can probably call it persisted. But this is great for articles that get edited a lot because you can get that next five revisions in maybe a couple days or a week. But for articles that are seldom edited, maybe even articles that aren't even edited more than once a year, this could be a problem. So, we also wanted to use time, the actual time that a contribution persisted in an article as a cutoff here. And so, in this plot on the right, I'm essentially showing you the same sort of hazard thing that I was talking about with the plot on the left. But rather than looking at subsequent edits, we're looking at hours after the word was added. So again, if your contribution has not lasted zero hours yet, it has a 15% chance of being removed in the next hour. But if it lasts at least an hour before being removed, then you have less than a 5% chance of having it removed in the hour after that. And so, we can see that this hazard of removal pretty much decays to zero around 10 hours. But there's some interesting artifacts in this graph that I want to point out. And that's, we have two steps here. They're kind of hard to see, and I'm not sure how large you guys are seeing this on screen. I'll have my slide deck up on the wiki afterwards so that you can dig into the slide and zoom in on it. But there's very obviously some steps here where the hazard goes down once we cross these daily thresholds of 48 hours and 24 hours. And so what I think is happening there is that a lot of the times when somebody makes an edit that's not really good for an article, then people will catch that because they see that edit appear on their watch list. And we know from some of my past work looking at editor sessions that editors tend to have 24-hour periods between the times that they come on Wikipedia and do work. So it only sort of makes sense that here that we see people doing this sort of watch list removing damaging edit behavior on the 24 and 48-hour timescales. And so using these graphs, I ended up choosing a cutoff of 48 hours so that we gave it two full 48, or two full 24-hour periods of people looking at their watch list to potentially remove this word. So to translate this into English, I'm going to consider a word added to an article persisting if it survives at least five revisions by other people and 48 hours before being removed for forever. So this is not a complete measurement of productivity and quality. This misses a lot of important work that Wikipedians do such as their activity negotiating content on talk pages, developing templates, uploading images, performing the countervandalism that lets this metric work in the first place, and of course doing research and tool development which is what I do most of my work in. However, it's good in that it recognizes adding good new content to articles assuming that Wikipedians are selecting for good new content and it seems like that is definitely true. So now that we have this threshold and I've talked to you about our sensitivity analysis to choose this threshold, I want to talk to you about what measurements this actually let us do. And I'll be focusing on the English Wikipedia. I haven't extended this beyond English Wikipedia yet but that's on my plate. We'll get into that a little bit later. So when I look at the overall productivity of Wikipedia just as a raw count of persisting words added to articles over time, we get this graph. Each one of these bars on this graph represents a month of Wikipedia activity. And so there's something that caught me right away when I looked at this graph that I just thought was completely unusual. And that's if you've ever heard me talk about Wikipedia before, you've probably heard me talk about Wikipedia's decline. And so the graph on the left, I'm showing a graph of the active editors in Wikipedia. On the graph on the right, we're looking at this overall productivity measure. The graph on the left suggests that Wikipedia has been declining since 2007 but the graph on the right suggests that Wikipedia really hasn't entered into a substantial decline when it comes to this productivity measurement. That was really surprising because I've looked at all sorts of measures of productivity in Wikipedia before but I've never seen this sort of non-declined pattern happening. So I actually don't like the measure of active editors to get a sense for these sort of things. I like a measure called labor hours that I developed in 2013 or at least published about in 2013 where we can actually take editors' sessions editing the Wikipedia and make a pretty good estimate for how many hours they spend doing their editing work. This I would like to think is sort of a measure of the input that goes into Wikipedia. We get about 12 million labor hours a year and it's up to us to figure out effective things to do with that. So this plot I'm pulling from this 2013 paper shows this rise and decline pattern. We can see in 2007 there was a sudden spike and then a decline in the number of hours that people were putting into Wikipedia. So just note in case that you can't read the labels on this graph that the spike is around 600,000 labor hours per month and by the end of the graph which is a little bit after 2012 it's like March 2012 we're down to about 400 labor hours. So just using my eyeball and extending this graph to where it would likely be assuming that the trend matches what we have for active editors new article creations added to the Wiki we should expect a decay that looks about like this up to this day and I want to highlight this to make it absolutely clear that I'm eyeballing this. I think it's a good eyeballing but it is not a robust method. So now putting these next to each other if we look at labor hours going into the Wiki and persisting words added to articles coming out of the Wiki as input and output it lets us ask questions about what is the sort of productive efficiency of Wikipedia how much output are we getting per input? So and we can see that the you know just eyeballing this the slopes of these these patterns are drastically different. So what I did was I took values that are actually measured from the ends of this graph and drew some estimates. So in 2016 we got about 258 persisting words per labor hour of editor invested in Wikipedia. In 2015 we got 483 persisting words per labor hour invested in Wikipedia and so I forgot to actually put the number up on the chart but this is an 86% increase in efficiency. That's really surprising and so I wanted to find out you know where the heck is this efficiency coming from and I started with a hypothesis maybe Wikipedians have better tools to help them edit articles faster so they can have fewer people spend less time but still end up doing the same amount so I wanted to look for bots and automated tools that might be contributing to this pattern. So in this graph that I'm showing you and I can show you with this little arrow I'm just breaking out a few different editor types from the larger set. So on the very top of this plot we have registered editors who are not using a tool to edit Wikipedia. So they dominate the productivity measurement. They're adding by far the majority of the productivity that comes to Wikipedia but just again eyeballing this curve we don't really see a decline here and this will be more interesting once I talk to you about the other things that we broke out of this set. So anyway yes no real decline here. Closer to the bottom we can see the IP editors. So the anonymous editors add somewhere on the order of five million persisting words to Wikipedia per month. But we can see that there's sort of looks like there's something declining here that was much higher in 2007 than it is in 2005. And then finally on the bottom we have persistent words added by bots and by tool assisted registered editors. And so there's some trends here but I actually want to zoom in on this part of the graph so that we can talk about them more closely. So this graph is showing the overall proportion of productive contributions that these editors have been making the bot IP and tool assisted editors. So here for the IPs we can see that the overall proportion of productive new content that they've been adding to Wikipedia has been in decline. In 2006 it was about 20 percent and in 2015 it's about 15 percent. So this is sort of interesting it both tells us that annons are important but it also tells us that we're not getting nearly as much productivity from anonymous editors as we used to. There might be many interesting explanations for that that I like to talk about afterwards. Looking at bots we don't see a sort of steady pattern. Bots are really bursty in their activity. There's really two spikes of bots adding productive new content to Wikipedia around 2008 and 2011. And I'm not quite sure what those are. The actual underlying data shows some really tall spikes and so we might be able to dig into the data and have a look at those months to see what's going on but I don't have anything to tell you right now. But one other thing that you will notice is that after 2012 the productive new content added to articles from bots entered a steep decline and it doesn't look like it's rising back up. And so I just wanted to ask the question maybe this is Wikidata. Wikidata gained a lot of steam in 2012 and it's very highly dominated by bot activities. Maybe a lot of the people who are writing bots to fill in data on Wikipedia are instead using those bots to fill in data on Wikidata. That would be an interesting thing to look into. Okay, finally I want to talk about tool-assisted edits. And so this is an example where the rate of productive new content to add into articles from tool-assisted interfaces has been on a steady rise. So I broke out the most common five tools that are used to add productive new content to articles so that we could talk a little bit about each one. So by far the most productive tool used in Wikipedia is Auto Wiki Browser. That's what this AWB stands for. Auto Wiki Browser is a standalone interface. You don't actually run this in your browser. I think that it's Windows only but I'd have to go check. That lets you automate doing a certain type of edit to a lot of articles really fast. And so on the left hand side we're selecting a category and so we're going to apply an operation to a bunch of pages. We select the operations that we're going to perform. Auto Wiki Browser will show us what the change actually looks like and then we can hit start to start processing a bunch of articles. So this is sort of like a bot but it's an actual user interface. It has a human in the loop but it does very bot-like things. So there's two more that are used relatively commonly. So RefLinks is an old utility that would allow you to change a reference that's just a bare URL and format it using a citation template like Wikipedia uses. RefLinks recently was removed from tool labs for various reasons and so this new tool called Refill which essentially serves the same purpose was brought in to replace it and so we can sort of see the rise and decline of RefLinks and the sudden bursty rise of Refill. And so I just want to show you a little bit about this Refill interface. It's actually quite beautiful and I love this description that they have at the top where it shows changing a ref tag that only has example.com into it to a ref tag that has the proper citation template for example.com and actually pulls the title from the URL and all those wonderful things. Wonderful tool. Turns out that it's adding a lot of high quality content to Wikipedia. And so finally at the bottom there's two tools that didn't really make that much of a blip on this plot but they're pretty big compared to a lot of the other tools that are used on Wikipedia so I thought that they warranted inclusion here. So AutoEd is a user script. It's a piece of JavaScript that runs on top of Wikipedia and allows people to do auto wiki browser like activities where with one click of a button you can perform many operations. And Oconfucius actually is a user who develops several different types of scripts that perform these types of automated operation. And so both AutoEd and Oconfucius tend to get flagged in edits that are doing sort of like this batch editing activity. Okay so in summary I want to go over these graphs again and tell you remind you what I talked to you about. So one is the surprising thing that it looks like efficiency when we look at labor hours compared to productive contributions to articles. Efficiency is up 80% from where it was in 2006 and that's really surprising. We can also see that most of the productive new content in Wikipedia is added by registered editors editing manually. However anonymous editors are still contributing 15 to 20% of the productive new content in articles. We can also see that tool uses on the rise. Tool automated tools are helping editors add a good productive new content to Wikipedia and they're mostly force multipliers where one button will click or one button will one button click will make a lot of edits or they're doing things like reference cleanup. This seems to be sort of like a killer use case of one of these tools. Okay and now for fun because I didn't really have any fun research questions in this space but I thought you might like to see what does this productivity measure look like for individuals. And so of course I'm going to pick on myself quick and look at my productivity over time so epoch fail is my volunteer account. So in this graph these bars on the bottom represent the monthly productive contributions that I made to articles in Wikipedia and this area that's going up over the top is the cumulative sum of the total productivity that I had in Wikipedia and so that's why you can see these giant steps at the point where I actually do something that month. And so there's a couple things that I want to point out. So my most productive contribution was actually a conflict of interest which I filed on the talk page. I wrote the article about my old research lab group lens research at the University of Minnesota. And in fact I encourage you to go to that talk page because I think I did a pretty good job of managing my conflict of interest and I'm pretty proud of it. But you know like one thing that you might ask looking at this graph is what the heck has epoch fail been up to since 2010. And so here I am living the experience of having this measure not accounting for a lot of the things that I do in the wiki. It turns out that if you go to my user page I'll talk to you about the tools that I've been developing for Wikipedia editors since 2010. Like ORS which is an online machine learning service wiki labels which allows people to label things in Wikipedia so that we can train these machine learning models. SNOTL that's a newcomer support and help system Mr. Clean that helps you add cleanup templates. WikiNome that allows you to edit a sentence at a time while you're reading the article and you know this list goes on and they're real contributions at least I think so. I think that they are really contributing productively to Wikipedia but they don't show up as article contributions and so my graph looks very sad after 2010. Okay so picking on a few other people since we have Guillaume in the room I asked him if I could pull up his graph here so we can see that Guillaume is much more consistent. So he doesn't have huge spikes of activity one month and here or there and then long periods of inactivity. He's really picking up the contributing to articles on a relatively regular basis but not very much. In fact when I was talking to Guillaume about this he warned me you should probably look at French Wikipedia because my graph is probably going to look a lot different there. And so another one that I thought might be fun is Jimbo Wales. So we can see that Jimbo Wales actually looks a lot more like my activity and that he's very periodic and will often disappear for long periods of time or not contribute that much for long periods of time but it's really important to understand that Jimmy Wales is way beyond the scale of where Guillaume and I are. So if I were to plot the cumulative distribution of or the cumulative sum for myself and Guillaume we would be right at the bottom of this graph. We wouldn't even we would be pale in comparison to how much contribution Jimmy Wales has but even Jimmy Wales pales in comparison to an editor like DGG. DGG is somebody who I've been talking to quite recently because he works in a lot of newcomer help spaces in Wikipedia specifically the articles for creation newcomer help space where he and a bunch of other Wikipedia's help newcomers write drafts that will stick in Wikipedia. And so DGG looks like he's not making very big contributions because all of his bars are at the bottom of the graph but it turns out that every single month of activity in DGG's history is bigger than my biggest month of activity. And so if I put myself and Guillaume and Jimmy Wales on this graph again we're towards the bottom. We just pale in comparison to how much productive new content somebody like DGG has added to Wikipedia. Okay so that's all I have for you. I'm just going to go back over my takeaways the things that I would really like you to walk away from this talk with. So first this content persistence measure is a robust weight of measuring article activity. We look at the survival of content and we use that as an implicit measure of quality. Tracking authorship is hard so there's some fun research in the space that I've been taking advantage of but there's definitely more work that we can do. And this is a useful measure but it's incomplete and I can speak from experience that my graph does not capture my full contribution to Wikipedia. So we've also seen that the English Wikipedia seems to be getting more efficient and it's sort of hard to see why this is the case. We see that the labor hours that people have been putting into Wikipedia has been decreasing quite substantially over time but the output of Wikipedia editors even when we just look at registered editors and remove bots and tools has been holding relatively constant so that we see about an 86% efficiency increase since 2006. This is definitely worth looking into. And finally anonymous editor contributions are important. They add about 15 to 20 percent of the overall productivity in Wikipedia so the next time that we release a new interface or a new way of consuming Wikipedia content we should make sure that anonymous editors can contribute fully while logged out and we should probably not be pushing anonymous editors so strongly to register their accounts. Next steps in this, of course I want to do more Wikis. English Wikipedia turns out is the most difficult to process so smaller Wikis should be easier. There are no Wikis that are bigger than English Wikipedia and I'm specifically going to work on targeting emerging communities which is something that the community resource team is developing a list of so that we can start seeing what's going on in those Wikis. I'll be mixing these productivity measures with measures of importance such as page views or measures that we can get from the link graph so that we can get a sense for value added with the idea that if you're productive in important places that's more valuable. And then finally getting an interface online so that people can look at their productivity measures. I think this will both be interesting for maybe improving how productive people are in contributing to Wikipedia and also allowing them a nice channel for critiquing the measures so that we can develop new and better measures for getting at this productivity stuff. Finally, I've already got some data sets that we'll be releasing open access and we'll be getting those uploaded into the Quarry querying system too so that you can play around with them. That's all I have for you guys, thank you. All right, so I think we have a couple of minutes for questions. I haven't heard anything from IRC. Okay, fantastic. So you want to read them? Yeah, sure. So the first question, Erin, is what was the said persisting token thingy? Is that survival of text written by someone? Sorry, I didn't hear the beginning of the question. What was the said persisting token thingy? Yeah, so there's some jargon that I removed from the slides, but I didn't get all of it. The measure of persisting that I used was based not on just surviving through subsequent revisions, but they have to be revisions that were made by somebody who isn't yourself. So non-self-persisting means that your content stuck in the article through other people's edits or hit our time threshold. And I should have removed that and just called that persisting words. I'll take the next one, really. It's actually similar to a question that I had. So two questions from IRC. First, suggesting we should reach out to tool developers. And I feel the same, like at least data about the property that I used to write in my digital tools where all the ideas that I created, I think it's fascinating we should talk about this. Just a note quick on that. So it's really hard to generate these statistics in real time. But my goal in this project is to make this so that it's as simple as hitting an API where you could say, you know, score this edit for me and tell me how long the words persisted or score this user for me and tell me how productive they've been over time. So it's hard, but that's exactly what I'm working on. If there's somebody out there who wants to help develop these technologies, please reach out to me. Yeah, and HR is virtually hugging you for doing this, just so you know. All right, great. And the question that I had is related to your analysis of breakdown as a functional type of agent, if you wish. And what I'm wondering is, what is the effect of cohorts, right? We don't have, especially over the last couple of years, the vast majority of edits and work, especially on most mature weeks, as we perform by the older cohorts. So I'm wondering to what extent the underlying factor is people who are more expert and in line share of these edits. Yeah, I think that that's a good question and I think it's a solid hypothesis too. Like one of the things that might just be happening is that the experienced editors as a group are getting more confident and collaborating with each other effectively, making edits that other people won't get upset about or bringing conversations to consensus quickly and that if this were really happening, then it would probably be the people who are around Wikipedia for the longest. I haven't done a cohort analysis to look at that sort of thing and this is one of the big reasons why I want to get this data set released publicly. I have a lot of other work to do, especially around ORS and the rev scoring project and I would really like it if others who were interested in asking these questions could answer them relatively easily. And so by releasing the data set, we can do more of these drill downs into various angles that people have thought of that I haven't thought of. So yeah, I guess to not answer your question, I don't know yet. I think that that's a really interesting question and it would be really great if somebody else could pick that up. I'd like to help. That's great. Just to thank you for the last time if you have any questions. So if there's anything else that you would like me to say, I continue at the end of the year. Obviously, it talks. Aaron? Jerome, I'd like to start your presentation. Again, 25 minutes and I'll continue with questions at the end. Okay, well, wow. Aaron, that was fascinating. It's going to be hard to move after you, but you've, well, I'm getting used to it. You did the same thing to me at Wikimedia 2014, so. Let's try. Okay, so thank you for having me and really excited. Thank you, Dario and Layla, for putting this all together. So let's try and share those slides. Okay, can you see it well? Yes. Okay, so this is a project which I started a while ago when I was at the Berkman Center and it basically deals in a very general way with the pro-social foundations of cooperation within Wikipedia. But what I'd like to do today in the interest of time is to really focus this presentation on two very specific sets of results and rely on those to ask a couple of questions which I think are important and should attract the interest of the editor or the researcher that lies in you. But before I do that, let me maybe start with a very quick thought experiment. And I'm going to use this animation to help me make my points. Imagine that you're participating in an experiment. You're the guy who has the green dollar bills here. You're participating in that experiment with three other people in the group. Each member of the group has a private rotation of $10. Then each one of you has a private decision to make. How much taken out of those $10 you want to keep for yourself, in which case you just earn that money? And how many dollars taking out of those $10 you want to invest in a common project? Now, each dollar that you're going to invest in this common project is going to yield a private return of $0.4 to you personally. So that it's really not efficient for you to invest in that common project. However, as it turns out, each dollar that you invest in this common project is also going to yield $0.4 to the three other members of the group. So that by investing $1, you actually create $1.6 for the group as a whole. So people have been analyzing those kinds of situations as public goods they lend us. And if you are perfectly selfish and rational, well, you should never contribute in such a situation. But if everybody's like that, that nobody contributes and we end up in a very inefficient social situation. So what I'm going to claim here is that this game theoretical scenario is actually a metaphor for the decision to contribute to Wikipedia. And I really like you to think about each and every decision to contribute to Wikipedia as a public goods dilemma in this respect. Indeed, there are no extrinsic incentives to push you to contribute in those kind of situations. You're not getting paid to contribute. And you can't even hope to get a better job and signal the qualities on the labor market by contributing. So really this kind of public goods dilemma is at the core of the decision to contribute to Wikipedia. This is the question that many people ask themselves when they notice some changes, some implementation of some enhancement that they could make on a Wikipedia article, should I incur the private cost of contributing knowledge in order to reach a socially efficient outcome that is let other people around me benefit from this knowledge which I already hold but need to put in a good format so that other people can actually access it. And so in this respect I kind of like that quote by Kizor who's a Wikipedia meat straighter who tell us the problem with Wikipedia is that it only works in practice in theory, it can never work. And I hope that this game makes really precise how in theory it should not work. Now I like that quote even though in practice it's actually wrong. We do have theory around that tells us why people would be willing to contribute in those kind of public goods situations and we basically have three kind of models that assume different kind of what economists call pro-social preferences that is you're going to be willing to put in your own welfare function the welfare of other people around you in different ways. And we have three classes of models for that three main preferences that can push you towards contribution when you ask yourself that very question should I incur this personal cost which is socially efficient. The first of those motivations is based on altruism. Basically what makes you happy is to provide value added to the people around you. The second one is based on reciprocity whereby you will be willing to contribute if you see that other people around you are also contributing. You derive welfare from those interactions that you entertain with people while contributing to the public good. Finally the last class of model has been put forward by the literature and the social sciences based on a social image whereby you would be willing to incur this personal cost if by contributing you're able to signal some quality about yourself to a community of people that you care about. This is really what social image is about. And so what I'm going to do today is basically elicit the social types of a sample of Wikipedia contributor that's representative of their diversity with an online experiment which I will couple with observational data and then I will use the revealed social type of those contributors in this experiment to predict antisocial slash non-cooperative behavior within Wikipedia. Now this is kind of the game plan for today but some people now might just ask themselves this question why not simply use survey questions what not basically ask people. And this study purposefully does not rely on survey method. Why? Basically because we know that people are prone to self-reporting biases if I ask you what is your social type you're going to tell me whatever works best with your own view of yourself or what you think I want to hear or whatever. Also people might not have a clear idea of their own social type. And finally going the experimental way in a very contextualized fashion in a sense allows me to get at very deep underlying preferences which are most likely to carry over from one context to the next. And so this is why I want to use an experiment rather than survey methods for this project. Okay so I think that we're basically good. So what Wikipedians did in this project is that they actually played this public goods game which I just explained to you. But what I'm going to allow you to do in this game is to condition your contribution on the average contribution of the three other members of your group. As you can see here in this decision screen I'm going to ask you if the other members of your group on average contribute zero how much do you want to contribute? If they contribute one, two, three to the column project how much do you want to contribute and I'm going to use those conditional contribution decisions to infer your social type. I'm going to basically distinguish between four types. The first type if you look at this graph is going to be free riders. Free riders basically play the rational strategy. They never contribute and they maximize their earnings in any event by keeping their 10 bucks. And if other people contribute pre-riding on the benefits of those contributions. Basically you're reading Wikipedia you will never contribute. This is the baseline group against which I'm going to compare the behavior of the other social groups. So to speak. First group, free riders. Second group we're going to have reciprocal caterers. Reciprocators are those guys who decide to exactly match what the other people of their group do. They're going to contribute other people around them. We're also going to have frequency procators as it turns out those guys are a pretty large fraction of people in the population. They react to an increase in the other people's contributions but less than proportionally. Finally we're going to have those altruists which I talked about. And this is kind of the mirror image of free riders in the sense that those guys will always contribute a very high fraction of their endowments irrespective of the average contribution of the other members. So those are kind of the of the different social types that I'm going to elicit through Wikipedia's behavior in this experiment. And it's going to be important for you to remember that when Wikipedia's actually played this experiment they played with other random internet users not with Wikipedia contributors. Now you're going to ask me we have the secrecy preference there we have altruism what about social image? Well social image is very difficult to measure experiment I mean people have tried it's very hard so what I'm going to do is that I'm going to rely on the wealth of observational data that we can extract from Wikipedia in order to construct indicators of revealed preference for social image within Wikipedia. I'm going to try and do that in two very simple ways in order to try to see and check for the consistency of the results that I get. First I'm going to make use of first on Wikipedia user page data. For all the participants in this study and there are about like 850 of them I'm going to measure the size of their user page and consider them as social signalers quote unquote if they have a user page whose size in bytes is bigger than the median image sample a very simple indicator if the size of your user page is bigger than the median I could use a social signaler otherwise you're a non-social signaler not really concerned about your social image within the amenity. Second indicator is going to rely on BornStars data here I'm going to restrict the sample of subjects to those who already receive BornStars and I'm going to consider as social signalers those who decided to manually move at least one of those BornStars from their talk page on which they typically receive those BornStars to their personal user page so that would be displayed for everybody to see forever. To me the fact that contributors are willing to manually move their BornStars to their personal user page is an indication that they care about their social image within the community and so I'm going to consider them social signalers. Okay so so far so good again we have an experimental protocol to disentangle pre-writers from reciprocators from altruists and then we have data from Wikipedia that hopefully allows us to get at people's preference for social image within the community and now I'm going to use those real preferences to try and predict antisocial behavior within Wikipedia and see what I can learn from that. Okay so first result collaborativeness of contributors and antisocial behavior I'm going to operationalize antisocial behavior in two distinct ways first I'm going to take the proportion of the subjects reverts that do not feature any kind of explanation basically you have a blank edit summary field you do not provide any reason why you reverted a particular edit second I'm going to take the number of times one of our subjects started in an editor with another contributor on a Wikipedia article I define an editor in all sorts of ways in the article here I'm going to stick with the following definition one of our subjects reverts another contributor C on a Wikipedia article C reverts the subject on that same article and finally the subject comes back comes back and reverts C therefore going back to the first situation knowing that those three steps must be consecutive to count as an edit one so what I'm going to do is that I'm going to explain in turn both of those antisocial behavior controlling for a bunch of demographic characteristics at the individual level and including the social types at risk reciprocal or social signaler of subjects to see how it correlates with antisocial behavior this is just to show you how such behavior correlates with demographic characteristics that I have about our subjects which you can see that it's actually very difficult to predict be it the proportion of reverts that do not feature an explanation here or the number of editors that subjects start basically age matters and that's it the conflictuality score of the articles that editors contribute to is negatively correlated to this indicator positively to this one which makes sense other than that not much is happening now controlling for those what the next table does is that it estimates the impact of those social types here weak reciprocators reciprocators and actresses contrast that all else equal to free writers and here social signalers contrasted to non-social signalers for the user page measure and here for the barn starts measure so what do we see each of those coefficients you can interpret as the estimated person for change in the dependent variable here the proportion of reverts without an explanation here the number of editors started when you move from revealing free writing preferences to altruist reciprocators or request reciprocators preferences here social signalers versus not what we see is that subjects who reveal reciprocal or actress preferences actually justify their reverts much much more on average the prevalence of reverts that are not justified in the sample is about 6% so by having those preferences you basically curb that behavior to zero if you look at the barn star measure of social image you also see that social signalers justify their reverts much more than others now if you turn our attention to the number of editors that we start we get a somewhat more somewhat more nuanced result what you see here is that actress than reciprocators again start less editors they are more cooperative when they interact with others less anti-social behaviors but here contrary to the previous estimation social signalers actually have much more conflictual relationship within the encyclopedia they start much more editors like about 70% more here here the same 33% more editors for social signalers as opposed to not so what explains this behavior well I think that there are some variables there that can help us understand what's going on if we go back to the proportion of reverts that are not explained what we see here in the control variables is that the number of barn stars that subjects receive is negatively correlated with the proportion of reverts that are not explained this totally makes sense the community recognizes that it's a bad idea not to justify your reverse now if we move to the edit work and this was kind of surprising to me and needs to be understood what we find is that the number of barn stars that subjects receive is actually very strongly correlated with the number of editors that they start controlling for the conflictuality of the topics to which they typically contribute so that suggests that actually at the margin the community seems to reward confrontational and self-assertive behavior and so this pretty much is in line with the finding that those very users who care about their social image within the community exhibit much more conflictual relationships so now from there I'd like to ask the following question why does the community seem to reward such confrontational assertive behavior under certain conditions it could be for instance that it's an efficient thing to assert your opinions in a non-corporative way by fighting directly through edits within the body of the article if the cost of opening up the discussion is too high it may be an efficient way to just close the debate if everybody agrees just fight and let the guy who's wrong abandon the fight basically it could be efficient from the contributor's perspective is my point so we need to understand why does the community reward such behavior once we understand why can we find ways to modify those community incentives in order to alleviate this antisocial behavior and build a more inclusive Wikipedia this is my kind of first question for the research slash editor community the second set of result is going to be linked to governance and I'm going to look at administrators here trying to see whether the level of trust that administrators typically have in strangers around them impact the way they use their policing rights within the unsupported community in order to do that I'm going to need yet another experimental tool so let me very quickly explain to you what that tool is imagine that you're participating in yet another experiment and that you are participant A here I will call participant A the trust you and the other guy who's playing with you in this game are both endowed with 10 dollars but you have a private decision to make that participant B does not have you can decide to transfer to participant B any amount taken from your initial endowments from 0 to 10 any amount as you can see here in the animation that you transfer to participant B is going to be tripled then participant B receives the amount and participant B has a private decision to make in turn how much of the amount that he receives does he want to send back return to you knowing that he has absolutely no obligations to do so so if you send all of your endowment to participant B you send 10 bucks he's going to receive 30 he has no obligation to return a single dollar to you so actually I'm going to interpret the fraction of your endowment that you're willing to send to participant B knowing that this guy is totally anonymous you don't know him and it's the only interaction you're ever going to have with that guy over the internet the fraction that you're willing to send to that guy is a direct measure of the trust that you're willing to put in strangers' wrap and that's what I'm going to use to study the link between generalized trust and the policy activity focusing on Wikipedia administrators who participated in this experiment okay so this is what this table does you hear here you have this value of trust and again the coefficients which you interpret in the same way as person can change in the number of contributions that is estimated if you move from no trust to full trust in the game now the first thing that I do is that I first check here that there is absolutely no significant correlation between the trust level of regular slash non-admin contributors and the number of contributions that they made to Wikipedia there is no reason to believe that trust should be related to contributions among regular editors however if I move to the same sample of Wikipedia administrators what I see is that an increase in trust is very strongly negatively related to the activity level of Wikipedia administrators now if I distinguish this overall activity level in terms of number of users blocked pages deleted or number of pages protected I find again that same negative relationship which holds as a final piece of evidence I actually went back to those administrators six months after the completion of the study asking them can you tell me in a scale from zero to seven if I remember right maybe nine I don't remember precisely what's the fraction of the working time that you dedicate to admin activities on Wikipedia I think it was from zero to nine zero was absolutely nothing nine was all of my time and here again I only got 27 answers but even with this very small sample size there is a very strong negative relationship between administrators trust level in strangers and the time that they self declare dedicating to admin activities on Wikipedia so this is an interesting finding itself the social attitudes of those administrators have an impact on the way they manage the governance of the community but here is my question what is the optimal level of trust that administrators should exhibit in order to efficiently protect the common resource while maximizing participation you do want to keep the vandals out but you don't want to discourage a good faith editors who are still learning the rules of the game and once we figure out ways to understand what that optimal level of trust should be can we actually nudge admins towards the adoption of the right level of user trust in order to build a more inclusive Wikipedia letting them those out but making sure that we bring good faith contributors in without discouraging them at an early stage this is kind of the the provocative thoughts that I wanted to share with you guys this is by no way a comprehensive account of what we do but this is kind of the format that I thought would make the most sense for this particular presentation so thank you very much and look forward to your thoughts and comments Thanks Jerome great talk and we have a few questions from the audience from my receipt so I'll ask Aaron to answer the first question since he was the first one to raise his hand oops there we go hey so I'm curious when you were looking at the custom edit comments on reverting edits did you exclude auto-generated edit summaries that might come from Huggle or Undo or Rollback or something like that I actually didn't so do you think that this is important so I did a similar type of analysis where we're looking at how often people who revert other edits will respond to people when they reach out and ask them why did you revert my edit and I found that it varies quite dramatically on the tool that's used and so I was really surprised to see that I think you said six percent of people didn't explain their reverts and I think it's more the opposite six percent of people actually say anything at all beyond I reverted you huh so okay so I may be capturing a lot of justifications that are on the part of bots is what you're saying yeah or just something that the tool that they're using generates it would be interesting to see the analysis controlling for those things I have a bunch of regular expressions I can give you if you want if you have the data and can rerun it yeah sure sure I can do that that would be great thanks yeah it's a very good point so I'm gonna look into this thank you Aaron all right so I think the people on IRC do see there are other questions anyone from the room I have one question want to ask next but I want to check first go ahead okay so German this is more related to the your final question about next steps right so my understanding of that test is that when it comes to understanding demographics of these users you didn't really collect much data about you know any demographic traits that would be another variable driving those effects and I want to check if that is the case and also hear your thoughts do you expand a bit more about something we discussed you know a few days ago around understanding more for example how these effects women or specifically minorities over the media so I didn't get the very last part so can you say that again sure yeah so the first question is if you collected other demographic information from participants in the test and the second question is related to the next steps particularly how we can apply this towards demographic groups whether it's gender-based or geographic-based for a minority all of the media okay so regarding the first question I do collect demographic information but very basic information so basically what I have is age, gender, education level salary level discount version level whether you you want to take risks in your life and so on and so forth and this is actually one of the tables that I showed you in this presentation is that that demographic information explains virtually nothing of what happens in that space so all of those social types and I was myself kind of surprised by this have much more explanatory power than basic demographic variables when you're looking at antisocial behavior so that's for the first question then the second question how might how could we use this to try and devise interventions that would allow us to increase retention rates well I'm not totally sure so this is certainly something that needs to be thought about I think we can try and design some intervention at least as far as question one is concerned question one was we really need to understand why the community seems to reward confrontational behavior under certain conditions and once we understand that if there is a good reason for them to do this then we need to find to build tools that allow them to reach the same goal without having to rely on antisocial behavior that that would be my take on the issue and you don't need any kind of big intervention necessarily to achieve these goals now regarding the optimal level of trust that administrators should exhibit I'm starting to think about this issue now that I don't have any any good answer for you for you right now cool thank you did that makes any sense or yes okay I have one question for you Joel so about barn stars and edit floors so my question is have you looked at the effects of barn stars on editors over time can it be that you know when you start when you receive your first barn star your behavior is very different than when you receive your 20th and that's not necessary because I mean there is a control mechanism that you know the community can stop giving you barn stars but then barn stars are in their own can they make me worse barn stars are there on their own can they make somebody worse kind of in terms of behavior so if I receive more I know barn stars do I mean sort of anything is already known a title meant or anything that I can do yep so I think there I'm sorry are you done yep okay so I think there are two separate questions here the first one is two contributors react differently when they receive a barn star this is actually a question that I did not investigate myself but using the same kind of data Aaron Shaw and Mako Hill did just that and what they do is that they contrast those social signers to non-social signers and they see how they react when they in terms of editing activity when they receive a barn star and what you see is that actually contributions go down after you receive a barn star on average why because you receive a barn star is because you've been recently very very act right but contributions go down less for social signers than non-social signers that's what happens now this has no bearing on the results that I showed you in this presentation because what I'm actually interested in is not the number of barn stars that you receive is controlling for the number of barn stars that you receive whether you actually decided to move those from your talk page to your reserved page and this is what really reveals your taste for social image within the community as opposed to the sheer number or the timing of those awards yeah that makes sense thank you okay checking if there's any other question on IRC people think me people want me to relate anything but it doesn't look like anything more from the room from your talk yeah okay so with that thanks a lot for our speakers and I guess I'll see you all next month thanks everybody thank you very much goodbye take care folks