 Okay, we're live now. All right, good morning, good afternoon everybody. This is Daria from Wikimedia Research and I'd like to welcome you to the December edition of the Wikimedia Research Showcase. We have some brave souls joining us both in the office and remotely and especially our speakers today who managed to come in the week just before Christmas to give us some great presentations. So I'd like to introduce you to Arona Aaron-Helfiger who can be presenting first on quality trends on individual media with a specific focus on gender-related initiatives. And our second speaker is Andrea Forte, presenter remotely on perception of privacy and safety among core users on Wikimedia. The format is as usual, we're gonna have two presentations of 25 minutes each followed by a short Q&A. And we're gonna allow a think of time at the end of the session for additional discussions. You can join us on IRC. The channel is a Wikimedia Research, Wikimedia Dash Research. And without further ado, Aaron, the stage is yours. All right, thanks Daria. So yeah, as Daria said, I'm gonna be talking to you about some quality dynamics research that I've been working on recently. And I originally titled this English Wikipedia Quality Dynamics in the case of Wikiproject women scientists, but I've been able to extend the work a little bit over the last weekend. And so I'm also including some analysis that I did of articles that are related to visual arts. And I think it offers a nice comparison. So first, as a little bit of introduction, I'm Aaron Halfacre. I'm a principal research scientist at the Wikimedia Foundation. As I usually do at the beginning of my presentations, I call out this little statement that I have beneath my title on my user page, which is think big, measure what you can and build better technologies. And today we're gonna be looking at some measurements. So I'm gonna be talking to you about three things today because these things always come in threes. So first I wanna talk about when will Wikipedia be done and what does done sort of mean in the context of Wikipedia? Then I'll talk about how I use modeling quality as a way to think about progress towards completion in Wikipedia and how we actually implement one of these quality models to help address some of these questions and see some interesting trends. And then finally we'll apply that measurement to some interesting cases, including overall Wikipedia quality and some specific cases including Wikiproject women's scientists and Wikiproject visual arts. Okay, so part one, when will Wikipedia be done? Of course, I couldn't start a presentation without pulling up a nice essay in Wikipedia, which addresses the question of when will we be done and makes the assertion that Wikipedia is not finished. Arguably, it'll actually never be finished. However, the rate at which we've progressed to where we are now can help us think about what we've done well, what we've done poorly and what we can expect to see in the future. And so this graph is one that's brought up quite often. This plots the number of articles in English Wikipedia over time and inside that when will Wikipedia be done, essay, there's lots of extrapolations people have made on this plot in order to guess where we're going if we're slowing down, if we're gonna speed up, if it's gonna keep going, that sort of thing. So yeah, I think it's an interesting question to consider what happens after the end of this plot and people have been thinking about this quite a lot. So this plot is interesting for giving us a sense of Wikipedia's growth, but there are a lot of things that it doesn't show us. So what's happening in this time, in this history, before the end of the plot here, we see this sort of mostly steady, at least monotonical growth in the number of articles in Wikipedia, but one of the problems that happens in this plot is it takes every article and assumes that they're the same no matter where they are in their development. So this stub about Nell Morton is equal to this article, the C-Class article about Highway 55, the most annoying road that I live right next to. There's lots of research papers written about this road. And this good article about biology. All of these are worth the same value when we look at that graph, when we look at things as sort of the raw number of articles. And so it turns out that generally when we're working inside Wikipedia, we don't equate articles. We do consider how complete they are, how high of quality they are. Wikipedians have developed a scale called the Wikipedia 1.0 assessment scale in order to assess article quality in Wikipedia. And there are some bots that help track the quality of articles over the entire encyclopedia. So on the left here, I have a screenshot of the scale. On the right, I have a screenshot of this bot output that shows how many articles fall into these different quality strata. So at the bottom of the scale, we have stubs, which are a very basic description of the topic, provides very little meaningful content. And at the very top of the scale, we have featured articles, which are described as professional, outstanding, and thorough, a definitive source of encyclopedic information. So when considering an article like biology, one thing that might be interesting to ask is, well, it's a good article now, and it's pretty close to the top of the quality scale. But when did it get there, and how fast did it grow in quality? If we were to look at just the assessments that Wikipedians have applied to this article, then the graph looks a lot like this. It turns out that there's only been two assessments that have increased the quality prediction or the quality assessment of this article on biology. Back in 2006, just a couple of months apart, the article on biology was assessed as a B-class, and then it was assessed as a good article class. And so these assessments, they don't really help us see what the trends were actually like because we had these huge gaps where there's not really much information given. In fact, these assessments don't actually tell us when the article arrived at a certain quality. They just really tell us that when somebody showed up at the article and decided to assess it, they assessed it at this quality. So we know at least by this time, it was at this quality. For example, this article could have been B-quality much, much sooner than these months in 2006. In order to try to, or I've been looking at this and trying to figure out a good way to get a more granular sense for when quality changes are made to articles and what's sort of happening during these blank periods of no assessment. And so that brings me to part two of this talk, which is modeling quality. So there's a system called ORS. I like to talk about it a lot. It's a machine learning as a service. It provides the kind of facilities that we wanna have when we're doing this type of modeling work. And so essentially, and by the way, the system called ORS is online right now, this is actually just a screenshot of the homepage for ORS. We provide lots of models, including models that predict article quality. So essentially, when we're thinking about predicting article quality, we take the assessments at the times that they happen in Wikipedia and we train a machine learning model to try and predict those assessments on those versions of pages. And this allows us to then, once we have a pretty accurate prediction model, feed that prediction model assessments that happen in between or sorry, revisions that happen in between the actual assessments that were applied to articles and see what the machine learning model thinks about the actual quality. This will make sense in just a second, but I just wanna make it clear that essentially what we're doing is we're training a machine learning model to replicate the assessments that Wikipedia and seven doing for years across the encyclopedia from stub to start to see the good article and featured article. Okay. Oh, and before I move on and actually show you the cool stuff that we can do with this, I've gotta give a shout out to a recently minted Dr. Wonky Wang, also known as user Netrom and his work in this area because he did the foundation work towards building models that do what we wanna do right here. And he was just awarded his PhD. So congratulations, Dr. Netrom. Okay. So essentially what we're doing is taking this period over the history of an article like biology, running it through the machine learning model and asking it to predict the gaps between assessments. And what we get is something that looks much more information rich and colorful, like what we see on the right here. So what we're looking at in this graph on the right is essentially a prediction of the quality of the article for every month that it existed in English Wikipedia. And so in this graph, we can actually see the article going through trajectory from somewhere between stub and start class in 2002, jumping up to B or maybe good article class in 2005 and jumping up to a good article or maybe even featured article class around the end of 2009. And so I know that you're definitely gonna have questions about some weird things that we see in this graph, such as what the heck is going on with these sudden jumps and these dips in quality. And it turns out that I've written a report on what's going on here. It's not, there are some things that are a little bit funny, but it turns out that there were some big content removals and additions to the article on biology that explain these jumps and falls pretty well. And so if you wanna dig into what was actually going on there, I have this report that's linked at the bottom of the slide. Okay, so this is cool, right? We can look at the actual trajectory of quality in articles over time. So what might we actually do with that other than this pretty colorful visualization? So that brings me to part three, quality trends. So I wanted to try and figure out how fast is English Wikipedia getting better? Like how fast is this trend actually taking place and when did the growth happen? There's a lot of past work on the declining community in Wikipedia is that affecting the growth rate. We should be able to see that if we look at overall article quality changes. And I wanted to do some interesting breakdowns too. And so there were two breakdowns that I managed to do before writing up this presentation. And that's, I looked at the articles that were flagged using the Wiki project women's scientists template, which renders like this rectangle and the Wiki project visual arts template that renders like this rectangle. You can see these things on the talk pages of articles by the way. So if you wanna see which Wiki projects have claimed an article, just go to the talk page and look right at the top. Okay, so looking at English Wikipedia over time, essentially I'm predicting the average quality level of articles given a particular month. You'll notice that I have a weird quality level at the bottom of this graph that I call empty, which I assume to be a full quality level beneath stub. And so if the article doesn't exist yet and it falls into the empty quality category, and then once it exists as a stub, then it falls into the stub quality category. So in this graph, we can see that the expected quality of a random article over time in English Wikipedia enters into about a linear growth around 2005. There's certainly a bend in this growth and maybe a slowing growth around 2010, but it's really minor. This is basically a straight line. And that's really interesting to see. It sort of flies in the face of the concerns that we have around the population of English Wikipedia editors declining, which started around 2007. So now adding the expected quality of a random article tagged by Wiki Project Women Scientist, we can see a very different trend here. That up until about 2005, things were pretty much in line between articles covered by Wiki Project Women Scientist and the rest of the encyclopedia. However, from 2005 to about mid-2013, there is a, the articles covered by Wiki Project Women Scientist are generally of lower quality than the rest of the encyclopedia. But something weird happens around the beginning of, or sorry, around the middle of 2013 and the rate at which articles about women's scientists in Wikipedia improve goes up dramatically. And we can actually see it about 2015, at the beginning of 2015, the expected quality of a random article about a women's scientist is higher than the expected quality of a random article from the rest of the encyclopedia. So that's pretty cool. So I'm gonna point to this time, this period in the middle of 2013 a couple times, but I just wanna highlight this right now. I'm really interested to hear what people who are actually working within this project think happened around the middle of 2013. It might have actually been the start of the formal Wiki Project for Women Scientist. It might be that they were running external initiatives and editathons or something like that. I think that there will be something interesting here and I don't quite know what to look for. But I bet you that people who are working in this space will have some ideas. Okay, moving forward to Wiki Project Visual Arts. So I dug into this Wiki Project because I attended an editathon that the local Minnesota Group for Visual Arts was holding last weekend. And I thought it would be fun to see how their quality dynamics looked as well. So again, we see a trend that matches kind of close to the rest of the Wiki, but it deviates in a couple places. So there's kind of a minor deviation around 2007 where Visual Arts starts to fall behind the expected quality of the rest of the encyclopedia. But it doesn't fall back nearly as much as Wiki Project's Women Scientist edits. It's relatively minor in comparison. We're talking about like a 10th of a quality class here as opposed to a half of a quality class. But from the beginning of 2010 forward, we see a similar acceleration. It happened earlier than what we saw with Wiki Project Women Scientist. But it again has a very similar effect where now the average article from Wiki Project Visual Arts is higher quality than the average article from the rest of the Wikipedia as a whole. So again, I wanna ask the question if there's anybody who's been working within this Wiki Project for a little while, what happened in the beginning of 2010? What's actually going on there? What might explain this sort of acceleration in the quality dynamics? So this metric, this way of plotting things is pretty cool but it doesn't show us something that would be really interesting to see. Essentially that last graph holds all quality class steps the same, that going from empty to a stub is the same and going from a stub to a start is the same. But it doesn't tell us where the transitions are taking place. Are we creating articles and that's what's raising the quality? Are we maybe doing big jumps where we turn stubs into FAs? Or are we doing lots of little jumps say where we're turning start quality articles into C class articles? All of these could have similar effects depending on the scale that they happen. So let's look at some breakdowns by where these quality changes were actually happening. So I love this plot right here because it helps make it really clear what I mean when I say the empty class. So when I generate this dataset in the beginning of Wikipedia, everything is empty. We don't have any articles yet. And at the end of my time period which it turns out is August 2016, I assume that we've created 100% of the articles which is obviously not true, but it's very hard for me to get around knowing which articles we will eventually create. And so anyway, I'm making the assumptions that the articles we have now are the articles that we will have. And so we're just about to look at a lot of proportions. So just know that it's proportions based on what articles existed in August 2016. Okay, so there's a few things that we can see in the declining rate of empty articles for these cross-sections of Wikipedia. So first we can see that visual arts trails behind the rest of the encyclopedia when it comes to creating articles as well as women's scientists does. And there's the kind of difference here is much smaller visual arts, much larger for Wiki Project women's scientists. We can actually see these transition points too. I have the tip of the arrow pointing to the transition points for visual arts in the beginning of 2010 and women's scientists which is around the middle of 2013. So we can see at least part of that transition had to do with creating articles. Okay, when we look at stubs, we see something very different between these different sets. So for the entire encyclopedia, creating stubs is a big deal and it's something that's generally done. It hasn't really slowed down. We can see a little bit of bend in the curve but it's generally a linear growth. For visual arts, well, there was a slowdown at the rate at which stubs were being created around 2007 but it doesn't slow down and stop. It stays on pretty linearly after that point whereas with women's scientists essentially a stub creation stops or at least we don't create enough stubs to fill that class more than we say move stubs to another class. And so we stay pretty much constant at about 10% stubs in wiki project women's scientists until this med 2013 period where it looks like a lot more articles are getting created. I'm guessing that there's something to do with the wiki project around women in red which is intended to turn red links to notable women into stub articles or something greater quality than that. And so it's probably, I'm guessing that they started work sometime around 2013. At least there was some initiative like that there. So when we look at some of the lower quality classes that are above stub, we see a trend that looks a lot more like the overall trend that I was showing you all just a moment ago. Where we see these sudden switches in the proportion of articles falling into these quality levels around this beginning of 2010, mid 2013 period. That and in this case, we see higher proportions of start and C class articles than the rest of the encyclopedia around these periods too. And so immediately both wiki project women's scientists and wiki project visual arts are higher quality than the rest of the encyclopedia as soon as they start this positive switch. So whoops, there we go. So B class is super weird though. I don't know what to think of B class. And so I mean, when we look at visual arts, there are a lot of articles that fall into B class a lot more than the rest of the encyclopedia. We see a decrease in the rising proportion of articles that fall into B class around 2007 which corresponds to one of the changes that we saw in the past, but we don't see a clear transition happening around the beginning of 2010 which was the bigger transition for visual arts. For all of the wiki, it looks like the proportion of B class articles stays perfectly stable after 2007. For women's scientists, we see this decline in the proportion of articles that fall into B class. And it kind of mellows out around the start of 2011 but again, beginning in the middle of 2013, there's a sudden rise in the proportion of B class articles in wiki, project women scientists. I'm not quite sure how to think about this. This is very strange. So what's up with B class anyway? Is it that B and C class are sort of the same and so people don't like to associate B class with articles anymore? Maybe B class is kind of becoming defunct. Maybe when people get an article to B class quality, they really just want to make the leap towards good article class quality because good article class is actually formally reviewed whereas B class isn't. Or is there something else going on? Maybe somebody else has an idea that sort of explains these weird dynamics around B class. And biggest of all, I want to know what the heck happened in 2007 around B class because it seems no matter what cross-section we look at, something happened in 2007. Okay. So marching forward, I want to look at good articles, but first I want to get this B class out of here because it looks silly compared to the rest of things. And I think it helps our judgment of what's going on with good articles to have something that looks more explainable next to it. So I'm just going to stick the start class graph next to the good article class here. And so we see roughly the same dynamics that we saw across the overall. We see roughly the same dynamics that we saw in start class where Wiki Project Women's Scientist is generally behind from 2005 to about mid-2013. There was definitely an abrupt change that happened here. When it comes to visual arts, had a pretty high proportion of articles that landed into good article class. By again, the beginning of 2010, it just rises well above the rest of the encyclopedia. And so again, we see these transition points play out at the individual class levels. Now when it comes to featured article class, we see something that's very different here. And so before I looked at the featured article class, I really wanted to declare a victory for Wiki Project Women's Scientist and I think that we still should. There's obviously something that's really productive that's happening here, but Wiki Project Women's Scientist still lags substantially behind the rest of the encyclopedia. Whereas Wiki Projects like visual arts are well ahead of the rest of the encyclopedia when it comes to their predicted quality level. And so this leads me to another question. What's going on with featured article class when it comes to Wiki Project Women's Scientist? I think that there's likely a lot of very interesting explanations here, not limited to maybe there's lacking source material. There's obviously a coverage bias when it comes to source material about notable women. So I likely think that that might be the case. Maybe it's for reasons that have nothing to do with quality of the article and source material. Maybe it's harder to get an article about a women's scientist through the featured article process. You know, I'm guessing that the people who are doing this work day in and day out will have some good ideas about why Wiki Project Women's Scientist is ahead in every single other quality measure that we looked at until we got to feature article. There's gotta be something weird going on there. Okay, so one more thing that I wanna address which is we're using a model here. And so is the model just making bad predictions about articles about women's scientists? Could this actually be real? Well, when we look at the raw assessments, they play out essentially like the last period in this graph where articles about women's scientists right now, 0.1% of them are assessed as a featured article quality article. Visual arts land at about 0.3%. And the overall, the Wiki is about 1%. It's actually a little bit higher. I shouldn't have rounded it at that level. It's a little bit larger than women's scientists. When it comes to Orr's prediction model though, Orr's predicts a much higher proportion of articles land in the featured article quality class. And I think that this is because Orr's makes its assessments at every edit, whereas humans can only go through the featured article process. It takes a lot of time. They can't do it all the time. And so I think that Orr's is a leading indicator of what articles actually belong in this featured article class. And so women's scientists land at 0.4%. Visual arts way up there at 1.9% and all Wiki at about 0.7%. And so I think that this is actually real. It corresponds with human assessments and it plays out the way that you would expect. Okay, so to wrap things up in summary, I talked to you about this idea of when will Wikipedia be done. I talked to you about how we've been thinking about quality in Wikipedia for a long time and the trends that we've gone through to get there. I showed you how we use a modeling strategy to try and fill in the gaps between Wikipedia's assessment of quality. And then finally, I showed you some applied measurements that showed us some really cool things and suggested some interesting switches in the trends for Wiki Project Women's Scientists and Wiki Projects Visual Arts and also some really weird things that are going on with B-class articles. And by the way, you can read more about what I've been working on on Meta. There's this wonderful link that I have for you that will bring you to the page, but it's kind of a stub right now. It's a work in progress. I'm still working on more cross sections of the encyclopedia and I'm hoping to intersect these quality measures with importance assessments and page views. If you wanna track my progress, go to the talk page. You can see my public work log there. And so you can see me going through my analyses, adding new cross sections and that sort of stuff. That's gonna be a lot more active than the main page, at least for a while. Thank you very much. Thank you, Karen. Virtual round of applause. I have a few questions myself. I asked first, we have from the room, from Hangout participants or from IRC, a question. We have one observation from Ziko on IRC. Ziko's observing that potentially the reason why women scientists, their trajectory is an outlier is because they're a kind of, especially I think around the start class, is that they're an easy article type to start as a Wikipedia as you don't have much to think about the structure. Mm. Could I ask a quick question? I was wondering what the features in the machine learning algorithm are. Oh, yeah. So here I wanted to get us a visual of start quick. And I think that that jibes pretty well. Start and see, like a Wikiproject women scientist didn't fall back that far. So yeah, I think that's an interesting observations, Ziko. It'll be interesting to hear what people on that Wikiproject have to say. Okay, so now that we've looked at that, let me show you the features that are in this classifier. So in order to do that, I'm gonna go directly to ORS and I'll have it tell us the features that it extracted for a particular revision. So I'm going to paste this URL into the chat. But before I do, I should actually get a revision ID that I know where it came from. So I'll get the most recent revision of the article on biology. So that was a pain when you're doing hangouts. It uses so much CPU that my computer is lying pretty hard right now. Come on, give me a revision ID. Okay, so here is ORS prediction. So that y'all can see. So ORS, by the way, predicts that biology just barely fits into a featured article class. So if I add features to the end of the URL, then ORS will tell us what features extracted for making this judgment. And so we have some language-based features like the length of the article once you stem all of the words. How many main article templates that we see in the article, category links, citation templates, citation needed templates, image links, info boxes, raw number of characters, content characters, external links. Essentially what we're doing is we're analyzing the actual content in the article. And one thing that I should point out is this is not actually the vector of features that go into the prediction model. We actually, we have some secondary features where we say, for example, divide the number of content characters by the number of characters and divide the number of external links by the number of words in the article and that sort of stuff. But when we report the features through ORS, we hide those divisions and scaling and normalizing and that sort of stuff so that we can report out these raw and straightforward features. But all of those divisions, those normalizing and controlling are derivative of this feature set. And so, yeah, I'll leave it at that for now, but if we wanna get more into this, I can show you how you can actually inject features and that sort of stuff. I have one more comment that I wanted to relay from discussion that we had internally. What a question, but it's something we really went over internally when discussing these results. And that's basically about the weirdness that we see around the B class, but also the feature articles that is of the within scientists of the project. So, one more kind of hypothesis that I personally would like to have some input on is whether the growth dynamics of an individual article, it is true that most of these articles are driven by edit terms, so all our office initiatives. It might be at a speed in which these articles basically go through multiple classes, skips intermediate steps. So, I'm trying to figure out if it's, this is because there are some norms about not using specific quality classes or just because of the sheer speed at which these articles improve even the nature of outreach events. So, this is the first topic that we brought up in the discussion. The second one, I really like the comments made about the feature article segment. And it strikes me that obviously the model is straight on whatever data we have currently in the media to determine what a feature article is. And it might be that in such a very likely that several biases both about the complex structure of feature articles, but the norms and guidelines to determine what a feature article is. But also, if you said the coverage of these topics in mainstream media or this kind of mixture may propagate these bias that basically discriminates on a weekend-related comment. So, I think that's a fascinating question and in itself will be a fascinating comment to explore further. Yeah, yeah. Why so many good articles and not featured articles is like, it's a wonderful question and it's kind of like, it sounds like it's the beginning of a critique of our writing process. You know, if you can write a good article about something, you should be able to get it through featured article. And if you can't, then it's not the fault of the writers. It's something else is going on. Yeah. Okay, thanks folks. I should drop off so that Rachel and Andrea have some time. Okay, as a reminder, we'll have additional time at the end of the showcase we'll just go around and have additional questions. And so with that, I'm gonna leave Andrea at this stage. Again, Andrea, very happy, very happy. Okay, hi there. Let me get you. So I'm going to be talking about something completely different. This is work that I've been doing for the past couple of years with my collaborator, Rachel Greenstatt, who's also on the Hangout here and also a PhD student, Naz and Libby. So this is not the title of the paper that we've wrote about this study, but as I've continued to think about the kind of research that we're doing on privacy related issues in Wikipedia, this is how I've started to think about it, that there's risk inherent in participating online and people who are working on seemingly mundane tasks are encountering this risk like anyone else online. And so for about 15 years now, my first Wikipedia study started in 2004, I've been looking at sort of the positive features of participation online and the things that people can learn through these processes, the exciting new things that people can do or the new ways that people can do existing activities like writing encyclopedias. If only we understand how to design the infrastructures and policies that support these activities. So I've mostly been focusing on opportunity and recently it's become more and more important to me to think about the kinds of risks and take more of a critical perspective on the things that people are doing online, Wikipedia included. So there's been a lot of research in Wikipedia over the past decade or so on what it takes to develop successful open collaboration projects. What does it mean to be able to attract participants and volunteers? How do we teach people to contribute to these projects? Well, how do we keep them contributing over time or at least get them to come back if they go away? And so this last bit is something that I hadn't considered. And so that's making sure that people feel safe and able to participate in the ways that they want to. So this is a story that people in the Wikipedia community know pretty well. Basil Cartabil disappeared. And so a lot of the discussion around this has been about his participation in open culture projects. And so this is a really high profile, really sort of dramatic and public example of the kinds of dangers that people might face. But there are a lot of less spectacular examples of people encountering harassment, of people encountering different kinds of threats as they negotiate their participation in online projects like Wikipedia. So last year, about a year and a half ago, Rachel and Naz and I set out to interview people who both participate in Wikipedia who are concerned about their privacy and people who are known to seek out privacy. So anonymity seekers. And so we talked to Tor users who also contribute to online projects. Many of them contribute to Wikipedia. Some of them contribute to other projects who are clearly concerned about their privacy. So we sampled on both of these populations to both collect experiences from people who are definitely high level contributors. So Wikipedians who are definitely participating online and then Tor users who are definitely concerned about their privacy and sort of start at these two ends of the spectrum and gather data about all of the sort of points in between. So we recruited through the Tor project blog through various social media, through Wikimedia lists. And we ended up interviewing 23 people. And 12 of these were Tor participants. 11 of them were Wikipedia participants. You know, the division isn't really that stark. People who use Tor, also many of them had edited Wikipedia. People who edited Wikipedia, a smaller number, but still some of them had used Tor. But they still pretty much represented to distinct populations. They brought different perspectives as you'll see when I show you the kinds of threats that they perceived. So usually when you do a qualitative study like this, you provide a participant table that sort of breaks down. You know, we had a male who was aged 40 who had a master's degree. We didn't do it that way for this study because people had really serious privacy concerns. And so we sort of aggregated things to communicate what kinds of people we talked to where they were from, what their experiences were like. As far as analysis, this was a very standard inductive analysis. We interviewed these 23 individuals, actually we interviewed 22 individuals and recorded the interviews and transcribed them and analyzed them. One of our participants preferred to meet face to face without any recording. And then I took all of those data and did open coding and shared these themes with the themes that emerged with the co-authors and we reviewed and discussed them. So this is a very typical kind of approach to doing qualitative research on people's experiences on something like privacy. So what we found, and you can read in a lot more detail about this in the paper that's linked from the Wiki Research page, we found out about the kinds of things that they were worried about, about the sources of those perceived threats. We found out what the conditions were for people not perceiving threats when they participate in open collaboration projects. Everyone had some sort of privacy concern but some people weren't very worried when they were specifically contributing to open collaboration projects like Wikipedia. And then we gathered data about strategies for how do they deal with these risks? How do they deal with these threats? And some people modified their participation, other people enacted different degrees of anonymity and we'll have some examples of how that all shook out. So the kinds of perceived threats that are out there, not surprisingly, the most general one was that there would be some unknown parties surveilling everything that they did. And for Tor users, this was a really big concern. Wikipedia users somewhat loss of employment or loss of some sort of opportunity, like being able to get into grad school. This was something that Wikipedia and Tor users both talked about. Tor users were slightly more likely to talk about it. But again, these are small numbers so we don't really put much stock in the breakdowns. But it's interesting to see what kinds of things sort of float to the top and where people's concerns generally lie. Safety and harassment, intimidation, these were concerns that started to lean more toward the Wikipedia experience and the reasons why Wikipedians wanted to seek out privacy, likewise reputation loss. And the sources of threats are not surprising. I mean, these are pretty standard governments, businesses and other private citizens. So this is an example of the kind of data that we would have coded as a concern for safety. We had a Tor user who talks about why they started using Tor for their online contributions and says they busted his door down he's speaking about a friend who was politically active in similar ways that as the participant and they beat the ever living crap out of him. He was hospitalized for two and a half weeks and they told him if you and your family want to live then you're going to stop causing trouble. So here this person has a family and decides that, okay, I'm going to start taking some of my human rights activities online into other identity through the Tor network. Wikipedians had similarly dire perceptions of threats to their safety. So we had one Wikipedian who mentioned that he had been threatened with a drive-by shooting of his house. And he didn't really take it very seriously but then he says, I pulled back from some of that Wikipedia work when I could no longer hide in quite the same way. For a long time I lived on my own so it's just my own personal risk but now my wife lives here so I can't take that risk. So people were experiencing threats to their own safety threats that were directed against their loved ones or that they perceived to be directed against their loved ones in both of the populations that we interviewed. And so I could go on and provide a lot of interview experts about the kinds of things that people were afraid of but this is an incomplete list having their head photoshopped onto porn, being beaten up, being swatted, being doxxed. So these run the gamut of online and face-to-face threats. Now in some cases people didn't talk about feeling threatened and there were two reasons that tended to come up when people said that they weren't worried about participating in projects like Wikipedia. One, they're not interested in controversial topics and two, they self-identified as a member of a privilege class that didn't need to worry. So a tour user says I come at it from a completely privileged position. I'm an employed white male so I have no horse in the race. I have colleagues who get the death threats and the rape threats and all the rest of it. So they recognize that features of their identity make them part of what they perceive to be a protected class. And then a Wikipedia says I'm in the privileged position of not being interested in any of the topics that would be of particular interest to say the NSA. So being a white American who's probably not at the top of the watch lists to begin with, this person does not have many concerns about contributing. So when people do perceive a risk, how do they deal with that? How do they mitigate that? Two basic ways, first, modifying their participation. And this gets particularly problematic when you think about the impact on projects like Wikipedia. So here's a female Wikipedia editor says I let them know who I am. So I'm no fun to chase, but I don't edit topics like for example, women's health topics or sexuality. Not because I think I might be wrong about it. I've got my giant obstetrics textbook open right next to me, but I don't want the backlash. So this woman's pre-med, this is the person you want editing women's health topics and sexuality. This is the person who's genuinely concerned that people have good information and who's actually studying to be an expert in this area, but she doesn't want to deal with it. So that's one way that we heard people eliminated risk was just to modify the ways that they participate. The other one was to attempt to enact different degrees of anonymity. So we heard lots of different strategies for doing this. Using multiple accounts was one. Asking others to post things for them, whether on Wikipedia or forums or wherever, was another way that people maintained their ability to participate in different kinds of projects and using privacy enhancing tools like Tor. So that's a really kind of high level overview of the kinds of things we found when we talked to people. The implications that we see in this are that, so first of all, this challenges the assumption that underlies a lot of the discussions about open collaboration that knowledge and skills are equally shareable. That we just need to get people to show up, that we need to motivate them and incentivize them and teach them to write, and then they'll show up and do it. It's also about providing contributors with these safe places where they feel like they can contribute their knowledge. And this is not just a concern for Wikipedia. It's really something that is relevant to all kinds of online participation. And so there's this threat of a new kind of digital divide between those people who describe themselves as privileged in various ways and who feel like they can participate and contribute their views under their real identities and then those who don't feel that freedom to do so. If some groups of people are systematically excluded, the information that we have access to becomes biased and the opportunity for big ideas that are underlying projects like Wikipedia, democratizing knowledge and creating a representative resource for human knowledge, it becomes diminished if people don't feel equally able to contribute. So it's not just harmful to Wikipedia, it's kind of a critical concern for online participation in general. There are possible socio-technical solutions that could help mitigate some of this. I mean, okay, so let's talk a little bit about what we're thinking about next. Rachel and I have been talking to, I should give a shout out to Mako Hill at University of Washington about ways that we could continue this work and think about some of the problems that were uncovered in our, this was a very exploratory interview study. How do we continue this? So for one, there was a disconnect between internet contributors threat models and the kinds of privacy protections that are supported by providers like Wikipedia and other places where they want to contribute. So we had reports from people who said, well, when I realized I couldn't use Tor, I just decided to stop contributing. But if service providers aren't supporting tools like Tor, it's probably because they aren't perceiving the same threat models as their users. So to develop privacy enhancing toolkits that can help with this situation, we really need to not only understand threat models of the people who are contributing, which we've tried to start with this interview study, but also understand what's informing the decision at the organizations and not just Wikipedia, but other organizations as well. So we've been talking to a whole bunch of places that support user generated content development to start to set up the foundations for studying this further. We saw chilling effects on participation. So like this example of the pre-med student who didn't wanna edit women's health issues. So we know that some people want to contribute, but they're limiting their contributions. We have at least some proof of concept experiences that we've seen. So we don't know the extent of that loss though. We don't know how to measure this and how can we measure what isn't being contributed is a really hard problem that I think we'd really love to talk to people about ideas for measuring what is the potential loss if protections for anonymity aren't supported. And then we also have these anecdotal bits of stories that suggest edits are being reverted when participants don't use well-known user names or don't use well-known identities. So we had one participant who stopped editing under a well-known username and started getting reverted, not because he changed the way he was editing, but because he started using a different account. And so we're really curious about the extent to which perceptions of anonymity influence the perceptions of quality when people contribute to projects like Wikipedia. Cause if we can figure that out, then we might be able to develop reputation metrics or other kinds of approaches to help support good contributions that are coming from anonymous users and sources. So in the paper, we talk about a couple of ideas for directions that socio-technical solutions could take. Like systematically, everyone we talked to who had become an administrator or had taken some sort of central role in Wikipedia said, oh, well, yeah, when I started, I had no idea that it would be problematic to say, you know, edit near my home or edit about geographic locations near my home or divulge things about who I am. And then later on, they became more concerned about these behaviors. So handling these temporal features of privacy concerns as people move from peripheral to more central participation, there's things that open collaboration projects could do. Thinking about un-linking technical identities for admins from their past identities while retaining markers of trustworthiness or experience, this might play out differently in different kinds of projects. Supporting users of the anonymous web, so experimenting with existing tools that MediaWiki supports like something like Cunding Changes for edits from Tor might help, but it needs to incorporate feedback from Tor users to understand does this actually solve the problems that they perceive when they come to something like Wikipedia. So yeah, so that's it. And like I said, this is work that's been done in collaboration with Rachel Greenstat, who's also on the call. So Rachel, feel free to jump in if you like. Thanks Andrea, I gained a round of virtual applause from the room and from the hangout. We have time for questions. So again, I could say five minutes of all this talk, then we can open up those more to questions about the hybrid regions. So IRC and Hangouts, any questions for Andrea? One question from Ziko on IRC. They ask, privacy enhancing tools are good for potential victims, but also for potential perpetrators, or does it not matter so much for this latter group? I had muted myself. So that's a really great question. And I think that this is sort of the wicked problem for the anonymous web is figuring out how do we support people in using anonymous tools when they want to make good faith contributions, whether it's to Wikipedia or political discussions or whatever it is they want to do online and balance that with service providers' desire and need to quash abusive uses of their tools. And so we've discussed some different kinds of approaches to this for I think some of the most, the successful ways of doing this are going to be tailored to individual projects. So something like a general communication tool like Twitter or discussion forums isn't going to require the same kind of solution as Wikipedia, right? So there are tools that could be developed that are tailored to projects like Wikipedia that would be able to allow some sort of oversight either by the community, probably by the community or by subgroups within the community. So I think this is a problem that is able to be dealt with both from a social perspective, if we give people the right technical tools and potentially from a technical perspective but that's not really my area of expertise. So I haven't really explored the technical possibilities as much. I could jump in briefly, where I think that I got an anchor here, but anyway. There has been a lot of research in things like anonymous blacklisting approaches that try and sort of balance that area. What I think is missing in that work is sort of a connection between the privacy enhancing technologies researchers that do it and the actual service providers who have sort of different perceptions of their own threats so that the solutions may not be a perfect fit for the exact situation. So I think that's an area where again, more conversations with service providers and so on. I think it's also really interesting to see that a lot of, it depends on what you think of in terms of perpetrators, whether it's the people doing the harassment being anonymous or just sort of other groups. There was a really interesting study done by the ADL recently on harassment in Twitter that seems to show that like a lot of times the people doing the harassment are not particularly hiding behind anonymous, I mean in some cases they are, but in many cases people are very overt about their harassment. So I think the perception that anonymity is sort of the enabling force of harassment. Harassment is perhaps a little overstated. So if there's nobody else, I have a question. So I've been looking at all sorts of fun and interesting ways to catch vandalism and spam and personal attacks and all sorts of negative things with machine learning models. And it seems like this might be a way to sort of calm concerns around tour users contributing in spaces like Wikipedia. And so, but the thing that I really wanna know is how much do we know about like the negative things that come out of this so that I can know if we have appropriate models now or if I should start working on a new modeling strategy in order to be able to come up with this kind of project. So that's a really excellent question. And this is precisely what I was saying when I said, we know that there are people who want to contribute but are holding back because of concerns about revealing who they are. And this may not be something that tour will solve in all cases, it's not a placebo, that's for sure. But there are certain, there are also people who want to use tools like tour. So having, so there's two things. There's measuring what kind of bad stuff is gonna happen, how much bad stuff is gonna happen and what kind of bad stuff is gonna happen. And there's measuring how much good stuff is gonna happen and what kind of good stuff is gonna happen. And it may be that there's lots of drivel that would be let in if you open the tour floodgate, right? And I don't know if it's a floodgate or a trickle, frankly. I'd have no idea how many edits might be being turned away. That's another question, it's open. But the kinds of edits that are being attempted from tour, I think it's really important to look at what kinds of geographic locations are erupting now. What kinds of topics are being attempted to be edited? What's the nature of the loss versus the nature of what could be gained? I think it's a much more nuanced question than how many edits are bad and how many edits are good. But what is the kind of content that is potentially being lost because if it is coming from voices that are currently underrepresented, that could go a long way toward improving quality in areas that aren't currently getting attention from those people. So just a quick follow-up. So I wonder if you are saying exactly what we need to do is open the gates, maybe for a short period of time, see what comes through and then re-strategize based on that. That's kind of my dream, yes, is to have that data to see what actually would happen. Makos done some really interesting research on Wikias in terms of real name kind of account policies versus not and seeing how that affects it, which is not the same thing, but it might be able to inform things a little bit. He has a really interesting suggestive graph in his work that seems to suggest that you lose the trolls and you lose some quality, but you also lose some good quality stuff when you make these kinds of changes in terms of being more restrictive. But it does look like the trolls tend to come back, whereas some of the good stuff just gets lost. And I mean, more work is clearly needed, but I think we're definitely hoping to do that work. See, Aida, I wanted to jump in and make two quick comments. I know there are more questions coming up on air, so you have two, I think, very relevant points on this specific thread. So the first thing is that it turns out that we are looking into harassment and we have been running this project for about three quarters now on improving ways to detect personal attacks and harassment. Henry from the research team has been giving a presentation of some coordination with folks in community engagement at WMFs. We're gonna have some more announcements in the next few months, but I just want to highlight this is definitely an area that we're looking to. And the issues that you both have mentioned around the lack of data, if we're gonna train the model, we're definitely gonna have a gigantic survivor bias. We will not capture all the issues where people are not just there and not participating, right? And it's gonna be interesting to see how we can better identify these issues so we can better train this model specifically. But the second, like a bigger picture, kind of a comment I want to make goes back to your slide about implications, right? So you really said, oh, mostly people who are feeling comfortable contributing are A, not just in controversial topics and B, from a privileged class. And I'm gonna say a few things about personal capacities and is it representative of what everybody thinks about these issues, but I would say that people who are not from a privileged class and people who are contributing on controversial topics are exactly the people that we want. I think that the question of who controls the narrative is more timely and important than ever. The last thing that probably we want in our project is to have an over-representation of topics that are non-controversial or that are edited only for a very tiny privileged class. The value of BDA is to provide a long-term memory on topics, including topics that are controversial using neutral tone and the best possible sources. It's really critical that we address these biases and we try and attract these contributors. And there's an additional reason why this is profoundly problematic. And the question is that whoever controls the narrative doesn't just control whatever this BDS says. It also allows racial, gender, cultural biases to be propagated to the entire system of consumers of data that are derived from BDA. And this is a method. So basically, we are training an entire generation of AI tools or algorithmic search engines or whatnot based on the context that we have today and affected by the biases we have today. And this is much bigger than BDS and BDA itself. Another comment. Yes, more questions from Irish Lee. So we have one question from Tillman Bayer. It's over several messages. So I'm trying to actually like, okay, I'm just gonna read what he says. I read the paper and the press release and I think this statement does not make sense. Wikipedia allows people to edit without an account but does not permit users to mask their IP addresses and blocks Tor users except in special cases. So it is still possible to piece together an editor's identity or location by looking at the things they've contributed. That's the quote. The question starts, the IP address which Tor obfuscates and the contributions which are public by defaults are entirely separate. And in my humble opinion, this also undermines the argument for relaxing restrictions on Tor editing. Any thoughts on that discrepancy? So the ways that people can be sort of outed in terms of who they are and where they are were, there were lots of different ways that people were concerned about being outed. Some of them were concerned about the idea that Wikimedia Foundation has their IP address or that even though they logged in with a username, they didn't want to reveal where they were editing from because of laws in their local country or whatever. Lots of different reasons that they felt they might be sanctioned for participating. There were also people who were concerned that their history of edits over time by being linked together either through their username or because they have a unique IP address or whatever it is would reveal who they are because of having edited things about, for example, their college and their hometown and their place of work or it would be obvious to other people in the Wikipedia community sometimes and they didn't want that. They wanted to remain anonymous. And in some cases, these were really long-term participants in the project. These weren't people who were just dropping by. These were folks who wanted to contribute and were contributing good faith but didn't want to reveal who they were for whatever reason. They found it very difficult to maintain anonymity because of the combination of ways that you could piece together who they were using both technical resources and features of their identity because of the content that they had contributed. So I'm not sure if that quite answers the question. And I think one difference between the Tor users and the Wikipedians in our study is that a lot of the Tor users who said that they didn't really trust the... They knew that the IP history of edits were not public to everybody but that you needed privileges but they didn't necessarily trust that those privileges would be enough for somebody who subpoenaed those logs or somebody who hacked Wikipedia and got access to it and docked them or things like that. People were still very... Especially Tor users who obviously are concerned about that threat model where we didn't necessarily think that was sort of safe enough. Awesome, thank you. Another question from Ziko and IRC. Andrea, what do you think about allowing only registered editing? Registered as in you have to make an account? I think that's it, yeah. Okay, I'm really impressed with the line that the foundation has taken on allowing people to remain anonymous which it's sort of an interesting conflict of definitions that we ran into continuously in this work is that from the Wikipedia perspective, anonymous editing means you have revealed your IP address which is the least possible... That's the least anonymous thing to people who are using Tor specifically to subvert revealing their IP address. So yeah, I'm not a huge fan of the idea of making people register to contribute things. I think it's a great strength of the project. No, I think the Tor project themselves have suggested that maybe only allowing registered users to edit through Tor might be a reasonable compromise. Yeah, so I would agree that if you want, there are ways that we could imagine facilitating editing using Tor tools like Tor that would be sort of special cases without having to turn them away at the door, right? Like you can register or your edits get reviewed or there's lots of different possible ways to enable something like that. As if we're talking about this, just want to make one important clarification of terminology in the iteration implications and when we talk about register versus anonymous or IP edits. So first off, you mentioned all the dedications about revealing one's identity, location, gender just via public data. So it is true that basically all the data about contributions, about discussions that someone posts both when we did our own talk pages creates basically a record that leaves traces and may potentially get on your mind these contributors. And there's been some great research and a previous showcase where we posted in this series by Andre Rizoyu, that's called Evolution of Privacy Loss in Wikipedia and it tackles exactly that problem. So I invite you all to look it up if you're interested in the implications of data organization or identification based on public data. The second question about private data and I want to reinforce something that may not be entirely clear in general. We talked about retention of key addresses. So the foundation does not retain any data, any personal information in pretty good addresses in private past 90 days. So what that means is that if you edit using a registered username, there will be a period of time during which your IT is retained mostly for operational issues and for, you know, personal issues like detecting, for example, DDoS and this kind of stuff, but also for protect users. But the past 90 days basically all information pertaining to IP addresses entirely will come from our servers. So the answer to the question about subpoenas simply doesn't apply because that data doesn't exist and it's not retained at all by the Wikipedia foundation. This is definitely something that we found is that a lot of times people don't have a clear sense of what's happening and that's what actually contributes to a lot of privacy concerns is just a lack of understanding of how data is being captured, how it can be captured, how it will be used, how it is being used. So I think that could be potentially helpful to some of the people we talked to if it were more widely understood. I think you're right. Real quick, it will be useful to have at least some pointers and some better explanation at the time of the edit screen for on the side of yours or the registration process for sending up for a new account about the implications of the direct access that's very successful. And I think there are more comments from IRC. John, is there anything you want to relay? There's another comment from Tillman just as a remark regarding Andrea and Rachel's response. I think that mixes the very practical concern of public edit histories with the in almost all cases quite theoretical concern of subpoenas or check user abuse. Also, he wanted to clarify, he did not see the latter mentioned in the paper specifically of, although of course it does not cover all the interview material. Thanks, yes. Other questions about Andrew's talk or Aaron's talk before we wrap up? So please check that. Oh, maybe one thing that I should highlight while I have the chance is that Twitter kind of exploded about the quality trend change for WikiProject women scientists and there was a substantial project that started by Kelana in mid-2013 that likely explains this substantial shift that we've already started to talk about as the Kelana effect because it's actually affected a few other initiatives like the Wiki Education Foundation and that sort of stuff. So I think that we have an answer for at least that transition. Yeah, and thanks everyone for like supporting this multi-channel conversation is pretty amazing. We have IRC, social media and Hangout at the same time. And thanks to Brandon who's been helping behind the scenes through this successful. If you have any additional questions, I'd like to thank everybody again round of extra applause to our speakers. And this is the last edition of 2016. We're gonna come back in the new year with more research and more presentations. So thanks for watching today and happy audience everybody. Bye. Thank you all. Bye. Bye. Thanks.