 All right. Hello everybody and welcome to the October edition of the research showcase when it is audio and I'm joined today by an external speaker, Aaron Shaw, from North Worcester University, and our own Jonathan Morgan, who is going to be presenting today. The format is usual, so we have two presentations. Jonathan is going to present a strong work with Ana Fidelpova in the first half of the showcase. And then we're going to pass it on to Aaron, who will be presenting joint work with Esther Hardite in the second half of the showcase. As usual, we're going to have a short amount of time for questions and presentation, and then an extended Q&A at the end of the two talks. We have a Baha in the IRC channel as our IRC host, so if you have any questions and you are in the IRC channel with media, that's research. Please bring in the today's questions to the audience. And without further ado, Jonathan, the stage is yours. Awesome. Thank you, Dario. Let me share my screen. Can you see my slides? Can anyone see my slides? Yes, we can. Okay, cool. All right. Thank you for joining us today, everybody. My name is Jonathan Morgan. I'm a researcher here at Wikimedia, and today I'm going to present some work that I performed with Ana Fidelpova, who's a research scientist at GitHub, and unfortunately could not make it today. But the work I'm going to talk about is related to the role of social norms in a Wikipedia sub-community called the Tea House. I want to start this presentation with what I hope is an uncontroversial statement that open collaborations like Wikipedia are powerful. They're self-organizing systems powered by dynamic communities of diligent volunteers. They create things that benefit the entire world. And in fact, in many cases, the products they create, like huge multilingual encyclopedias, are so good that we take them for granted. We may not be able to imagine a world without them, and we also may not think very hard about how they came to be. People who study open collaborations have identified several principles that underlie their success. Chief among these is that open collaborations feature low barriers to entry and exit. That means they have low commitment requirements, which makes it easy for anyone to contribute. They utilize volunteer work, which means that contributors are intrinsically motivated, and they feature fluid boundaries around teams, practices, and roles, which means people can organize themselves and coordinate their work in whatever way works best for them. However, these same features that make open collaborations powerful, their low barriers to entry and exit, can also make them vulnerable. Let me explain. Low commitment also often leads to high membership churn, with experienced people leaving and new people cycling in all the time. This, coupled with reliance on volunteer contribution, also makes it harder to enforce rules because people don't have to obey. And fluid boundaries can lead to low group cohesion, which erodes the sense of community that comes when you have more explicitly defined group boundaries, roles, and goals. So maintaining a vibrant and successful open collaboration can be challenging. How do you teach all those new members how to behave? How do you regulate behavior when you need to be careful to avoid discouraging or alienating valuable contributors? And how do you maintain a sense of community and a common purpose? Well, one way that we humans deal with these kinds of challenges in other social settings is through social norms. Norms are shared expectations about how to behave in particular situations. However, as powerful and ubiquitous as norms are, creating and maintaining strong social norms has challenges of its own. One such challenge is that people don't always comply with social norms, even when the norm is clear and the reason for it is self-evident. Norms also don't exist in a vacuum. They can conflict with and contradict one another, making it hard to determine which norms or whose norms should take priority. Such normative conflict can tear communities apart. So how do we understand how social norms affect behavior and open collaborations? And can we use that information to make the norms that govern communities like Wikipedia more effective tools for socializing newcomers, for regulating behavior, and for maintaining strong bonds and shared goals? One prominent criticism of social norms from the social psychology literature is that it's a poorly defined concept. It's not always clear to identify the norms at play in a given social context or how those norms are influencing behavior. The focus theory of normative conduct by Robert Chaldini and colleagues attempts to address this criticism empirically. The focus theory defines two types of norms, descriptive and injunctive. Descriptive norms are beliefs about how people do act in a particular situation. Injunctive norms are beliefs about what kind of behavior other people would approve or disapprove of. In other words, how you should act. The focus theory also posits that a particular social norm can only be influential if it is activated. That is, if a person's attention is focused on the norm through situational or personal factors that make the norm salient to them. Chaldini designed a series of clever experiments to validate the focus theory using littering in public places. They put an annoying handbill on the windshield of the subject's car in a parking lot and then tested a variety of conditions to see if by making pro and anti-littering norms salient, that is, observable and interpretable as norms, they could change how much people littered. In one experiment, after the subject found the handbill, they had a study confederate walk by and drop a similar piece of trash on the ground, making a pro-littering norm salient. Under these conditions, subjects were likely to follow the example of the confederate and litter their own handbill. In another experiment, they placed a swept pile of litter near the subject's car in order to communicate the injunctive norm that although people did sometimes litter here, it was not okay to do so. Under these conditions, they found that subjects were less likely to drop their handbill on the ground. They also experimented with providing conflicting normative cues. For example, they found that if a subject sees swept litter and also sees a confederate walk by and litter in front of them, they're still unlikely to litter. Experiments like this one provided compelling evidence that injunctive norms are often more powerful than descriptive norms. In other words, people are more likely to behave in a pro-normative fashion if they know they should, even if they see other people violating those norms. Like physical environments, online environments also frequently provide cues to normative behavior. For example, showing the most recent or most popular posts at the top of a page is a common pattern in online forums. This design makes descriptive norms salient by allowing new arrivals to see what kind of thing other people are posting and viewing right now. Another common pattern on forums is to post community rules, FAQs, or policies prominently. This makes injunctive norms more salient by letting newcomers know how they are expected to behave in this community and how not to behave. However, even design cues like these will not assure that people always behave as expected. Online social and environmental cues around what is normative often conflict, which can make it difficult for a newcomer to decide which one to follow. It can also make them miss the cues you want them to notice. Furthermore, if they see widespread evidence that other people are not complying with your community rules, they may decide that it's not important for them to comply either. And of course, each person also draws in their own past experience when deciding how to behave in a new setting. If a newcomer believes that they already know how to behave here, they are less likely to notice local rules or infer norms from other people's behavior. In this study, we're interested in which type of norms, injunctive or descriptive, are most influential in online communities and under which circumstances, and also whether descriptive and injunctive norms are more influential when they're aligned rather than conflict. The community we studied in this case is the Wikipedia T-House. The T-House is a forum where new Wikipedia editors can go to get answers to common questions from experienced editors. In that sense, the T-House is itself a place where new editors learn about Wikipedia's norms. However, in this study, we're focusing instead on the behavior of the people who answer the questions, the experienced editors, and specifically on how the way they answer those questions reflect and reinforce the local norms of the T-House. At the T-House, like many forums, the newest questions appear at the top of the page. This should make descriptive norms for answering questions salient to new answerers when they join, because it makes it easy for them to observe how other people answer questions. For example, if a newcomer notices that answers on the T-House generally have a friendly tone, they are more likely to use a friendly tone when they start answering questions themselves. Here's an example of a typical answer on the T-House. Overall, it is friendly, personalized, thorough, and clear. But more than that, it reflects some specific answering norms that set the T-House apart from other discussion spaces on Wikipedia. These norms are articulated in a document called the host expectations and consist of five different considerations for answering questions. These norms were developed and agreed upon by the community when the T-House was founded. They represent local guidance for behavior, local norms, but they aren't enforceable and there are no penalties for violating them. In this study, we'll be focusing on two of these. Number one, welcome everyone. And number four, avoid overlinking in your answers. Returning to our example, we can see how these norms shake out in practice. This post starts with a warm welcome per the expectation to welcome everyone with a friendly hello. Because the welcome everyone norm usually happens at the beginning of the post and follows a regular pattern, the focus theory would say that it's likely to be highly salient to a new answerer. This post also follows the avoid overlinking norm. It contains several hundred words of text and only a single link to a help page. However, this form may be less salient because it doesn't stand out or follow a regular pattern. So it may be difficult for a new answerer to infer that the normal number of policy links, what the normal number of policy links should be just from looking at other people's answers. Which is unfortunate for the T-House because the host expectations themselves are kind of hidden. They're not prominently linked to from the main Q&A page and most answerers probably don't know they're there. As a result, most new answerers learn how to answer questions at the T-House through descriptive norms, not through injunctive norms. However, a subset of new answerers are exposed to the host expectations. People who create a host profile are shown these expectations and asked to uphold them. Creating a host profile doesn't grant you any rights or responsibilities and you don't need to make one in order to answer questions at the T-House. You can just start participating. And in fact, many experienced and long-term contributors never create a profile. However, the presence of this feature creates an information asymmetry when it comes to norm of exposure, which allows us to perform a kind of natural experiment. Because while all new answers have the opportunity to learn how to answer through descriptive norms, injunctive norms are only made salient to some new answers. According to the focus theory, being exposed to the injunctive norms of the T-House should have a stronger and more resilient impact on behavior than being exposed to descriptive norms alone. Someone who reads the host expectations should be more likely to act according to them, even when other people are not. And they should also be more likely to follow even norms that are hard to infer from observing behavior, like the norm against overlinking. In addition to understanding the interplay between local descriptive and injunctive norms, we're also interested in understanding what happens when someone joins a community after participating in other communities that have conflicting norms. Do they adapt their behavior to local norms, or do they bring their old norms with them and continue acting the same way they did before? We addressed this question by examining the behavior of people who joined the T-House after participating in the Wikipedia help desk, a separate Q&A forum, which has different norms around what makes a good answer, and which also shares members with the T-House. Here's an example of an answer to a question on the help desk that reflects its own local norms. The answer doesn't include welcome, because the help desk doesn't have a welcome everyone norm like the T-House does, and welcoming is relatively uncommon there. Like the T-House, the help desk does have an injunctive norm around how many links to use in an answer, but that norm is the opposite of the T-House norm. On the help desk, answerers are encouraged to add many links in their replies to ensure the most accurate instructions. So, per the focus theory, we might expect that when an answerer starts on the help desk and learns the norms there, they'll bring those conflicting norms around linking and welcoming with them when they join the T-House. We attempt to answer all of these questions by collecting a corpus of over 29,000 replies to new editors' questions posted on the T-House between 2012 and 2016. We perform regression analysis to learn what factors are the best predictors of an answer that reflects local injunctive norms. An answer that contains welcomes and a high link to text ratio. The factors we examined as predictors of norm compliance were whether the answerer had a host profile, the overall welcome frequency, and the link ratio in recent posts by other answerers. And in the case of answerers who worked on both the T-House and the help desk, which forum they worked on first. Here are some of the things we found. First, in alignment with Chaldeany's experiments on littering in parking garages, we found that whether someone complies with the descriptive norm is contingent on several factors. If it's a norm-like welcoming that is relatively more salient, easier to detect, then their likelihood of complying is higher if they see a lot of other people welcoming too. But for a norm-like policy linking where the expected behavior is harder to detect, then there is no overall relationship between the number of links an answer includes and the link ratio in recent posts. On the contrary, injunctive norms are more persistent. People who are exposed to the host expectations, people we call hosts, tend to welcome more and include fewer policy links regardless of whether other recent answers contain welcomes or a high link to text ratio. We also found, as predicted by the focus theory, that norm conformity was highest when descriptive and injunctive norms are both aligned and salient. In this case, answers given by hosts at times when there were many other examples of welcoming present on the Q&A board had the highest likelihood of containing a welcome. When we analyzed norm compliance among answerers who worked on both the T-House and the help desk, we found evidence that prior exposure to conflicting norms reduced local norm compliance. Compared with T-House first answerers, those who started the help desk are less likely to welcome even when everyone around them is welcoming, and they tend to include more policy links and less pros in their answers. So while we think that these findings are interesting in their own right for what they say about Wikipedia, we also see broader implications. First, we found evidence in a naturalistic online setting that supports the focus theory, which was developed and tested under experimental conditions in physical spaces. We also observed theory aligned effects of exposure to injunctive norms, specifically that they exert a more persistent and more powerful influence on behavior even under conditions where descriptive norms conflict. And we've seen that activation matters. Norms like policy linking, which are harder to observe and infer, are less likely to be activated than more salient norms like welcoming, even among people who are exposed to injunctive norms and examples of pro-normative behavior. We also see several concrete implications for design in these findings. First, our evidence suggests that posting community rules in a forum can be helpful, even if they are not enforceable, and even if not everyone follows them all the time. In fact, the T-House would probably benefit from posting the host expectations more prominently to increase their salience to a wider group of answerers. We also found evidence that highlighting recent examples of pro-normative behavior, perhaps through mechanisms such as upvoting or featured answers, can increase norm compliance. For less salient norms, like the ones around overlinking, simply user interface nudges may be useful. For example, the interface for drafting an answer could track the link-to-text ratio and suggest reducing links or providing additional elaboration to people who are including too many. Finally, our findings suggest that people who join a community from similar communities may need different kinds of socialization than newcomers who have no directly applicable experience to draw on when they're deciding how to behave. And there's much more. It's all along in a detailed paper. So for additional methodological details, more findings, some ethnographic interviews that dive into these phenomenon more deeply, and lots more discussion and implications, read the paper, which is now available on social archive, and will be presented at the computer-supported cooperative work conference next month, or ask me during Q&A. And that is it. Thank you very much. Awesome. Thank you, Jonathan. We have ample time for questions now. So when asked about, is there anything coming up from IRC? No questions yet. Okay, I'll start kicking these off with one question I have. So, this is, like you said, this great research that also has much drawing implications than just Wikipedia or online spaces. And I was thinking of, like, specifically designing implications related to community health norms, specifically, you know, safe space policies and code of conduct kind of norms. Safe space policies are designed to follow a model of the injunctive norm, because we know, I mean, they're designed because pure exposure to distributed norms just doesn't work, right? So you need something that really has a saliency and has an injunctive format for it to be able to be effective. That at conferences or online spaces, there are very different ways of making the injunctive norm safe. Conferences have started, like calling out very prominently at the beginning of, say, the opening session that an event is subject to some code of conduct. I've seen many, many people actually introduce the, an explicit condition at the registration time for people to abide by the terms of a policy by actually taking a box before being able to complete registration. So I think there are many learnings here that could apply to, you know, outside the online interaction spaces. And here's if you have any, like, in the context of community health norms, where is some pattern that could be, that could be reused coming out of like this, this end of research. When it comes to how to, how to design injunctive norm for them to be salient enough to be applicable and enforceable. I mean, I think that the, the, the biggest takeaway in terms of increasing saliency, so I'm obviously yes, you know, posting things, posting things publicly, right? And in a context where you want people to be taking them into account is important. I also think that, you know, that the lesson that we learned from the welcoming versus the policy linking norm is that the other feature of saliency is not whether it's just like public, but whether it's, it's, it's something that you can, that you can detect and observe and kind of be reminded of if it's, if it's a con, if it's fairly concrete. So I think that this, this really brings up the, you know, the utility of having examples of pro normative behavior, or people who are able to roll model, basically, people who are, who are there in that context who are demonstrating through their own behavior, what it looks like to be, you know, to, to be respectful, or, you know, to, to maintain safe space and having kind of concrete examples of what is and is not safe space rather than kind of having a nebulous concept. So, so doing the work to actually break down what, what safe space means in this context, and ideally trying to get, make that a consensus, consensus based process where, where people, the people, you know, who are going to be there have both a say in what defines a safe space, and also have some kind of, you know, have internalize what the concepts mean, right, beforehand, and then also, you know, during the, you know, having that kind of come up again, in various ways. Yeah. Yeah, I really like the featured answer. Right. See, any other questions from the room. No more questions yet. Hey Jonathan, so the examples that you gave both in the original theory about littering and in the paper with like the tea house, those focus on what I guess I would say are like minor behaviors of greeting somebody or picking up bitter. And does the theory also apply in the same way to major changes of behavior or ways of acting in general. So for instance, we want to have tea house host to greet people in the tea house that's great. But what if we are also thinking about the norm of nurturing new editors in general, when you are an experienced editor wherever you are being an experienced editor, does the theory magnify to those kinds of concepts. I mean, I think that the, so, the way that, you know, the tea house. So in the example post that I gave, you know, that that's a, you know, that's a response that that clearly, you know, shows a, you know, if you read through it that there's, there's a great deal of empathy and understanding about like what new editors need and kind of meeting them where they're coming from and being polite civil. And that reflects these more specific, you know, kind of, you know, consensus based community rules about, you know, what we do. The smaller pieces, ideally, kind of work together to create a larger kind of framework for action, you know, a larger more coherent set of expectations for how you act in a certain space. So starting with these, you know, starting with these smaller rules that reflect a larger kind of set of values, essentially. You can, you can kind of build a, you can build that, that wider normative kind of framework that kind of, you know, system of norms and ethics from these kind of these self mutually reinforcing components, I guess. Yeah, that makes sense. And that's actionable. Like if you want to build a broad community norm, you can try and compose it out of little ones. Yeah, ideally. Sweet. So, I think we're ready to start with a second presentation a little bit early on time, but that's cool. So with that, Eric, I'm going to pass it on to you. Hi, everyone. My turn to take a second to share the appropriate part of my screen. See, is that now coming through for everyone. Great. Okay. Well, thanks very much. I am Aaron Shaw, and I am an assistant professor at Northwestern University. I did this joint research that I did together with Esther Hargitai, who as you can see on the slide is at the University of Zurich and it's quite late at night over there so she won't be joining us today but she sends her regards. And I will just apologize in advance if my presentation is not quite as smooth as Jonathan's he did a wonderful job and that was great. So, without further ado, let's see if I can get this to my slide to advance. Okay, so this is work that Esther and I actually published earlier this year. So I just this is the first page of that article it's in the journal of communication. If you'd like to read it you can potentially download it from them although unfortunately it's not open access and that site at that journal so if you are not able to access a copying you I will just say please contact me. My name and email address are in various places on the internet, and I'm happy to share slides and copies of the paper and anything else that anybody would like. What I want to do in this talk is share the share the paper the findings from the paper basically and start a conversation about some of the implications of this work that may specifically speak to Wikipedia Wikimedia projects more generally and related communities involving sort of open collaboration open participation on the web. And I'm also going to talk a little bit about some future research directions that Esther and I are pursuing related to this work, since at this point this paper has sort of been tied off. I can tell you a little bit more about what we're doing next. So the context for this paper is an idea about knowledge gaps and how that relates to digital inequality. The idea of a knowledge gap comes out of some research in the, I believe in the 70s actually that talks about how different levels of media access and skills at navigating media can exacerbate inequalities of education and really socioeconomic inequalities. So basically it's a story of a feedback loop right the generic idea of a knowledge gap is that those with more knowledge and ability to navigate media sources, use those skills and knowledge to expand the gap that they already have conferring greater advantage upon them in various aspects of their social lives. So Esther my collaborator in this project, and to some extent I have done a bunch of work understanding how different types of really differentiated levels of participation in various spaces on the internet work, and how those connect to different types of knowledge. And this study is an extension of that this study is a is an attempt to understand how different kinds of inequalities of knowledge of skills of access to resources may compound each other. And in particular what we're looking at in this case is a familiar context. So, for the purposes of this study we're really looking at knowledge gaps and participation inequalities in and around Wikipedia. As many of you know, Wikipedia is one of the most important sources for public knowledge sharing and knowledge access in the digital or network public sphere. And the fact that it's open and can be edited and has been edited by millions of people means that it's, it's an important space where people both access and acquire knowledge but also where they do that by participating and contributing. In terms of understanding inequality in this environment. We focus on understanding how how that how that kind of works as a process basically how people move from the from consuming content on a site like Wikipedia to producing and contributing content. Most prior work looking at inequalities and participation gaps in Wikipedia has focused on the gender gap. My personal favorite estimate of the gender gap which is my own prior estimate of it is that on at least English Wikipedia, we think that, you know something around a quarter of editors are are likely to be female identify as female in a survey. And the responses to this have been, you know, numerous and too numerous to really describe here so there are lots of initiatives from within the foundation and beyond the foundation to recruit women to edit to change the culture of the community and do even more. And I would say that in this respect, you know some of the work that Jonathan and others have done around the tea house and other projects and then understanding how to create a more welcoming community has been really crucial. A big difference between that work and what Esther and I are trying to understand here is that we're really looking outside of the community we're looking to understand who gets to become a member of the community in the first place. And in that sense, the, the gaps that we're interested in are really sort of prior to the gaps that that emerge from having say a safe space within the community in a certain sense as you'll see. So what we tried to do in this study was develop a really pretty schematic and rough perspective model of how this might work. And we were inspired by prior work looking at pipelines as a way of talking about gaps in terms of gender based inequalities in stem research and employment fields. And what we wanted to think about was what are the skills and behaviors that are necessary in order for someone to become a contributor to Wikipedia. As you can see in this fun little drawing that Esther created for our paper. We've got a really sort of simple version of this that is not very nuanced or very deep but which we wanted to see sort of lay it out. Test it, see how it helped us understand potentially in different ways the kinds of participation inequalities that there's been evidence of before, and really open up some new questions that we hope we can pursue. So in this case what we're looking at is how people are not only, you know, internet users in theory at least all have access to Wikipedia. But there are some intermediate stages along the way that you can see in our little diagram that people would have to pass through if they're ever going to become a Wikipedia editor. They, they're not going to be a Wikipedia editor if they've never heard of the site if they've never visited the site and if they don't know that the site can be edited. Right, and you can imagine that this is not just true of English Wikipedia or other Wikipedia's or other Wikimedia projects, but that it's true of basically any participatory site on the internet. And so as I started with, we're really thinking about this as a, as a case within a much larger ecosystem participatory sites on the web that can help us understand how knowledge gaps and participation gaps might work in digital environments. So to understand this and evaluate this, what we did is we collected survey data, and this was an original survey that we commissioned an organization called NORC, which is used to be known as the National Opinion Research Center. But now they prefer just to be called NORC at the University of Chicago, so that's what I will call them. And as you can see here are some descriptive statistics about what the survey was like and what sort of sample we've got through that survey. The survey is meant to be nationally representative, although it over represents certain groups of people, as you can see in those descriptive statistics to the right. The average education level in our sample is a little bit higher than the average in the national population. And people tended to be more active online and have spent more time online than the true sort of national US population seems to be according to other data sets. I can tell you more about the survey instrument and Q&A if people have questions about that. But for now I'm just going to focus in on the questions that relate to the topic at hand. So these were the outcomes that we were interested in, right? If you remember my little pipeline diagram, these are the four different steps along that process. And what we expected to see was a sort of decreasing rate of affirmative answers to these questions. So right, we expected to see fewer people editing Wikipedia than hearing of Wikipedia. And here's what we found. So this is just across the entire sample, right? As you can see, most of the vast majority of our survey respondents, nearly all of them had heard of Wikipedia. And then as you sort of move into the higher skill, higher awareness behaviors, we see these numbers drop off. And I think it's noteworthy that the steepest drop off is really at the bottom there between knowing that Wikipedia is editable and having edited Wikipedia. I also think it's noteworthy that for those of you working at the foundation, it may seem completely unbelievable that only 70% of Internet users in the United States, according to our estimate, would know that you could edit Wikipedia. But that's pretty consistent with some prior findings that Esther and I have had. And in general, I think it's often surprising to people who study and work and spend a lot of their lives on the Internet that not everybody knows how this stuff works. The other thing I want to point out about this slide is that 8% number. As far as I'm aware, this might be the kind of best most recent number of an estimate of the US population that has ever edited Wikipedia. And on the one hand, it's a lot lower than those other numbers, as I was saying before. On the other hand, if you think about the percentage of the US population that say has sent in a correction to an encyclopedia ever in history, I bet that number would be a whole lot lower. So this is both a cause for celebration, I think, in terms of the relative rate of inclusion that Wikipedia and its sister projects has achieved in terms of building public and open and accessible knowledge resources, and a cause for further consideration and reflection. So just wanted to underscore that. So what I'm going to do is I'm just going to really descriptively summarize the results of some regression models that we ran. What we did was we ran regressions to estimate who engaged in each of those four outcome behaviors that I talked about before. I'm not going to show you regression tables because I think that's sort of horrible, a horrible thing to inflict on any audience. But like I said, if you're interested in that, it's all in the paper in all sorts of gory detail, and I'm more than happy to geek out about it in the Q&A or afterwards in some other format. Excuse me. So what I'm going to show you are just some descriptions of the regression results here, and here you can kind of get it all in one go. So just starting from the top, we found that education level, internet skills, and I can tell you a little bit more about that measure in a second, and age all helped explain variations in all different steps of the pipeline. Right, so kind of across the board we see that those who are higher educated are more skilled, more knowledgeable about the internet kind of internet related behaviors and technologies in general, and who were a little bit younger in age were more likely to have heard of Wikipedia, have visited Wikipedia, to know that Wikipedia could be edited, and to have edited it. And that was strong and statistically robust across any model, version of the models that we created. Looking at some of the other steps along the way, we see that earlier steps in the pipeline, right? So having heard of the site and having visited it were also explained by variations that really correspond to socioeconomic status. So income and employment, having a higher level of income and being employed, you were more likely to be someone who had heard of Wikipedia or who had visited the site. When we look at those middle steps in the process, visiting the site and knowing that it could be edited, we find that those with greater levels of internet experiences, right, who have spent more time and more years online and who have a wider array of points of access to the internet were more likely to have engaged in those behaviors than others. And finally, and interesting in relation to some of the prior work in this space, we find that gender and race and ethnicity only really help explain variations in the later steps of the pipeline. Right, so those are the outcomes that are looking at whether people know that Wikipedia can be edited and whether they've edited or not. And then we found that really interesting that those really only pop up at the very end, right? So to give you a slightly more visual, more tangible sense of what that looks like, and to help you understand it in a different way, I'm going to show you some visualizations that we generated as part of the study as well. These are all going to be graphs that help on the y-axis. I'm just sort of describing them here. You're going to see the probability of each outcome, and you're going to see it sort of cut and sliced across a couple of different variables. And I'll explain those in more detail for each of the two graphs I'm going to show you. I'm only going to focus in on two for right now, just in the interest of time. So here's the results of our predictions when we look at who knows that Wikipedia can be edited. And that's what you're looking at along the y-axis, the vertical axis. And you can see that that goes from a very low probability at 0% to a very high probability of 100%. And what you see here is across the x-axis, we've got this measure of internet skills. And just to give you a sense again of how that's measured or to, I guess I didn't do it before, so I'll do it now. What we did here is this is a measure that Esther has developed and used for a number of years and has validated in a number of ways, where we ask people about a series of different technologies and tools, things like what is a PDF, things like what is a browser cache, things like what is phishing mean. We don't actually ask people what to answer those questions, we ask them whether they, how confident are they that they know what it is. And Esther has validated that if you sort of aggregate people's responses to those questions, it does correspond very well to other measures of how effectively they're able to, for example, find information that they need on the web. Right, so this is sort of a generic internet skills index, which we can tally up all together and which goes from one to five. So that's what you're seeing measured across the X axis. And what you're seeing is that as we move from lower levels of that skills index to higher levels, right, as you move from left to right, the probability that someone, anyone knows that Wikipedia can be edited is increasing. Right, all of those graphs are moving up from left to right. But if you look at the different education levels, which correspond to the purple and the green, you'll see that those with an education level of having received a bachelor's degree or higher are much more likely to know that Wikipedia can be edited than those who have less than a BA in terms of their educational achievement. Right, and that's true at every point across the skills index. So that's that separation between the purple and the green across the, across the scale. And we can experiment all of this a little bit further by dividing it and looking at male and female respondents to the survey. And here we see that there is the lines are separated right they're separated between males and females across across the spectrum. And we see that even when we're controlling for variations in education and even when we're controlling for differences in internet skill levels. We see that male respondents are more likely to know that Wikipedia can be edited than female respondents. The extremes these differences are really big right so a highly skilled highly educated male respondent to our survey was a proxy I mean if you look at the graph here was approximately 90% likely to know that Wikipedia could be edited. Whereas a low educated low skilled female respondent to our survey was only about 30% likely to know that Wikipedia can be edited. That's a huge difference. Okay, that was a mouthful but I won't have to explain it all again for this one because now we're looking at a lot of the same things just for a different dependent variable. This this time what we're looking at is the probability that a respondent had contributed to Wikipedia. And if this graph looks a little bit familiar to you it might be because Esther and I have actually published a prior study of this in 2015 with a sample of us college students and the results of this graph. It's identical to that graph. So we we feel pretty confident that this is a this is a pretty consistent result at this point. And what I think is notable here and again we're looking at the same thing other than that right it's across it's looking across internet skills from left to right, and then we see the separation that we're visualizing between education levels of the BA or higher and gender in respondents. Here what we can see that's especially interesting is that at the low end of the skills index, essentially everybody is equally unlikely, and I want to say sort of spectacularly unlikely they're almost completely unlikely to have ever contributed to Wikipedia those not really different from zero. Whereas if you go all the way out to the right end of the graph, you'll see, again, a big difference between people of different education levels and people of different gender. So, once again, we see that highly educated highly skilled male respondents are the most likely to have ever edited Wikipedia, and the probability that they've ever done that is approaching 25%. The probability that there's lower skilled, lower educated female respondents the probability is under 10%. So, okay, so thanks for bearing with me as we went through those graphs, let me try to sum this up a little bit. So what we come out of here with is a story that this pipeline idea seems to hold up. Right. And I think not terribly surprising in that respect. But what's interesting is to look at the specifics of how these kinds of behaviors disaggregate right we see drop off across the pipeline, but we see that education and internet skills really help explain variations throughout. And that things like the gender gap really only happened towards the end in those last two stages. And that I talked about in the graphs where we see differences between male and female respondents. Had I shown you the same graphs for the earlier stages in the pipeline, there's no gap between those male and female respondents. So that sort of helps put in perspective some of the earlier work where we've been really focused as a as a research community and as a as a community, working with these with these projects to think about the gender gap is this pervasive that happens throughout. But it turns out that the gender gap seems to occur in specific at specific points and people's accessing and contributing to the community. So when we want to think about the broader takeaways of this. I think it's important to think about sort of what the, you know that if the implications for projects like Wikipedia, but that is trying to build the sort of just open repository of knowledge that anyone can edit. I think the is that really thinking about barriers to participation in a much more multidimensional and almost sequential stage wise kind of way. What we've talked about here that really hasn't been covered in prior research are these sort levels of awareness that happen between becoming an intern having access to the internet and becoming a contributor to a community. I think prior research has maybe tacitly sort of assumed that if you're connected to the internet. Well then you could be an editor of Wikipedia or an editor of any other online kind of participatory website. And it turns out that there's a bunch of steps in between that are not equally or evenly distributed and that those distributions are have have patterns and they reflect other kinds of inequalities in society. And what this reinforces is this earlier knowledge gap idea that I started with where we see that those who have resources and skills and awareness and knowledge when they first get online and when they become active internet contributors are much more likely to engage in these other kinds of knowledge production and knowledge consumption behaviors down the road. I think that in terms of building more accessible and enhancing access to contribution. It really speaks to the importance of of going beyond the kind of work the kind of and I want to emphasize very important work that's already happening and making the existing community more inclusive. And I think this speaks to the importance of really building easier on ramps to knowing about Wikipedia to knowing that it exists to knowing that it is something that anyone really can edit and really continue to evaluate whether and how particular interventions are affecting particular types of people. Right this work suggests that those who already know that Wikipedia can be edited may need a certain kind of intervention to help them become active and sustaining members of the community. Whereas those who don't have that knowledge already who have maybe never even visited the site. Well they probably need a very different kind of intervention and I think that's a real opportunity for work that those inside and outside the foundation can pursue. Stepping back even a little bit more broadly here to think about the broadest level of implications for this. I think that there's a lot of work to be done to understand how these kinds of gaps may or may not work and other spaces on the internet. We find evidence here that existing knowledge gaps are likely to contribute to feedback loops in terms of digital inequalities. How does that look on other websites and other and other places on the internet that's something that Esther and I are currently investigating but that I think there's a huge opportunity to do a lot more work on. Likewise I think that this is something that has changed over time right you Jonathan talked about some sub communities within English Wikipedia that are approaching. You know solutions to these kinds of problems in very different ways. But I also think that communities at different points in their lives may face different versions of this kind of issue. Right it's very different to talk about who is and who isn't knowledgeable about English Wikipedia today than it was in 2005. For example, or then it might be to talk about who is and is not aware of contribution opportunities in wiki data right or you know insert your favorite community here. So I think understanding this more broadly both in terms of across other communities and over time and over community life cycles and stages of their evolution is really an opportunity and something that we're interested in that I'd love to talk about more. The other thing that we're working on here is I think we've we've really focused a lot and understood a lot more from these recent studies that we've done about who does not participate in these kinds of communities and I think understanding where those changes to non participation are happening and where where people are sort of falling out of the pipeline for lack of a better way to put it is something that we're hoping to do through interviews and other other methods. And I think that's all I'm going to cover for right now so I hope that folks have questions and again if folks are interested in this please find me on the internet or respond to any of the folks who organized the talk and I'm more than happy to share copies of paper slides and any other materials that are useful. So, thanks very much. Fantastic. Thank you. Thank you. Super engaging and relevant presentation, given overdoing in the movement and the foundation moment on these very topics. So, yeah, just checking the room in our IRC for any questions. There's a bunch here in this room that we're going to hold off for a second. We don't have any questions from IRC yet. Gotcha. Okay, well, I mean, so we were kind of wondering about the 8% number. You mentioned like that it seems high. I was just kind of kind of game out in my head how high it really seems so 8% of American adults would probably be something like 19 million people. And I don't know if anyone knows at the top of their head like how many distinct user names and IP addresses have even edited English Wikipedia is that it because that's a ceiling for that kind of number. But yeah, I don't know I was wondering if you thought about that Aaron. Yeah, that's a great question. I think that it's likely that that number is a little bit high in our data set to I sort of alluded to this at the beginning that that they're the survey. I think it over represents active Internet users and it over represents people with higher levels of education and income. And as you saw in the rest of the talk, those are exactly the people who are more likely to be Wikipedia editors in the first place right so While I could go and dig up the survey weights that we got from nor to try to come up with a slightly more precise estimate. I prefer to just kind of report the data that we have and throw a few grains of salt on it in this way so I would guess that that 8% number that we're getting is also a little bit a little bit inflated, but I hesitate to make a guess as to exactly how much how inflated it might be. And then my second question was the pipeline drawing, you know you all chose four or five different segments of the pipe. Yeah, for in doing that did you all consider additional more granular segments and then kind of aggregate them up, because I was wondering if there really are many more segments in between the big ones. Yeah, for one to the next whole disclosure, Marshall is working on the growth theme so cares about one specific segment right so I mean to give you an example on our team we care about retaining new editors. And so the part of the pipe that we focus on is between when people create an account and when they make their second edit, like that is the that is the part of their life that we care about getting them the first edit and then to their second one. And so we've been trying to figure out like what are the little segments in there, people have to achieve in order to keep moving through such as open the editor, right type something in the editor say the editor, those are really really granular but I was wondering, because we've been thinking that way whether there's granular ones higher up in the pipe. Right. Yeah, I think I sort of apologized a bunch for the coarseness and crudeness of our graphic but I think it's also a pretty course and crude idea. And honestly this was something that we we just sort of sketched out before we did the study because we weren't aware of anybody having sort of looked at this, even at this level of granularity before in terms of participation across a bunch of different sites. To answer your question, most more directly, I think there probably are more granular levels and I think you could break this down a lot more. We don't have more granular levels in our survey. And I think that, you know, to the extent that something you have a much greater opportunity to do through the foundation which is really to just focus in on understanding who moves up a sort of ladder of engagement or through a pipeline or whatever metaphor you want to use right I'm not attached to the pipeline idea in any profound way. I think there's a huge opportunity there to really break that down further. I don't have any specific empirical examples in mind. You know, I think another body of work that speaks to this, for example, is something like, you know, I see, I see Isaac's name on the call there but I know that Isaac and other collaborators of his including Brent Hack that at Northwestern have done some work that helps to understand when people encounter content from Wikipedia elsewhere on the web, for example in Google, in Google search boxes excuse me. They often don't even realize where it's coming from and so my guess would be that you could even insert that would be an example of some way that you could say people have encountered Wikipedia content but who aren't even aware that it's Wikipedia content right that's an additional grain that you could drop in there. And I think that, you know, the both at that end of the pipeline sort of the extreme early part where people are on the web and accessing various things but maybe don't know what what Wikipedia is or have never visited it directly. And then the end that you're working at where people are moving from edit zero to edit one to edit two. I think there's got to be a ton in between. I think the I think that part of this is really probably and and something that we're trying to do is really understand more qualitatively now we're hoping to we're planning to do some interviews to follow up especially with folks who, you know, aren't going to show up in a sample of existing editors. Right so folks who are on the internet but maybe don't know as much about this kind of stuff to understand what they do know and what are the barriers that you're going to see from recognizing when they're visiting a community like this because I think there's probably a lot more that we could do to understand those steps along the way. That makes sense. Thanks. Thank you. I can ask another question more focused on the gender related issues as to the intervention related applications of presented. So to brutally simplify what you were saying sounds like a what one key takeaway of this work is that many of the gender focused interventions, given their focus on the very end of the pipeline. I think I really have a hard time. What moving needle. You know, at scale, because of basically, you know, the point in the pipeline at which to apply which is typically, you know, intervening at the final segment of the awareness to first participation kind of part of the pipeline. I think it's a very valid concern. And I think that this compounds also with the fact that obviously there's a very steep barrier for for this that is basically caused by the pre existing bias when it comes to gender in sources that people use and representation so all of these basically concur in making that work even harder if it's tackle downstream away. But the one the one thing I want to ask you is that there's a, I think an impact theory behind these artificial niches, that is basically, given how widely, you know, Wikipedia contents get distributed and consume even elsewhere on the web just working on on the representation, correct representation bias, for example, is something that may reverberate back to the earlier stages of the pipeline and so, in a way, it sounds like it sounds like the stages of you, you enumerated are kind of independent. But I'm just wondering if there's a way there's a way of thinking hey these downstream interventions actually may affect some of the earlier stages. Let's say, for example, the literature, the literacy one right. If there's no content on a specific topic on the internet is unlikely that the right kind of people who may pick up some skills and don't for the literacy, maybe even interested in that. So, so I mean, I'm curious about feedback effects that may affect the early stages of the pipeline. Yeah, that's a great point and I think both of these questions really underscore the the weakness of the metaphor to right I think, getting the danger of calling this thing a pipe, the benefit of calling this thing a pipeline is that people have heard that idea before, and that it sort of creates a cognitive shortcut. The weakness is that that cognitive shortcut is really problematic, like in the sense that I don't think I think your question. It perfectly points to some ways that it's not a terribly linear. It's a complicated social phenomenon who participates and who doesn't and who has access and who doesn't. And it's not likely to be a linear process. So it's not as simple as well if we cram more people from stage one of the pipeline into stage two and more people from stage two to stage three that we will then have billions of people editing English Wikipedia and we get to just wash our hands and walk away. I think that's not likely to be the way it works and I think you're exactly right that understanding this is, this is just one way of thinking about this and I actually, I think that the, you know, having a more precise and empirically validated theory of how this could change is probably an entirely separate, is definitely an entirely separate thing. Right. I don't think I think that there's already been, you know, there's some evidence that things like the art and feminism really have a big impact in terms of correcting content coverage biases, right, like if nothing else. Right, I think understanding sort of how those content coverage corrections reverberate in the ways that you put it, and may shift the sort of sort of people's understanding of what is not covered in the encyclopedia and may shift people's baseline perceptions about what kind of a place it is or what kind of content it has or not. Right, I think those are all, those are all potential pathways to changing this and I think what's important to me about this is making, making it clear that I don't think this research really gives us a very precise way forward in that regard and I think that understanding the potential pathways forward is a really important area for more work in my mind. You know, I have the luxury of saying that because that's sort of just my job, I think from the perspective of the, you know, in terms of I'm talking to a bunch of people at the foundation whose job it actually is to figure out what they're going to invest in and make bets on in terms of what's likely to work. And I think that here it's really about pursuing diverse strategies and evaluating early and often right doing, I really just think doing robust impact evaluation of all the different strategies that you're pursuing is probably the most important step that you can take. And I think that thinking in those multi dimensional ways right that maybe this isn't going to be linear maybe that you know increasing content coverage in a sort of underrepresented area could reshape people's perceptions of the site and could we measure that might be a worthwhile worthwhile way to think about this in terms of next steps. But yeah, for in terms of where you started, you know, I really was surprised by the extent to which there were inequalities along genderline specifically in terms of who knew that Wikipedia could be edited or not. Right. And so I think that's an example of the more we can learn about different points in these sort of on ramps and ways that people learn about this kind of community. The more opportunity it gives us to develop concrete strategies that we can then evaluate the impact of so that yeah that to me feels like you know if there's a contribution of this study to that work. Hopefully it's opening up some new possibilities and some new directions and some new ways of understanding what the impact of different different initiatives might be. And in fact, big confounds that we're going to be aware of like the early stages totally totally totally. Yeah, yeah, yeah, that's a that was that was surprising to us to we did not expect that at all. When we started this study. Yeah. Thank you. Yeah, thank you. Are there any other questions online or offline. I was curious if you could speak to some of the so this work was done the US thinking kind of beyond the US, you know something like education makes sense to me that that would, you know, you would see that show up in a lot of different countries maybe gender would show up in different ways I just wondering if you could talk to that. Yeah, yeah, yeah. I know that there has been a little bit of work that's tried to take some of these kinds of measures of participation and skills that Esther uses and examine them in other national contexts and the results have been different. You know, if I don't think anybody's really directly tried to link it to Wikipedia editing specifically, at least not that I'm aware of. And, but if you know if you're interested in some of those citations, I could dig them up or I think they might even be in the references to that paper. Just in terms of work that's looked at skills and participation in other countries right there's a lot of great research happening in Europe on this. There's, I think there's some in Latin America and Asia I'm aware of less of it in Africa, for example. Yeah, I mean my sense is that these things really do vary quite widely. And I mean even looking at the results of some of the previous foundation editor surveys right you can see that participation rates and the demographics of who participates vary a lot so you know I think one one idea I've heard sort of thrown about in the passing conversations with folks at the foundation and elsewhere is this idea of sort of targeting particular populations that might be especially strategically important right so if you have language editions for example in Wikipedia's that are much much smaller than the than the potential speaker community. So let's say you know I don't know how many millions or billions of people speak English in the world but for you know if you compare that to the size of English Wikipedia proportional to the size of the speaker population, and then try to generate the same ratio for a few other languages and speaker and language editions and language speaking communities around the world. I know that the foundation has made an effort, for example, to increase participation in Portuguese language Wikipedia, right. I don't have any idea what this would look like in that environment, and I think there's been a lot of great research on digital quality in terms of access in say Brazil, for example, but none that I'm aware of that explicitly focuses on understanding how these kinds of participation divides when you get into these more fine grain distinctions works out. So it would be a really interesting thing to do and I think one nice thing is that a lot of these survey questions that we're working with, and that are in some of these other studies I'm talking about have been at least validated or translated. There's a bit of a roadmap in terms of if you or somebody else wanted to pursue that. I think, you know, there's a bit of a roadmap in terms of some steps you could take to at least start to gather the data to understand the landscapes in those places. But yeah, I have no knowledge of anybody who's really directly gone after that yet. Fantastic. Thanks. I appreciate that. Yeah, thanks. Actually, I passed this on to the folks in the New Year readers team that are working on just would be a brand awareness right in different markets. That's so they can check busy with their own instrument design and see how that compares to do this. Oh yeah. So that would be fun. I would love to hear more about what they what they know about who does and does not read Wikipedia. So that'd be really fun. Sweet. Any question from Drew or from IRC. A little bit more time. No questions from IRC, unfortunately, or fortunately, all quiet on the on free now today that's the yeah. All right. So, big thank you to our speakers. Thanks for joining us today. And I'll ask you to share the slides so we can post them on social media and on the showcase website. And see you all next time in November.