 And welcome, especially to Esther and Erin. Esther Herbitai is the Delaney family professor in the Communication Studies Department at Northwestern University. She's also faculty associate at the Institute for Policy Research in Northwestern, where she leads the web use project. And Erin Shaw is also on the faculty of the Department of Communication Studies at Northwestern University at both Long Time Burke-Maniacs. And welcome. Cool. Thanks so much. Great. Thank you very much. We are delighted to be here. Erin and I were fellows here both in residence five years ago. And that's when we met. And that's when we started working together. And we had a paper published last year on civic engagement and internet use. This is our second written collaboration. And in the meantime, I'm delighted that Northwestern was able to snag Erin. So now he's a colleague as well. And that's awesome because I get to see him regularly. So we're going to go back and forth with this project. This draws on work that I've been doing for quite a long time now on internet skills. And Erin will start us off. Then I'll talk about data methods, some of the findings, and then he'll wrap it up. Thanks, Esther. And thanks, everyone, for coming today. I'll try to talk loudly enough. It's really loud up here with the overhead. So if you have a problem hearing, just raise your hand or something and let us know. So the starting point for this research is about understanding who edits and who contributes to Wikipedia. And so the logical question to start with then is, why does that matter? Why do we care? And there are lots of reasons, I think. And some of them you can see up here now. It's a really popular website. By a lot of measures, it's one of the five most popular websites on the whole internet. And people go there for information about numerous topics, basically almost any topic. It's a really popular, prominent search result. And it's the largest free source of information. So a lot of people who might not have access to Harvard libraries or things like that to seek something out might look for it there. And the other part of this is that, in addition to the billions of monthly page views that that sort of search traffic generates, is that it's an incredible repository of volunteer labor. So the closest estimate or the best estimate that I've seen estimates about 41 million hours of volunteer labor have gone into creating Wikipedia. And if you stop to think about that, it's just unbelievable. It's huge compared to almost any other volunteer endeavor you can name. And so understanding what it is that brought that about and who brought that about and who didn't bring that about is a really big question for research. So there's a big butt here, though. The big butt is that there's a huge gender gap. And this has received a lot of media attention. So you've probably read something about this. The estimates that I prefer to use, in part because Mako Hill who's there in the back of the room and I did some research on this as well, suggests that there's only about 16% of editors worldwide of Wikipedia are women and about 23% of US adult editors are women. And so there's been a lot of research that's tried to investigate this kind of question. And like I mentioned, lots of press coverage and so it's a big splashy topic. But there's some interesting aspects to most of the existing work. So far, and a lot of it I want to emphasize has been really excellent. A lot of it's been done by the Wikimedia Foundation or in collaboration with people who currently work at the Wikimedia Foundation. And a lot of what they've done looks for answers within the community of existing contributors or tries to recruit people for their studies through the Wikipedia website. And so we'll get to some reasons why that might be an issue in a minute. But the result is that a lot of this work is looked at the dynamics of existing participants within the system or the culture of the community among the people who already participate in it. And that leads to a big limitation in the research design. You can't really answer questions like this. Are women showing up to edit in the first place? And if not, why not? And the prevalence or research designs that people have used to look at this can't talk to these questions because they're not looking beyond the boundaries of the community in the first place. So what we're bringing to this is also a perspective, a lot of which comes out of work that Esther and her colleagues and collaborators, other colleagues and collaborators have done, which focuses on understanding the reasons and differences and why people participate online in a lot of different ways. And some interesting differences here have come out across gender divisions and particularly across people's skills at using the internet. And you'll hear a little bit more about some of the measures, the ways that Esther has developed to measure web use skills in a minute. But these measures have been developed over the course of over 10 years now. And they're really valuable predictors of a lot of different kinds of behavior online. So what we're contributing here is really bringing this data on skills to questions about the gender gap. So this is a good time to turn it over to Esther to take it from here, sorry. Okay, yeah, so I'll start talking about it here because this is a data set that I've been developing for several years now. And we realized that this data set is actually really good for answering questions about the gender gap in Wikipedia contributions because first we have data on user skills, which again, literature has suggested that people's internet skills are predictors of all sorts of activities. So it may well be that it's related to Wikipedia contributions, so let's look at it empirically. The other thing is that in this data set, we have data on non-editors or non-contributors. So we address that kind of sampling of the dependent variable problem of projects where they only look at those who are already contributing. So we have data on the same questions and variables that we have for contributors. We also have for non-contributors. And also this data set that I'll tell you about represents data over time about the same people. And that's extremely unique in internet research if you're familiar with social science data sets on internet uses that it's very rare that people are actually followed over time and we have data about the same people over time. So those are the strengths of the data set that we use here. So let me tell you a little bit about the data. So we draw on data from an urban public university, so that's not Northwestern. So that's not, it's a university that I've never actually been affiliated with nor has Aaron in any way other than to do this study. I picked the university because it's very diverse. In its student body, which is of interest to me, I'm a sociologist, well, so is Aaron. And I'm interested in questions of social inequality. So I wanted to be able to look at a group of people who are diverse. There were also logistical reasons for choosing this campus, which is that they have a required course that was willing to work with me in terms of accessing their students. Now that might seem simple. You walked into a course, you collected data. It's not that simple. These were actually, this is a course that has 90 different sections and so it's actually quite an elaborate study. Moreover, we did the data using paper pencil surveys and the reason is all my work is interested in internet use and internet skills. It would be wrong to do the data collection online because I would be biasing towards those who spend more time online, who have more privacy online to take surveys online, who have more skills to be able to navigate questions online. So this is a method that is very rarely applied anymore and is especially rarely applied to internet research, but I think is still a very important method that we need to bring from prior social science work to studies of the internet. I should also note that while in the first wave, which was in 2009, we went into these 90 classrooms to collect data. In the subsequent years, we couldn't do that because we didn't know what classes people were taking and some had left the university. So in fact, we had information, we had people's mailing addresses. So we contacted them in postal mail, stuck to using paper pencil surveys. In 2010, collected more data and then 2012 collected yet more data. And our talk today mainly draws on data from wave three, but also data from wave one as controls. And that's interesting because often a lot of, most of internet research, when it looks at the relationship of different variables, it does it on cross-sectional data, meaning the data were collected at the same point in time. But if you think that one might cause a difference in the other, if you're collecting the data at the same point in time, you don't really know if the cause is actually going in the other direction or if it's going in both directions or whatnot. So having data over time actually addresses that nicely. Okay, so also just I stand by the quality of these data. One of the questions we ask that I welcome, I invite you to read is to check if people are actually paying attention to taking the survey. And in the 2010 and 2012 surveys we have yet and other such attentive, but it's called the attentiveness question. And we pitch everyone who doesn't get both of these right basically. So, and they're interspersed in the survey. So again, I think the data are quite high quality. We lose, first year we lost about four and a half percent and the 2014, we lost about three and a half percent of the people this way. And again, it's all through postal mail, very expensive and hard but worth it in my opinion for the quality that it yields. Just a few words about the sample. Is it at all representative of people that age in the US? It is in many ways. Of course, these are people who at least started college. So they're certainly more highly educated than many other Americans. But in terms of their graduation rate in four years, they're actually quite similar to graduation rates across the US. So in that sense they're representative of that age group in that sense. Just a few words about the people. They're all young. Diversity and gender, many are first generation college students. As I mentioned, I worked with this community because of their diversity and that shows in their race and ethnicity as well. We also collected data on whether they're working currently just as a way to account for how much time they might have for different types of activities. So on the one hand, strength of this data set is that it has diversity in terms of things like socioeconomic status. On the other hand, I do want to remind you that of course we recognize that it is a particular population. So yes, they're all young adults. The smiley is over Chicago because that's where the school is. And of course they're all, they at least in one year all attended the same school. They weren't already three years later, but so we do recognize that I want to remind you as we draw conclusions that those might not generalize to all populations. Okay, so let's talk a little bit about people's internet experiences in the sample. These concern, well data from 2009 because we're controlling for prior experiences. We have information about how long people have been internet users, how many access locations they have to use the internet if they want to, which is called autonomy of use in the literature. We have data on frequency of use. And then we have this internet skills index, which I can talk about in more detail during Q and A. It's basically, it asks people their level of understanding on 27 internet related items, and then we create a scale. I have three academic publications about the development of this scale. The first time I developed it, it was based on a very elaborate study where I also had data on actual skill and basically the point was to come up with a survey instrument that was most closely correlated with the actual scale measure. Of course, it can use refinement and updating over time because it's an incredibly elaborate process that hasn't been done too much, but I encourage people to do it. We also have some Wikipedia-specific measures. One is in 2009, we asked people about their confidence in editing Wikipedia. So we have data on that. So this is very specific to Wikipedia. And then we also asked people, and this is relevant, especially for 2012, whether, but retroactively asks whether the people have been assigned a task in school that has to do either with starting a new entry or editing an existing entry. So we thought that was very important to control for just to see how that might affect people's contributions to Wikipedia, but also to see if you haven't had such assignment, what are the chances that you have contributed to Wikipedia? Okay, so let's see how much people are editing Wikipedia. Before we go there though, let's make sure people are actually reading Wikipedia, which I think we all know they do, but this is a good check on the data set, right? Like are people reporting that they're using Wikipedia? And indeed 99% of respondents say that they have looked at Wikipedia, not surprising, completely consistent with what we would assume. It's a good check again in terms of the data. Well, let's see now how much people are editing, which we partly, as you can see, asked in terms of frequency because we wanted to see that nuance. It turns out, it doesn't matter much because there's so little contributions going on that there's not much to do with breaking that down, actually. So I'll let you look at those numbers for a minute. And I decided to ask these different questions because people might think about editing and contributions in different ways. So I broke out fixing a mistake versus adding new material in case, in people's minds, editing might mean different things. So I wanted it to be very concrete so people would really report if they're doing any of these things. And then I thought if the numbers are not that low, you could also try to see differences, but that's not quite happening. So we'll see that in the future. What we did, because contributions overall are not that huge is we basically created a variable that just says, have you ever in any way contributed to Wikipedia? Which just over a quarter of the young adults had reported doing, if you exclude those who were assigned to do this for a class, then it's actually a fifth. A fifth who've ever done any of this once, at least. So that's what we've got, which depending on how you think about it could be encouraging, but that's the overall figure. So what I'll discuss next is breaking this down by certain variables that could be of interest. So obviously gender, this is our focus today. As you can see, men are just much more likely to report having contributed at all in any way, which is true both with respect to those who've been assigned it at school and not. For each of these figures, I'll have those broken out just to show you the figures if you're interested. We also look at racial ethnic differences. And here it's interesting to note that it seems like having assignment in school related to this does level the playing field a little bit. So if you look at those excluding school assignment, whites are actually statistically significantly more likely to have done it. But once you, but if you look at everyone including those who were assigned it in school, there are actually no racial ethnic differences in the data set, so that's interesting. And a similar finding for our proxy for socioeconomic status, which is parental education. So you can see that having the assignment in school really does level the playing field across socioeconomic status levels. So that's encouraging in terms of potential interventions. How about things related to internet use? So people who three years prior were more frequent users of the internet, not chalkingly are more likely to be contributing to Wikipedia, also absolutely not chalkingly people who are more confident in contributing to Wikipedia are more likely to contribute three years later, understandable. And then you can see that of all the graphs I'm showing you, the largest difference is actually here in these general internet skills, not specific to Wikipedia at all, that those who score the highest on internet skills are very significantly more likely to have contributed to Wikipedia. And again, there's a three-year lag, so this is your internet skill in 2009, how it's relating to whether you're editing three years later. So these are bivariate relationships of these factors, and now I'm gonna pass it on to Erin to talk to you about the more elaborate statistical analyses where we try to tease out of all these things, what is really going on? So we put all of those together. So the bivariate relationships are basically, if you've got a surveying, you just see what proportion of men and women did something, whether they edited Wikipedia. So for this one, we put all of them together, and these are all of the variables that go into a regression model that we ran. So again, the variable that we're looking to understand the outcome that we're interested in understanding are these, did you ever contribute to Wikipedia in any way by 2012 for all the respondents in the study? And then we're including all these other measures, including, you can see the top half is sort of more the background demographic, socioeconomic attributes, and then the bottom chunk are those measures that we've got that are related to online participation, online behavior, skills, confidence in Wikipedia editing, things like that. And so what we get out of this at the end is a sense of controlling for the variation in each of these different measures, which ones have a strong association with the outcome, in this case, editing Wikipedia ever by 2012. And what we find when we do that are that really three things are the only things that come out as significant, as statistically significant. So we've got gender, skills, and whether or not you were assigned to edit Wikipedia in a school assignment. And it's just interesting to note that we include a lot of these other things because they come out as significant differences when you just look at the crosstabs, right? So those figures that Esther just showed you, those are meaningful differences within the data. But when we put them in the regression model and control for the variation across everything, gender, skills, and the school assignment to edit are the only ones that are meaningful predictors. So that's part of what we wanna underscore here. And the fact that some of these other variables are really important findings from previous research suggest that they should predict this kind of behavior as well, but in this data set they do not. So to understand this a little bit more deeply as well, what we did was we generated a figure to sort of show you what our model would predict, right? So it's important to emphasize, before I show you the figure, I wanna emphasize a couple of things about this. One is that this figure doesn't reflect is not actually the data that we observed, right? These are the predictions that we generate on the basis of the data. And the second thing is that what we're really looking at here is trying to understand whether there's something that in statistics you talk about is an interaction effect, whether there's a relationship between the predictors that we're interested in, in this case gender and skills, and the outcome, right? And whether that relationship between those two sort of works in tandem in some way. So it might be that your likelihood of editing Wikipedia could become greater or less depending on where you are in the skills variable and the gender variable at the same time. And here's what we find on that front. So just to break this down a little bit, you can see the, if you can see the colors, the light purple represents the female respondents to the survey and the green represents the male respondents. And the two lines are showing you across the X axis, which is the skills variable, right? It goes from one to five. Depending on where you are on that, how likely you are to have edited Wikipedia by 2012 along the Y axis, right? So that's what those values are. So that goes from zero likelihood to 100% likelihood, or zero to one in this case. And what you can see with the two lines is that as you move out along the skills variable, the women in the sample remain relatively unlikely to edit. The males in the sample started about the same place as the women, which is interesting, but then become increasingly more likely over the course of the distribution of skills to be editors. So that by the time you're out at the high end of the sample, males are over 50% likely to edit and females are still below 30%. So that's part of the big story here. The other thing that's really worth noting are these little tick marks along the axes, right? At the top and the bottom. So the little tick marks on the bottom represent the actual distribution of female respondents to the survey's skills. And what you can see there is that they're centered right around three, right? Right around the middle of the scale. The male respondents, they're distributed a little more widely. They start a little bit higher and they go closer to five, right? They go all the way up to five and they're centered a little bit higher. They're centered somewhere, it's a little hard to see here, but somewhere in the high threes closer to four. So women are less likely to have high skills in this sample. And even when they're at high skill ranges, they're less likely to edit Wikipedia than equivalently skilled men. I'm gonna interject here for one second. Because this is where I anticipate a question that I get almost every time I talk about skills and gender, which is a very complicated relationship, by the way. Which is, well, does my skill measure actual skill or is it self-perceived skills that it's really a self-report measure, et cetera? And I get that. But take note of this graph, right? Whether it's self-perceived or whether it's actual, however good or not that proxy is, and I do stand by it, but obviously it's not an exact correlate. The point is that it matters, right? Whatever it's measuring in terms of that skill, whether it's just in your head, it matters. So we need to keep that in mind. It might have implications for what the right interventions are, but this is just really important to keep in mind as we talk about skills later. Sorry. Great addition. So we can go back to the graph during the Q&A, if you like. But for now, let's focus on what we think the big takeaways are from this. And we've got a few here. So the first one is that this gender gap not surprisingly matters. But what we find here that's new is that it really matters among the higher-skilled users, the people with higher levels of internet skills. And that's something that hasn't been found or talked about in previous work on this topic. The second thing is that skills really matter, right? And this really hasn't been studied in the context of Wikipedia contribution, but comes out of a long line of work on other forms of online participation. And what we see that's really unique here is that people with low skills just don't contribute to Wikipedia, right? I mean, statistically speaking, they're all very unlikely to contribute to Wikipedia. Finally, and this goes back to the fact that this data was gathered over three years within the same people. Internet skills in 2009 are what we're using to predict this behavior by 2012. So again, these skills are having long-term effects in terms of what behaviors people engage in over the next, you know, the time that they're in college, basically. Or not. Or not in college, right? The time that they, from when they started college to three years later. Thank you. And last but not least, and it's worth being careful about this because it's just sort of bad stats if you're trying to interpret non-results in your study. But it's worth noting some of the things that don't seem to matter in this data set, which do matter a lot in other research and really merits further follow-up, which are these things like race and ethnicity and socioeconomic status and other kinds of internet experiences and confidence in editing Wikipedia, right? These, it's, you know, we don't know why the data doesn't give us insights into that, but these are not predictors of who actually edits Wikipedia three years later. It's just skills and gender and whether or not you were assigned to do so. So we'll just wrap up with a couple of questions for future research. I think there are a lot, but I think the clearest ones to us are these here. So why aren't skilled women more likely to contribute, right? And this is sort of, it's a slight elaboration on the usual gender gap question. So it's not just women aren't likely to contribute, but it's skilled women. There are lots of women out there who have high levels of internet skills and lots of women in the sample who have high level internet skills. But why aren't they editing? And second of all, how can Wikipedia, which is meant to be the world's largest free knowledge resource and is meant to be editable by anyone, what can Wikipedia do to address these barriers to entry for low skilled internet users, right? Not just men, not just women, low skilled internet users in general are a huge proportion of the population and from what we're seeing, it doesn't look like they're likely to be contributing to Wikipedia at all. So just to wrap up a big thank you to the folks who have supported this research and to the students and former students who have mostly done a lot of the hard work on gathering and analyzing and putting together the data set and that's data entry as part of the web use project. And thanks to all of you for paying attention and we're really excited to hear what you think and talk about it more. Yeah, Ryan. Whether, how much you looked at like different kinds of contribution to Wikipedia, because like for instance, I would be curious to know whether like adding completely new content is different from like reverting changes that other people have made or like correcting mistakes that other people have made. So I'd be curious to know whether that like the predictions would hold for all different kinds of interactions on Wikipedia. Let's take it. So we looked at just basic just bivariates for that. I'm pretty sure we did not run the numbers. We didn't have the models for that. Yeah, we did a look at bivariate for gender. It seemed like men were, I thought men were more likely to do the new edits or the new additions. Otherwise, we did not run the models because the variation is so small. There's just so few people for any one of these that report doing it, especially for starting a new entry that it's very hard to find, it's very hard to have findings when you have almost no variation in the outcome. Yeah, some ways of publication is doing regression based methods with this kind of a question. But yeah, I mean, again, right? It's only 20% of folks who were not assigned to do so in a class that edited in any way. Right, so it's just a small proportion of the population. Yeah, that's what I'm gonna go back to. Figure of just how, what, I don't know if anyone remembers the figures for new entries, but. Yeah, I mean, there's been some other work. You know, maybe we can dig it up later. There's been some other work looking at patterns and who does what on Wikipedia that's dug more deeply into people who do participate in how those variations break down. And I think in particular, there's some folks in Minnesota in the group lens lab that have done a lot of work on that. And then Judd Anton and Koi Cheshire have also done some work on that so I can find those if you're interested, but that's. But yeah, you have less than 9% who are actually starting new entries. So that's just hard for regression. So we've got two hands over here. Maybe let's go here and then here. Yeah. We're running through the same problem. So retention of people who are assigned to do it? Is that the only time? We don't just, that's just not the way we have the data. Yeah, yeah. I mean, again, that's like the focus of the work that's looked more at once people are contributors, what happens and what brings them back and what keeps them there. Yeah, again, there's a lot of interesting work on that topic. And I think if you want to follow up with that afterwards, I would definitely have some ideas about that and make them have some ideas too. But the, that's not the focus of this study because what we're really trying to look at is what gets people to show up in the first place, right? So yeah, so let's go here. What sorts of classes are assigned at Wikipedia? Mind you. No, that's like. But is it within the discipline? Is it a writing class? Is that make a difference? We have no idea. I don't have data on the distribution of what ones do. There are lots of classes now that have started doing this and across lots of disciplinary boundaries and topics that's become a thing. And in fact, the Wikimedia Foundation has some programs to support that and some outreach to support that. So it's an activity that they're trying to develop to try to get people exposed to and participating in Wikipedia. So let's look at that. I know, sorry. I'm sorry, if you want to follow up on that. Yeah, I think a quick answer to that. The answer is, I mean, it's in a bunch of places, but as Sarah mentioned, the Wikimedia Foundation supports a bunch of this. They had a really big push on public policy and working with public policy initiatives like universities and creating course material and working with that. And to help answer that question, to give the question before, they have a Wikimetrics program, the entire point of which is to sort of keep track of people who create accounts as part of courses and editathons and other sort of interventions and to track things like retention and future editing behavior for those people. So there's a whole bunch of, I mean, this is part of the Wikimedia Foundation trying to evaluate the effectiveness of their own sort of like programs. But the infrastructure is available for other people and there's actually a growing amount of data that helps address exactly those questions. I do want to add those since you missed the data collection portion that the person who asked that question missed the data collection portion. So this was a specific group, so I can't say on this campus what class is required at editing. The general point I wanted to make that I forgot to say during the talk was that I should know that this study wasn't about Wikimedia uses, right? So this is a much larger study that asks hundreds of questions about lots of different things with respect to internet use but also outcomes and other things in people's lives and all sorts of correlates. So this wasn't focused on Wikimedia, which limits what we can answer in terms of Wikimedia questions, but I think as a strength of the data in the sense that people weren't primed to be thinking about Wikimedia uses in a particular way. In fact, almost all of the variables that you look at here were interspersed across the survey in different places. Go ahead. The views of generated content or content creation and have you analyzed how that compares to is Wikimedia, particularly gender bias compared to other forms of online creation? Yes, so I can take that. I have published on that. I have a piece, Hargitine and Willaico, 2008 that looks at actually that's a different cohort but 2007 data on all sorts of arts related content where there is a gender difference but there actually once we control for skill the gender difference goes away, which was interesting. And then more recently for the same data sets, all sorts of contribution questions related to voting on content quality and contributing reviews and lots of questions and yes, there is almost always a gender variation that women almost always contribute less to those activities. Yes, the gender does seem to hold up even after we control for skill but skill itself is always significant in all these cases. You've had your hand up. Yeah, I wanted to make sure that I understood correctly. Were you saying that socioeconomic status did not correlate with internet skills by gender? Okay, so generally speaking in these data, SES actually does tend to correlate with skill as an outcome. What we were saying here was that that socioeconomic status does not predict Wikipedia contribution once we control for skill. So if you take someone of a certain socioeconomic status, no, if you take people of different socioeconomic status but control for skill, that does not affect Wikipedia contribution, correct? Stated correctly. Yeah, that's good. No, the reason I bring that up is this is a topic of debate as you probably will know in the gender equality information society indicators area. And a number of writers take the position that there really is not a gender gap, a gender digital divide. If you control for income and education. So that's not the case. I've done a lot of research that shows that you could control for SES and there's still a gender effect on skill, absolutely. I have lots of evidence of that and I'm not the only one, other people have shown that too. Yeah, the only thing I would have too is I think it's important to emphasize that when we talk about a gender gap, it's often to kind of euphemize the fact that women are contributing less. And across different kinds of online behavior, that's not the case, right? So it's important to keep in perspective that when we're talking about different kinds of user-generated content and cultural production online or when we're talking about Wikipedia editing, we might be talking about a situation where once you control for all the other variation, there's still this gender gap where women are contributing less than that or less likely to contribute to them. But in other kinds of online behavior, the gap can switch, right? So some of this is just about thinking of it in terms of differentiated use and understanding what those differences are. And that in this case at least, I think the case that we're trying to make is that with Wikipedia, if you're talking about something that's supposed to be a widely accessible, widely contributed to knowledge resource, if you're effectively missing the perspective of a huge portion of the population in a way that's systematically different, that that's why in this case the gap is, and the way the gap is oriented is worrisome, right? So that's part of that. But I will say just because I mean, I'm really familiar with this data set, obviously, and back to this question of other types of creation. So again, reviewing, voting on content, men do report doing more of that. And then when I would present that work, I've presented that work lots of times on the 2009 data, people would say, oh, but you didn't ask about fan fiction, that's where women are. So then in 2010, I asked about fan fiction and men were still reporting even doing that more. So is there some methodological issue? So while I can definitely have a long conversation as there's lots of literature showing that in terms of skill reports, there is gender bias going on and that's very complicated, I don't quite see why there's a reason for men versus women to misreport what they've actually done in terms of contributions on this kind of a survey. So I'm confident in those measures, but if people have thoughts on why those measures would be biased, I'm certainly eager to hear. You have your hand up. The point you made there about the sort of dangers of these differences actually made me think about what do we know about how men and women contribute differently, like what is the counterfactual, what would a more female balanced Wikipedia look like? Like what ways is Wikipedia, how much do we know about the ways in which Wikipedia is biased because of the gender imbalance in it? The content itself. So what I'm aware of at this point is that there have been some studies of topic coverage that suggests that topics that are disproportionately interesting to women have are likely to be less covered. I don't know if there's been a lot of really systematic work to try to dig this up. In part because it's hard to figure out once people are editing Wikipedia, it's hard to figure out their gender. And the surveys that have been run of Wikipedia contributors have had some design issues that sort of compound these issues. So I think there's other interesting work that I would point to on that question which goes into where there are areas of one body worker I can think of as stuff on actually hardware hacking. And there's a system called Lillipad that was really developed by a lab dukely over at MIT that managed to recruit more women to participate in hardware hacking than men with this one particular platform that she helped develop. And so looking at those spaces that are dominated by women I think points to possible lines of inquiry but I think this is an open area for work is what kinds of online content creation or participation do women participate in more. And maybe the short answer is when you control for skill and things like that it might be that the gender difference is not meaningful. But there's a lot of work to be done there I think and I understand more deeply what that would look like. Yeah, let's go back here and then over to you. I can just add to that my colleague who also runs the tool workshop which Eckert has written an article about the content that women produce which is unfortunately biased towards once women do edit they write about feminism, gender topics it's very centered on issues specific to women and feminism. So it still doesn't address that in general topics like world war two or whatever there are still almost no women voices at all. Just a quick question. Have you or anyone else looked at other cohorts that are in a different age groups, different parts of the country, different parts of the world at this study that you guys need is fascinating but I also know that it's very specific as far as college kids from a certain place and. Yeah, so first it's just to clarify it's not college kids anymore because several dropped out and they're growing up so actually by now half of them are not college kids but yes they're young adults Americans in a certain location. I wish they'd had I would love to be able to analyze data that are more representative. It's just those three factors that I mentioned that make this data set very unique the fact that we don't just look at those who are contributors the fact that we have data on scale and the fact that we have data over time makes this data set extremely unique and so other I mean I wish there would be funds to collect nationally representative data on things like this. I would love to do that or I would love to see people do that. I don't know of any such data set. Yeah, nothing that goes in as deep on the skill and use and internet use and skill and. And panel data. Yeah, yeah, definitely not panel stuff. Very hard to find. Any other questions? Yeah. How is the discussion in the media especially the New York Times about women not being welcome on the internet? And do you think that this has a sort of general cultural impact that could be discouraging for a conversation that's all about what kind of response women get and they actually write about topics that are related to the internet and you know that women will get if they're being insulted they'll be insulted by some remarks about their bodies and their looks and whatever. Men will get insults. Women will get very sort of personal types of things. And that there's a sort of general cultural feeling that the internet is a space, a public space where it's not really safe for women to come out in a high profile way. So I know this is not relevant to attitudes. So I actually think it is potentially relevant. The problem is and anecdotally I completely agree that there seems to be something going on there. How you measure that through survey data is not clear. I've tried, I've tried to come up with questions that would get at things like mean comments or mean things like that discourage you or I've actually tried asking things like that and I mean I have asked things like that and I haven't been able to show that they matter to outcomes. So I don't know if it may just be that the measures are bad or that systematically we're not finding that. I don't know. Unfortunately, I don't know of quantitative work that has looked at that. And anecdotally, again, I completely see where that's coming from. I think there's something there. And I don't know of in-depth like really rigorous qualitative work either. I don't know if others out there they know of such things. I don't know. I would sort of like systematic work. Yeah, I would answer it two ways. I guess I would say that there is really interesting work within, I mean some of the other work that I alluded to in the talk when we were sort of saying that there's other work on understanding why women who show up to edit Wikipedia might get discouraged and leave. There's been some work looking at that. And I think the thing that I take away from a lot of that that does seem systematic is that like Esther, I haven't seen a lot that sort of ties those experiences to differences in outcomes in a quantitative way. But I think a lot of people share that perception and a lot of women who contribute online share that perception. And I think as someone who would like to see that trend reversed, I worry about it and would love to see more work into it. And I think sort of taking off the kind of the details of this study kind of piece of my brain, right? I think that this is a really big problem. And I think that whether anecdotally talking to a lot of people I think that is a very common experience and or perception. And so I think there needs to be a lot more research into understanding systematically what kind of experiences are happening across the board and what kinds of outcomes that produces. And starting to think about, I think within the Wikimedia Foundation in particular, so bring it back to this topic, right? They have worked on developing some tools and some systems to create sort of more supportive cultural environment within the Encyclopedia to welcome newcomers, to hook newcomers up with mentors who can help train them to understand how to edit. And particularly focusing on building an environment that is changing, turning away from a sort of more aggressive kind of cultural dynamic to one that's more mutually supportive and based around conversation. And part of that is because they've in surveys found that women who contribute to the culture of the community is not that way, right? So they're trying and experimenting with different tools to do this. And I think it's still a little early to know if it's what kind of long-term impact that could produce. But I think it's a really key area for future tool development and research and investigation. I was wondering about your choice of words, aggressive and supportive. And I'm wondering if it's been studied or researched that women are uncomfortable in aggressive environments or if they're just uncomfortable in environments that are personally aggressive specifically where they're being threatened, their physical well-being is being threatened or they're being attacked for their physical appearance, which tends not to happen to men in aggressive environments. So I'm just wondering what you wanna say. Yeah, yeah, yeah. Or if what you might wanna think about is rather than the supporter of as in, we're gonna make things better and different for you, which is an equally unpleasant experience or could be an equally unpleasant experience. Another way of looking at it is, what if we just look at the types of responses and how they make a difference? Because I mean, you could study that very easily, you just have gender-neutral names versus non-gender-neutral names. I mean, that's happened to me so many times but I forget to use a gender-neutral name on the site somewhere, the dramatic difference in response from a gender-neutral name and how many times I regret and delete and start over again just because of that. Yeah, I mean, when I chose those words I was referring to, the studies that I'm thinking of are mostly interview or ethnographic studies of people's experiences as contributors and the reasons, their stated reasons for why they stop editing or why they go away from Wikipedia in particular. And so, I think in terms of what other ways of studying that and it sounds like you're employing an experimental intervention where you change, or, based on your experience, you could even imply an experimental intervention where you're changing names and seeing how that shifts people's responses. Well, women writers did in the 18th and 19th century. They used gender, and still do. A lot of people still do it, yeah. They used gender-neutral names. So it's not as if there isn't a long history of this happening. Yeah, and I think there's some excellent work on looking at bias and discrimination in hiring and workplaces that takes advantage of some of the same strategies. I think it would be a great approach to research. I would love to see some of these. But I will add that that and the previous question address more, again, those who are already contributing or contributed at least once. The question is more, why didn't they stay? Why didn't they keep doing it? I think people who haven't contributed at all and never even got into it, I don't know if they would even know that there is that particular type of environment or hostility or whatever it is, if they've never actually contributed anything, right? So can that be an explanation of why people don't even start? I'm not sure. Well, could you look at then how, I'm sorry, just to go this off. I mean, people tend to do things that their friends tell them about and say, I had a really good experience doing this. And people tend not to do things that their friends say, that was horrific. That was really an unpleasant situation and I'll never do that again. I mean, which of those two would you say, hey, sign me up, I'll come along with you next time? Yeah, so that would be a great thing to ask about. Do you have any friends who might, like to ask these things, but friends? Like do you have friends who come in? Or coworkers? Yes, people in your network. Yeah, that would be a great thing to ask about. So if we were the, all of us, the Wikimedia Foundation, and you presented this and we said, okay, what can we do to get more women coming in the first place to be part of there? And we'll worry about them having them stick around. What are your thoughts on that? That was our question for you guys. Yeah, let's get it. I mean, I think a lot of, I mean, we saw the sort of trends with folks who had received editing as Wikipedia as an assignment. And we saw the crosstab variations for those variables, right? And you saw how, for differences in socioeconomic status or racial groups, you saw the distribution becoming flatter in terms of who had edited Wikipedia based on assignments. And I think, these are hard questions in any area, right? Not just adding Wikipedia, right? How do you close a wage gap? How do you get more people to get advanced degrees or get higher salaries, right? I mean, so I think that there probably are a lot of different things that need to happen. And I think that part of that are these sorts of proactive interventions to reach out and introduce people who might not otherwise find it through their social networks or other mechanisms, right? If you're not just gonna show up on Wikipedia in the first place, having it happen in a classroom that everybody's, you know, or in a class in a school setting, lots of people go to school, I think that could be one really useful way. I think, and I think that the things that the foundation is doing to try to encourage newcomers to stick around are a big part of that too, because a lot of people have unpleasant experiences with their first edit. And so this is a really active area of research there that I think they need to keep building on. And then lastly, I think, you know, a part of the issue here is that there's a different distribution of internet skills across genders. And that could be for complicated reasons, right? We can talk about that, but based on this kind of data sandwich is a pretty diverse sample, right? There's a different distribution of internet skills across genders. Gotta find a way to get people who are at the lower end of that skills spectrum to contribute, right? So lowering those barriers, making it easier. And the foundations worked on this a lot and it's a really hard problem. But I think this research illustrates some of the reasons why you can't just solve the gender gap by focusing on gender. You also have to think about it in the context of skill differences because that's an important part of the relationship. I pulled this up just because even if we take the people who contributed for school, I mean the gender differences are especially large, right? So for some of the other variables, they weren't, but for gender even then. Yeah, the gap's closer, but it's still huge. It's not much closer. Yeah, it's not much closer. So it's interesting that the internet of itself is definitely not gonna solve that issue. I mean, I think, as we said, I think one of the really big questions is why is it that the women, and they're not a lot of women who will self-report a high skill, but those who do, why are they so much less likely to be contributing? What's going on there? So if we do control for just if, I mean, I don't know, I've thought a lot about this. What would be the next study? And qualitatively, what are the things you would wanna be asking if you controlled for skill and only looked at high skilled people, right? Interview a bunch of high skilled both men and women. And what is it that you could ask? And I don't think the question necessarily should be about Wikipedia, right? I think there's a larger issue because, again, the internet skill measure is not about Wikipedia either. So there's something larger going on. Yes? You have a little trust in Wikipedia content. Not trust in the content. There's confidence in editing, but not trust in the content. Because it seems to me that people's relationship to Wikipedia or relationship to that online content might be a significant difference. Well, but again, the internet skill measure is not about that at all. It has nothing to do with Wikipedia. Yeah, yeah, I get it. Yeah, the other thing that I, oh, do I not have it in the background? But if you wanna follow up on this. I'm just gonna jump in on this line of question too, and then we can go to the back, sorry. Which was just that the other thing goes back to this question that Justin asked in this idea of other systems and other environments online where women are more active and contributing more. I think understanding what's going on there that's different from what's going on in Wikipedia is a really critical line of work. And so there's some, I think, continuing to investigate the possible lines of explanation here and try to narrow it down a little bit more. So another area could be it might be that there's something about encyclopedia writing. There could be something about this task that's not appealing to women on average, for some reason. And so it could be driving self-selection into the task as well. And again, I don't know, right? I don't think there's a lot of great research on this, but if you go in a system like Etsy, there are a lot more women participating on Etsy than men. So actually there is another angle of the gender variable when it comes to the different types of contributions that I've looked at that usually there seems to be a relationship of the more public it is, the less likely the women to be there. So if it's, certainly we know that computer media communication types of things, women are more likely to do that. And whether it's chatting or emailing with your friends. If it's just your friends or if it's just a fairly private domain, the women are there. They're very active. But it's the more public it is, the sort of almost bigger the stakes in some ways, the less likely that women are doing it. And so I do think that plays into larger societal issues of gender inequality again, in terms of whose voices matter and why women, I mean, but then some of it is back onto why is it that women feel that their voices might not be welcomed in these spaces? Kind of back to that. I think it's the research on education outcomes in Oxford that has interesting things around the clarity of writing or the style of writing for essays and making statements where it sees a gender gap because of background and ways people have been educated to write that may tie into a Wikipedia context specifically. So there might be stuff from the offline world that's quite instructive about gender gaps in certain forms of examination of content if that's what content's being assessed on. Yeah, yeah, I mean, how do you build a more egalitarian participatory encyclopedia in a sexist gender divided world? It's part of this question that I think is really hard. To join us, I'm sorry, for a while. I just wanted to add about our workshop. We started with a CD grant last year from the Future of Information Alliance at the University of Maryland. So it was a small project. We only had 40 girls, but across four different schools. And we had some research that indicated that girls tend to kind of fall off the technology curve in general very early in their education. So it's somewhere around fifth grade that girls kind of decide that, oh, this whole geeky technology stuff, it's maybe not for me. So that's why we thought we should intervene at that precise moment. And one of the interesting things we found when we started the workshop, which was just 10 weeks, 10 hours basically of teaching them how to edit a weekly site, was that we made this survey. And in the survey, we found that they do have high internet skills. I'm not sure we use similar measures to yours, but in general, they know how to find things. They know how to find a YouTube video. They know how to do everything, do a Google search. But they have this vision of the internet as something that's really mostly a networking tool and communicating with friends and kind of finding information. But they didn't see at all, almost at all of the potential of the internet to be something where you can produce knowledge. Like we made them make these Google graphs based on Google trends. And we made them embed a video in an article. And these kinds of skills where you produce new knowledge using what's out there online. And where you share that knowledge, these were really alien to them. And that was kind of interesting. And I think it has to do a lot with women not being as likely as men to perceive themselves as experts. And I think that goes along with what you were saying, that when it's content that is to be shared and really a large public space or sphere, women tend to doubt themselves more. So I really am happy to see that you think that these school assignments might make a difference, even if it doesn't make such a big difference in the gender gap. But anyway, I brought some propaganda, so I'm just gonna circulate it in order to take a look. This is actually a sheet we make for teachers or parents who might be interested in doing the workshop themselves. We have a full curriculum that's downloadable online and people can just go and run with it. And we're not teaching it ourselves anymore because there's only two of us left in the team. And we have our dissertations to write and everything. But it exists. So please take a look. Thanks so much. Thank you for sharing. Did you have a hand? Yeah, that ties in very much to the idea of women not seeing themselves, being socialized to not see themselves as having agency in the construction of reality itself, but rather contributing components to it at times or filling up space sometimes, not building the space itself. And most of what Wikipedia is talked about as is that construction and that dissemination of reality. And so even just the language of itself is something that is... It could be framing effects in that way, how the project is framed. Definitely, I haven't seen work that addresses that particularly dynamic in the context of Wikipedia, but it would be an interesting, just a more general question about how to frame projects that are user-generated participatory online projects in a way that could attract broader groups of people. Yeah, other questions? Have you thought about what would be a research design that would get at that? Or anyone else? I would be interested to see how just language around Wikipedia itself as a more collaborative space, as a more space that you contribute to rather than something that so ambitiously builds a new manifestation of society, which is inspiring, but also intimidating. And so, yeah. And to be fair, I mean, so a lot of research I've done is about information-seeking, not necessarily content production. And one of the studies we did, which was a lot of sitting with people and talking to them about looking for content. Again, not Wikipedia-specific, but definitely mentions of Wikipedia. And you'd have quotes where people would say, oh yeah, Wikipedia, it's where they hire editors to edit these, I mean, if people think that that's how it works, then of course they're not gonna be sitting down thinking that they can contribute. So there's a lot of confusion out there among some people about how Wikipedia works. And there are definitely people out there who don't understand that you can edit it. So even those who know that in theory you could edit it, many have absolutely no idea that there's the edit button, and they don't get that. But there are actually lots of people who don't even understand that you can just go and edit it. So that's like this very core level of misunderstanding. There you go. Yeah, so you made a big deal in your presentation a couple times, a little deal, about the longitudinal nature of these data. But presumably internet skills three years ago are pretty correlated with internet skills now. They're like perfectly core, it's like crazy, I have a graph, I don't know what's going on. That was my assumption. I didn't wanna be too presumed, yeah, that's right. They're not perfectly, but they're close, go ahead. So what do we gain then from the longitudinal nature of the data if they're the same measured in both places? Why they could just, because you wouldn't believe the number of people who tell me, oh, but everyone will improve their skills in three years. I mean, that's one of the reasons I have to collect data on this. I mean, I've felt like I've had to because people's reaction is, oh, this is just momentarily. I mean, you study the internet so you know that certain things don't change that much, but you would not believe how many people when I present material will say, oh, next year this will be completely different. I see, so maybe, I mean, so I understand the value of longitudinal data in general. I don't think you've really taken advantage of the longitudinal possibilities. No, I think it's super exciting to have longitudinal data on internet use. Right, but I think that's the, but because you have the one variable in here both ways, I just feel like this, I mean, maybe what's missing here is that you can, maybe what's missing here is this in the talk or what I was missing was like a sentence we said, and in fact, if we use today's measure of internet skills, the results are exactly the same. It is the same. And we suggest that, okay, all right. Yeah, it's in the paper. But so here is like this amazing graph, to me it's amazing. I actually thought, wow, I must have gotten something wrong in the numbers. Like, that could be so perfectly, right? So this is how people, because this is specifically SES and the relationship to skill, right? Like, yeah, everyone is improving some, but they're pretty much improving at the same rate, so. Yeah. Yeah, I mean, one other kind of thing that I was thinking about is that I hear skill in this talk, and I'm thinking about like, skill as a, you know, maybe this is just because of the nature of the word, but I'm thinking like, this is something that is, people aren't editing Wikipedia because they don't know something which would allow them, like they don't have the skills to edit Wikipedia or something like that, right? But it seems like the argument you're making is more nuanced, because of course, these students are editing it when they're giving it as assignment for a class, right? So, I don't know, I feel like there's a little bit of a, there's a little, like, skills represents maybe something a little different than what I would, I don't know what I think about. No, it really is. And partly, I think our way to try to emphasize that is that it's not Wikipedia skill. That's not what we measured. The skill variable is not about Wikipedia things. It's about understanding all sorts of internet related items. Has nothing to do in some ways with Wikipedia. Like one of the 27 words is wiki, but that's not, I mean, that's not... So it's as much about familiarity or comfort or sort of like general, like knowledge. Do you know what HTTPS is? Right, that kind of question is a good example of one that really differentiates the possibility of knowledge. So like basic knowledge about how the internet works is how I should be thinking about this. It's sort of a level, yeah, it's sort of... As opposed to like... And I think the fact that it's basic knowledge and that that basic knowledge what Esther's demonstrated in the other word is that that basic knowledge really correlates quite closely with your ability, your actual abilities, right? So the questions that are measuring skills are these sorts of like, how much do you know about something? But that the relationship between those questions and what people are actually able to do when they sit down in her lab and are given an assignment, like find the information about this on the internet is really closely tied. So the behavior and the knowledge are tied really well. So yeah. Part of the suggestions that you even made today are things like make it easier to contribute, right? But I mean, it seems like maybe the argument would be something like make it more familiar to people without like make it rely less on, I don't know, like metaphors which are or like knowledge of particular technology, right? Which is a particular way in which people might make things easier, right? And so that's where sort of unpacking that, unpacking exactly what we mean by skills here in a way that I know you've done in this body of work, but might help draw more clear and actionable solutions for the people trying to, people in the Wikimedia Foundation who desperately wanna fix this. Right, I mean, I do think it's that prior step of, if you go to the page knowing that I know I can edit this because I've been told it's easy to edit this, I can edit this, let me see how to do that. I think then let's set aside creating a new entry which I think truly is not that simple. I mean, anyone who says that's easy has not actually read up on how to do it. But to edit, like you wanna fix the typo, okay? That really is very easy if you know that that's something that can be done. And if you sit down there knowing, oh, people fix typos all the time. How do I do it? Oh, there's the edit says edit, maybe I can click that. But that's a type of attitude and approach to the web. And I think that's that initial step and that's why I think it's a more general internet skill. You sit down with a different attitude. That's something you can actually do. And that actually, I mean, I think this is a type of skill that's generalizable beyond the web, right? I mean, it's the same thing when I talk to a graduate student about their project and I see the numbers as things that you can manipulate in a status program or even on a simple spreadsheet and they don't see that. Like it's a type of approach to material in front of you that you know what's possible. How do we get people to know what's possible? You've had your hand up for a bit. Yeah, I just often follow this up and you get more women in. And my thoughts are, what about looking at doing a qualitative study on the women who are actually doing the editing and find out what factors there are and then drop them into it. And you know what their experience is once they're up. And you know, you use them as a resource to. So my concern about talking to people who are already contributing is that whatever they tell you the reason is, you don't know if those who aren't contributing are similar women actually. Because if you're not interviewing in parallel non-users, then you cannot claim differences among them. While yes, we can learn certain things from the contributors. If you don't have a parallel data collection going on from the non-contributors, you might come away with things that, in fact, if you asked a non-contributor, they'd have the same thing, but they just haven't sat down to do it. But I think to get at what the distinction is between the two groups, the contributors and the non-contributors. I mean, is there a factor that would turn non-contributors into contributors? Yeah, I mean, that's the big question. And I think part of the way I would respond to is that there is really great work that's trying to do with some of what you discussed. And I think, you know, again, this is the work of folks out at Krupp Lens and some other folks at the University of Washington and at the foundation itself have tried to investigate this more in depth. And I think that they've made some findings about what women who are contributing actively say, like their experience has been about and what differentiates them. But I think Esther's point is that you need exactly part of what you said is that you need that comparison across the two groups. And so I think that's the key to moving it forward. I think I agree really closely. I'm noticing that we're basically at 145. One more, if anybody's got one. Or I know that we'll also be sticking around and come and talk to us or find us online. But pretty easy to find online. If you have the skills to do so. Yeah, if you have the skills. Just come say hi. That works too. So yeah, thank you all. Yeah, yeah.