 items. And actually, if I didn't get a chance to meet all of you, my name is Liz Velarde and I'm the recruiting associate here at the foundation. We also have two other members of our recruiting team here. We have Sarah Roth, who's our senior technical recruiter. And we also have Amy Elder, who's our director of recruiting. And I think she's still checking a few people in downstairs, so she'll be up in just a minute. But if any of you are interested in talking to our recruiting team after the talks, feel free to come up to myself or Sarah or Amy. And we are more than happy to talk to you about maybe career opportunities here at the foundation or your resume, career search, or if you just want to say hi. So I'll go ahead and jump into the housekeeping items. Hopefully everyone had a chance to grab some pizza and some drinks. The restrooms are towards the back. So the back of the room down the hall and to the right, we have signage everywhere, so you should be able to find them. And then we also have our Wi-Fi password if you need it. It's one, like the number one edit can make a difference. And we have it written there as well as on some yellow signs. So if you have any questions, just feel free to ask me. And as far as, oh, I also have to mention the emergency exit just in case. It's just right behind you. And moving on to the agenda. Our first speaker is Grace Nolan. She's from outside of the foundation and she's actually in New Zealand. So she's going to be joining us via Google Hangout. And her talk is going to be on my journey into women in tech and open source research. She is currently a systems developer. So there's Grace. She'll start in a second. And immediately afterwards, we're going to have Chelsea Shea. She's one of our systems, I'm sorry, data analysts here at Wikimedia Foundation on the discovery team. And her talk is going to be on data analysis in discovery. So at the end of both talks, like I said, we're going to leave some time left over for networking. Feel free to grab more food and talk to some of our recruiting team if you're interested. And I will go ahead and pass it on to Grace. Okay. Oh, it's me. All right. So let me just share my screen. All right. Can you, can everyone hear me? Is it good? I'll take that as a yes. All right. So hello. I'm from New Zealand. And this is a talk about a research project that I did. So just to introduce myself, I work for a fiber company in New Zealand. And so that's like five wholesale fiber optics for networking. And I work mostly in Django and Python with a little bit of bash on the side. And I'm also really into singing. That's a picture of me and the choir. And I also like doing painting and Photoshop. So I've done quite a lot of research in, oops, sorry, I'm just really got distracted by a thing that just happened. All right. So I've done, read a bunch of research. There's quite a lot here. But before we get stuck into that, just want to talk about how I got into computer science. So you have a bit of context for where I'm coming from. So I was, I guess, pretty interested in computers when I was young. I was more the technical one in my family. I had an older brother, but he was like a farmer. So not really into technology quite in the same way that I was. But I wasn't quite sure what I wanted to do at school. My best subject was biology. But I didn't really want to sit in a lab. And I heard that, you know, it's like lots of statistics and that kind of scared me a little. So I started out doing design actually, like I really enjoyed computer games and innovation and interactive art. And so that's where I thought maybe I might go wild and do something completely different and do design. But when I, when I was there, it was good. And, but I didn't quite feel, I don't know, quite so comfortable. And then I did this programming paper, which I found really challenging and really interesting. It was like an introduction to like Java programming. And I really liked how it kind of made me feel like it's like, oh, it's interactive algebra. And it's the kind of algebra that I really like, which is great. So I thought, maybe I might want to change and do computer science full time. And that way I can do what's challenging, which is like, you know, the programming and hardware side of things. And then that should kind of clear my way for doing interactive art, which is kind of what I thought I'd want to go into. But then when I started doing computer science, I kind of enjoyed, I guess, doing the more kind of back end and logic type side of things rather than doing UI graphics, which I found really, really hard and very frustrating. So, but when I went into computer science, I felt pretty confident with computers. And I've, you know, coming from like my background and yeah, just, I guess just feeling really confident. So that was really nice. But then when I started, at about nearing the end of first year, I did not feel confident at all. And I was quite put off. Like I kind of felt like, oh, no, I don't know anything about computers and everyone here knows so much and they've been doing it for their whole life and got really kind of stressed. But I kind of wanted to understand this more because it didn't seem to really make sense to me that I would enjoy computer so much and then suddenly not want to do it anymore. So I came across this book, Unlocking the Clubhouse Woman and Computing. I'm pretty sure most of you would have heard of that. And it was really enlightening. Like I read a lot of stuff in there that made sense and was pretty accurate to what my feelings were about how things had gone. And I learned some really good stuff like, for example, students feel isolated as a result of their class environment. So in class, you had the loud minorities, the students that would pipe up in tutorials and they would challenge the tutor and they would bring up kind of irrelevant details that really didn't help with the lesson. And it was kind of uncomfortable. Like it really just made me not want to say anything ever in class. And I remember having this conversation after one test where everyone was kind of debriefing about it. And somehow we got into the conversation of how we were feeling in classes and feeling kind of isolated and quite nervous. And it turns out a lot of them have been feeling the same way as me, which really surprised me. Like particularly people who I thought were like really good, you know, but they were all feeling, feeling quite isolated due to that class environment. And the other thing that I learned, which was quite interesting, which is gender bias indoctrination happens as soon as gender is assigned at birth. So in Spurtis, the paper there, it's really, really good. Even though it's old, it's very thorough and goes through, you know, right from the start, which is when, when infants are assigned a gender, the way that people describe them is different. So for example, girls described as soft and distant and, you know, kind of like a little bit dazed, whereas boys described as active and, you know, will play up and things like that, which doesn't really make sense because at infants, like really, most babies are the same. They just kind of want to eat and poop and I don't know, like just cry a lot. So, so it doesn't really make sense, you know, to be ascribing those attributes at that time. But people do. The next thing is that things that suck for everything suck more for marginalised groups. So talking with the students who felt isolated after that test, you know, most of them were like white males who came from fairly like affluent backgrounds. And sure, it's still suck for them, but it doesn't reinforce the stereotype, whereas for women or people of colour, already there's this belief that they're not quite as good at technology. So it just kind of like reinforces this idea that they don't belong, which is why it's, when I started finding out more and when I talk about this report a little bit more in detail, the kinds of recommendations that I make tend to apply to everyone, but kind of lift up people from marginalised groups even more. So the next one was sharing struggles with people in a similar situation is very good. Like talking to those students after that test was great. And then continuing to talk and share, because it turns out lots of people were feeling really depressed and unhappy about what was going on. And doing that really like having these honest conversations made me feel more connected and it strengthened my belonging. But I still noticed that there was, you know, this problem. And just because now I felt a little bit more like I belong doesn't solve this problem. So I created a group called Computer Science Students Society. And we organised events like rock climbing and pub crawl and barbecues and stuff like that. And the first event that we did was a barbecue just to see if anyone would turn up. And we had around 70 people turn up. And that's the picture in the middle that you can see, which is a really good outcome, like way better than expected. So that inspired us to keep organising more events. And now it's been about five years later. They've got their own new logo. And it's still going really strong, which is really great because, you know, when I started this, a lot of people said, oh, you know, it's going to just fall apart as soon as you live, because that's what happens. You know, we've seen it a million times before, there always be somebody come along and do these events. And that's great. And then as soon as they graduate, it's all over. But it's really cool to see that it's still going. And there's also another group associated with it, which is right next to them, which is Ladies Inc. And a student that I was mentoring had been working on that. So Ladies Inc. is Ladies in Computing. And that's another group that's also doing really well, even after the founder has left. So that's really positive to show. I also started doing outreach, so talking to students about computer science. And that was really interesting, because I started learning a whole lot more about some of the problem areas in outreach, like perceptions and teacher professional development and training and funding is really like really difficult. Teachers don't get very much time off to do professional development. And there's very little funding, which has meant that organizations like Google are organizing events like CS4HS, which is computer science for high schools. And that's for teaching teachers how to teach computer science and stuff like that. And also ongoing collaboration and cohesion between teachers is pretty challenging as well. There's so many resources out there now to try and help teach computer science that it can be really overwhelming. And it's hard to sift through which ones are really good and going to work for their students. So these are some of the challenges. So let's go into those in a little bit more depth. Stereotypes are a thing. So this is something that appeared a lot in the research that I did. So there's one study in particular that I'll talk about, which is that they did this experiment, which was set at a university where they appeared up students where one of them was an actor and one of them was a participant. And these were just really short discussions about what each of these students did. So the actor did computer science and the actor was either fit this would fit the stereotype or would not fit the stereotype. And right after this meeting, the participant was surveyed asking if they would consider a career in computer science. And what they found is that if a student fits the stereotype, regardless of if they're male or female, if they are kind of exhibiting those stereotypes, then students are not really interested. They're less interested in going into computer science. And then when they found if the student was breaking the stereotype, then they were more likely to consider it. So just to define what stereotype looked like in this case, stereotypes included people who wore cargo shorts, socks, sandals, and who were interested in computer games. And people who broke stereotypes were dressed differently, maybe just like a t-shirt and jeans with like some canvass shoes or something, and had other interests such as music or sports. So they also did another survey two weeks later to ask if they felt the same or if their feelings had changed about computer science and their feelings were unchanged, which means that after those two minute short meetings, their perceptions would last for at least two weeks after forming those perceptions. So another one that I want to talk about is stereotype threat. So stereotype threat is say you're in an environment where there's a negative perception of you. So for example, women aren't perceived to be quite as competent at computers as white men tend to be. So in that environment, women feel like they might have to justify themselves more, or they're not going to be taken seriously, or they're going to be like spoken over. And there's that kind of negative connotation, there's a lot more work to do. So women might just remove themselves out from the situation, which makes sense. It's not really pleasant to be in an environment where you're not seen as credible or valued typically. So this though suggests that reasons for not going into STEM extend further than simply not being interested, which seems to be a common thing. There seems to be the first step that if you ask a professor who doesn't really know much about these, they'll just say, oh, you know, women just aren't interested in going into computer science. So that really debunks that. The next thing is pinkwashing. So it's really tempting when people are doing outreach to create groups that are for girls, and they'll choose things that girls might be interested in, like cooking or sewing, or these kind of things. But what this can do is this can actually perpetuate stereotypes, and it can increase isolation and distance between reality. So it's not like a real view of what computing can be like, or a true slice of the different kind of areas that you can go into. So the next thing I'm going to talk about is what's happening in schools in New Zealand. So this is a list of topics being taught in some high schools in New Zealand. It's not an all of them yet, and it's not compulsory. So my high school, for example, did not teach digital technologies or programming or anything like that. But here it's a really good overview of the different subjects. So the number, the DT number, you have like level one and level two and level three for the end of high school. And this students who are coming through this are really changing the shape of what's happening at universities. So the structure of courses at universities are slowly changing. For example, currently, if you have not done physics in high school, but you want to do engineering, then there's a bridging physics course that you can do. Whereas at the moment, it's just your base computer science degree. So you don't have a bridging course, but they will start introducing bridging courses and then they'll extend from that. So that way it's kind of skipping some of the basics, but that should change the environment quite a lot. The next thing is that computer science is coming into primary schools. I actually live with somebody. So I live in a share house and she's her name's Caitlin Duncan and she is doing her PhD in this at the moment. And it's showing some really positive results, which is awesome. So this looks like it will be something that will happen. I mean, even in the UK right there doing this already. So hopefully they can get in before those stereotypes are formed, which seemed to happen at about six years old. Some of the studies have been suggesting CSUN plugged gets a special slide all on its own. I'm not sure if you've heard of this, but it's a curriculum developed by Tim Bell, which is a series of activities to learn things like computational theory and algorithms away from the computer. For example, I did one activity where we broke off onto groups and we were each given a number and each group did a different sorting algorithm and we had to kind of act out what that sorting algorithm was doing, which is a really good way to just kind of understand what's happening. And there's heaps of activities like this and they're targeted at students in the primary school level, but they're good for everyone like right up into adults. So this is like a really good way of kind of separating out what computer science is as opposed to just programming as well, which is quite good. So now I'm kind of getting to the end of my degree and I have learned a bunch of the stuff about what the problems with the environment and difficulties with outreach and I really wanted to give back. So this is a research project that I did and that's what this talk is based on and this was like a summer research project that was pretty much self-directed. I just kind of told the university that I wanted to do this and the faculty registrar actually worked out a way for me to get funding but because it was self-directed I didn't really have a deadline so it took me months longer than what was expected but it turned out really well in the end and lots of people have read it and really appreciated it. So by the end of the project I had a list of like 20 plus people who actually wanted to be updated on what I was saying, which was really really awesome. So yeah it was a pretty good experience but I mean my university still was kind of like not really sure and you know there were some people who didn't really want me to do this research so there was like a little bit of contention there as well. So these are some graphs from my university. On the blue and red one here on the side these are the number graduates. So blue is for male, red is for females and as you can see there are not very many women who are graduating but that's slowly, slowly increasing a tiny amount each year and then on the side the top red and green graph that's the whole university of number of women and men who are enrolled. So as you can see there's like a large chunk more females than males yet in computer science which is the one down the bottom you can see that in the same period it's kind of like it only represents about 20% of female students who are enrolled. Also to make a note here these numbers are grabbed from the enrollment forms and students only have the option of choosing male and female for to describe their gender but there is a push to change that to let people select their preferred gender on there or how they identify. So in summary these were the things that the main themes that came out of my research. So having no prerequisites is really good that was something that convinced me to go and change into computer science. I remember asking the course advisor there just really pushing like are you sure no prior knowledge is needed like I really don't want to go in there when there's going to be all these people who know everything which turns out it was kind of a bit like that like I kind of in the end felt like prerequisites should have been required but it was good they did it start from from ground zero which is which was helpful it was just the other students who were a bit more awkward. Allowing students to choose subjects is really good this gives students like a sense of agency working together in the first year which I will describe a little bit more about that in a moment. Doing some research projects and letting students kind of select something that they're interested in and getting mentorship from people within the faculty is really good. Students marking other students work so the way that it was done at my university because it was quite small they had students physically mark off other students work so when we finished an assignment particularly in first and second year this is how it works they would come around and then they would ask you to make a change to your assignment and that would be part of like the marking schedule just to show that you know you'd learned but also it gave an opportunity for for the tutors to notice if a student was struggling so for example like I really hated getting getting help I would always be like nope I'm fine like keep moving on but with this I was kind of helped was kind of forced upon me in a way which was really good like I really needed that but it was just kind of too too hard otherwise and paying attention to student environment is really important like noticing that students feel isolated. Social good and culturally responsive computing so so this is like what I mean by a possible FOS opportunity here is working with universities and FOS projects could be like a really good way to help enable students to work on something that makes them feel like they're you know having well for starters they actually have an impact on something else but that can also be really valuable in terms of like students realizing you know they're worth and that they can actually give back to the community so soon. Peer programming is also really good and good course descriptions so the course descriptions at my university are really vague like you can't you don't have any idea of what's going on and when I enrolled for computer science I just had zero idea of what that even meant other than you know I kind of liked programming so I mean I really felt like I was taking quite a quite a leap there but having good course descriptions can help people understand especially people who don't know very much about what the subjects are to do for. Another thing was that mentoring is really cool it's great it's also very difficult so I think in tech in particular it's quite common for people to kind of find somebody that they can ask questions from who kind of act like a casual mentor mostly because it's really hard to just like just saying go and google it doesn't really help in a lot of cases like that doesn't help students feel more empowered and stuff like that but there was this really good study which I'm going to describe which is in Germany they experimented with a mentoring system for girls and students subjects and this was done online the average age of the female mentees was 13.5 years and they had a wait list for the mentoring program which was used as a control group this program ran for for one year and returned pretty promising results so what it showed is that if students have a mentor then they are they're more likely to have sustained interest whereas students without a mentor their interest drops so it's not necessarily that it makes students more interested it's kind of like sustainability of of interest in STEM and having that support there when needed. The next one is pair programming and more detail so these are some of the really key values that that the study found that pair programming helped with so improved critical reasoning and increased student retention especially for women. It encourages appreciation of diversity among students and encourages professional and social development and it improves student confidence in programming especially amongst women and it improves the quality of the programs that students write so particularly early on in the degree this can make a huge difference rather than people kind of going away and working on assignments by themselves and struggling and then giving up which is really what you don't you don't want to happen so pair programming seems to be really valuable but yeah it needs to be actual pair programming as opposed to just defining Honka and it works best if students are paired on where they're at a similar level of development. The next one is that conferences are awesome so this is something that Harvey Mudd encourages in particular and I think definitely for me it's really helped so before I went into computer science I went to this this conference that all my friends were going to and the tickets were really cheap for students it was a conference called KiwiCon which is a hacker conference in New Zealand and I read the description on the page and it was really funny and so I went along having no idea about anything at all but when I went there what I found were all these people who were really interested in very technical things but they were so casual about it and it was so cool like I just really felt like I'd found kind of my group of people and I've been going there ever every year since and it's in the sixth year of going I actually gave a talk there which felt really cool and by that point they had 2,000 people going there so it was just a pretty surreal experience but that just going to those conferences and meeting people really helped make me feel more connected to the wider community and that there are really cool people out there and it's not just the students that I was kind of interacting with on a day-to-day basis. One conference in particular Geelong Linux conference that I went to in 2016 so I met some really cool people there who encouraged me to write this talk essentially and to talk about the research that I did so I was very excited when my talk got accepted and it started like a little bit of a panic especially since my research had been mostly focused on what's happening for students at university and in schools so I did a survey to see what people thought and if people are contributing to open source or not and why they don't and how they got into it if they do so this is kind of what it looks like. I tried to phrase everything in a way that kind of tried to counter any biases like for example if somebody is a non-contributor asking them why they don't contribute can implicitly imply that they should be contributing and that it's bad that they don't so I don't actually contribute to open source projects because I don't really have time and I don't really know which projects I want to get started in and so I just kind of like reminded them in this description and then at the end I just asked people to confirm if they were comfortable with me referencing this information so I have a little disclaimer which is you know I'm not a real statistician and this is all bad and it's qualitative study and I make empathetic decisions which means the daughter is going to be a little bit scared but you know trust me I've definitely got the gist of this and it was just for fun and not very good science but don't add me on twitter about my bad science please understand but also imposter syndrome manifesto I kind of realized that you know what what I'm doing in this kind of casual survey is probably more than what you know most people might have done I think that what can happen with some cases like this or giving talks like this where people just kind of like don't really connect with the audience that way so I didn't quite realize it at the time but you know going this extra step was was actually really really good and I got some interesting results so this is if you've how many people contribute or not so you can clearly see that the bias in the audience of people who responded where lots more people said yes they do contribute than people who don't so that's quite a good way of kind of like getting an indicator of what we're looking at and the kind of circles that people are from I shared this with the sister's mailing list and a few woman in tech groups so there are there there is representation of people from kind of woman in tech and other marginalized circles otherwise I just shared it on twitter and on facebook and stuff like that and here were the common barriers to contributing um oh also I don't think I've mentioned but I got like 255 responses which is way better than expected like I thought I was going to get maybe 10 responses or something so actually heaps of people wanted to share stuff which was great um so the biggest thing seemed to be a lack of direction slash mentor I thought perhaps maybe it would have been time would be the biggest barrier but um as you can see I was that that's like the second value there and that's actually quite a small proportion they're less than 20% cited that as a reason why they don't contribute lots of people felt like they didn't have enough skill and also the like the social environment and open source projects and stuff can be really intimidating um environment setup is too hard which is definitely one that I relate to a lot I only have a certain amount of time and patience for like fiddling around with really you know millions of packages that you have to install dependencies and all that kind of stuff um and also you know people have other interests in their spare time and this is how people got into contributing for the people who do contribute so uh running into bugs or adding functionality that they need seems to be a really um common one also lots of people contributing through their job uh so they actually you know they get paid to do it which is great um and also really common to get into it through a friend uh referring them or a friend showing them and then there's like university down the bottom so working on it through university projects um of those people who say that they do contribute um and they talked about some barriers uh these are the things that they said were kind of like the main barriers for them which in this case time was the biggest one uh environment setup um unwelcoming community like there are some people who had a really bad experience um in the survey that they talked about uh and there there are some that even uh helpfully recommended some projects not to go anywhere near because of the people that are on those projects um and uh the other thing was like lack of skill or not confident enough that they could even contribute uh there's what there's are some other little kind of things that came out um so seven people got started thanks to google summer of code um or uh other um woman and open source projects five people got into it thanks to linux cds found in like serial boxes randomly enough um github was mentioned 15 times uh only three people said that they were primarily drawn primarily drawn to open source because of like freedom and um i guess ethical values uh six people said starting was easy and not hard but 21 people gave um barriers to getting started um so for some people it definitely seems to be a lot easier than it is for others which i guess you know would be kind of expected there was also this so um as a ux designer it's not clear how we can contribute or if our contribution is even welcome so this uh things like this are quite interesting as well because um i think that open source communities would really benefit from having people that aren't just from they're just that aren't just developers and things uh and i feel like you know having ux designers or people from outside of that community would add a huge amount of value to those projects and probably improve like the uptake of people who are um casually using um open source software uh as opposed to you know proprietary software so yeah this i thought this was quite quite telling um i also had a look at uh other studies who did you know actual research on this stuff um this was one of the diagrams that they showed so i'm not going to go into all of these um the red ones are uh social aspects but i think in total there's like 35 uh responses here or something like that um or you know possible reasons for why people have a bad time contributing uh so these are the barriers for newcomers to open source software one of the biggest themes was that flame wars are bad uh said a lot of people which kind of makes you go like hmm you know why why is it so bad and um there seems to be uh this thing where if you're writing code um you have to you have to justify it and um there is a strong there's a strong social aspect which means that you get into these discussions on the internet and you know some people are um effectively anonymous or just because you know you're not talking face to face people are more likely to say like rude or offensive things um but there was some interesting things like uh for example for a woman if they're witnessing an argument then they're more likely to be affected so get physical responses um of anxiety whereas for men watching an example an argument happen they think it's kind of funny they're just kind of like laugh at it uh unless it's directed at them in which case they get very upset um also for a woman and some people of different cultures don't like bargaining um so some people would rather you know rather than bargain the price down on the car they would rather just spend the extra one thousand dollars um and there was yeah there was a study that that showed that um so yeah it can be it can be really tough um this one here is a particularly annoying problem so uh i'm sure you're all familiar with this which is you know read the fucking manual um so this is like just a really not great way to on build people or make people feel welcome um in projects there's also a lot of discussions around um when people get into this topic of gender in these communities often they're just like ah you know but it's just a social construct and you know gender doesn't exist here which is absolutely um which is absolutely not true uh so yeah that can also be quite a problem where people kind of form into into these arguments like i think even uh recently um there was a tweet from um the founder of docker saying you know what if we had like a totally anonymous version of github which is really not a great idea um because you know they can for one thing it's like not really going to be anonymous like that would just be too too difficult to do um but the idea behind it is that if you have just code then you know nothing else matters and therefore diversity is not a thing but you're still going to have problem where people who are you know non-english speakers who are struggling to communicate and contribute um there's going to be biases against that and um code doesn't speak for itself um this is like another popular belief um it it turns out that you know people uh like i said before you know you have to justify um you have to justify your code and essentially what seems to happen is the more loudly that you justify it um then it's gonna the more likely it's going to be accepted that you know your code is your code is better um which is not really great uh there's also this kind of conflicting idea which is you know some people say good code works bad code doesn't but then you might also have beautiful experimental code which is pretty unlikely to work but it could be seen as like pure genius um so these all of these kind of conflicts are settled by by flame wars essentially um yeah and it's really it's really not good it's not good for people to have this happen um this is an example of um some community guidelines that i've seen which i really like which is kind of um outlining uh what you want to focus on so focusing on experience experiences and practice seems to be a lot better than interest and identity because as soon as you know people get really tied to things um as part of their identity it becomes like you know personal attacks and stuff like that which is you know really not great um and can kind of it breeds um contempt culture as well um and i'm not sure i'm not sure if you're uh familiar with the term contempt culture but there's this really amazing um vlog post written by oren out there that's very good i absolutely recommend reading it because it talks a lot about that whole like oh you know but php really sucks you know people who program in php are just the worst they're such blows o's or whatever like stuff like this that just doesn't help anyone um and it's kind of treated like a way of um you know currency in a sense of like uh i've got to show that i know a lot so i'm gonna push down everything else and just perpetuate it's like again like more shitty stereotypes all right moving on um where will the newbies come from um so youth coming through digital technologies as taught so for example people who are learning computer science in schools um they'll probably get more students who are exposed to it coming through um there may be senior people who struggle to find work um this seems to be something that's becoming a little bit more common where people are doing career shifts um however there is the problem of age of them so uh this seems to be more applied to people from um like open source communities where they can kind of hide their identity a little bit better um people who are going through a career chain and people from countries who may or may not be going through some serious stuff right now so uh for example um some people are showing like some forms of like resilience or being able to do some kind of work um in countries that are conflicted by war and stuff like that um so that's quite an interesting i guess side effect of that so here are the key takeaways here are the real takeaways um the point is is that you don't you exist in a social context and um you know tech is not a meritocracy um flame wars absolutely need to be reduced um there are some ways of uh there are some concrete ways in which you can try and reduce those there was um a really i saw a really good top at linux conference this year that was about that and there's definitely some really good resources online um on some strategies on how to reduce those um have useful documentation and guides for beginners like having a good starting point or somebody that you can talk to or people who are you know generally friendly um that don't make assumptions is really important um and also remember to welcome non coders or even to talk to non coders about what you do and explain it in a way that they can understand without using um lots of lots of jargon because they're you know they could actually be really interested they just might be a little bit intimidated anyway thank you very much um sorry that i went like a tiny bit over time but uh i hope you enjoyed it i'm going to um start sharing my screen now let's go stop sharing all right yay thank you uh let me just check my inbox because i think does not say you can send me questions okay no questions um did anyone have questions there that you wanted to ask me or anything nope maybe thank you i prompted everyone if they have questions all right here we go can you hear me grace yes i can you said that when you're doing your recent gender parity research you had faced some opposition what kind of opposition did you face for the research you're interested in um so one of the things that uh happened was um early on i kind of proposed the idea of doing this i proposed the idea of doing this project to a bunch of different people including the head of department at my um at my university and the head of department was like oh yeah but you know maybe we should have somebody who you know is from like like sociology or somebody who's you know like better at it and also my grades were like all over the place and this lecturer knew that and so they're like oh you know maybe you want somebody who's like more reliable um is kind of like what what he said and not so many words so it was kind of i guess made fairly clear like at first this kind of seems like a little bit of like a knockback of like oh okay head of department is like oh don't want to do this or you know some people will just kind of say some things really that wasn't so good but it was um lucky that the registrar of the faculty who went and like organized this essentially um that she kind of took took care of that and kind of did it along the side if that makes sense so like in a she managed to find funding in a slightly more indirect route which enabled me to do this project um but yeah it kind of sucks when the head of department's like no we don't we're not interested in this more questions there's one at the front hi grace that was super interesting thank you um thanks i'm really interested in what you're doing now as well i mean you went you graduated recently and you did all this research in in uh you know gender disparities and then situations like that in New Zealand um but you know what are you doing now are you still kind of furthering any of this or in your job at the moment do you do anything in your company or whatever you're doing yeah good question um the answer is uh is kind of not really like i um i mean all of this is kind of on on the side um most of like the research that i've done so it kind of depends on what kind of opportunities come up like i've definitely found that submitting a talk and then having you know to promise to deliver something really helps kind of push my research along um but uh i guess i'm kind of like i've got a few side projects on the side but i'm kind of mostly focusing on my own like technical development and trying to improve in that way um i did recently write a blog post about uh command line tools for beginners which um if you're interested i can send to you i made it really funny and like tried to make it really um welcoming so i kind of do stuff like that uh on the side every now and then but um yeah i've just kind of been taking a little bit of a break from gender research in particular because it's quite like i guess it's quite hard work um especially uh when there's people who are not necessarily like interested but my you know my organization is a little bit different in the sense that they um the they only have like a really small development team so it comes with like a whole different set of challenges um and i'm a little bit uh more uh disconnected in a way from um some of the other difficulties in tech where like for example at university they were the problems were like right there in my face being like everyone was miserable and i was faced with that like every day um which really kind of like gave me the impetus to like research more if that kind of makes sense i don't know if that answers your question but thank you very much hi there thank you for taking the time for speaking about all this um we were wondering what was the name of your book oh uh so the research project that i did um i think it was uh let me just find it was like gender parity in computer science at university of wakata i think um so yeah recommendations for reaching gender parity at university of wakata um and it's about uh i think like 30 pages long or something so it's quite short it's just like a quite a small i guess relatively small research project um yeah um but yeah if you email it if you email me um i'll send it to you uh as well if you're interested in reading more and haven't checking out like the references and stuff because there was heaps of information that i couldn't include in this talk great thank you so much grace really appreciated that talk super interesting we're gonna go ahead and move on to um chelsea's talk on data science in discovery all right awesome thank you for having me of course thank you grace bye hi hi girls um my name is chelsea i'm the data analyst at discovery department in the wikimedia foundation um today i'm going to talk about our data science workflow at the discovery so um before i start to talk about my job let me give you a brief introduction about what discovery department do and what does data analysts do in discovery so um the mission of discovery department is to make the vows of knowledge and content in the wikimedia projects easily discoverable and we are responsible for the following project um we are responsible for maintaining and enhancing the search features and apis on for media wiki and second since many people discover wikipedia through wikipedia.org we are also responsible for improving the user experience for those visitors and then we are also trying to include open street map files in all the wikimedia project and lastly we are building uh wikidata query service for to help uh users to search structure data on wikidata and as data analysts at discovery um we are responsible for providing ad hoc analysis and reports as needed um we do analysis from exploratory data analysis to um model building to solve some complicated problems and we are also building and maintaining dashboards for checking key performance indicators and other metrics uh we also can start with teams in design of av test and then analyze and report results and lastly we work with engineers to design and implement uh event logging schemas and we use our for almost all of this project um next I wanted to focus on two main part of our job to show you how we use our in our day-to-day work so the first thing I want to talk about is our dashboard uh we build all our dashboard using our shiny package and um our dashboard contains everything from API usage to direct user interactions and provide data for internal and external use to see how well we are doing and this graph shows how we um create our dashboard from the raw data so we have two main data sources uh one is the web request so web requests are the logs of uh every HTTP and HTTPS request uh made to order wikimedia sites and uh we store them in a Hadoop cluster and the event logings are checks the user sites event and they are we use it for things like how often do users click through to our search results and we store them in the MySQL database so and then we use a SQL or Hive query to pull the data from our database and then do some uh aggregation and remove the personal information in the data and sometimes we'll use R to do some further process and these scripts are scheduled to uh execute it on a daily basis and then publish the and we publish the aggregated data to data sets dot wikimedia dot org and then finally on the dashboard side we pull the data over HTTP from data set dot wikimedia dot org and then use shiny to build our dashboard um so shiny packages itself are already provide a lot of widgets and basic graphs for uh the interactive visualizations but there's a lot of other packages out there that can let you create very beautiful uh interactive graphs so so um let me show you with uh some example on our dashboard so this dashboard shows the traffic for uh shows the traffic to the wikimedia dot org and we are breaking down the traffic by countries so you can see here on the first row we put a widget to let users select the data range and you can also select the metrics we want to show and then you can select the group of countries you want to show you can even customize it by typing into the box here and you can also select the smoothing method here and then the first graphs here is a pie chart we created by a package called high charter um and so on the outer ring uh are the all the countries that we have traffic coming from and here on the two tapes you can see the name of the country and the number of visit from that country and the proportion and I want to mention that the number is not the exact number that we see wikimedia.org since when users navigate to wikimedia.org we randomly select the select the sessions to be checked by our event logging schema so these are just the number of sessions sessions that's been checked and um in the inner circle we group the countries by the continents and you can also click on the continent and then it shows the pie charts for the countries within that continent with their proportion and number of visits and since many of our traffic come from united state we also break down our traffic from united state by the region and the next thing so this line graph is shows shows the time series for the number of visits so we create this graph by a package called digraphs and um there's a lot of things you can play around with too so you can select the date range you want to show or select a session on the on the line to enlarge it and we also put a data table here so this is created by the dt package um you can sort the table by whatever metrics you want and you can even filter the table by typing in the search box here and we also put a button down here to let you download the csv file for this data table and lastly we use some markdown file to document the things we do on this page like what's this metrics means and how we compute them and if there's any issues with the data yeah so um beside the examples I just showed so there's a lot of other package there's as well for example like uh these small spark lights here are created by a package called spark light and you can even uh embedded this spark light in a data table uh the data table yeah so shiny is very powerful it allows data analysts to create a very fully functional dashboards without knowing any javascripts and there's a lot of great documentation for shiny there so you can check out their tutorials articles and galleries and also shiny shiny yeah shiny is very customizable with a little bit javascript code and cs as code you can make it to whatever style you would like um yeah so the next thing I want to talk about is our research and testing um although every project is very different but in a typical data science project there are some common steps and there are some package that you use for almost every project so we normally start with retrieving data and we use read our package in wmf package to accomplish this task so wmf package is our open source package for curing hive and mysql database internally and then we use uh deploy r and tidy our data table and many other package to refine our data and then after that we use ggplot to to do exploratory data analysis and um I want to mention that the most package in the first three steps are now in a collection called tidyverse so now you can just load tidyverse without loading this package individually and then when we come to the model and analysis steps um it really varies the package uses really depends on what the problem is uh we'll use binom bcda random forest to solve different problems um for example when we do analysis for ab testing we always use binom package to compute the patient credible interval to see if a matrix is significantly different between the our test group and control group and then finally we will use um artmark dot and nita package to create a reproducible report um next let's see some example for uh from our job the first example is the exploratory data analysis so it's about how long do users state on wikipedia.org so like I mentioned before when our users navigate to wikipedia.org they are randomly selected at a 1 in 200 rate to be checked by our event logging schema and if a user is selected we'll set the time we'll set a 15 minute timer and if the user come back before this before the session exploration we will renew the timer for another 15 minutes but if they come back after 15 minutes we will stop checking them and then clear the local storage so the 15 minutes is just an initial guess when we put the schema together so the purpose of this project is to provide a reference point if we wanted to adjust the 15 minutes so um you can see here this this is a density plot of the our session length um we can see that the most common session length is about 10 seconds and uh the majority of sessions are shorter than one minute and here on the link this is the link to the full pdf report so we put a lot of detail about how we process the data here and a lot of the exploratory data analysis results um we also break down the session length by click through by the number of visits and here we put the survival curve by the top 30 languages so um each great curve stands for the survival curve of one language and the black curve is the smooth medium we also um highlighted the English using session and Russian sessions by blue and red so here this point shows that 48 percent of the English using sessions lasted longer than 10 seconds and however 66 percent of Russian using sessions lasted longer than 10 seconds so uh we can see clearly that um the blue light is under the black light which means that our English sessions is shorter than average while the uh Russian using sessions are longer than the average and there's a lot of other results down there so feel free to check out this report and we also put all our um analysis code base on github um you can see we organize all our code by the steps in our workflow so this is the script to retrieve data um this is for exploratory the data analysis and refine the data um and we also include a header.txt file in our art markdown report to cut to select the font and the color we want to use in the pdf report so okay so that's the first example um the next one is analysis for an ab testing um so our discovery search team one is trying to improve the relevancy of our search results and we are trying we try we want to try a new um search ranking function um we did a first ab test on english Wikipedia and it turns out the search ranking function works very well we are showing our users more results and our users engage with the results more but then our uh engineers realized that this ranking function might not work very well for languages that don't use space to separate the world like chinese japanese and tai so we decided to do a second round of the ab test and we are interested in comparing the following metrics between our control group and test group the first thing is the zero result rate which is the percentage of searches without any results and we are also comparing port score which is a measurement of relevancy based on the click and of course we are interested in the user engagement um so we are comparing click through ray position of first click result and then we also want to know after the user click to the search result how much do they like the result so we also look into the time they spend on the visited page and how often do they scroll on the visited page and finally we wanted to see how often do user reformulate the their query because we are assuming that if our user is satisfied with the search result they will less likely to reformulate their query and so um you can open so we put a full report for this analysis on uh github pages you can open this link so this is um this is the html report we create a also we put background how we process the data here um the aggregation table and of course the results here we can see in a test group the test group uh the zero result rate for the test group is lower which means that we are showing our uh users more results but however when we come to the port score um the test groups a port score is getting worse and the click through rate for the test group is getting much worse which means that we are showing our user more result but they might not be very relevant and we should not implement this search ranking function for those languages and of course um here on the top of the page we put a link to our art markdown report and analysis code base and comparing to the pdf report i just shown before um art markdown allows you to uh add more features with the html report here so for example um we add a little bit uh yeah here you can let the users show or hide the code and we add a little bit javascript code here to let user click on the image so that they can see the full resolution uh of the image and we also use a package called captioner uh the package allows us to have a finer control over the caption um and then also uh you can add caption with that package you can add captions for not just the figure but also tables the code snippets um and you can even referencing a a figure within the title of the figure combining with ggplot2 yeah so that's it and yeah so i think in every data science project there's a roughly 80 20 rule like about 80 percent of every project are can be solved by the same tools or the same packages and so if you have a solely foundation of uh in that in those tools that your work efficiency will be improved largely and i also wanted to recommend a book by hailey vikin the r4 data science i think this book covers many package that we use for the 80 percent of the job and it has a lot of great details and examples and that's all of it um any question thank you the goal of our group is to make anything uh make make this vikin media and the articles on vikin media more easily discoverable for example like i talked about the main job of our group is like improving our search engine so we are trying to make our users um find the more find this information they want more easily and also like uh the other projects to like um the wiki the project related to wikipedia.org we are trying to to let users have a better user experience on wikipedia.org so that will be more likely to click through and read articles on wikipedia yes yes uh yes we uh yes actually um most of our chat back come from google yeah hi she'll see uh thank you for the presentation thank you i have a couple questions that's okay um one more or less relates to her question uh i was wondering if you could elaborate a little bit about how the analytics group is organized within wikipedia what's the composition of the group is it centralized is it the centralized it or how it works that's my first question um currently um the we i think we only have four people whose title is data analysts and we have two data analysts in discovery department uh one in editing department and one in the reading department so we are kind of like decentralized and but we talk to each other all the time of course um but i don't know like um we are also thinking about like maybe things can be changed and we can we may be uh become a centralized data science team in the future but i don't know yet yeah and uh my second question is more technical it's more like a curiosity uh you mentioned that you guys have been using open street open street map files could you elaborate a little bit more on that um and actually i'm not very familiar with that project i'm not the because we have another data analyst i'm not the main person who support that project um but if you're interested i think you can check out the our website there's more detail there so there's a page for discovery department so um here you can see what the maps team's doing and i think they have some demo of their work there as well hi chelsea that's a very interesting talk i i got a lot of from me so thank you very much uh and i just have a quick question so i'm very interested in the usage time where you guys tune when you check user are still idle or active on the web page so for that project is that public and github and like available for public consumption or that's just more for you guys yes it's public you can check out all of our code and our art markdown file and that's like from the viki media uh page yes it's um um if you if you're interested i can share the slides i'll put a link of the slides on the meta page and uh here on the slides there's a link to that report so feel free to check it out thank you very much hi thank you i really like thank you um i'm working through reading our for data science now and i was wondering if you had any recommendations from your personal experience of building up experience and kind of getting to the point that you're at now and if you have any kind of open data sets that you recommend working with or any of the expertise that you talked about with the four different packages um just any suggestions you have and gaining momentum it's kind of an open question maybe how you started or um just some projects to get started with project to get started with um i think well because my background is statistics and i've been i've been training to use art when i was in school so um so i kind of learned art from school but uh so i so i might not be very able to give a lot of advice on like self-studying thing but um i think you can try something like have you heard of kegel yeah yeah i think there's a lot of open data set on kegel and there's some competitions there and i think there's a lot of people writing tutorials on how they solve a problem and i think as the starting point you can just read the tutorial that those people uh write about and just following the read their code and trying to uh trying to replicate their analysis and i think it's a good way to learn at a start okay are there any people that you follow right now or like kind of people whose work that you right now not really i kind of like learn in my job yeah okay thank you i was just curious um what what you do in your department to like review code like for each other or or like i guess how do you check for the quality of your your programming uh yeah so um so there are two data in the list in my department so me and the other person and we always uh reveal each other's code and reveal each other's report so is that what you want to know yeah yeah yeah i think that's very important because we always we always make stupid mistake and it's always good to let the others to check your work hi oh hi thank you so much for the talk um thank you um i have a question which is that the shiny those shiny dashboard are you showing earlier um or those open to public um yes okay cool yeah the um yeah i also put the link to all our dashboard here we have five dashboard in total hey so i have a related question um so usually when you make shiny dashboards you kind of have to run them in r you have to like have an r process running underneath like it doesn't just export as an html um so how i don't know if this was something that you did or someone that someone else who knows more about web stuff like uh but like how do you put your shiny dashboard like on a web page um so we so right now we are running uh our shiny dashboard it run a shiny server in a vagrant virtual machine on one of our server and it's so uh it's been so the setup of this whole thing it has been helped by other colleagues in the wikimedia foundation but um me and the other data analysts are also learning to to try to like when when there's a bug we are trying to solve it by ourselves and so we are learning a lot about like how to uh maintaining it also yeah so it's just uh um so how do you define a session uh because you're measuring the amount of time you know per session how do you define a session uh if it's just i open the web page um it's just the amount of time it stays on that page or you know if you switch to another page or let's say if i go you know if i'm browsing wikipedia and then i go to bed uh you know and next morning still there's do you consider that entire session one session i'm sorry can you say one more time sorry uh session because you had a plot of the amount of time spent per session right you're saying 10 seconds maximum a lot of people mostly 10 seconds per session yeah my question is really what is the session how do you define a session so uh when when when a user so um you can roughly take a session as a user so when a user navigate to wikipedia or then it starts a session yes so but my question is like if i leave the you know the as i said i'm just like browsing wikipedia so i open and go to wikipedia.org for the other page and then i go to bed uh and then next morning shoot there how how do you account for that that's my question so that's uh so when uh so if you so if you uh leave before the 15 minutes and uh we stop checking you and when you come back uh did the next day if you're selected again then we'll start a new session i see i start a new session next day okay yeah thank you no if you come back before 15 minutes we will renew the timer and so it will set so it will it can be longer than 15 minutes yeah one session yeah if you if you if you leave if you stop browsing the uh if you stop browsing the page then after 15 minutes we won't check you anymore yeah so and you're just staying there so i will so we'll stop uh keep checking you so if you're browsing there you have some action like if you scroll we can check the scroll action there yes this data set though wikipedia.org yeah yeah yeah yeah there's actually many other data set which is public so um except the so there's a lot of so this data set up public so if you are interested in you can do any research you want using this data set and i think the foundation also publish many other aggregated data set um somewhere else i can't remember off the top of my head right now but there's many other websites there that publish our data set and feel free to use them do whatever research you find it interesting yeah uh what do you mean by content no but i think you can i don't i don't remember anyone have do such aggregation but there's i think there's it's it's a very it's a very interesting topic and i think someone is doing some research about it and i can't remember neither but um i think with the public data you will be able to do that but of course the category of the wikipedia page is a mess it's a mess so um we are actually working on it and trying to make it more structural yeah yeah yeah yeah because um many of the category you know it's created by human and they are not very structured any other question any other questions for chelsea thank you so much chelsea that was great thank you thank you so now we'll um open it up to networking uh our director of recruiting amy elder is sitting right here in the back so if anyone wants to swing by and say hi and maybe ask any questions um and then our um senior technical recruiters here as well so feel free to swing by and say hi uh and thank you so much for all coming on a tuesday night we really appreciate it and we hope you enjoyed the talk