 Time we can get started. Hi, my name is Callie Doffy and I am a data science and red heads open source program office And today we are going to be talking about community metrics and really breaking down What to measure and why we want to measure them? So We will be discussing today first. What is the value of community metrics and what can that bring? second we'll be going into the methodology behind Generating community metrics and lastly we'll be going into the analysis lessons learned and the things that I really would like you to Leave today with whenever you're going about creating your own community metrics so first What can strong community metrics enable? It's really break going into keeping up with our communities and the others that we care about I'm not here to tell you that data analysis or Automization is going to be the one thing that informs your community decisions. It's actually the opposite This is to build on your own open source community knowledge or incorporate others Maybe you have somebody on your team who is really good at in establishing community initiatives or meetups And there's another person who's great with CI CD whenever you collaborate together on metrics You can start to incorporate everyone's unique perspectives and start to take those into account and next We all have a thousand and one things to keep up with every single day And it never feels like we have enough time if getting an answer about your community is going to take 10 15 hours You're not going to do that regularly if ever and last there's so much data around Repositories and every single aspect of our communities IRC channels all of the above It can be really overwhelming trying to consider. Okay. I have all of this information There's so much pressure to be data driven. How do you take all of that and pick out the little needles in the haystack? To figure out what's going to bring you value so first thing we want to consider is What perspectives we want these metrics to gain or to bring to you or to share? First is considering whether we want it to be informative or influencing accent action Is there an area in your community that is not really understood and you're trying to take that first step of getting there? Or is there initiative that you're trying to decide on or measuring an initiative already in place? Next is looking into whether we want to expose areas of improvement or highlighting strengths There's going to be times and you're just trying to hype up your community and show it how great it is Especially when you're trying to show business impact or advocate for your community But when it comes to informing yourself and trying to decide what areas that you're going to prove Identifying shortcoming is is where you're gonna get the most value out of your metrics There's no problems with highlighting strengths, but there's a time and a place Don't use metrics and visualizations to be the yes man inside of your community to always tell you how great you are but There is times for it and sometimes a morale boost or recognition for what people are doing great is also important The last is we're going to look into is trying to go from either community impact or business impact the languages that many businesses speaks is Numbers and data and it can be incredibly difficult at times for advocate for your community without having this behind you for people to listen it can be a good way to bolster your points and Kind of like I said meet them where they are But there's also we want to show your impacts of your community in the open-source ecosystem in general or how your the community aspect impacts your code With all of these it's not always an either or situation of what you're going to get out of your metric But framing what your goal is is going to make it where you have a much more deliberate Visualization whenever you're trying to establish your goals Now we're going to take a little bit of a step back From just looking at community analysis to data science as a whole When talking about General data science or machine learning work some version of this workflow is what you're going to see people describe For this presentation, we're going to be focusing in on that first step codifying metrics problems and metrics with a little bit of a splash of the second For looking this from a data science perspective this Presentation can be viewed almost as a case study of these first two steps and These steps are often overlooked. It's actually one of the main inspirations of this presentation But the true value of your metrics can come right here You don't wake up one day and just know exactly what you want to want to look into so Let's go into the true focus here codifying problems and metrics I Know tooling is something that people love to debate and I am one of them But that is for another time come talk to me afterwards and I would love to go into it But we're really going to start to break down here Let's truly start figuring out what you want to know Then going into what data that you have and get us to our true goal of thoughtful execution of data analysis So let's go into our analysis angles. These are me some different scenarios for metrics and visualization building These angles are the own not at the only ones But just the main examples for these top for this talk and they can be applied generally So let's go into our first scenario. So let's say you've already done a little bit of data analysis You've gone down this path. You started looking at things maybe like the number of contributors over time or Commits you kind of started to figure out what's generally useful for your community So let's look at the number of contributors over time say, you know that At this point you have a hundred and fifty That's great You know that these are the amount of people that have been in your community over time But how can we take this a step farther? What if we decided to break it down into active versus drifting contributors where active is somebody who's been Has done a commit or some type of contribution in the last month or two You can put that whatever time interval you would like there and drifting is somebody who has not You can start to see the breakdowns and breakdowns in your community. Is there a large group of people leaving? Is there something going on there? Or is your active contributor base staying strong and consistent? Another breakdown of this is looking at repeat versus flyby contributors of the time with Flyby we're defining as somebody who's done less than we'll say for this example for Contributions so whenever somebody comes into your community How many people are actually becoming active members versus of somebody who is coming in maybe opening an issue to making a comment And then leaving and so you can actually get some and some real insight into what these contributors are doing Another example of this is commits over time Say this month you've had hundred commits and the next month you've had 40 It gives you a little bit of something but let's try to go a step further here So what about the debt we can look at the depths of commit over time? Maybe in those a hundred commits you've only had I don't know a couple hundred lines If you had a couple hundred lines of code changing saying that's something you really care about is looking at lines of code Versus that 40 commits was changing thousands of lines and many different files That can start to tell you do we need to put some more maintainers in here Do is there's some analysis that we need to go into to make sure our code base is still stable after such a large amount of change? Another way we could look at this is commits by a subset of contributors is Your repository deeply dependent on a small group of people. Is that something that is? Sustainable is that number that you're trying to change this can really take from just knowing okay We've had 40 commits over time in the last month to some more deeper analysis and insight on your community and So this next scenario is going we're going to go into is community campaign impact measurement That's a lot of words. Let's break that down a little bit. Let's say that your community is trying to establish a meet-up or Some type of initiative around bringing in new contributors How do you start to view your impacts and your goals around these initiatives and These two steps actually feed to into another and let's discuss why Once you establish your campaign goal and campaign goals you can start to determine What can be measured and to detect impact? With that figuring out what can actually be measured can feed into act to establishing the goals of camp of the campaign It's easily to fall into the trap of being a little bit hand-wavy and not concrete with your goals And that can be really hard to make substance of Being knowing that substance Lee if your impact your Give me a second. You're a campaign is making an impact Of an example. I really like to go with here is some work that I've been doing with the fedora community They have a goal of doubling their contributor base by 2025 Which seemed like a pretty straightforward goal, but with that I started discussing with them. What is considered a contribution? Is it just a code commit? Is it a website change? What actually counts for you? and Where do you want to look at this? Is there only a subset of repos is there? Chat rooms like what do you actually care about whenever it comes to doubling those contributors? And so defining what we want to look at can actually help Going towards the establishment of this initiative knowing how you want to go about this where you want to Bring your efforts and so the scenario three is where we're going to be spending the majority of our time here as A lot of the prior examples can be viewed as different parts of this workflow this is a living cycle and improvements and extensions can always be made and Whenever you get to that point you can decide and whether spending times towards those improvements are worth going into deeper I don't know about you, but I'm pretty big on the examples We'll first walk through this as at a like a theoretical level and then go into an in-depth example So first step here is breaking down your focus area And this is when you want to bring in the perspectives that we talked about earlier in this presentation. I Like to think about this process and in three parts first. Let's think about the magic April I don't know if you'll ever had this toy as a kid It was a plastic April and you could shake shake it up and ask something like am I gonna win my softball game today? And it would give you some result like absolutely or definitely not And so I like to think about here is Let's think about your analysis area if you could get any answer right then and there What would it be whenever there is no limits once you have that question that thing that you would really love to know? Let's talk about the data From your magic eight ball question What are any data sources that could potentially have to do with that question or more generally that focus area and Really cast out your net wide I think that one thing that's really helpful in this step as well as Establishing how accessible this data is if it's repository data from the github API That's going to be a little bit more clean a little bit more accessible or if maybe you're looking at your IRC data or some type of Chat room that's going to take a little bit more cleaning both are good to establish But it's also good to take into account how much time and effort it is to get that data and to the point where you can make analysis and So now with the context of your data and your large-scale goal What are some sub questions that can be answered to bring you closer to that proposed eight ball question? You're kind of taking in these questions and saying okay with this data What can I answer directly and they don't need to all add up together to get to your big grand goal? It's actually sometimes better not to because if you're trying to piece together Maybe four or five different visualizations. You're like, okay, this brings me to my goal You really need to take apart the assumptions that you're making to say, okay These all plus plus plus plus equals that so there's this thing to take into account So next we're going to be going into converting each of those sub questions into a metric And this is a process that you would repeat for each of your sub questions So first for each of the following proposed questions First you're going to want to select the specific data points needed So even if you're saying okay all of the github all my github data Are you trying to look at contributors? You're trying to look at issues and in what and what attributes of those are you trying to look into and want to integrate into your visualization or metric The next thing you want to consider is how do you want to represent this data? Is it a bar graph? Is it a percentage is there means like really trying to break down? Okay now that I have this data What is the best way to aggregate it to get us to the insights we would like to have? Last step here is to try to make a hypothesis of the insights and actions that you would like to come from this Because I think that can also help with the process that we'll be going into next Once you have that first work in project on progress metric You're going to kind of go into a repeat process process that could go on forever When you have that that first metric this is when you want to go and start getting some more larger community feedback Your biggest skeptic that is going to pick it apart That's who you want to bring in here to really start asking questions for you can go and start improving on this first Go around of metrics I can say many times the best ideas come from Showing an initial metric and having someone be like oh, what about this it's actually something that's happened multiple different times During this conference just in hallway talks and some of the people in the room has been involved in that There's different ways that you can kind of somebody's in new eyes new perspective, and it can be very helpful in this process So the last step here is analysis in action We're gonna go to three different parts here and with each of your visualization You might not do every single one of these but we're gonna break it down So with this first one we want to determine if this metric follows what is currently known about your community So if it does Great, but let's make sure that to take a step back and see if there was any assumptions made Whenever you are generating this metric that made it aligned to what you had already known about your community It's a good time to just do a little bit of a sanity check to make sure that maybe some of your internal biases was not like Integrated into this Visualization and if it doesn't align with your prior knowledge you have two things that you need to consider one Was there a data or calculation issue here? Do you need to have someone to go through and really digest the code that you have made or the data collection? Make sure there was no problems here or was this a Priorly misunderstood part of your community. Is this something that is bringing in an entirely new insight and you kind of have to Take a breath and realize that maybe something wasn't known the way that wasn't exactly how you had known it before The second part here is implementing community initiatives now that you've digested all of this data analysis You can start to determine different community initiatives that you want to attack with the insights you've been given and Here's where you would want to determine measurements of success a lot like how we went into in that second scenario The last one here is observing Community initiatives that have been informed by the metric or whenever you generated a metric around it If there's a case where this impact is not observable. There's a couple things you might want to consider One are you measuring the right thing? Is there initiative happening and there are impacts there, but you're not looking in the right areas whenever you're doing your analysis or is there a Initiative strategy that needs to be tweaked a little bit to actually get the results that you're hoping to receive So I just talked a lot about a bunch of high-level things. Let's go into an actual example of this workflow so This our example analysis area is going to be new contributors I want to learn more about our new contributors and what their activity is So first I'm going to start with my magic eight ball question What do I wish I knew if I could get a straight answer right away in this scenario? I want to know if people are having an experience that converts many to being a consistent contributor So what data do I have that could go into this analysis area in the magic eight ball question that I've proposed For me here. I'm looking. I want to look directly at the contributor activity Around repose with timestamps. That's kind of my little my data area that I'd like to go into here Now with the data in our magic eight ball question Let's break this up into sub parts and see each of them to the end I'm gonna with those couple of different parts. We're gonna go question by question Just we can stay on one train of thought So one of the first ones let's start with how are people coming into their community? Looking at new contributors, let's see what action they do first. Are they opening an issue? Are they making a comment? Are they doing a PR? so The visualization that I make here would be first-time contributors like their action broken down by quarters So say we have a bar chart and two first-time contributors did an issue So you see a little issue and then on top maybe one did a PR and so that would be something how that would look like So I go and I had this visualization and maybe I go talk to a colleague and we come up with an extension What's a little bit further breakdown? How about what if for first-time contributors? We broke it down to see if there was a difference between the actions if somebody was a repeat contributor or a drive-by Contributor so you can kind of see those two graphs next to each other. Let's see if What people are doing when they come in when they're just a drive-by or a fly-by or if they're coming in and becoming a repeat or active member Are those first actions different? and Some potential actions that could be performed and could be informed by this is maybe something along the lines of Does our current documentation support our contributors for whatever the top? The top first action is by repeat contributors like whenever somebody becomes an active member. Are we supporting? What is the most likely first action? Is there a contributor area that is not common overall, but is a good sign for someone to be an active contributor? Let's say in this example PR is the most common for repeat contributors, but not for fly-by Could we maybe label good first issues in our contribution documentation? Would that be able to hold help or maybe adding some form of like a PR buddy for first time first time people opening a PR? These are different ideas here We can go and break down a another sub question Another one it came up with is what is the conversion rate from first-time? Contribution somebody make one contribution to an active or repeat Contributor will kind of since we went into repeat a lot in the last one Let's just go with active and that somebody will just say has made a contribution in the last month We can make a metric saying okay What percent of those first-time contributors actually get converted to a active community member and Some questions that we could ask around this is that is this number percent going up or down? Is there something that we did in the past maybe a year ago whenever this this number was higher that we're not doing now? Was there initiative that change is there something that we can learn from what we've done in the past or? Is there a trend for the ones that are sticking around are they getting more? communication from current members in support are Whenever their first time or early on community members are their issues and PRs getting the attention quickly and One last example will go through here is it's our code base really dependent on Drive-by contributors, maybe I want to see What number of what our contributions are looking like for each of the breakdown and A real question we can look at here is just is this a ratio that we like Is there a lot being done by Drive-by? Is there an underutilized resource that we're not doing our part of bringing in? So from here, we're going to start breaking down the analysis lessons learn and the scope that I would love Everyone here to leave with today whenever they're considering doing community metrics So first one that I will say is numbers in data analysis are not facts They can see we can make them say anything and the internal skeptics should be alive as well and well Whenever you're looking at somebody else's metrics or your own The iterative process of really considering what you're how you're looking at your data is what's going to bring value You don't want your analysis just to be a yes, man of your of the beliefs that you have hold Take time to take a step back and truly and evaluate the assumptions that you've made So and if a metric just points at a direction to investigate that is a huge win You can't look at everything and think of everything off the bat And if a graph just brings you down a rabbit hole that you wouldn't have thought of before that's a good thing a conversation starter alone can bring you to a new place and Let's be honest here Sometimes the exactly what you want to measure is not there But it might bring be able to bring valuable pieces of the puzzle With that you can't assume that you have every piece of the puzzle to get the exact answer to your original question If you start to force an answer or solution you can read lead yourself down a pretty dangerous path led by your assumptions and Leaving room for your path for your path or goal of analysis to change Can lead you to a better place or insight than your original idea did Here we're gonna go into Data analysis. It's really just not it's the start not the solution each of the scenarios that we went through today is starting at different points of the data science workflow or the specifically the workflow for defining Metrics the first one we went into was that second go around of data analysis when you started to bring in more people and More insights mothers the second one is whenever you had your goals and scope Establish you knew what you wanted to look into and the third one was coming in at a completely new idea of building from scratch You should never stop asking if there's more context needed or if this is truly answering the questions that you want and And There is so much going on whether it be data other community Responsibilities if we can cut down the amount of time that it takes to get information or making it's an easy system to check This on a regular cadence. There's a huge win for any community member, especially your community architects Think about how much this can be used to inform the way you think about the community Even if it's not a direct conversation about the top the topic or metric and this makes it sustainable 10 to 5 to 10 minutes every week to check something is much more realistic to many hours looking into it So here's some of my closing thoughts Data is a tool. It's not the answer But it can bring us it can bring together insights and information that would not have been accessible otherwise methodology is the most important part of this and breaking down what you want to know into manageable chunks and building on top of that and Taking a step back from open source data analysis This is all a great example of the care that needs to be taken into account for all data science you must take into account the nuance of the topic area and We all know that open source community has a lot of nuance um The process of look working through what to ask and what answers you want is often overlooked and This is where freely some truly insightful and innovative process can come in which is much more important than any specific tool that you use and Today if you are a community member with no data Data science experience and looking for a place to start. I hope that this can show you how Important and valuable you are and your insights is to this process We have to have people who truly understand the community in this case our data area to be able to make really great Visualizations that are not go based off of incorrect assumptions And if you're a data scientist or someone implementing these metrics or visualizations You have to listen to the voices around you even if you're also an active community member So Thank you very much Whenever I'm gonna put up on the screen as well some examples of actual implementations of some of these visualizations and I actually wanted to take a second to take a step back and Establish what has happened today. I would really love to urge everyone in this crowd to donate to your local abortion fund Today and there's many people who are going through a very rough time please support the people you love around you and If someone if you are someone you know needs help getting access to abortion today tomorrow or any day after please reach out to me Thank you. See if I can get this data analysis on the screen and we can go for questions Actually isn't it's dash Yes Back to my argument that tooling Everyone's choice, but yes opening up for questions Hi, thank you very much for this because I am responsible for spinning up an Ospo at a new startup And we're gonna be measuring a lot and so this context is feels much more like Repos that we already own and measuring community involvement. Would you use the same methodologies for tracking? staff members Contributions to upstream communities because that's a metric. I'm being asked to track for our management To justify funding as we go through different rounds of funding as a startup. Yes, absolutely I think that goes into well like the whenever you're considering your perspectives there You're trying to show your business impact to advocate for working with open-source communities And so I think this all this methodology really applies to that Thank you very much I think I think the data analysis that you've shown makes a ton of sense for Like a single repository that you're trying to manage for folks that are trying to Aggregate across many repositories What are What about this do you think changes? I mean certainly the methodology is probably the same but In terms of how you think about metrics like A flyby contributor for the Linux kernel is probably very different than a random npm module And so in aggregate, how are you thinking? How would you think about? Contributions in that in that view? Yes, so when building like a lot of these visualizations that I'm building it's supposed to be used For multiple different sizes of communities. So here I think it's important to like I think the first example for the drive-by and contributors per quarter Having a the ability to change what that threshold is is really important. So for a smaller community, maybe Four or five contributions would Warrant somebody to be a repeat contributor versus maybe a really large one that threshold is different So I'd say that there's not going to be a one size fit all for different sizes of community Or even if you're looking at different repositories And so I think that's where the customers like the customization really comes into play And can make it a lot more valuable because even you might not know even for your own community What that threshold is that makes it at a table like this really helpful and so having that ability to change Can really bolster it if you had to blend one at like Instead of like if I have a hundred Repositories that I need information on Looking at a hundred different variations of this would be very difficult um Do you have any like I don't know like data science intuition around like blend blending that in a way that would Make sense. Does that does that question? Yeah so For this tool and that we've created actually we have like a search bar above which allows for you can look at an entire Organization or maybe your repos aren't in an entire organization But you can select all of them and see them all that data grouped together Or you can start to deselect so you can kind of see it What does it the entire aggregate look like and you can compare it to like a singular repo? So I think that that context can be really helpful So you can see what the average is like and then see when you break it down to maybe one or two repos a small portion What the differences are Thanks, I'm I'm this is a Been really great and got a lot of questions in my mind But the what I'm was What came up most this morning was what what are my magic eight ball questions, right? And so like two examples would be Like something like where where are people? Falling down in the contribution process Or how is it that you how does an idea go from here to a feature that's completed on the other side? Like who are the individuals involved in? Doing that Like who are the core maintainers who are the people that you need to make that thing go and how do you like How you find that right? So I'm just Pausing like those are my magic if all questions that you have any where would where would you begin for trying to to parse through that kind of Come up with some more to go and chase for that I think the for the next step there is really like going into okay. What data can we look at around this? it's going to always be that next step and It's going to it might not be something that you can come up with right on the spot You might need to talk more to the community members around you what data is available And that's really where that brainstorming spot the beginning is going to be taking you have to take that into account And maybe it's a group of people who are community members and you want to have somebody who has a little bit of data Insight to be help curate that a little bit And so I guess like the next step there would be trying to determine what data is available Uh, so I was wondering and I may have missed this because I came in a few minutes late Um, it what do you what suggestions do you have for a new program? When data is so scarce. Do you have any thoughts on like what would be some magic a file questions initially? That's a good question. So you're saying a new program as like an open source program offers or new repos That's really interesting question. I guess the first thing I would recommend going to is that maybe if you have other Communities that are in the same like similar like technology ecosystem or some communities you look at like, okay This is something that I think is the right size how the cadence work I would really recommend going and talking to those people and kind of see okay What has he found that works and doesn't work and if they're doing metrics and kind of see Okay, what how can I take this and tweak it a little bit to work with my smaller community? So I think that it would be really good to establish like goals out front of what you would like to see From your community. How would you like to see people interact? And so then you can start to build those metrics and so you can see how that growth happens as in real time Yeah, there's nobody else. I definitely wanted to give a little bit of a quick plug if you are Ospo or a community architect or somebody who is trying to kind of take that first step to data analysis You have your questions, but you're trying to figure out the tooling portion of it. Um, please contact me I'm we're working on a project at red hat called project San Diego And we're really trying to figure out what people want to see What are the visualizations are important to their community? And we would love to help produce that and make some of these visualizations the ones that y'all would like to see So I'll be putting my information up here and please reach out to me and I would love to work with y'all Thank you