 Good afternoon. My name is Brian Prophet. I am a community architect with the Red Hat open source program office and I'd like to thank you today for joining my talk about community data or not community metrics. So we'll start the talk with a little bit of a story about how data can be very good, but not necessarily useful. Okay, so there are a lot of things in nature which can be very beautiful, but also rather harmful. And data is a part of that story. So let me tell you about the blue dragon. The blue dragon is a three centimeter long sea slug, also known as a sea angel, and it eats Portuguese Manivur jellyfish. So it's very beautiful, lovely to look, but because it eats poisonous jellyfish it can basically kill you with the little things at the end of its appendages. So this is one example of something in nature that is very beautiful to look at, but don't touch it or you might die. Let's not forget the mantis shrimp. This is a 15 to 30 centimeter crustacean. It can see more colors than any other animal and as you can see it has a lot of colors to see. It can also shoot its front legs out to the velocity of a gunshot from a rifle. And that's basically how it kills anything that comes near it and it's hungry. So and that will strike any prey with 1500 newtons of force. I'm sorry, I don't know the metric conversion for newtons, but that's basically a sledgehammer of force dropped on a tiny little fish near it. So again, these are things that are very beautiful, but they're also they can tell, they can deceive you. They can tell you the wrong story and that's how they basically survive. So let's talk a little bit about data and how data can be beautiful, but also deceptive. So here's a nice picture of data. This shows all of the data hacks and breaches in the year 2017. And as you can see, this is telling a story about how many users from Twitter were exposed from a data breach and Yahoo and MySpace and Equifax and Facebook and all these different breaches and hacks. And this picture can tell you many stories. So one thing it will tell you, if you are looking at this as a security expert, is that there is a lot of work to be done to secure data sets of customers. And some companies and organizations are clearly better at this than others. If you're looking at it from the point of view from a consumer like any of us here in this room, you might say, I'm never going to put my information on the internet again, because it's going to be lost somewhere. So this can be a useful picture or it can be a frightening picture. It really depends on how you want to take it. This is true of almost any form of data that we will get. Now this is a little bit more germane to what we're talking about. Here is a picture of community data from Batergia. Batergia is a tool. It's a company from Spain. And they use the open source software known as Grimoire Labs. And Grimoire Labs is designed to look at sources of data around community and open source development. So it will be looking at GitHub and GitLab and any Git repository. And it will be seeing like how many commits have been made at a certain time, how many pull requests have been closed at a certain time, and how many people are involved in this. And this is this one slice of data from a project that we have at Red Hat called OpenShift. Okay. This sort of data can be very useful. And also it can tell us a story about how the community can be perceived, whether or not it's running efficiently or not efficiently. And the problem is this picture can tell you many stories, but unless you know what you're looking for, this picture is not very useful. And this is actually something that happened to us in the open source program office when we started using data sets like this. The key to any kind of data use around community or any data set is that questions are critical. When we were talking about using this data set that we saw a minute ago, we had a lot of data coming in from our open source projects, projects like OpenShift and Overt and RDO and Fedora. And these are all different projects that are very important to Red Hat's business because everything we do is in the open source community. And then we take that code and then we make it a little bit smoother, easier, more efficient, and then we sell support from that. So we have to have our communities be strong and efficient and vibrant. And we needed data to show that. But the problem was that last slide we had, I'll back it up, we have a lot of data here but we did not know what to do with it. It was not telling an efficient story. And this is why this is the point of my talk today. We have to come up with better, we have to know the questions that we want to ask about our communities before we even come to the data part of it. And that seems backwards but it is very important that we make that a priority. So we're going to talk a little bit about what makes communities healthy. How do they work and what are the factors that make communities strong? In the past, you might say, well, my community has a million downloads of our software every month. That's great. That means a lot of people like your software and they like to use it. Does that make you a healthy community, though? Maybe not. While people are downloading your software, maybe you're having a huge fight inside of your community about the best way to develop and the best tools to use to release your software. Or where are you going to have your next conference? You don't know these things. An outsider would not know any of this. They just know your software is going smoothly. But eventually, if your community is unhealthy, the software that you create will not be as good. There will be problems. People will start developing inefficiently and then you will see your overall community help decline and then less people will start using your software. So this has a ripple effect as you move through your open source projects. So I'm going to talk about the four things that we at Red Hat and also we're working with a project in the Linux Foundation known as Project Chaos. Chaos is community health analytics for open source software. And Chaos has four different areas and Red Hat is very invested in these areas that talk about aspects of community health. The first one being evolution. Evolution is we look at the products or projects entire life cycle. So when a project is growing and new, there are certain things about it that will be say it is healthy or not healthy. When a product is mature, you're looking at other metrics that determine whether or not it's healthy. So if you have for instance as an example a new project and there's only one company like say Red Hat that is using or like building that project, that's okay. It's brand new. It's a baby project. It has to be nurtured and grown and eventually more people will come and work on it if the project is healthy. If the project has only one company or organization working on it when it's been around for a long time and is mature, that is a problem because now no one else is coming in to help you and collaborate with your project. Your company is the only one using it. Eventually that's going to hurt you because you will lose creativity. You will lose innovation and probably you will lose users because no one will want to use their software even if it's really good because they won't want to participate and see it develop and grow and then somebody else will come along with something that is better and people will go to that. So it is important to have as many organizations and companies involved in a project as you can. So this is what we're talking about when we talk about a growing project and a mature project and even a declining project. Just because a project is not being used as much, it can still be retired gracefully. The Apache Software Foundation is a good example of this. The Apache Software Foundation retires their projects into what's known as an attic. An attic is where you store things in your home and you're not going to see them for a while. In the Apache Software Foundation, that's a key thing because no one is working on the code but it's still there. If somebody else wants to come later and maybe use it again, they can. The software is preserved to be used later. Another area within community health that is very important is diversity. When we talk about diversity, we're talking about lots of different things. We're talking about gender. So are there too many men, not enough women? We're talking about race. Are there too many of one race and not of another? And then corporations. As I said before, are there too many people from one company involved in this project and not enough of others? So diversity is a lot of different things but it is an important mark of health for a community. Value. Value is something that is important for community health especially when we're talking about corporations. What is the business value of an open source project? How do we determine that? How much money do I save if a lot of my software testing and QE is being done in the community? And I don't have to pay for people in my company to do a lot of QE for the commercial version of that project. That is one example of value. So that is something that cannot be ignored. Excuse me. And then finally we have risk. Risk is important to note because risk is something that we look at every day. So there's lots of different kinds of risk. Some people will say that licensing is risk. If I license my software a certain way, people have a fear. They're uncertain about whether or not someone will take their code and run off with it. That's usually not a problem but it can be and people do assign risk to that. Another example of risk is something, there's an English jargon for this known as the bus factor where if your project has one person who is very, very important to the project and that person maybe is hit by a bus which is bad, what will that do to your project? A nonviolent version of this by the way is the we call it the lottery factor. If someone wins the lottery and they win millions of dollars or millions of euros or one and they say buy because they're going to an island in the Pacific somewhere, what will happen to your project when that person wins the lottery? There are many, many very important open source software projects that have a bus factor of one that's bad because a lot of projects will use their software and if that person retires or wins the lottery or for whatever reason leaves that project, that project will languish and die and other projects that use that code are going to be hurt as well. Risk can be very, very important and also really tricky to assign. So these are the four areas that we look at. These are where our questions are going to have to come from as we talk about community health and community strength and the answers have to tell stories. We have to tell stories with our data as we go through these communities. All the metrics that I talked about are there are four separate groups but they also can be combined. I already did it in my first example when I was talking about corporate diversity in the evolution model of growth, maturity and decline. I combined the metrics around diversity to come up with a new question and a new story about where my project health is when I'm growing and when it's mature. These are things we have to think about and what the chaos project is doing and Red Hat, we are adopting this within our own open source program office, is that we're going to find metrics on a very low level that can be combined and form new questions. How many here are familiar when I talk about Lego bricks? Lego bricks are the little tiny toys that you put together in different ways and I can make a building or I can make a spaceship or something like that but they're the same bricks. That's what we're trying to do with project chaos. Make small atomic measurable metrics that you can combine to build the stories that you need to tell about your community and ask the right questions about the data that you're delivering. A very easy example of this is I'm looking at mailing lists in my community and every month my mailing list traffic is about let's say a thousand messages a month maybe you know and it's the same number a little bit higher a little bit lower it's about a thousand every month and then one month I have 1500 messages and the next month I have 2000 so my traffic is going up as we say in you know in data up into the right that's good right maybe well it could be good it could mean that more people are participating in your community it could also mean that there's somebody you know causing problems on your mailing list and there are fights going on in your community on your mailing list you have to know what's the why behind every data change in your community and when things start changing rapidly you have to determine why is it a good reason or is it a bad reason so our metrics try to set scientific quantitative reasoning and not just feelings and intuition around analyzing a community and once you do that then you can tell the stories about your community that are strong and accurate and that is a conclusion of my talk but I will be happy to take questions if there are any from the group yes okay so the question was I'll repeat it so it will be caught on the recording is the gentleman wants like what is a wrong metric and what how can we change that metric to be more effective and correct we better have a real a concrete example a real actor example case okay so I will I will give you the answer I was just repeating the question for the recording so a good example is one that I've already mentioned before so downloads and and and how that is historically that has been used as a metric to determine community strength a better a way to turn that around would be instead of looking how many people are consuming and using your software you might want to turn that around and say how how fast is your software being produced maybe when your software your project was first starting you were putting out a release every six months and that was regular but now your project is releasing every eight months or nine months and and it's getting longer and longer and you need to look at that and determine is there a problem in your community is the software so complicated that you can't do it in six months that would be that would be a good reason but maybe there's a problem with your you know your testing process or your your your release manager is not doing their job effectively so you need to look at you know not who's consuming your software but how is it being produced that would be an example of of turning that number around and seeing if using that to determine if your community is healthy yes sir right for the metrics right the community metrics there are some fields which you directly have such as like you said the number of downloads or the traffic on the mailing list but do we take into consideration some derived metrics things like relation between the increase in download and the number of increase in the pull requests or get a activity could we have some sort of a relation from that and we can identify that yes the increase in downloads is actually a good sign because there has been a lot of activity on the product yes so no so yes because if if you can so you we we need to start applying the scientific method so you have a hypothesis that you just said okay is the increase in downloads and and pull request is their direct relation so you need to what I would do in that situation is um tested okay you you if you do the scientific method and an experiment and this is what we all have to do when we're talking about these data stories we need to make sure that they are testable and repeatable if my download numbers you know if my pull request numbers change in a certain way did that affect our downloads that would probably mean that the quality of my software is getting better right there are fewer problems there people are more responsive to the changes more pull requests people are happier with the software downloads go are yeah yeah downloads go up right that's a story you're looking at right so but you need to test that and make sure that that's consistent right you know because if you if my pull requests if I have 201 month downloads go up 300 the next month down you know downloads go up even more but then I have like 100 pull requests on month three and downloads went up still maybe there's no relation right right so you have to test everything and that's and that's something I probably should have mentioned in the talk so that was a good question thank you one thing that I wanted to share with you too um what another side point to all of this is a lot of people say well my community is different from their community okay um and this is why testing is very important of your data one thing to note when we're talking about community health it doesn't matter what size your community is okay so Shanghai is a very large city okay and there are and compared to like a village in um Mexico it's they're they're very different okay their sizes are extremely different and their needs are but guess what both communities have similar needs there needs to be a way to get food to people who live there they're all going to have streets in Shanghai it's a multi-lane highway or multi-lane road okay and the village in Mexico it's a dirt path okay but it's still the same thing it's a it's a tool that people use to get from one place to another you know in the village in Mexico they need water maybe they're getting it from a well okay um or multiple wells in Shanghai it's from you know plumbing and pipes and a huge water facilitation facility okay that's it's the same thing okay you just are using different tools to do the same um the same processes okay both communities are healthy the village in Mexico is doing great Shanghai doing great okay even though they're you know wildly different in terms of what they are delivering to are using to deliver to their communities so this is why testing is important because you have to you know every community is going to be a little bit different so you need to test and make sure your theory is correct good question that is great thank you so much you've been wonderful