 minutes ago. So, Georg and Matt, welcome. So, for today, what we have is introduction to the chaos community. This is the acronym for community health analytics for open source software. So, I had a great pleasure to meet Matt and Georg from 2016-2017 when all these projects started. So, this was announced in September 2017, laying one of the open source summits by the Linux Foundation. So, Matt is a professor at the University of Nebraska, Omaha, and Georg is now currently holding the position of director of sales in B30, and both are running and sharing the chaos project. So, where are you already? Thank you for having us today, just to make sure you can hear me. Can you? We can hear you. Excellent. So, what we want to talk about is community health metrics data, and as Diane said, it's a nice segue from what we were just discussing at the end of the last session. So, the community health, what we mean by that is that we want to look and understand at the potential that our open source projects and communities can continue developing quality software. And when we look at the literature, there are different ways that we can look at communities to understand if they have everything they need. We can look at the code, make sure that it is of high quality. We can look at the community to understand who are the people involved? Are they being active? Is there good diversity? And we can look at the resources, make sure that our communities and projects have the finances to pay for the servers, or if they need special infrastructure, that everything we need for our communities is in place. Or as we heard in the last session, the commons, that the commons is healthy. And this is a matter of issue for a lot of different stakeholders. The users of software, they don't want to be exposed to vulnerabilities like we had with Heartbleed or Equifax, with hundreds of millions of data being exposed because there was a vulnerability in open source software, and that wasn't fixed in time. Organizations care to minimize their risks, but also to leverage their influence and impact and have an assessment of how they're doing in the communities. The communities themselves care because they want to be inviting places for their community members, and they want to, just as contributors, enjoy the work that they're doing. Foundations who are doers of open source projects and communities, they care about the health of their projects because it can give them an indication whether they need to have interventions, or it can also help them identify where the best practices are, learn from that, and then help other communities in their portfolio to become better as well. And researchers, naturally, they want to understand the world. They also want to understand communities and the health of communities, and they might even have their own open source projects that they care about. Now, interesting thing that Jeeb said, there are experts that we can call on to make decisions, and when it comes to community health, it's a very complex topic, and there are very few super experts who can look at an open source project and understand how it's doing, understand the community, and identify ways to improve the health, know where work is needed. And so because this requires a very specific skill set with lots of experience working in open source for many years, it's really hard now with open source growing. We have seen an influx of organizations and companies coming into open source, and there are lots of people who are just learning what open source is, how the communities work, and so on. So we need to take that knowledge of community health and come up with a different way that it's more accessible to everyone. And this is what this talk is about, how together we can be better, stronger, faster for understanding community health. This better, stronger, faster comes from a TV show, The Six Million Dollar Man, where the person was assembled and improved through bionics. And we can do this together. We can figure out what is community health, how can we understand it, so that we all can be better, stronger, faster. And so let's walk through, start with together. The chaos community is this place where we can work on this together. We started this project in 2017 at the Linux Foundation to create analytics and metrics to help understand community health, to have procedures and practices that we can share and understand together. And so it's a group of industry professionals and academics and open source practitioners from a variety of different projects and companies that come together, and chaos is the platform to talk through the thorny issues of what is community health and how to be assessed. And in this work, we are building practices to become better. We want to become better at understanding communities at scale. And we do this through metrics, just like organizations have been empowered to grow in size because they're using and leveraging metrics to understand what's happening throughout the organization. And open source communities have grown to be some of the massive, with thousands and thousands of people working at the same time. Super collaborative, lots of work gets done. But you can no longer understand the health of the community just by being there. You need to have help. And that is what we try to do with metrics and analytics. And we want to have this objective data to reduce bias in decision making, to identify places where we can work on our communities, help improve. And so we use this data to make decisions and be data driven in our management. And this can also help us to have a strong foundation for shared decision making because we can understand the community through the data together better than if we all relied on our own objective perspective of what is happening in the community. So we are building out these practices to become better at understanding community health. We're also building a stronger foundation through metrics. And in the chaos project, just to give a little background, when we started the project, we had, we started collecting a list of metrics that everyone was interested in. It was a really long list. And in defining them, we decided to split up into working groups that had specific focus, focuses, focus. The diversity and inclusion working group looks at metrics to understand how welcoming and inclusive our community is and having metrics around that. The risk working group looks at communities and projects and to understand what is the business risk. I don't want to be stuck with a project that I've used as an organization in my innovation stream. And then suddenly I'm left with maintaining it myself. So we want to understand that. Our license risk, we want to make sure that there's compatibility in the licenses. And it's compatible with not only inside the project, but also when we use it ourselves. We want to understand the evolution. This is a third working group. Is a project growing? Is it already mature? Or is it in decline? Understanding the activity levels and how that fits in with the history and the story of the community is super important. The value working group asks questions around, what's the value? What's the business value, monetary value for organizations? What about for individual contributors? Is there value in contributing to a specific project or community? Maybe there is because there are job postings out there for skills that I can build in this community. Or what's the social or the broader value for society? Is this community project making an impact in the world? And then we have common metrics. This is a working group where we sometimes call it the work group for the misfit metrics that don't fit anywhere else or that have broader implications. Like understanding what organizations our contributors work for. Understanding this organization affiliation then helps us to identify other areas like the risk or diversity inclusion. And so that's where the common working groups come in. Now, building up metrics is great because it gives us a shared language to talk about community health. When we have different communities that we're looking at, we can start to use the same set of metrics, even if we're not doing the analysis ourselves, we can start comparing what's happening in open source communities with others. So we're building up the stronger foundation. And then the chaos project is also helping us move faster by embedding what we are learning in software. And the software is where we put routines for collecting data. We identify the data sources and solve the problem of collecting the data. We're solving the issue of how do we present the data, the metrics, the analytics to users so that you can start doing that more quickly. We can also trial new metrics and ideas and practice because sometimes someone comes to the chaos project says, hey, I'm really interested in understanding this part of our community. What is the data? What is the metrics we can be looking at? And by putting that in software, we can start to see it and then iterate on that. Now, we have three projects, the Grimoire Lab project, the ARGAR project, and Dregat. And they're being used on a daily basis by professionals in companies, in communities, in foundation. Grimoire Lab, for example, has Beturgia as a company that provides services around this project. And I know the OpenDiff community has a dashboard that is powered by Grimoire Lab. ARGAR is also being used by companies like Betur or VMware. And then Cregat is a tool that allows us to see, like, get blame who has changed something in the source code, but not at the line level. It goes down to the token level or who introduced this variable or edited it last. And that is heavily used by the Linux kernel community. So VF software, it's out there available for everyone to use. And through our work on becoming better, stronger, faster, together, we have learned several things that I want to share with you. If you're thinking about starting your metrics journey, or if you're already looking at metrics analytics for community health, we recommend or we have seen that it's really important to listen what is happening in the community. And you can start doing this by collecting everything. You don't know what in two months and three months, what kind of data you want to look at, because your focus changes. You're starting to change your perspective. You're having new questions all the time. So make sure that you have the data available to ask those questions of the data. And when you start to ask questions of the data, we found it really useful to use the goal question metric approach, GQM. Start with what you want to accomplish as a community, as a foundation, as a company. And ask yourself, what do I need to know, whether or not I'm reaching that goal, or that can help me find the way for reaching that goal. And that is where the metrics then come in. And using the goal question metric approach gives a direct rationale for why am I looking at this data point. But data by itself, or metrics by themselves, or nice, pretty graphs, are not enough. We need to tell a story with the data. Community health is very context driven. And let's say we're looking at a graph that shows the number of new issues in the project have gone up. That can be a good thing or a bad thing. Maybe there are more and more users who are asking questions, which can be good. But maybe it means the last release had a lot of bugs and they're being reported. Or maybe it's not even related to the project. Maybe it's Google sum of code is coming up and there are more students interested in the project. And that's why there are more and more issues. So you need to understand the community and the context around it. And then use the data to tell that story. A word of advice or caution here is to avoid gaming of metrics. If you have incentives around metrics, let's say once you hit 100 commits on the code base, you're being recognized as a frequent contributor. People will start to change their behavior to hit those metrics. And commits, if you have contributed to a repository, you know, you can make small and big commits, you can have thoughtful commits that touch a lot of different files, or you can separate them out into multiple commits. And once you have that metric, the incentive is to contribute a lot of commits, whatever it takes. And that might distract from actually accomplishing the goal of developing quality software or documentation or whatever the community needs. That leads into the next point. You want to value all contributions because it's easy to look at the commit history. And focus on that. But a healthy open source project and community has a lot of different types of contribution. We need marketers who advertise our project. We need people who help in forums, answer questions. A buck triaging is very important so that the people who can answer know which issues to focus on versus the ones who can actually dive into the code and need to do debugging. So open source projects and communities have a lot of different work that needs to be done. And we want to find data that represents all of this different work. So with that, it's, and I know there's a lot of recommendations here. Just when you have, when you, if you're not there yet, just start the metric journey, start collecting all the data. It's better to have something than nothing. Get off zero. And keep these recommendations in mind or come back to the chaos project later. And I'm going to hand this over to Matt Germanprey to talk about how the chaos project can help you. Thanks, Georg. Great. Just to kind of reiterate a few things. So thanks, Gary. This was great. So the chaos community is not just a single super expert, but we're a collection of people with an interest and a passion to think about open source community health. So we encourage you to come and be part of, part of that team, right? Be part of that, part of that group that's investigating this. And together we can start understanding this complex issue in more detail. So that's really important. And as Gary pointed out, one of the real key things in the chaos project is we, we don't just develop metrics for metric sake, right? And so one of the things that we've learned is that there's kind of a push to develop metrics just right out of the gate. But they're not provided kind of that goal question metric. We don't know what the goal of the metric is. We don't understand the context. So we take a considerably more systematic approach towards developing metrics, which is kind of first thinking about why we might even, what we need to understand in the first place. And from there, we have the development of metrics that can inform that. The chaos project also works really hard as Gayorg mentioned on practices. So we work to draw out hard to see metrics. So some metrics are observable in trace data, like get repositories, right? And others are considerably more difficult to see. So for example, around our work in diversity and inclusion. So not only does the chaos project develop software to help bring those trace data metrics forward, but also practices or processes to bring those, I'll say, non-trace data metrics forward as well. So we don't want the metrics just to live in isolation. We don't just develop those metrics and say, you know, done. And so we work on these things as well. And then as also Gayorg pointed out, context does matter, right? So it's challenging in the chaos project for us to ever say, and we won't, that this is a healthy project and this is an unhealthy project. That's not the goal of the chaos project. So our goal is to give you the tools by which you can then make that assessment locally as an important decision for yourself, right? So context does matter. So, right, if we're just thinking about Red Hat, Red Hat has probably a few open source projects they care about, right? But I would suspect that they are not all understood in the same way, right? So a metric in community A might have a very different impact than that same metric in community B. So it's really up to the individuals in their local context to understand and think about what those metrics mean. So just wanted to reiterate some of those points and we can talk about those more as well. So there are a lot of ways that you can connect with the chaos project. So I'll give a nod here to Gayorg and many other people, but they've been really making some great advancements on the chaos cast. So this is a weekly podcast from the chaos project that brings different people from outside of the chaos project, sometimes internals of the chaos project to think about what open source community health can mean to them, right? And so it's great. It gives you a look under the hood at the project itself, but it also gives you a look as to how the work is having an impact beyond just the project itself. One of the things that we think about metrics is building blocks. So at the moment, we have 40 some odd metrics that are released as part of the current release. And the expectation isn't that you would use every single metric to analyze community health for your particular interests, but that you can assemble those blocks accordingly in your own context. I do think one of the things that we're going to think about in the chaos project is as our number of metrics continues to grow, how do we provide appropriate filters so that those building blocks are put together in meaningful ways? So that's one of the things that we're starting to take on right now. The software that Gayorg mentioned can be really an important way for you to start seeing this trace data in kind of a time history. So you can start looking at it over time. You can look at the evolution of things. And in fact, you can start even comparing projects between each other. So if there's an aspirational project that you might have an interest in, tooling can help kind of reveal how an aspirational project is performing and how you might want to think about your own activity within your projects. And then the chaos community, I kind of mentioned this, we're not just a single super expert, we're a collection of people trying to understand what's going on in this space. And the chaos community can serve as a place for you to ask questions, for you to advance your own initiatives. We're very open to ideas from all different directions. Next slide. So get involved at the podcast, right? Take a look at the metrics that we have. So if you just go to cas.community.com, you'll see the current release. We're going to be doing a new release in about six weeks from now. Contribute to the software, as Gayorg pointed out, all pretty accessible and pretty easy to contribute to. And then we have weekly working group meetings. So each of the working groups that Gayorg had listed earlier has weekly or bi-weekly meetings. We have a weekly community call. The software groups have their own calls. And we have an Asia-Pacific call as well that meets every other week. Join our mail list, nice activity on there, and let us know what you're thinking about community health. So together is now replaced with the chaos community. That's us, right? So help us build this better, stronger, faster. Help us build a better way of understanding community health. We're listening. We're not just pushing our ideas, right? We're trying to capture the ideas that people have. Because a lot of people have thought about this for a long time. Next slide. And so I think we're going to be moving into a live chat. Is that right? Right now I'm not sure how you want to proceed with this. Right. Yeah. So thank you for the great presentation. There are a couple of comments now in to each I can see. So then I will start with you the question. So then we can start with the open discussion and ask me anything. So anyone here in the chat to each or in other platform, please feel free to ask anything. So specifically the question we have is and I'm literally reading here the way I do analysis and visualization doesn't follow this approach. This is focused on having all of the data in one place. And this is what do I need to so the question he has is what do I need to measure and can I measure it? So that's my first question. So it says grabbing all the data first before I have a question sets up an expensive game board for gaming. So I don't know if you have an opinion or comment. I first of all want to welcome Vicky. Hi Vicky. Thanks for joining us. Hi Vicky. Happy to do so. Yeah. Thank you Vicky for for joining. Well indeed now Kakar here the the one specifying this. It wasn't a question I have an answer to that. But in any case go ahead. Okay. Yeah. Go ahead. Go please. So the question of what to measure is really a question of what is your goal? What do you need the metrics and data for? And without knowing that first we can we can make a recommendation of 40 plus metrics that we have in the CAS project right now. But without actually knowing why you're looking at metrics in the first place it's hard to narrow that down and not drown in data because once you start opening the data fire hose you'll there's so much to take in. So that's it's really hard to give a recommendation here. That's my opinion. Well and I think something that people first of all is my microphone on it does appear to be so excellent. So one thing that people I think forget when they're considering data is that they already have the data it's simply not assembled the vast majority of data that you are going to be using for this metric these metrics it's generated right you can gather it from various sources on the fly and yes it will take time to do so but you are creating it through the acts of your community and so you don't necessarily have to flip the switch and start assembling it right then. So if it's you are worried about having data overload right and getting just buried and too many data just wait and take that time to figure out what you actually do want what are your goals and that's the primary problem I've seen almost every single project hell every single company I've worked with is they don't answer the question what are we trying to accomplish here and if you don't have that answer then you can't see whether you're even on track for it right if you can't answer what success looks like then you can't identify failure so if you don't figure out your goals obviously you're never going to pick the right metrics right you're just going to be floundering in this sea of data and you will die that way and that's just not a good way to go through your world. I'll add one thing as well Georg had mentioned a phrase that we use in the chaos project quite a bit is just to move off zero so the chaos project doesn't have some magical set of metrics that is the perfect set that is the end of the line if right if you can measure and understand all of these metrics great so sometimes it's really just small steps I would say which would help with this reducing the sea of data that's out in front of you just ask questions about issues ask questions about poll requests very simple questions and see if they help you make more informed decisions over time. Well one of the things I found very helpful and it sounds like a pitch but it's not because I've legitimately found it really helpful is Vitergia Launch Cauldron earlier this year cauldron.io and the public dashboard for cauldron.io does not have every metric under the sun for a free and open-source software project so for people who don't know cauldron.io is a service which you can use for any free and open-source software project to gather data and metrics and display them to people and just sort of track various metrics over time and it is just absolutely brilliant it's super useful but it does start from just those sorts of things that nearly everyone's going to care about right what's your number of contributions over time for the companies or where people coming from to contribute to your project right even those bare bones basic things are incredibly useful and as Matt said you know and and Georg get off zero the way I phrase it is baby steps are still steps right so even taking one tiny small movement to gather something or to get something going can help to set the mindset of the entire community around paying attention to the these numbers in some way and um and cauldron has really helped a great deal with that as I show it to communities I'm like oh wow we can actually see this stuff a bit I would say I love it yeah go ahead Georg you were gonna say oh no I was waiting for you to ask the next question and if not I have questions as well to discuss oh yeah having having questions great so I just I just wanted to share with you so um I think it was in some oscon years ago we were one of these um I don't remember where exactly but there was a question like what's what's the metric that mattered to you and we were an audience of 30 or 40 people and we have like 20 something different metrics um so there are two lessons learned here the first one is take care with the metric that you are kind of revealing because perhaps you have not considered the specific cultural goals or business goals or community goals that you are trying to reach so that's something to first think on on those high level stuff and then forget about measuring things for the pressure of measuring things because that's that's the other lesson learned because oh I I'm basing my my decision making because I'm able to analyze commits but not for instance the poor request or diversity and inclusion in my community and because I'm not able to analyze that then I'm basing my my strategy on this data but doesn't make sense because then you are basing in some kind of bias data which is kind of the previous discussion we had like are we using the proper data sources so the question I have for you is if you feel that we are missing some important data points in in kind of the usual chaos community or analysis that we are running either in cauldron or any other tool that you may use which starts I can start so one of the things that we're looking at in the value working group and Karen alluded to it was the ability say for example of a project to have social impact or positive social impact in the world which could be particularly important in today's state of the world how do we go about doing that is a that's a very difficult question and so what would what's the goal what are the questions and what are the metrics so the data doesn't just jump out at me as to what that might be so I I think that in the chaos project though through talking and listening we can help figure out what might be that meaningful data so I do think there's some data that's perhaps more evident as evident through say GitHub repository and some which is harder to track down it's not impossible but we have to think through carefully what that data could be so yes certainly I'll say one other thing too is as we look in if we look in in the say corporatized open source space the data that's available there might be very different than the data that's available say in a scientific software space and so what that data is and what it means can be can be very different things so the the provenance of that data and and and what that data can mean for for an ecosystem it it can be very different in different spaces and I think it's worth kind of reinforcing something that Daniel mentioned which is I've seen frequently people who kind of let the cart lead the horse so to speak you know we have this data and so therefore this is what we're going to use to make all of our decisions rather than figuring out what decisions we need to make and then what data do we need in order to do that and it's I understand the desire you know but just because all you have is a hammer does not make everything a nail so um there are these things called hardware stores where you can go and you can get other tools that will allow you to get your job done appropriately same thing for your metrics and your data right um and it does make sense rather than just using your one hammer allowing that to guide what sort of house you build go out and buy yourself a decent you know table saw and you know maybe when those really cool pneumatic nail guns and things like that right there are different things you can use there and that's something that I don't think people pay a lot of attention to um it's not the exactly exactly the question that you gave to us Daniel um but I thought it was worth reinforcing because it is a problem I see a lot of companies and a lot of projects make if they care about data at all or if they pay attention to data at all which is really the first hurdle getting them to to recognize that this is something they should even pay attention and so that's a question I would like to put to um your Matt and Daniel since you're here right um how what are these strategies you have found to work to help to um evolve community thinking such that they will pay attention to data and do it appropriately rather than just like any data in the storm sort of thing right in in your metaphor chaos basically is the hardware store then yeah yeah I like that um so how how do we how do we look at the data one of things just to talk about the the origin of data Vicky had pointed out that a lot of data is available just from the traces of our communities but we need to understand why those traces were created in the first place and what shaped that and this is something that jpeg talked about at the end of the previous session that we need to understand how the data was created before we draw conclusions from it understanding the context and the community is super important and then there's other data that may require a survey or manual data collection or completely new data collection methods and then uh well how do we get started with metrics how do we get people to care is something that we are we're thinking through in the chaos project and one of the things that we've started doing is having user groups where we have so we started one for the app ecosystem after scale and so these are folks from the GNOME foundation from the KDE foundation and chaos and we get together and we're like okay so we want metrics we know we need them and what what is it that we should recommend to the people who are in different roles within our foundations um and the app ecosystem working group is really focused in this case on managing or overseeing many different communities because it's a different use case from a company or from a single project so that came to mind but I don't think it solves the problem of how do you get people interested in the metrics in the first place I do think one of the things and hopefully this is working um in the chaos project just even with the different working groups so having one working group focusing on say d and i one working group focusing on risk one on evolution right off the bat that helps kind of segment areas of metrics that you may or may not have an interest in um within each of the working groups themselves so for example within d and i we have focus areas and so one focus area in d and i would be um say event d and i or event based d and i so and then the metrics that would help reveal d and i with respect to events so again that helps localize a series of metrics that can provide insight into a very particular area so we've taken time to kind of structure the chaos project but it's not just a grab bag of metrics but but structuring them to be meaningful as a collection yeah and um I would like to to to answer here as well so first I think it's it's been really useful to to have everyone in the discussion uh at the same time in the same place or virtual place because that helps to have a feeling of ownership when you are defining metrics and instead of working at the level of individuals try to work at the level of teams so then you are in a team or in a community and then you are trying to improve by unblocking certain and desired situations or looking to improve bottlenecks etc etc so if we all define certain metrics we are bringing uh to table all of the different points of view so we have a voice there defining this and we all know that if there's a way to to improve this or there's we need to tune the metrics and somehow we all have a voice again so we can keep iterating those metrics course metrics from time to time we need to improve them or you know elaborate them a bit more or simply dismiss them because they are not useful anymore but those are those are a couple of interesting things uh the other thing is uh context and skills and and with this I would like to bring back Diane Diane I hope you're around uh because we are kind of running out of time but I would like to to bring her because one of the things or the the lessons learned that we have we have together thank you hello Diane is uh this difference between context knowledge that in this case is Diane and certain data knowledge that is my my part so when we've been working together it's really really important to avoid pitfalls or or similar stuff to have this context knowledge or domain knowledge and then the other knowledge might be uh more data related to the staff so where the data is stored and everything so those are my two cents here I I'd agree with that and putting me on the spot a little bit but I think the and and you know going back to the previous talk as well is that um having some domain expertise around the data and understanding the shortcomings of the data as well um has been um has been a key part in some of the research that Daniel and I have done um on the CNCF community as well and uh things you know things that are missing in our data that we've we've noted and haven't quite figured out the workarounds for but we're working around it's like um you know diversity information around the data how to identify that how to do that um securely and privately within you know when people aren't sharing that information that's um a big you know it's it's an issue for us to better create diverse and inclusive in communities but it's also a privacy issue as well so um like we have de-anonymized the organizations for a lot of the people that are participating in our OpenShift Commons work that hasn't happened in all of the other ecosystems as well some work that we've been doing in um COVID up here in Canada they don't have any diversity background information on the patients so they're not getting the breakdown by race or ethnicity or um you know all kinds of other things because they don't actually have that information and that's that's I think one of the bigger issues I think with a lot of our um kind of dependency on people self identifying um and and that's a good thing people should have we should respect their privacies but it's also makes it harder to understand all the dynamics of the data and to not let the data influence or drive our decisions if we if we know we don't have that information I think that's one of the key learnings um from from the work that Daniel and I have done and I think the work that all of us have done and and what um jave and demigate have um reiterated earlier today so I think that's that's key and I don't know the answer to that um I don't think any of us do I think we're all in this um conversation together about how we do this and bring in other other um means other other hammers so that we're not you know just hitting nails and screws and even if you're up here in Canada you have Phillips and Roberts and all kinds of different screwhead I mean it's like there's such diversity in in our communities but we're in some ways prohibited from um exposing that information for good reasons um but it still makes it difficult so I don't know how you guys think about that and address that um but I think that's also another whole day's worth of conversations so on that note maybe if you guys have some final thoughts um add them in mine is simply a thank you thanks for letting us talk about this this is great yep we appreciate it yeah well thank you because really the whole impetus for doing today was um the cancellation of chaos con and um Daniel and I having a conversation about how important your work was and um that you know we really wanted to continue that conversation so we'll we'll do this again and often so thank you very much guys for all of this