 So it is two o'clock, and I'm German. I like to start on time. It's in my DNA, I was told. So thank you for coming today, for listening to this talk, and then having the conversation that we just agreed to have. The topic is building and supporting open source communities through metrics. And as a primer, a few thoughts up front. We know that when we are looking at open source, a lot of it is about the use of source code or the binaries, or someone is using the software that we're doing. It's also about sharing. That's where the whole freedoms come in around open source or free and open source software. And then collaboration, having open source as a way of building software together. And through the advent of the internet, we were able to do this at scale. And this is where today we are in a situation, depending on what survey we look at, we have 70 to 100% of source code in organizations or software that has some component that is open source. It's gone very viral, if you wanna use some modern term here. And then in the to-do group survey, from last year, we found that investment in open source still is expected to increase. There's still more and more coming. So we're not at the end yet. And we know also that as we are building out this open source and building this massive collaboration and bringing more people into it, there are some failures along the way. We've had Heartbleed many years ago already. Who still remembers Heartbleed? Yeah, it was the first time that we had an open source vulnerability and there was a logo for it. And now we have had more recent, well, Strud's Equifax was, what was it, six years ago now, where 40% of the United States population had their personal data stolen. Huge thing just because Equifax didn't update when the patch was available. And log for J or log for Shell, that was kind of ruining everyone's Christmas two years ago. So more and more we need to think about the collaboration and how we are working together to then also address the issues that arise. But as we are looking at open source and the communities and try to understand, okay, so which projects are healthy? Which ones have good ways of engaging? We also know every project is different. And the report from Mozilla, now already four years ago, they identified several different types of projects and they have different governance structures, they have different goals, they have different ways of working together. So we cannot just use the same metrics everywhere. And that is the premise of the talk or this conversation. For those of you who just joined, we had a conversation before we started that today we don't want just to talk, we want to have a conversation. And so I'm just going through a few of the slides I have as a primer to set up the discussion and then we can go around and have a conversation all together. So using metrics when building and supporting open source projects as a background, I work at Biturgya, we've been doing this for 15 years. We are maintaining the Grimoire Lab tool set that is an open source platform for collecting data, for analyzing open source projects. We are the official metrics partners for open infra and numfocus foundations and providing trusted data. And we started six years ago, the Chaos Project to build a community and I have little poker chips for you all with the Chaos logo on it. So make sure before you leave that you all get one of those poker chips. They are collectibles that we print each time we have our event and on Tuesday we had a ChaosCon event. So I have some extra chips for you. So this is just to introduce the topic of why we are looking at metrics. As we are building these massive collaborations we can no longer be in the community and know what's going on. We need to rely on tools and to look across many projects and ecosystems of projects. So we have a framework within Chaos of Metrics. And I'll go through this real quick because there is more than 70 metrics that we have defined. I just want to introduce you to the way this is organized. So when you go later and want to find metrics or find some resources you know how to navigate that. Chaos stands for community health analytics open source software. That's the open source community we have here at the Linux Foundation. And we had four stakeholders in mind. Everyone has different goals of what to look for in the metrics, why to engage with metrics. The contributors finding projects that they think are good for them where they can have impact. The communities want to know how to attract members how they're doing. The companies may be looking at mitigating risk or looking at where they can themselves be influential and participate in the innovation as they're building their own products and services. And then foundations want to host projects and show to their members that they're doing a good job with that or identify where they need to maybe put extra resources and extra efforts. So that's why we created Chaos. It's a matter of having standard metrics, having a common language around what are we actually measuring so that we can all talk together and compare notes. Before we started Chaos, everyone was looking at the same metrics but calling it something different. And do we count an empty commit? Do we count a merge commit? There's a lot of questions. So we're building out software that actually does this and then informing our standards. And we have two projects. One is Grimoire Lab that I already mentioned. The other one is Augur. They take different approaches at this. So there's some choice. And then we built out the metrics around risk, diversity, equity, inclusion. There are some common metrics. And then we built out metric models. The work is done in working groups that are focusing on particular things like each of those metric categories. Then there are context working groups. I'm just showing you a little bit about how this community is working. So you have an idea of where this comes from. And this is where the metric models are created, where we put together the metrics in specific use cases. And then there's supporting work that is being done in the Chaos project with maintaining the website and hosting a podcast and running events and so on. So that is how we create the metrics, what's all provided. And then we have a community that is growing globally. We have active community in Asia, active community in Africa, North America, Europe, and we continue to expand as well. So the metric models, that is something that has come up in the last one or two years really, where we are combining the metrics in specific use cases. And I'll give you one example here. One is community safety. And so each metric model has three parts. One is a description of why it matters. Why do we care about this collection of metrics? One is about the user stories of specific use cases or different perspectives that we take on this metric model, this collection of metrics. And the third element is metrics that are included in the model. So this community safety, and this is an abbreviated version when you go on our website and you find the actual, one there's a much longer description of everything. So community safety is about being mindful of the systemic bias that exists in society where some people have a privilege of being, they have more time, they can be online, they can participate in open source, whereas others, they don't have that technology access or they don't have the support at home, they have caretaker responsibilities. So that's cues who is already more likely to be in open source. And for those that have a harder time to be here, you want to be mindful to create an environment that they actually want to be part of and like to be part of. So nonprofits might use these metrics to decide what open source projects to sponsor, where to put efforts, contributors may use these metrics to decide this is where I want to be, where I want to make an impact. Maintainers want to maybe track the progress as they're really trying to improve their community. And so some metrics or these are the metrics in this model, one is psychological safety, one is does the project have a good code of conduct? How good is it if it hosts event? Is there a good code of conduct? We have inclusive leadership. We have a board council diversity. And we also want to be mindful of burnout and keeping the contributors healthy. And each one of those has a definition and how it can be measured and so on. So this concludes the first one third of the talk that I had prepared. And the next third is about how to look at metrics, how to interpret them in supporting and growing open source communities. And then the third part is about the technical and organizational challenges that we face. Now, we were talking earlier about just having a conversation. I'm happy to pause here with the thoughts that have come up from the first third of the talk. And I'm happy to pass a microphone around. If anyone has thoughts and wants to share something, I think there's an on button here. Yes. My question is, so I can see how you could check for like a code of conduct pretty easily. But I was wondering if you could say more about how do you actually measure things like burnout and psychological safety? Does that make sense? Yes, it makes sense. And I don't have the answer in my head right now. So this is where, darn it. Let's see if the demo gods are in our favor. Let's try to find out together what our definition in the chaos project says. Wonderful. So this is how I go about doing this. And I admit that the website has changed in the last year. And so I sometimes, you know, feel like I'm a newcomer myself trying to find this. Let's see, I'm trying to open. I asked this question because I know how it's so hard to quantify what are these things. So I'm really curious if other people have ideas or you have experience trying to capture that. All right, so if someone wants to measure something and goes to the chaos website to find out, okay, what are the metrics? How do we actually collect the data? We have here at the top metrics. And this is organized in, there's, it's a knowledge base. So we have WordPress and there's knowledge base with different cards and so on. Let's look for burnout, metric project burnout. And so each metric is, has a template. We have here what it is, what question does it answer. Then a description, objectives of why we are measuring the metric and then implementation. This is how we can collect the data. And looks like here we have some survey questions that we can use when we run a community survey. So these are some ideas. That's really helpful. I guess my follow up question would be if you have advice or experience on reaching your population that you're concerned about. Like if you want to assess burnout and you want people to complete the survey or inventory who might be burned out. If you have any experience with that, I'm sure it's a hard thing to get at. Yes, so we have run surveys and we have selection bias. That's the problem, especially with burnout. The people who have been burned out, they're most likely not following the mailing list anymore. They're not following our Twitter account, our Facebook, whatever we are. So it's really hard to get a hold of them. What we can do with metrics is go back in history and see who was active half a year ago, who was active last year, but no longer makes any contributions. And then we can go and say, hey, thank you so much for your participation. We really appreciate you being part of this community. We would like to know why are you no longer here? We have noticed you have become inactive. And with that approach, we can get some feedback and we can actually learn, oh, you became a father. Wonderful, thank you for being the time I understand now you have something else going on or job change or burnout. But it's only because we have the metrics that we can go and find out who we actually want to survey. Yeah, so it sounds like part of it is that you can even use the metrics to find who are the people that you want to talk to and then maybe take a more individualized approach to trying to reach those people, which makes sense. So yeah, that's great insight, thank you. Yeah, thanks for asking, Emily. Yes, over here. I think it's turned off, maybe. Yeah. So I actually kind of have a follow-up question on the same topic, and I apologize. I followed chaos many years ago, but it was very new and mostly just to read me on GitHub. It seems like you've progressed quite a ways, but have you gotten to the point where you actually have examples of having gone out and measured these things on an open source project that we can see and say, okay, this is actually, we went out and decided we want to measure all these metrics for a certain open source project. This is how we did it. This was the results, have you got that example anywhere? Yes, we have examples. So it's not the chaos project that goes out and does that. The chaos project is a community of practice, but we have members who have done this. They have run surveys, they have done analyses, and the next part of the slides is examples, not just necessarily from the chaos perspective, but different communities and the kinds of reports and metrics that they have put out. So yes, there are examples. If you have a specific question that you're curious about, maybe we can dive in that. I guess mostly, when I was looking at chaos, like I said, it was like four or five years ago, it was really hard to see how you get from kind of the concept of having the metrics to actually taking a measurement and saying, okay, here's where we are, we can now evaluate how healthy or unhealthy things have gotten. And as you say, my own interests have been more on the, as a consumer, how do I identify that maybe a package needs extra investment right now or things like that, which makes it a little bit more awkward to do things like do a survey because I'm not necessarily deep inside that community. Because that's the decision I'm trying to make. So I was just kind of curious, in that sort of use case perspective, where we've gotten and how we're actually making these measurements basically. So it's work in progress, right? One of the things that, yeah, I don't have a good answer ready. There's still a lot of learning that is going on, still a lot of challenges that we are facing, especially when we look at adding metrics to like a package manager, where we can then have for all the packages, messages. We are in conversations with many different stakeholders in the ecosystem to try to get that. So we can do it for like, if you're a company or a small group, these are all of the projects that I care about and I have my own set of metrics that I care about, but doing it at scale for everyone in the ecosystem, I don't think we have a good example quite yet. Okay. Yeah. Here, you get a chaos token. Thank you for asking and you too. When you're developing the metric models, is there a discussion about balancing gathering as much information you can and also the privacy of the members of your community and like what metrics should you not be gathering? What should you not be keeping stored in your database? Yes, that is very much something we are considering, especially with what there's also laws that we have to follow with GDPR and privacy and making sure we don't collect too much. There are a couple of thoughts here. One is all the data is public anyway and anyone can collect it and analyze it. Then there is an enrichment or we do more with the data, where we go in and we say, you know, I know Michael works at ARM and then I go in and change it even if the data showed something else before and I start building more knowledge around my community. And when we get to that level, it's a question of, okay, how do I do it transparently? So everyone is okay and trusted. The community is behind this or do I do it in the private? So I don't expose it. And then we can go even further, especially with the surveys, where we might ask gender and sexual orientation and some other demographic information that we are interested in. We would not share that data probably, especially not the individual answers, but only aggregates. And that's where we then say, here is the report, but we are not actually allowing anyone to backtrace and then de-anonymize the data in the end. So there are lots of questions and challenges along the way and yes, we are discussing all of those. Yeah, thank you. All right, I'm holding the space if anyone else has something, otherwise I can continue in the slides. Okay, wonderful. I love the engagement. It's much more fun to be honest than just going through slides. Just a very quick question. With these metrics, what you're saying kind of sounds like how we might look at cohort analysis for a SaaS company, like as people onboarding what's their activity, has anyone taken these metrics and wrapped them in a service? So I could have this running automated, be it a cron job or be it a web app or something that could look at the amount of contributions using your use case. This person did this much contributions, they're trending off transactional email goes out or something that could be, how can we create these tools to maybe semi-automate some of our community management or health checking? Is there something that's popped up in that ecosystem? So yes, there are services where you can just plug in your projects. So, Petrgia, we have one called coldrun.io. You can go in and you get project health metrics at a high level. We don't expose the specific contributors like you said for GDPR compliance reasons. That is only if you have a private instance because then you can more protect the data better or choose what to do with it. There are other services as well that do this in chaos we have the Compass that was created by a team in China, Huawei I believe. They're really focusing on the GTI ecosystem. And we have Augur, they also spun up a system where you can go online. I don't know where they all are, but yes, there are several services where you can plug in your projects and start getting these metrics that chaos has defined. Yeah, here, that's a poker chip. Thank you. And Michael, you also get one of course. And you had another question here. I was scared on Tuesday, I already have one. Okay, good. I don't have any questions, I have two remarks, one of them because I've been looking at chaos for some time and I'm attending as much as I can the conference and everything. I'm part of a fairly, let's say small community about 30 or 40 people and I've always looked, I'm going to try chaos, but I didn't get the time and kind of my remark is on that side. The first one is, I think it's also depends on the, let's say the community culture. I come from Romania and there's a bad culture on metrics there, meaning that people are going to violate them at every point in time. You just give them a metric and there's going to be some sort of bias, I want to reach that point. And there's a saying, it was from someone, I think in England that said, when the metric becomes the goal, it's easy to be a good metric, right? Yes. It's part of the community culture to consider metrics as kind of just ways of measuring not actual goals because if you're setting your goals on the metric, it's going to be quite weird. And the remark I have is, given kind of my personal experience, I'm a bit afraid at this point of using metrics because it's already difficult to kind of manage the community because there's a lot of volunteers and people are doing their best. And I feel that if I get some metrics and I know some things are pretty bad, these will require actually doing something about them because you're measuring, they provide you some problems. And I'm not sure if we have the time for those, right? So I'm afraid that at some level for not so mature communities, having metrics may actually put extra pressure on the community to solve them. I mean, we know we have poor PR integration, poor issue replies, but I think if I kind of highlighted all those issues in the community, people say, come on, it's too complicated. So I think there should be some sort of correlation of using metrics with the level of maturity of the community, I'm guessing, just to be able to digest those metrics and actually put them to good use instead of using them as maybe some sort of negative pressure that people stressed about them. Absolutely. You make an excellent point. The metrics are not an end in themselves. Perfect. The way, so I want to offer a different approach here. Instead of saying I'm going to start doing metrics and now I have all these unintended consequences with people gaming the metrics and feeling pressured and I'm revealing problems that I knew, but now everyone sees it, start with what do I actually want to accomplish as a community, set a goal, set some, I want to improve in some way. And then we can go back and, okay, how do we know that we are actually moving the needle? How do we know we are actually making an impact in being better with our pull request workflow? And then you can choose the metrics, put it up, and if people start gaming it, now you actually are getting the outcome you want, right? Because you chose the metric intentionally for that purpose. And after half a year, maybe it's so ingrained in the culture, you throw out the metric and you don't care about it anymore. And now everyone is like, okay, it's just part of who we are and what we do, we are now really quick with first responses and everything is merged within three days and we're all happy. Now we have a different set of problems, we have a new set of goals, now we select those different set of metrics. Yes, start with the goal, the goal question metric approach, that's what we call this. Yeah, all right. Well, let's take a look at some examples of how metrics have been used for supporting and growing open source communities. All right, here we go. So this I've structured in a way where we're looking at a metric and then how it's been used in a specific context. So these are examples from reports that are public out there from open source communities and we're looking at three different kinds of metrics. Some is about the people, the companies that are engaging. Some metrics are about the processes in the community and like what we were talking about with the workflows and I'm blanking on the third one. We'll get there, we'll get there. So this one is about global and inclusive community and as we are trying to build out the community and wanting to accommodate everyone, we need to be mindful of time zones and language barriers and different cultures and there's a lot of things. Here, do you want the coin before we head out? Do you want to catch? Okay, so different metrics that can show us that hey, we are making progress towards this is to see what is the engagement at the global level and so from the WordPress community here, they are looking at how many languages has the software been translated to? How big is the community worldwide that is supporting our software? Then there are local events, these word camp events. How many of those? 128 across 48 countries and we sold 39,625 tickets. Local meetups, 4,379 meetups across 73 countries. So as we are building out the community, some success metrics that we could be using is that engagement worldwide. So that's one example. Another is new contributors and contributions which can give us an early indicator of the health. So as we are being a healthy community, do we bring in new people? Because we know people also leave, right? So we need to make sure we have that inflow and there's a magic report where they were looking at this is how many new contributors joined the community and we can see the community was kind of small, then it started growing and then it kind of became not so actively getting more contributors. And we can see this in the level of activity also. There was downhill and the community or the company Maltic said, hey, we need to do something about this and invest in the community, figure out what's going on. And they made some changes where at the end here you can see it starts going up again. And in their report, they said that they achieved this by establishing a solid foundation for growth by changing the processes and enabling the community. So looking at new members coming in and the activity and contributions in the community, we can make the choices that can then lead to the results we want to see. So this is another example of how this metric has been used. When we look at ecosystem growth, the example I have here is very close to where we are today. The Cloud Native Compute Foundation uses this in the report to really show, look how big we are. See the growth. You want to be part of this activity and all the awesome things that is happening and there's more and more happening. You should be part of this. So we see here the number of contributors is growing linear. So there's no end of growth inside yet. At some point, there's going to be a natural growth. So it's going to... It cannot grow like this forever. We all know this. The same is true with members. What we see here with users, the red line, it's already starting to be a little slower in growth. So what this tells me as I'm looking at the data is these end users, the companies that are using the CNCF technology, they are still putting more and more projects, more and more of their employees into the projects that are part of the CNCF. They are moving more and more of their innovation into the foundation rather than growing the foundation by bringing in more companies that are using the technology. So there are some insights that can give the foundation a direction for how am I addressing the members, how do I talk to them, what is going on here. Organizational diversity is something that a lot of projects care about to distribute the risk in the project and the technology to make sure there's not one overlord, one company that controls it all, but also to make sure that if one company goes away, there's others that are picking up the slack or can continue the development. So I as a user am not as on the hook if the support goes down. The Drupal community, this is a report, the most recent one I could find, was looking at the contributions versus the companies that the contributors were coming from. And here the interesting thing is we see that the number of individuals is going down, but the number of companies stayed somewhat the same. And this was at the beginning of the pandemic. So we see there is a drop-off in individuals, but my hypothesis is looking at this, once business starts picking up, the pandemic is over. I'm waiting for the next numbers to see if I'm right here because the companies are still involved and then they ramp up their engagement again. So from a health of a community perspective, to me that's a good sign. Another way to look at this is the Qatar Containers community in this view filtered out the original founding companies. So these are all companies that joined after the project went open source and really had the intention of growing organizational adversity and they're pretty successful from adding more and more colors. Each color is a company. So another perspective is looking at the workflows in the community, looking at change requests. These would be pull requests in GitHub or merge requests in GitLab or change sets in Garrett. When we look at Starling X, we can see how long does it take for change requests to get merged. We see there's a dip of activity during the pandemic, but we can also see that this dip is not that worrisome at all because the engagement, the review efficiency index, the community is consistently making good progress in their work. So when it comes to the experience of the contributors, maybe there's a lower level of activity, but the engagement of getting work done continues to be healthy. So context always matters when we look at these metrics and sometimes a metric shows us something like this is worrisome, but then we talk to the people, we look at other metrics and we can see there's more to the story and the metric is just a way to ask more questions and get to know the community better. A really interesting one is looking at events and seeing how events can help energize and activate contributors, which is something we've talked about for a really long time. And Mozilla has done the analysis where they have tracked, these are the contributors that have come to certain events, and then let's look at the projects that we have and see where they actually show up in this network of activity. And the red dots, those are the contributors that came to the activation events and where did they show up in the middle of the network? They're in the center of the activity, they're participating in multiple projects and the activation events actually work. That's what we are seeing here. So these are some examples of looking at metrics and having an intention behind growing and supporting our open source communities and what are some things we can look at to see if any changes we are making are actually working. So this is the end of the second segment and does anyone know how much time we have left? It's 2.40. I think we are right at time. So we can... So what? So there's a break now. I'm happy to zoom through the rest of the slides talking about organizational technical challenge if you all want to spend two more minutes just to get some impulses, yeah? Okay. So when we look at implementing metrics, we have some things we need to decide on the right metrics. We already talked about that. We need to know what to do with the metrics, know if it's good or bad, be mindful of the personal identifiable information, and then we start with the technical challenges. Where do we actually get the data? We need to know where our community is and then collect the data. It's raw, different date formats from different data sources. We need to clean that up unified and make it useful. Collecting the raw data is almost the easiest step, although on ChaosCon we just had someone up front who said they spent six years improving, collecting the data and making it where they are now trusting the data. Six years of improving their tools because there are so many oddities in the data set and then different... There are so many different things. Enriching the data, aligning all the data, making sure it's in a format that we can use, managing identities, people use different usernames. They use different email addresses. Who knew? I mean, things like that we need to think about. And then calculating metrics. So there's some pre-processing that we can do and making it useful. This is where we think about who's actually consuming the data. What message do we want to tell? What story is it that we want to support or get evidence for or actually show, hey, note that. What you're saying is not true. We can do all that. There are some tools. Here's also the reference to Coldrin.io, GrimoireLab and ARGAR as Chaos Projects. And then I'm going to leave you with a few lessons learned from the Chaos Project. And then I'll give you these tokens if you want some. One is to start using metrics. Start small. When you have metrics, they lead to more questions. They allow you to think about the community in new ways and then you will change what you're looking at later. But be mindful of the context. Don't just assume because the graph goes up that everything is good or bad if it goes down. And be transparent with the community. Often collecting the data is not a problem. But if the community finds out later that it has been done and they weren't notified, they can get pretty upset. So anyway, I'll leave you with that. Thank you so much for coming and being part of the conversation today. And here are poker chips and feel free to have a good day. Thank you.