 Welcome everyone and welcome to this presentation about building and supporting open source communities through metrics part of the open source amid Latin America 22. I'm Emilio Galliano marketing specialist at the third year. And to talk about this topic here's with me. Giorg link our director of sales. Hi everyone. So let's take a step back and consider why we are all here today at the open source amid Latin America. The core of open source we care about using sharing and collaborating in the creation of software with its roots in the free software movement and ensuring the right of software user. Open source has evolved from being the realm of volunteers and hobbies to the enterprise collaborative software development has taken on a new dimension in the last five to 10 years. Today open source software makes 58% of software in the enterprise. In fact, 63% of companies in 2021 survey indicated wanting to increase their use and engagement with with open source. It is now acknowledged that open source is present almost everywhere and forms a digital infrastructure we all rely on. The heartbreak incident really really elevates the awareness. The strats equip us the buckle that is post million of US citizens. Personal information also had open source software at the center. They use Congress as the open source community about how to avoid these future issues. And then the US issued a directive mandating more software supply chain security. So to address this challenge, we need to understand how open source software is built. This typically involves an open source project. There are different type of open source projects. As an example, the Mozilla Foundation released a report in 2019 on the different type of open source projects showing that each is created for a different reason. As different governance chooses different licenses and engage users and other developers to different degrees. So our focus for this presentation is specifically on the open source projects that are built in by a community. Yes, we know that there are open source projects created with only one maintainer or that are fully controlled by a company. But we'll exclude those for now to focus on the project that have a community. In this talk, we will discuss using metrics when building and supporting open source projects. Our specific focus will be on what challenges you may face and how to overcome them. To discovering the health of an open source project and making decisions, we see that has been a huge challenge. Our company Viteria has a history of working on this issue for more than 15 years. When the interest grew, we co-founded in 2017 the chaos project as a cooperation with the Linus Foundation in collaboration between the industry, academia and open source. We are also maintainers of the open source Grimoire Lab metric tools. As well, we are official metrics partners from different foundations such as OpenIFRA and non-focus. So today the chaos community has defined more than 70 metrics and maintain software for getting insights that you need. Let's dive into these metrics so that we can understand how they are measured. First, the chaos metrics are sorted into five working groups where each of one have focus areas. The first in the group we have is the common metrics where the goal is to understand what contributions from organizations and people are being made. In this group, the focus areas are contributions, time, people, place. And one example metric we have in this group is the type of contributions that we can measure the types of contributions are being made. The second group is the value metrics where the goal is to identify the degree to which a project is valuable to researchers and academic institutions. The focus areas in this group are the academic value, communal value, individual value, and organizational value. One example metric in this group is the project velocity, where we can find what is the development speed for an organization. The next group we see is the evolution metrics where the aspects related to how the source code changes over time and the mechanisms that the projects have to perform and control those changes. The focus areas in here are code development activity, efficiency, quality, and also issue resolution and community growth. And one example we can see here is the metric of the new contributors and we can see how many contributors are making their first contribution to a given project and who are they also. The next group is the diversity, equity, and inclusion metrics where the goal is to identify all these aspects at different kinds of events. The focus areas are event diversity, governance, leadership, and project and community. And one example metric we have in here, and if using also the conference virtually we have now, is the time inclusion for virtual events, where the organizers of these virtual events be mindful of attendees and speakers in other time zones. The final group we have is the risk metrics, where the goal is to understand how a community exists around or to support a given software package. The focus areas we see is the business risk, the code quality, dependency risk assessment, licensing, and security. And one example metric is the elephant factor, where we can measure what is the distribution of work in the community. So now we see the metrics, we go with Gerb and he will explain us how to interpret. Thank you, Melio. That was a great overview of where we are today in open source and how with the growing communities that we want to support and grow, it's becoming harder and harder to know what's going on everywhere. And so that's why we started working on metrics. And thank you for showing the overview of the chaos project and the metrics that were produced there. Now, I want to go and show some examples of how metrics have been used to show, hey, how is the community doing. The first example is about attracting and retaining developers. And we have a chaos metric for this that really is about developers and other communities members joining and leaving. So when does someone make their first contribution? And then when does someone stop making a contribution? And it's normal in an open source community always have an inflow of developers, ideally always coming new people in, but it's also normal that they leave again. Where their focus shifts, they have a different job, they get busy with life in general or their interest in a specific software or library is no longer there because they are now working on something else. So we always have in a healthy community a turnover in community members. In the Mautic community, for example, you see the link at the bottom for this community report. They were looking at, you know, we want to build up the community better and engage more community members in being active. And so they were looking at this report of attracted developers. So how many people made the first contribution and through specific activities and changes in how they were engaging people, they were able to get more than in the previous quarter. We have another example from the Drupal community where Dries every year looks at, okay, how are we doing in terms of community size, who's contributing and so on and so forth. And in this report from 2020, what he found is that, yes, the number of individuals contributing is going down, but the number of organizations that they are contributing from is stable. And the people who are most active in the community have been active for a really long time. So while the total number of contributors might be going down a little bit overall the community is very stable and very well supported. So these metrics, this is the story that I'm trying to tell here is these metrics are starting points to understand the community. And then we need to put the context around it and understand more than just a single number. I'll give you another example for how we can do this. Another part of a community that we can look at is change requests. When you're working in GitHub, they might be called pull requests. And if you're working in GitLab, they're called merge requests. If you use Garrett, they're change requests, I think. Anyway, they have different names depending where you are. Developers or authors saying, hey, here is a change that I would like you to consider. And then maintainers can review it and have a conversation about it before merging it into the main branch. Now, when we look at this, let's say we did this for Starling X community and we're specifically interested in the change request duration. So how long does it take from someone making a suggestion to change the software for it to actually be merged and accepted or rejected? So how long does it take for the review process and the discussion around it? And the average or the mean time to merge is under four days, which is pretty good because you want to have an engaging environment in the community where someone who comes and makes a contribution gets that feedback while they're still working and thinking on the issue. You know, on the opposite end, if someone comes to your project and it takes a really long time for a maintainer to respond and even just to acknowledge that, hey, thank you for your contribution. By that time, the original author, the contributor, may already have moved on and no longer have an interest and doesn't even remember what they did at the time. So a quick turnaround time is important. Now, the top right graph that we look at here, this shows the last three years. So during the COVID pandemic, and what it shows is the review efficiency index. So reviews coming in and being approved and it shows a very stable situation throughout the whole time. Even with the pandemic, the project was pretty stable and how it was taking in and responding to change requests. And this is, even though during the pandemic, now we look at the big graph in the background, we see a big dip in overall contributions during the pandemic. When it started in 2020 and then went into 2022, there is a slowdown that then picks up again later. So we can see that, yes, during the pandemic, there was a slower pace overall in the development in the activity in the project. But then when we look at the experience and how quickly things were happening, the responses were good. So overall less load, but the activity was at a really good and healthy level. And that is where we need to look at one metric and then start investigating with other metrics what the whole story is and think about that. I'll give you one more example for using metrics for supporting and growing open source communities. This is from Qatar containers and the founding organizations decided, hey, we want to build a really diverse and engaged community and bring more and more organizations on that are also helping to maintain it. And looking at the last five years here, we see an uptrend in organizations. Now, we filtered out the founding organizations, so they're not showing up because if I were to show them, it would show a steady level of activity in the community. But I excluded them to really show how the number of other community members and organizations that joined after the project was open source is increasing. So this is something where there was a strategic plan to make this happen. And now we can see five years later, they're being successful in having those engagements. Now, this is examples for how we can use metrics to help grow and support open source projects. To figure out what we are interested in, we find a way to measure it, like retaining contributors, bring in more organizations, or having a really good experience with the review process. And then we need to, we can implement certain changes in the community and see how those are affecting the metrics that we're measuring. Now, when we want to do that, we face organizational and technical challenges, and I want to give you some advice on how to overcome those. When we think about organizational challenges, one is deciding on the right metrics. So what to even look at. And the recommendation here is to focus on at most five metrics. It's okay to get started with easy to measure metrics that then spark new questions. And over time, you change the metrics that you're looking at, but you get to know the community better and better each time you do this. The second thing is knowing what to do about the metrics. And this is where you want to pick metrics that are actionable, where you can actually do something about it. And there's some metrics that are that are lagging behind, like if we look at metric, like the number of releases, that's a really long time frame to influence. But if we look at how quickly our maintainers are responding, that's a faster metric to influence because we can just say, hey, maintainers, tomorrow start looking at everything that comes in new. And you bring down that metric and you change the experience people have in the project. Also, a question is what knowing what is good and what is bad. And this is really a challenging question to answer, because every community is different. So we cannot really well compare ourselves or our metrics to other communities. Part of it is they're just in a different stage of the evolution of their life cycle. Another reason is that they might be working in a different way where the engagements result in different metrics. And so comparing the metrics, you're comparing apples and bananas. So one thing that you can do is build out your history, look at your historic metrics and have a baseline. And then as you are making changes in the community, you can see, okay, are we improving against our own history? The final challenge I want to touch on is around personal identifiable information. And there are some legal concerns like GDPR in Europe is the general data protection regulation in Europe says that I don't care where you are, whether you are in Latin America, Europe in wherever. If you have personal identifiable information like names or email addresses from European citizens, then you are subject to this law. And so we all in our open source communities have very likely European contributors and have to think about how we manage that data. There's a lot more that we can go into here just briefly if we are transparent and open and honest about what we're doing with the data. We're likely in a good shape because people contribute to open source knowing that their information is public. And if we can make the case we're analyzing the data for the good of the project, then we can make the case it's a valid reason to collect and process the data. Now, let's move on to some technical challenges. The first question we need to answer is how do we get the data? So where is your community? What platforms does it use? Is it GitHub, GitLab, BitBucket, Gira, Garret, Confluence? What about other platforms like Wikis, Discourse, Mininglist, RCSlack, Meetup.com, StackOverflow? So where do you want to get the data from for your community? Once you've answered that question, you need to get the data and rich the data and make it useful. And getting the data from the data sources is almost the easiest step. There might be an API, a query log or a mail list archive. The challenge here is when the API is changed or you have different data formats. In enriching the data, that's where you unify the data formats. You determine the level of detail you want to analyze. So for example, with the Git log, we can stay at the commit level, but we could also go down to the file level. And so this is something as we are enriching the data, we need to make the decision of what we actually need to want to look at. And there's likely one thing to do from the raw data. We want to have it in a different format where we can query it to answer the questions we have and convert everything to our desired database structure. Challenge also is around managing identities. So knowing who is who in the community and who is contributed. Because people use different user names and different email addresses and across different platforms. So on GitHub, I might be someone else than I am in a wiki. So being able to assign contributions to the correct person and see their contributions across the platforms. And then finally, there is a calculation metrics where some metrics are like, we can get from the data when an issue was opened and when it was closed, but we might want to know how long was it actually open. So we need to calculate those kinds of metrics and maybe they're more complicated things where we look at quality models or other things that just are not in the raw data, but we can calculate. Now finally, we need to make the data useful. So we need to think about who actually wants to use the data. How do they want to use it? What do they want to use it for? So in what format, what presentation will it help and serve them? So what visualizations do we want? And finally, I think I already mentioned this, data by itself is not very insightful. We need to be able to tell a story about the community and use the data to have some objectivity in that story and to back up what we are saying. So how do we do that? Now, these technical challenges, you can of course spend a lot of time solving it by yourself or you can use any of these open source software that already solved many of these problems. The Chaos Grimoire Lab project is a suite of tools that has solved a lot of these problems. Coldrin.io, you can just go to that website and plug in your GitHub repository and start getting metrics. We have Chaos Augur, which follows a different approach to Grimoire Lab, but it's also really good. We have Apache Kibble if you're interested in specifically Apache projects. And then CNCF has Dev Analytics. There are others out there for open source metrics, but these are the ones that I can recommend. Now, we've covered quite a bit of ground. We've talked about how open source has changed. We've talked about why we want metrics, how we can use metrics to tell stories about communities and how we can support communities with that. And then we talked about the organizational technical challenges. So Emilio, I pass it back to you for wrapping up the presentation. Yes, so to make a final summary with the lessons we learned today during this topic, we can say that use metrics to identify where community needs help and track if actions lead to changes. Also, track metrics early and establish a baseline. Also, go for low hanging fruits, easy to get metrics, get more sophisticated later. Then present metrics in context, tell a story of the community, and be transparent with the communities about metrics. Provide public draft dashboard and publish reports. So with this topic, with this summary, we can close this presentation for today. If you want to talk more about metrics, if you have any questions, if you want to stay in contact with us, you have all our channels for contacting here. And so that's it from our part today. Thank you, Georg also, for your ideas and your explanation. And we hope to you to enjoy more about the Open Source Summit in Latin America and see you around. Thank you.