 I'm going to go right ahead, because you're all here, and anyone else can just wait. So I'm Sean Goggins. I represent the Chaos Project at the Linux Foundation. Also, I work at the University of Missouri in computer science. And how many people have heard of the Chaos Project before? And how many of you have more than a very vague notion of what it does and the aims of it? Fewer? OK. So in this talk, I'm going to explain a little bit about the Chaos Project, which puts the risk metrics that we're talking about in a little bit of context. And then I'm going to show you some of the risk metrics and elicited conversation from you about what those metrics are, and then facilitate a conversation to talk about your questions, your concerns, with regards to measuring risk in open source software. So compliance and risk metrics are an extension of Chaos. They're one of a number of working groups. We always like to point out that Chaos has the support not only of the Linux Foundation, but a number of other organizations. This is just a subset. So there's a lot of upstream contributors that participate in Chaos because they want to understand the health and sustainability of the software projects that they engage in. And risk comes into play from a number of different important perspectives. Oftentimes, if I'm the person purchasing a piece of software or deciding that I'm going to be an upstream contributor or somebody who uses the software but doesn't contribute back, I want to look into how sustainable that project is. Chaos has a mission. We all have mission statements. Chaos is to establish implementation agnostic metrics for measuring these things that connect to open source health and sustainability, and then to produce integrated open source software that analyzes those things. So today, I'm going to talk a little bit about the risk metrics themselves and also about some software that we are using to measure them. There's a fifth working group not listed here, the common working group that's led by Don Foster there in the back. The four other working groups inside of metrics, inside of Chaos are diversity and inclusion, growth maturing decline, which is also called evolution now. So maybe I should update my slides. Risk, which is what we're going to talk about, and value. And you can find us under these different URLs. Risk informs us about how much risk a project might pose in a community. But we divide risk into a number of different key stakeholders. So from one perspective, we think about risk from a developer perspective, from a developer's point of view. As a developer, if I'm assessing the risk of the project, what are the things that I'm thinking about? I'm probably thinking about, is this a project that if I invest my time in, I'm going to learn something useful? There'll be a community of people to help teach me things, and I'll learn from it. A contract lawyer might think of it, and can I make a living, right? A contract lawyer is thinking about licensing. Does this piece of software contain licenses that if I include it in my product, are going to force, for example, my product to all be GPL? So there's a concern about licensing that contract lawyers think about. And then there's also a concern about understanding what is in a piece of software. So this is especially important in the safety critical arena. So some of the folks who participate in the risk working group are in medical devices and auto-linux, auto-grade linux projects, and we understand all of the things that are in a piece of software. And in these safety critical cases. So essentially, the chaos mission is kind of from a risk perspective is to evaluate risk as this likelihood of loss compared to the impact of loss. So that's a general framework for how risk is assessed. And so the impact of a loss and the likelihood of a loss is very low if we just hold a meeting. It's very high in the case of the film War Games. How many people here is War Games relevant cultural reference for? It's where Matthew Broderick accidentally causes a nuclear war by playing a game of chess on a computer. High risk, high risk of loss, high impact. Evil, can evil, very low likelihood of loss, but high likelihood of loss, but low impact. He's one person. And then we have a bunch of open source projects where perhaps the impact could be great, but the likelihood is small. So in terms of how much risk matters, it's often and whether or not risk matters depends on where you fall along this dimensionality of what the impact is and the likelihood is. So a trustworthy device is one particular specific example that we use in the risk working group to talk about a device containing hardware and software and some logic. So we wanna understand is it secure from a cybersecurity perspective? It doesn't provide a reasonable level of availability against intrusion and misuse. And is it suited to doing its intended functions? Some basic security principles are in fact adhered to by that product. How many, are there folks in here that work in safety critical software systems? Okay, when you think about risk, are these the kinds of things that enter into your mind? And when those things enter into your mind, we think about quality of code, whether or not you're allowed to use it, when you use it is it safe and can it be subverted in the future? So there's a number of projects within the Linux Foundation that are looking at risk and sort of contribute to it. As PDX sort of defined a software bill of materials, peace and phasology and Deusox are two components of software. Deusox is now called AUGAR-LOM, software bill of materials. Zephyr is a safety security. Alyssa is a working group for enabling Linux in safety critical applications and obviously CII best practices. So that's kind of an overview of risk as a concept. How chaos, the project has broken down risk is into five key sort of focus areas. When we think about risk, not every focus area applies in every domain. One is accurate identification, the extra cold quality, security, safety critical use and licensing. So if we want to think of and we've named some of these things a little bit differently, so if we think about business risk, our goal is to understand how active is the community around a piece of software to be able to support it? And so we have two factors. One is elephant factor. Elephant factor as I think many of you likely know is how many organizations are contributing to that piece of software so that we know if the organization that does most of it, we have a higher level of risk from a business perspective of losing something when we go to support that piece of software. Committers is very interesting. So the evolution working group focuses on contributors and contributors is a broader umbrella that includes people who comment on issues, create issues but don't necessarily contribute code. In the risk working group, it's we kind of identified, especially working with the safety critical group that the number and from a business perspective that the number of people who are actually committing code distinct from the overall contributors is important because there's a need to understand how many people understand how this thing works. Like what is my risk? And you might call this a bus factor, but it's a little bit more than a bus factor. It's trying to ascertain is there enough inertia and work occurring around this community and enough people that have knowledge of it that I think it's going to become and remain a going concern. Test coverage is our second area, focus area or code quality and code quality, the one metric that we've released formally so far is test coverage, which is very important in the safety critical software domain. I work for a pacemaker manufacturer for a number of years and then around the industry for a number of years after that and we had to cover every condition that could be imagined, right? We had to test every path and do everything and depending on the language that is being used to develop software, we can kind of look at test coverage from two perspectives. One is subroutine coverage and the other is a statement coverage and so we want a high percentage in a safety critical system to be tested. Are there people who work with safety critical systems who have to have 100% test coverage at the subroutine statement level or what are some thresholds that are common in your industries? And you don't have to say your industry, your company or anything, I'm looking for general answers since this is being recorded. About 90% is about right and is there a qualitative dimension to determining like are there certain pieces of the code that have to be more covered than others? Right, yeah. And something that's not covered in this in test coverage is maybe how are your fail safes managed, right? So maybe you can't coverage, maybe you can't cover every subroutine or every statement. This is specifically how the chaos metric will give you a number. These we plug in, the number of subroutines are the subroutines that we test and the total subroutines and the statements are the statements executed against the total statements which we can use code testing coverage tools to ascertain. But if you're building a safety critical system there's probably another metric about the fail safe. Like how do you evaluate the fail safe on a system? And I'm curious from a chaos perspective, I'll just elicit, do you have, are there test coverage tools that will evaluate the fail safe conditions in a piece of software? Or is that largely you test the thing that the software goes in, give it conditions that it should not react to and then just show that it actually shuts down? Is that kind of how you do it? All right, so are there ways that that is simulated or do you have to use the real devices that you put things in? Right. So ultimately you have to put it in a real physical system but you might do some simulation before you do that. So any questions about test coverage as a useful focus area or as a useful metric for code quality? No, okay. We also have licensing, so this jumps off, did you have a question? So this jumps off from our safety critical code quality area to the focus area of licensing and so the goal is really to understand intellectual property issues, which is a separate set of concerns for risk and we look specifically at the number of licenses, the coverage, so how much code, how many files in this system actually declare the license at the file level and what are the licenses that are completely declared? So Deusox or Augur S-bomb gives us a set of statistics for a particular repository that says here's your total number of files, here's the number with files declared and so that's your license coverage and that may or may not be sufficient depending on your IP concerns but at least with the tools that we're building you can know what that is. Are there others besides these kinds of license counting metrics, are there other things that people with concerns about licensing would like to see addressed as a metric or would like to see measured in some kind of automated way? Right, yeah, yeah and so I mean I think that's something that tools can especially put into place. I don't know if that violates the agnosticness of chaos to say that or is it just a factual statement really? So the actual chaos project is agnostic, we try to just define the metrics and the things that you measure and not assign value to them. Then we have a set of tools, Auger and Grimoire Lab are the main ones that we're using to calculate these metrics and so we've put together a set of information inside of the tool called Auger which I'll talk about here a little bit that sort of get at risk and so I'm gonna jump out of this presentation and just into some Auger stuff. So the first thing you see when you get into Auger is these are the working groups, these are the clusters of projects and the repositories in them that have the biggest anomalies over a period of time. So if you have risk metrics that you're concerned about you can for example, if you wanna know when a new license gets injected into a set of source code that you're tracking you can set up a signal that will put this on your dashboard as an insight that'll actually send you a Slack message and say hey somebody put a GPL license in your proprietary code and when you burn this to a router you'll lose a billion dollars. It may not be able to draw all of those conclusions but it can sort of you can be notified when that kind of thing happens in a dashboard like scenario and if you like when we get to we talk about sustainability. So if I hit like I see if I hit open SSL I might wanna see like what is this signal or this insight and we can see that the number of commits dramatically increased at one point or an open SSL and then started to tail off over time and we all probably can guess why and when that might have happened but if you wanna look for sort of radical changes in a project that's another signal from a risk perspective that can be helpful. And if you wanna really understand elephant factor or bus factor you can get into looking at what is the what's the real pattern of who is contributing right? So I might have a statistic where I have a thousand contributors inside of a project but there could be maybe like eight people that contribute 90% of the code. So we wanna get a sense of who are the top contributors and what's their relative level of contribution from a business perspective. This tells us a great deal about when things are being committed and how committed the different committers are. So this is an example of GraphQLWG which is a GraphQL project and we can look at the top 10 authors and see that we have sort of a very episodic set of commits but we have at least five developers that are routinely committing over a long period of time over the life of the project. So you at least have some sense that this is a going concern and that commits continue on this project. You can also compare repos with each other. So one indication of the sustainability from a business risk perspective of a project is whether or not people are opening issues and contributing to the community and actually identifying code. And then you can compare that to some other project that you're familiar with. So here I've compared two projects inside of GraphQL and I'm looking at the issues opened and the issues closed in each of these projects. And if I wanna include it in a PowerPoint I can download that information and I can actually decide to maybe compare even another project inside of it and get that data put onto it possibly. Maybe it's gonna slow today. Come back to that later. And then finally, if you hit this little risk button there are some specific metrics that start to address risk. This is GraphQL again and it shows you the number of forks. Forks is another indicator of business viability. The number of committers per week is another source of business viability indication and the license is declared which we broke apparently unless I refresh it and it happens to work now. And then finally, one of the things that people have been asking for a lot is could I have a software bill of materials? So the aim is always that from a risk perspective especially when I'm trying to understand from the licensing and code coverage perspective what are the packages that are included in this open source project that I'm using. We use do sox which is now called auger sbomb and it actually generates a full report of literally every single file and the commit hash for that file so that you can get a sense of what it is and then you get also a sense of the different relationships of the commits. So this is the package, the supplier, a lot of information that can or cannot be declared in an SPDX document. So to the extent that you have a product or an open source project that wants to be in a safety critical space or wants to be in a licensed compliance space where they know that the people who are gonna be working with that project upstream have those concerns, I think those projects are generally more likely to include this kind of information inside of themselves because they know someone's gonna evaluate that. Even if you don't include that information or project doesn't, we can still see a listing of the files and it's a pretty substantial list here, right? But us being folks who are somewhat experienced, we can go look at each of these files and start to get a sense of the directories that they're in and what kind of files they might contain or sub-projects they might include. So this is more of a tool that lets you see a full scan of the repo and go through the whole thing in one page and use your own experience with open source software to scan it as opposed to clicking on or cd-ing to a bunch of different directories. It essentially gives you a place where you can apply your knowledge as a open source software professional and scan the information. Are there any questions right now? Questions about risk? Are there dimensions of risk that you care about that I haven't talked about? So if we go back to sort of the focus areas of business risk, code quality, licensing, are there concerns that you have outside of business risk, code quality, or licensing, things that you think of as an element of risk in the work that you do? So are there things that you would look for in a repository to try to identify that? Like if I went through, my Mac is so smart that it wants to mirror displays if I go out of it. So like this screen here. So and the thing about, and we're working on technology to sort of automatically identify organizations but it's tricky, right? And identifying developer aliases is also a tricky business though we have strategies that we use to do it and Grimoire Lab has something we call Sorting Hat which seeks to do that as well. One of the projects that the risk group and the diversity inclusion groups inside of KS are looking at this here is leveraging Hyperledger Indie as a single sign-on technology where we could possibly enable developers to be able to give some of their information to us in a way that's trusted that they control. So it's encrypted, it's federated, there's no honeypot for anyone to go after. Those are ways we think that we might be able to systematically enhance what we know about developers on any particular project but as any of you that have tried to track this know, a lot of times it's a very manual effort, it actually takes a lot of human labor to keep up with who the organizations contributing are and these statistics certainly, like any, can be gamed. So as we think about, yeah. This is behaving in a very non-deterministic way right now. Oh, it's a different desktop maybe? Yeah, and it's like, I say mirror displays and it goes back to, I know what I should do. If I de, yeah. If I had to take the web browser out of full screen, was there a question I missed? Yeah, that'd be an interesting metric. It's sort of like a help site coverage, like, right. That's a really good idea, yeah. And that's a, so we keep track of the timestamp that something's committed at and doesn't give us the IP address that it's committed from. So there's a little bit, I mean, and isn't that the asserted location now? Is it, that's not like tracked. So my, yeah, like my location, literally on GitHub is Earth. So I'm from Earth. And I suppose, but you're saying GitHub actually tracks the IP address where the commit comes from? Yeah, that's only based on the IP that's only thing probably get that is through the IP because, for example, if I'm thinking of countries that we might be concerned about for different reasons, it's pretty commonplace in countries that have export restrictions with the United States that the committers from those locations use VPNs that let them pretend they're committing from somewhere else, right? And so I think under, like a strategy for that is one way, so what are some, are there, so I can think of some proxies that could evaluate risk. I could think about the frequency of changes to a particular file. I could think of, if I know that certain files are critical or might be more prone to be the ones that expose you to like an open SSL type bug or something like that, we could scan for things like that. I think there are security scanning tools that look for those bugs and bounty systems. I think if I was monitoring a project, let's say I'd already put it, are there projects that, do you do this like to monitor projects that you're already participating in or using or consuming upstream or are there, or are you kind of using that as an evaluative indicator when you're trying to decide what to do? If you don't mind me, just. So please don't provide that data because you have enough work to do. Yeah, okay. Other risk concerns outside of the focus areas that we're currently working in. So one, my last event or thing is really finding out who's interested in working in these domains. Is there's anyone in the room who's interested in helping us to develop risk metrics as part of the chaos project or if you have a particular thing that you would like to see scanned from a software repository or a contributor community's issue tracking system, that if you knew that information, it would help you to assess risk. We're keen to have you go to the chaos.community website and get on our mailing list and think about participating in the mailing list and contributing or maybe listening in one of our monthly calls or each of the working groups has a call every other week. So the risk working group talks has a conversation every other Monday. I think this coming Monday is our next one. Pretty sure, because we didn't talk yesterday. So there's times that you can come and kind of engage and say either through the mailing list or even a GitHub issue where the chaos organization on GitHub and all of our working groups are identifiable by WG-dash. So working group, WG-dash risk is the working group on GitHub. So if I went to, since this is being recorded, I'm gonna test my typing skills. So this is the risk working group. So if you have a concern, you can send it to the mailing list. You can come to our calls, which are every other Monday at 1 p.m. East Central Daylight Time, or you can simply submit an issue here. And that will help us generate metrics and tools that provide the values for those metrics that are useful to you. And that's, I think, really we wanna do things that are productive and useful for the open source community. That's our principal aim. All right, I came back. Thank you. We've got five minutes. If you have any questions, I'm happy to answer them. And if you're also welcome to leave, right? Like you don't have to make me, I will not feel bad if you depart now. So yeah, go ahead. Some library that has not been updated in a year, right? So that's risk to me. Yeah, yeah. There's a national database of publicly stated vulnerabilities, right? But I believe the security community, and I'm not an expert in this, if anyone is, speak up, has a sort of way of privately communicating these things before publicizing them. So... It's kind of different for every project. Right. That's what I'm saying. There's no standard or uniform scoring for that kind of behavior that I've seen. I haven't. Jessica Wilkerson, who's my co-presenter who couldn't be here today, was deeply engaged in the medical device policymaking for safety critical systems and has a good understanding of the national databases that exist, which are part of the, we'll put them on the risk page for Auger here shortly so that you can find them. But I think when it comes to security, we haven't engaged with a person who's, or had a long-term engagement with people who have visibility to the release of vulnerabilities prior to the public awareness of those vulnerabilities. And I think that's what you're talking about. Right. So it's not the coordinated disclosure, or prior to disclosure, it's the, after this is public knowledge and attackers now have access to this exploit, actual risk. Sure. So, I have a question. So there are two ways that we can get at that. One is there is a public database of known software vulnerabilities at some point they become public and we can scan any repository and look for that particular dependency and that particular vulnerability. And if we know when it was made available, we can also identify when a commit change, because usually it's a library version in most open source projects that needs to be updated. So it's fairly easy to get that information. If we know what library needed to be updated, we can look at the distance between public release of that vulnerability and the group fixing it. So we have the data, I think I would say. And that could be measured. That could be measured. We haven't put the data, we haven't put the vulnerability database in. So that's one way it can be measured is to use that vulnerability database directly and then scan any Git repo. Any repo in GitHub, I think you've all likely noticed has a vulnerability list right on it and they have an API that exposes that information. So we could create a GitHub call for things, but a lot of open source projects are not on GitHub. And so that's a great feature for GitHub, but it doesn't cover in all likelihood your whole open source universe. So if that's something that I think makes a lot of sense and if you would love it, if you would go to our working group webpage and create that question and issue and then even if you're not able to participate in the call or don't have the time to follow another mailing list in your life, we at least will have sort of your synopsis of, and I'll remember it and I'll try to summarize it, but my brain is in perfect and everything leaks out my bald head, so we're done. Thank you very much. I'll be around for a bit if any of us has any other questions, but thank you. Thank you.