 Hi everybody, I'm Sean Goggins. I'm representing the risk working group of the chaos project here at the Linux Foundation. And title of the talk is, it's a complex web of open source software dependencies and risks. And I'm going to open up by explaining that we're a metrics group. I'll tell you a little bit about that in a minute. But when we went down the road of trying to understand software dependencies, we found ourselves here. Does anyone remember or recognize where this is? The library of tar pits. A massive, complex web of things that are interconnected. And we had to mull through a lot of information, because there are a lot of different organizations and components of organizations that have some issue or concern with dependency security and things like that. So the questions that people who run open source project offices or open source projects themselves are, is my project secure enough, safe enough? How do I measure the dependencies? And this has become a very significant concern recently, because my projects are increasingly dependent on other projects. And the biggest thing we looked at is, can I use something that's unsafe or less than a secure component and have a secure result? In other words, how do I build a trustworthy machine without having trustworthy results from all the components? And as my application goes to being in the 90% in some cases of third party dependencies, what is it that I have control over? And then how I design all that so that a mistake isn't the end of the world. And then where KAS comes in is measuring it. So KAS is a project that launched at the 2017 Open Source Summit, which was somewhere here on the West Coast, I believe. I don't remember which city, maybe San Francisco, maybe Seattle, I don't know. LA, it was in LA. And so KAS, we came along, and our role is to really define metrics for open source software health and sustainability, and provide tools that help open source project organizations, OSPOs, individual projects, ecosystems, with ways to track the various components of sustainability within their projects. So questions like, how can we know if this open source project is going to be around in 10 years? Or what is the health of the other projects that this project depends on? Is there a diverse community? And our main aims are these boxes at the bottom, establishing implementation agnostic metrics for measuring community activity, contributions in health, and some integrated software for analyzing such things. And although it sounds very simple, at the very beginning of the project, we had a lot of discussion about what is a commit. How do you measure a commit? Does white space count? Do you commits that are compressed count? How do we account for those things? So we don't offer what we would say is the right definition of any given metric, but we offer a consistent one. So if it's consistently not your definition, in a way that's OK, because it's still consistent. We have a large group of organizations that work with us in a very open community. Our process is transparent. Our website, ChaosCatCommunity welcomes you with open arms. And again, we have both metrics and software that are integrated into the Chaos project. This is an eye chart, which you won't be able to read, but this is just an array of the different metrics that we've deployed. And they revolve around core working groups of evolution, common, which are the metrics that cross all of the other working groups, value, diversity, equity, inclusion, and risk, which is what we're talking about today. And we also have some initiatives, including community reports, badging, and some software. So Chaos is a very large enterprise at this point. And when we start to talk about the metrics that we have, risk, we have a lot of risk metrics, and dependency is one of them. I will get to that, I promise. Every metric, in some way, represents some square on this risk matrix that many of you have seen. In other words, what's the likelihood of something going wrong, and what's the effect on my project if something goes wrong? This is how we assess risk, just like I'm sure you do. And one of the things we do is we combine metrics. So one metric doesn't necessarily tell a story. So if we want to understand something like all the second-time contributors that you've got, how well are you doing at retaining new contributors, we have reports that group metrics together and give you beautiful little charts, like this and like this. But let's get to dependencies, because that's what we're here to talk about. And by the way, if you have a question, don't feel shy about shouting it out. Well, I'm up here. I'm happy to just answer your questions as they occur. So when we entered the dependency realm, this XKCD example is a really good one. All modern digital infrastructure and then some random project in Nebraska that somebody's been maintaining holds the whole thing up. I'm sure you've all, who has not seen this cartoon? Yeah, it's a very popular cartoon in open source. And when we waited into risk a little bit further, we realized that we weren't dealing with just one thing. When we're dealing with dependencies, there's a lot of different dimensions. There are licensing risks, safety critical system risks, dependency risks, open source, repositories, consumers, security, sustainability, compliance. When we look at risk, there's this whole universe. And then when we narrow it down into dependencies, you'll find that there are a lot of organizations software and efforts focused on this. There's OWASP, OSSF, DEPS.dev, Libvirs, National Vulnerabilities Database, the open source census. So as a working group, we were overwhelmed with the amount of work that was already being done in the risk dependency space. And some of it's overlapping. Some of it's orthogonal. Some of it's interconnected. But those connections were something that we took a great deal of time to try to figure out. And so what I want to talk to you about today is how we're going about within the chaos project of figuring out the measurement of these dependency risks with an open source software. And the way that we're doing it and how it's systematic, answering these questions. Is my project secure, safe? Can I measure the dependencies at all? How do I do that? Can I use these unsafe parts and still have a safe project? How do I design so that a mistake isn't the end of the world? How do I measure my dependencies so I can enable more rapid updates? So we get to the dependency question. And what are the indicators of a dependency risk? Every piece of software has a dependency. What constitutes that dependency having some risk to my project? And how can we quantify them in a meaningful way? Metrics. And what are the results of those measurements? Does it matter if we look at them over time? In other words, is there a trajectory of our dependency risk? Does it go up and down? Is it something that we want to monitor over time? And then what is the value of any particular dependency measurement? And so to try to put some concrete thing in your mind, let me pick one metric that we've developed formally. So how many of you have heard of the libier metric for dependencies before ever? Sophia in the back, who helped define the metric. Hi, Sophia. So NK asked to define metrics. We use this goal question metric methodology. And so what is the goal? In our case, it was understanding the scope of dependencies in open source projects and then identifying higher risk dependencies. We can't inventory and analyze the thousands of dependencies that some projects have. But what we can do is find indicators or heuristic metrics that help us identify those that are higher risk. And so this is our risk focus area. And so one question was that this libier metric answers is, what is the age of the project's dependencies compared to the current stable releases? And we had some discussion about stable release versus latest release. Because as you know, those are not always the same thing. And so we chose the language, stable release. This is an eye chart. You can have these slides afterwards. But the question is, what is the age of the project's dependencies compared to the current release? And so age of dependencies that the project relies on compared to those stable releases, and age for the libier metric, how do you come up with a number? It's cumulative across all the dependencies. There are other ways to do this. We could apply something like a mean or an average and possibly a median. So the math that you use depends on how you want to evaluate it and what makes sense inside of your organizational context. But the libier is an example of a dependency risk metric that gives you a general idea. If you're looking at 11,000 projects in your ecosystem or your OSPO that you're responsible for, it gives you a general idea. Which ones have the oldest dependencies compared to the most current stable release? I'm going to pause. I've thrown a lot at you. Does anybody have any questions at this point? Yeah? Let me answer. I will address that. It's a really important question. We haven't looked at data-oriented dependencies specifically. No. So I have some opinions about that based on some other work I've done. And we can maybe talk about that after. So the age of a project's dependencies, this is one example from a website that calculates libiers for Python. And it'll tell you the current version, the latest version, and how many libiers behind it is. So you can see it by dependency. And the objective of this specific metric is to help you identify the dependencies that have a higher probability of posing some risk or vulnerability. So the heuristic is, the older the dependency is, the more likely that specific dependency in a project is to pose a risk to the project. So it's a place to start. If you have a lot of dependencies, you can look dependency by dependency within a project. So right now the scope of our thinking is just within a project. And you can see which ones are the most out of date and do some assessment of whether or not those are ones that you want to make some effort to either update or contribute to depending on your project's perspective. And when we look at dependencies, risks, and vulnerabilities together, right now we have certain things that are part of a critical infrastructure that are consumed, tested. We have security, licensing, production. All of these things are involved people with tribal knowledge of the software, historical knowledge, domain expertise. Many different people are involved in making decisions about what dependencies to introduce and how to update them. There is a question of cost, and there's a question of project maintainability, test coverage, suitability for service, and then provenance, for example. So sometimes there's export-resist restrictions. There's certain countries that we can't take things from. And there's different kinds of dependencies as well. So direct dependency is the whole thing is ultimately transitive. But a direct dependency is I have my open source project, and these are the libraries I import directly. And then I have libraries that those libraries depend on, and then libraries that those libraries depend on. So ultimately, I can also have a library that depends on the same library that I depend on. So we have a matrix of very complicated dependencies. And so for a libier, what we do is for each of these primary dependencies, first we'll calculate how old they are, and we'll produce this cumulative metric of libier, the age of all of them for a project. And you can either calculate libier for the direct dependencies, or you can include the transitive dependencies. That's in the metric that we define. We left that open to the implementation to decide what to do about that. There's an alternative view of libier that comes out of some ospos where they want to understand the cumulative age of all dependencies between the current date and the date of the most recent release. So it's a slightly different metric. And when we add securities and vulnerabilities into it, then we can identify certain library dependencies as having some kind of security or vulnerability problem. And so that separate from the libier might flag it. Now the view from the Ospo kind of gets to the question that you asked earlier, which is, OK, I have thousands of projects that I have in my portfolio. And those projects have dependencies. And so now I have this giant ecosystem where I have the same dependency across many different projects. And I have a limited amount of money that I can invest in hardening dependencies that I don't want to go away. I want to substantially secure those communities. And so from an Ospo perspective, the picture that a lot of Ospo managers that we've talked to want to have is, OK, I've got 11,000 projects in my portfolio. I want to know across all those projects, where are my greatest vulnerabilities? Where's my greatest risk? I can't think about every single one of those 11,000 projects. I have to understand across the whole world, where am I going to invest money in a project that I'm dependent on in order to bring it either up to more currency or to move my software to a more recent release? And so what we've done as a working group, as we looked at that La Brea tar pit of dependencies and all of the people thinking about dependencies and vulnerabilities is, we really, thanks to the group of people working together in this group, identified some things that we think are what we call the minimum viable metric. And I'll credit Sophia Vargas back there for inventing the term minimum viable metric. And so the ones that we think for dependencies that are going to add the most value to people who want to understand where the risk lies in their dependency chain are just to list what my dependencies are. So one of our metrics, we've produced two metrics in this area so far. The other is upstream dependencies. And just enumerate them. Give me a list of all the things that my projects depend on. And then assessment of sustainability risk. So sustainability risk essentially will take that inventory and apply existing chaos metrics from evolution, common, and value into what we're starting to call a metrics model, and say, OK, these are all the things I'm dependent on based on the health and sustainability metric collection within chaos, which projects have the most health and sustainability concerns, let's say. And then dependency range. So this is going back to the ecosystem. How many times is a single dependency referenced in an ecosystem or an OSPO? Lib years, we've developed so it could be a total or an average, that's a filter that you can apply. And also enumerating known vulnerabilities for your project's upstream dependencies, which I think I actually have that twice. And also enabling, there's a tool called OSSF scorecard, which evaluates 10 different dependency criterias. We've actually implemented that in some of our chaos tools now. And how many people have heard of OSSF scorecard? Kate has heard of it? Anyone else? Sophia? So it's a scorecard for evaluating risk, and it has 10 characteristics, which I don't know what they are, but perhaps somebody, Kate, you know what they're? No, OK. But it's a useful assessment of different risk characteristics for a project that the OSSF put together. And then finally, a matrix between vulnerabilities and dependencies. So there's a vulnerabilities database that is available. And if we understand the list of our dependencies and we can cross-reference them with a vulnerabilities database, and these databases are imperfect and now more decentralized, though somewhat more well-funded than they have been historically, we can start to, these minimum viable metrics, we think, help to give open source a standard way of thinking about dependency risk. And since it's a hard problem with a lot of different pieces to it and a lot of folks working on it, what we're trying to do with our work is cohere a set of definitions for things that we can focus on. So instead of being overwhelmed as we were as a group at first by the number of different perspectives on this problem, what we're doing with these metric definitions is giving us a place to focus. We've talked about and thought about all of the different work that's happening in the dependency space right now. And we've said, OK, if we measure these things, we're going to give most organizations, most projects, most ecosystems, a much clearer view, if not a perfectly clear view of what their dependency risk condition is right now. Coming back to chaos, I just want to talk about, so we have two metrics we've defined about eight minimum viable metrics. And our history is, at first in 2016 or 17, we didn't have common shared metrics. Every organization trying to measure something in open source software was doing it their own way. And at this point, we have over 50 defined metrics. When our next release comes out, it'll be over 75. And now with things like risk metrics and dependency metrics, we're trying to look at health and sustainability metrics, making them widely available and providing tools and projects that can look across ecosystems. And activity metrics with a repository focus can give us a really good idea about scale, project culture, project quality, process quality, product quality. We can contextualize risk, identify licensing risk, and look at corporatization and access to resources as factors that affect dependencies and the introduction of them. So now I want to come back and ask you these questions. These are the questions that we had as we sorted through this tar pit of available resources. Which of these questions are most salient to you? Are any of them salient to you? So is my project secure enough? And how to design so that a mistake is not the end of the world? Good? Yeah? Or three? So how do I measure the dependencies because of my projects increasingly have them? And yeah, so it comes back to the likelihood of occurrence and the effect, the impact of the occurrence. So if a dependency allows for a large date of reach, it might have been a little likelihood, but it had a high impact. So some of how you make these decisions are going to be related to the kind of systems that you're running. Safety critical systems have a near zero tolerance for any dependency risk. And it has to be measured not only at coding time, but at runtime as well, which is an entirely separate discussion. What are some ways that you think about dependencies and avoiding risks that come from them, aside from analyzing each individual dependency? Are there defensive coding strategies or release processes that you have in place that provide some protection in addition to understanding the dependencies in depth? Yeah. One of the things that we're doing with our tools is providing a list of dependencies over time so that you, as an OSPO manager, for example, would be aware that new dependencies were injected into your code. And you could decide if that was appropriate. Other organizations have started to implement dependency approval boards, where if you want to introduce a new dependency, you have to get it approved by a manager first. There's a recognition that the old programming practices and languages like Python, to be fair, the Python C++ contrast I think is a fair one. It would be very common for a developer to add a library because it served some function that they needed and they didn't want to write it themselves or they didn't want to find if one of the libraries they're already using already include a similar piece of functionality. It's just easier to throw in the one you know. And I think a lot of corporate practices and or project practices are starting to limit those. And things like, you know, simply having metrics that enumerate your dependencies help you to understand if new ones are introduced. Yeah. So rating each, so this is where it becomes a full circle opportunity for a project like KS, where we're working to help measure the health and sustainability of open source software projects by knowing that a project has got a strong community behind it and a good core set of developers and maintainers and contributors. If you know that, then choosing to import that library probably poses a lower risk than randomly grabbing one off of PyPy, right? And so that's, you know, one of the things we're trying to accomplish with dependency metrics is to give you visibility into health and sustainability of not just the project you're working on but the things it's dependent on. And I think to your point, this can be a proactive thing where I want to use something that's already hardened. Where do I find it? How do I assess the hardness of it, the stability, the health and the sustainability of the community? This is where I say thank you and ask if there's any other questions or discussions that people have. I mean dependencies are a significant concern really across open source right now. And what the KS projects risk working group is trying to do is, you know, give you some metrics that you can use and some tools that you can use to evaluate those dependencies and assess the safety and stability of your projects so that if you're responsible for running, say, a gas pipeline, you can determine if you have a vulnerability that needs to be addressed. Kate, yes, you can go to chaos.community. Hopefully that's easy enough to remember. Chaos has two S's because we wanted to be just a little bit more chaotic than regular chaos. And if you go to chaos.community, there's a participate tab and we have working groups that meet every other week. We have a weekly community meeting on Tuesdays at 11 a.m. Central Daylight Time in the U.S. We have a lot of resources related to our metrics and tools available at our GitHub site, which is GitHub slash chaos, C-H-A-O-S-S. We haven't done that. David Wheeler has worked closely with us on the development of these risk metrics and he's very closely involved with the CVE database and security in general. I think he has a heuristic belief that there is a relationship or he wouldn't be working with us because he's a busy guy. We looked at it as a, so right now, the metrics for evaluating dependency risk are, they don't exist before we started to do this, at least in a formal and consistently defined manner. And when we looked at, there were three GitHub repositories that implemented a version of the Libya metric all a little bit differently. And we tried to develop a metric that would provide a fast heuristic indicator of where I should look first. So if I have a large collection of projects that I'm overseeing, Libya helps you to see where to look first. And you can slice the data at the dependency level or at the project level or at your ecosystem or OSPO level. And it's a place to start. It's a filter so that you don't have to look directly at the dependency chain in 11,000 projects. You can start with the thousand that have the most risk. It gives you, yeah. So it's helping with that focus when you have a large scale set of responsibility. I'll hang out here for as long as you like, but the rest of you are free to go, as I say to my classes.