 Thank you. So up next, we have, oh dear, don't do that. We have a pretty exciting treat for us all. Stop that. We have a panel. Woo, panel, yeah. This next session, there's some top talent here. And we are here to chat about finding and securing stuff. Oh, I added all that to the document. That's pretty cool. So it's about finding and supporting critical open source projects. So our first panelist is Amir Montesari. He's the managing director of OSTIF, the Open Source Technology Improvement Fund. It's a nonprofit organization that provides tools, services, and audits and support for open source projects. So Amir, please find a stool of your liking. Next to Amir is going to be Caleb Brown. He's a software engineer working in Google's open source security team on finding critical projects and analyzing behavior of open source packages. So Caleb, please come find a stool. And finally, we have Julie Faroli. She is a developer, researcher, and open source maintainer, and podcaster. Yay, one of my favorite hobbies. She is active in the open SSF community, helping with the Securing Critical Projects Working Group and researching project criticality while advocating for solutions that work with open source maintainers. So this is our last presentation before lunch. No pressure panel to stay on time. And we have our slide appearing right now. I was going to hit some buttons. Oh, dear. No. Look at that. So panel, take it away. Thank you, Jory. Thank you. Thank you, Krobe and Jory and everybody for having us today. We're super excited to speak with you all about finding Libraska. So everyone has seen this amazing graphic before. And we're really, really focused on finding those projects and securing them. So I'm super excited to be here with my fellow panelists. And I guess we didn't really talk about how we would formally start, but really excited to be here. Julia? Yeah, so I'm Julia. I am officially an open source technical leader at Cisco. That's my pretty awesome title. And I'm also the co-founder of open source stories, a community-driven project to capture untold narratives in open source. And my pronouns are she, her. And I'll pass it off to Caleb. G'day. I am Caleb. And you can tell by the accent I'm not from this country. I'm with Google on the Google open source security team. A few of us here. And yeah, as mentioned before, I work on the criticality scoring project and the package analysis project, which is lots of fun. So I'm very interested in today's topic. Yeah, my pronouns are he, him, and yeah. Awesome. Thank you. And I'm Amir from OSTIF, OSTIF. As Krob mentioned, we focus on solving the problem of securing critical projects. That has been manifested through coordinated managed security audits, which I guess will be a good segue into why we're interested in the topic. So I can kind of veer into that and then love to hear your perspectives. But the reason I care about this problem is because as mentioned earlier, our organization is working really hard to try and solve this problem. Taking from some leading research and vulnerabilities shows that lots of times to find those really deep-seated problems and those vulnerabilities in software projects, you really need to dig deep and go in to find the juicy bugs, so to speak. So by doing that for the last seven years and thankfully with OpenSSF and a lot of the foundations and larger organizations starting to care about this problem more and be much more actively involved in it, we're doing a lot of audits and of open source projects and in doing that, trying to find better ways to identify those projects that are just so important that everyone depends on but might be overlooked by maybe some tooling or we were actually just having a good conversation outside about community size as a metric, how that can be a little bit misleading. So yeah, I'd like to think I'm very close to the problem and happy to share experiences doing that with everyone today and yeah. Good next? Okay, I'm interested in this project because I'd like to avoid another log for shell. I don't know how many people were involved in rolling incident management on this in the room but fortunately I wasn't, there were people in our team who were, so getting ahead of that, the next one of those would be really good and hopefully it's not a zero day. So that's one interest. I'm also interested in helping people who consume open source to understand I guess their exposure and their risk and hopefully motivate them towards investing in improving that particularly the ones that are like log4j where there's a small handful of developers working really hard and being able to invest their resources that they've got into those places but really needed to improve the security for the entire ecosystem. Well said. Thanks. So I have worked in open source programs offices and in different companies and I think my interest boils down to pain management because I am not the person fixing the bugs. I'm the person managing the efforts to figure out who's using all of these open source packages and worse who's responsible for fixing them and so I really want to help people avoid the pain of having to drop their entire work, their entire life to go and fix their open source dependencies and I started doing research in this and maybe it was like 2018 I was actually at Google at the time looking into where investment should be where like if something fell over like happened in 2014 if you remember 2014, 2015, a couple of times actually like how can we respond faster? How can we figure out that some of these projects need the extra help? And so I developed a framework to help identify those projects that I think are released somewhere. Very cool. And as a work group, the Securing Critical Projects Working Group this has been one of our main objectives is coming up with a well fleshed out list of the most critical projects out there and that the idea or the intention for that is to kind of help as Julia mentioned guide resources to put attention on projects that probably could use the help because I've never come across an open source project that said, no, we don't need any help at all, please, no, so finding those projects and being able to support them proactively has been a big part of what we've been doing as a work group but it has not come without its challenges because as lists are hard just inherently you're never gonna have a perfect list that will appeal to everybody and based on how they consume open source but being able to at least come up with something, again, we're under the impression that it doesn't have to necessarily be perfect, it never is going to be perfect but I invite my fellow panelists to talk about kind of what have been some of the challenges with kind of getting consensus on that, even measuring that and maybe from your previous experience too if you wanna talk about how lessons learned from there. Yeah, let's do that. Do you wanna talk about lessons learned from there? Yeah, so one of the big challenges that we identified early on is the trouble of comparison, right? We've got a lot of different types of open source projects out there and I'm pretty sure that by the time I finish speaking this sentence there would be like five or 10 more started so it might be a floor actually but they're shaped very, very differently. They serve different functions and one of the starting points that I took was from Nadia Eggball's Roads and Bridges publication from the Ford Foundation where she categorized a bunch of different types of open source projects and it included databases, libraries, frameworks, et cetera. And until you slice and dice open source projects you're going to wind up with one list that is not representative, right? And we've found this with languages as well. Do you wanna talk about the problem with languages? Sure, yeah. I think it's a fantastic point that again there really is no absolute here but the fact one being that and I like how Anne Patusio had this on the first line, open source just inherently is critical so I think one of the challenges is really kind of just drawing the line in the sand and starting from somewhere because in the time it takes to, let's say gain consensus on a project there could very well be plenty of new projects being introduced, vulnerabilities in these projects being exploited that aren't being found proactively. So really just having something and then having something actionable to start moving forward in terms of going from identifying projects to actually taking steps towards securing them and I personally am much more in the don't let perfect be the enemy of good and sure if you survey let's say a thousand people whether project X or Y is critical you'll probably get a relatively high percentage of those saying yes so really just trying to move the needle forward and as you said yes languages play a part platforms even the functionalities of the projects is a project that runs, that may not be used by a lot of people but is a critical part of the Large Hadron Collider project I mean is that critical? Well if we don't want things to explode I think things are more critical than not. Indeed yes or like power plants shutting down. This is one of the challenges is you don't know where the software is being used you don't know the data that it is using to secure what is the value of the system that this software is being used within is really hard to quantify and so if you actually want to measure the real impact of what really is critical you'd be able to be a God omniscient and be able to see all the applications of a piece of software and know but unfortunately we don't have that so we have to come up with other ways of figuring that out. And I think that's a good segue to one of the points that was made earlier in the day is that we don't have the telemetry that we used to we don't have the trust and when it comes to securing infrastructure and identifying the project's most critical for infrastructure however you define that we have a lot of private companies involved in that and they can't necessarily make their dependencies public because that can be an attack vector. So figuring out how to collect that information in a neutral, responsible and representative way is a big challenge and it relies on a lot of self-reporting. Yeah I think there's, I'm hoping in the future things like S-bombs and Git-bombs and other ways of cataloging software and dependencies helps in this process but I think it's gonna be an evolutionary process where over time as things like S-bombs are adopted and the companies are more, I guess, happy to be revealing this sort of information that you'll over time be able to get a better sense of how critical software is being used across the ecosystem. So one day hopefully we have that ability to see how things are being used but I think there's a process and a time when that will be the case. And a wish list, like dependencies and dependency count and software composition analysis is great but also I wanna see how much time is being spent in processes, in functions when they're being run because my favorite example for critical software is a Fortran package. Yes, bless, thank you. Thank you, Mark. So that, a lot of time, a lot of processing time is spent in BLAS and probably not on a lot of people's radar. Yeah. But good thing is at least the needle is moving forward despite it being a constantly moving target which makes it even more challenging. But you have research, like the Census II research that did a lot of work on finding what are the most used repositories and other research going on in the space is helping moving the needle but it is definitely a challenge, absolutely. How do you do that at scale and keep it up to date as well? The Harvard Census takes time to produce, it's a point in time view of things but like the more open source projects being created every second, like it's a changing landscape and my problem with that picture up there is that that's a point in time picture and someone can, the Lib Nebraska guy can like add another dependency to that tiny little thing there and like suddenly your whole picture changes. So that's- Even more skinny. That's right. I think we had a question, yeah. Yes, that is a great question. The question, do you wanna say the question? Oh, yes. What type of metrics are we looking at when we're talking about or measuring criticality? Caleb, do you wanna talk about that a little? So I'm still getting up to speed on all the research and papers that are available. Some of the ones I've seen are based on dependencies and looking at how they work. That works in languages that have clear dependent data. So like Python or Node, you can see that. It doesn't easily track across transitive dependencies, that sort of research. I've seen other research in the space around track factor and like trying to understand like how many people have a cognitive understanding of a piece of code and using what the data and git or in your software repository to answer that question. Other research has looked at like how do, what's the time around the issue resolution and those sorts of questions as well. You can also, if you wanna take in other data as well, maybe the number of source lines of code, maybe who are the organizations contributing? Is there a diversity there in that space? So there's lots of metrics and when we start to formulate some automation and scoring around this, we really wanna look towards academic research in this space to be able to like have some sensible reason for actually using something or including a metric or a signal in trying to calculate those scores. Does that help answer the question? Do you wanna add anything? I have so many opinions. Oh my goodness. So academic research, yes. I think we need to be more involved in participating and guiding academic research. We are seeing more and more research institutions having an OSPO and open source programs office that can help them understand open source software better. But they still need some guidance because open source isn't necessarily built into their DNA the way that it is for many of us here. The problem that I see with academic research is that they keep falling into the same pit of assuming that GitHub is your source of truth. And that is very problematic for a bunch of different reasons. And there's also the idea that metrics, all metrics are good metrics. No, they're not. All data is good data. No, it's not. I need to finish a paper actually, a position paper on this subject that's sitting in my open drafts folder. But we need to help guide them in identifying the metrics that are useful. Please don't use stars on GitHub. That's not a good metric to use. I'm sorry. Yeah. And then I was just gonna add that as important as quantitative data is, especially when it comes to scaling and be able to do some of this at scale, qualitative data is important too. Which is why one thing that I've been trying to incorporate into our process is some type of like a community curation kind of a thing where we can source those qualitative data metrics that might not be apparent on a tool or by looking at a GitHub or something like that. So as important as the quantitative metrics and the research is, I think the sourcing from the folks actually maintaining projects, working on these projects is gonna be an important data point too. We have a question over here. Oh, awesome. So really, I totally love all the different perspectives on the panel. One thing that we're pretty concerned about is a lot of the attention has been on the modern application development side of things. And so we use the term critical infrastructure meaning IT, but in fact, right, the things that are very relevant to human lives and safety tend to be the most under measured, whether it's embedded systems or medical devices or the things that go boom or fizzle. And last, that's one of the weaknesses in Frank's study. You guys have some thoughts about how we can engage in those software communities and get better visibility? Oh. So I noped out of getting a medical device implanted because they couldn't tell me about the security. And I'm like, I do not want some random person administering shocks to my spinal cord. Thank you very much. So yes, I think we are overly focused on our representative companies interests, right? We need to be more engaged with the medical community, with the people running power plants, the people that are running, you had another one that I can't remember what you said. Things that go boom. Things that go boom, like data centers sometimes. Or military equipment, yeah. Yes, yeah. And I think that, I think we probably need help. Like we need help figuring out who to talk to because a lot of the people aren't here, right? Oh, we have another question. Awesome. So I'm not familiar with the group. I'm from open UK. Hi. We've been working quite a lot on trying to find better data and better metrics. And I personally don't like the way we measure the economic value. I think we look at total cost of ownership, which is very out of date. We look at lines of code in, we look at number of developers. The data isn't reliable to start with, but it's a really old school way of thinking about it. And we're trying to shift more to look at investment and to look at value generated by open source in so many different ways across things like cloud infrastructure. We also are looking beyond economic value. And we think one of the biggest things that we have as an open source community is all the additional value we have to society. And it's not something that can be measured in money. And we will launch our sustainability day in November, what we're calling societal value metrics. There'll be a V1 and we need lots of help to make them better and better. But we've started with the Sustainable Development Goals to give us 17 base points. And then we're gonna refine that. So there will be something launched by November as a V1. Anybody who's interested in getting involved will be very welcome and we will do more year on year to improve those. Thank you, Manna. I think there's a, in terms of signal quality, I think there's a combination of keep looking for better signals and keep finding them. And it's great to know that people are doing that and that there is investment in that space. But I think, yeah, at the same time, keep working on using the ones that we've got to try and calculate or determine what we know so far is the critical stuff so that we can invest in that area but let's improve the signal collection so that we can make sure that we're not missing something that is actually really critical that we haven't seen. That's hiding down the pile of things. And the call for participation comes the other way, like bring your knowledge, bring what you're learning to the working group because the impact of open source and securing open source definitely, it spreads, right? It radiates, that's the word I was looking for. You have a question over here? Oh, okay. Yeah, can you hear me or? Yeah. We've got a couple of joys. Oh, what joy is going on? Thank you so much. So this resonates a lot with the work that I criticized from academia to other academics to follow up with some questions around what we've been finding. On my lab, we're trying to understand the cross-ecosystem analysis rather than just looking at, say, NPM and Docker or blah. And we find that it's really highly interconnected and most of it really does boil down to the Linux distros, right? If you're going embedded, you're going to find a bunch of yachtoes around. If you're going embedded, you're probably going to find some free artos and things of this nature. So my question is, why don't we approach, say, those communities that I find surprising sometimes, they're not very engaged with the Linux Foundation. And as a follow-up, which may be a little tangential, another finding that worried me a lot was that these ecosystems are actually unstable. If you sample the graph of dependencies on a Monday, you will get different critical packages than if you sample the graph on Friday. This means that it is not that easy to just say, hey, these 10 packages are the ones that we really need to take care of, but rather, when do we need to incentivize or how can we stabilize this graph so that we can actually start working on it? That's a great question, Santiago. Do you want to take a stab? So I'll take a stab at the first part of the question, I think, and correct me if I'm getting off topic, please, I tend to do that. So I think that there is research around the cross ecosystem analysis, right? There's the project ocean, which is open source, complex ecosystems and networks coming out of the UVM complex system center that is digging into this, and it's a cross disciplinary approach. So there are initiatives out there. You do have to kind of work to find them. So that's at least the first part of the question, I think. Do you want to? Yeah, do you have anything else? Yeah, the fact that the graph is changing all the time is really hard. And it's part of why this is like one, it's a wicked problem. It's part of why we use dependence as a kind of proxy for the impact. I think in some, in terms of my thinking about it has been to kind of ignore that the graph is changing, because the things that are really critical, the number of people depending on it will be high enough that hopefully those things elevate. That doesn't work so well where those relationships are not explicit. But at least that's what I'm thinking at the moment. In terms of the future and how to make that better, yeah, I think being able to measure the dependent relationships will be really important. But I don't think we're ever gonna get away from having to live with the graph changing all the time. I think you make a great point as well. It definitely seems like lots of times when we're trying to answer these questions that there are common denominators or projects that typically do come up time and time again, which I think at that point, it's safe to say that this is probably a pretty critical project. I mean, if it comes up in all the, no matter how you think about the question, it comes up that you keep getting those projects. And that could be a great place to start. Start with those that, if you were to sample 100 people, 99 of them would say, oh yeah, this is important. We should be focusing on this. But again, that's really hard when you have, again, different needs, different uses for open source. And there really is no absolute. There's not a lot of absolutes in open source. Or I guess in life in general, but especially in open source because even, I mean, we spent a good amount of time just talking about what critical meant. And so... I don't think we've agreed on a definition even. I think we have, yeah, no, but no, we certainly have. And it just goes to show that I think it's, obviously it's very important to think about the problem. And that's something we're doing a lot. And it's nice to see that happening on such a broader scale. But at the same time, action and experience teaches us a lot of things, right? You can read about how to ride a bike for years and now get on one and not know what you're doing. So I think there needs to be a little bit of both in terms of doing the research, but also a little bit of experimentation or trying things to actually move the needle here. And I think lots of times those types of experiences produce those kind of results where like, hey, we audited project A and saw that project B seems very important. So that could be something to look into. Or so yeah, so I'm not sure where I was going with that, but thank you so much. It's like criticality is, like one person's critical project is someone might not care at all. And so there's a, I think part of the discussion around what is critical is that everybody has kind of a slightly different opinion about it. Like, is something that's been receiving investment in the security space actually critical? Or is this thing that's got a corporate backer, is that a critical project? So there's lots of questions around. Lots of questions, yeah. And I just advocate for when we're thinking about this problem, a lot of nuance, right? Because there's, in capturing like dependency graphs and metrics and qualitative data as well, like we're gonna grab a lot of nuance that shouldn't necessarily be lost. And so the way that I think about criticality is by allowing a lot of freedom for people to decide what factors signal criticality to them. And leaving an open framework is very difficult. But if we go back to one of the challenges, which is the fact that we've got a lot of companies that or governments or utilities or relying on open source software, can we give them a way to discover their most critical projects without not necessarily opening up their private dataset? Should we take maybe like one or two more questions? I'm not sure how long we have until. I think we're like. Five to 10, okay. Okay. All right, so we could take a couple of questions and then maybe we can adjourn just a little early let folks get to lunch, so. This side has been remarkably quiet. So, and I'm not sure if that's just because I can't turn that way, but if there are any questions from that side of the audience or not limited to. Is the hand at the back? Hey, I just really wanted to return to this idea of like criticality in terms of ecosystems, right? Because I think something that we missed in this discussion so far is how different that looks when you slice it by language, right? That's really the differentiating factor here, right? Is it a chaos model? Or is it a top down? We've decided that we're incubating these projects as an ecosystem, right? Those are two very different approaches to deciding what we prioritize for security and what we grow with as a global community. So, it's something that I'm reflecting on right now in my own role, because I look at this across different ecosystems. And I'm seeing it probably be the differentiator between what languages are still going to be used in 10 years and what aren't, right? Because if your idea of criticality is changing on a week to week basis, that's not what I would consider a stable ecosystem. So, I'd love to think a little bit more about what can we do at a foundational level and as a federated level to be engaging not just in a solo ecosystem, but across them to make those look more congruent. That's a fine question. One thing that has helped for us, and when I say us, I mean on the audit side with OSTF, is having an advisory council or people basically who are in the space to talk about these things, right? Because you do have a lot of edge cases, you have a lot of emerging technologies, projects that could very well become the defecto standard in five, 10 years and recognizing that as soon as possible and taking steps towards securing those projects is a great approach because you're able to be proactive and even help increase adoption of those projects by recognizing them as kind of maybe edge cases or emerging projects that could be new standards. Lots of times it just goes back to I think the curation piece. We want to get the opinions and thoughts of the folks who are doing this every day, whether directly or indirectly and being able to curate that into something actionable, I think would be extremely valuable. So my answer would be just having strong kind of curation processes and giving folks a voice to be able to say like, have you considered this or have you considered this project to look into further? I think ecosystem, like looking at ecosystems and trying to kind of invest in understanding each one is really valuable. I think we focus a lot on the ones that have easy data to get at. So, and ones that have high profile incidents but like for example, Python and Node, very easy to measure their dependence. They used a lot in web spaces and so the attacks are frequent. They may not have the highest impact though. It's harder in the IoT space or in medical devices and those things, a lot of that's embedded code and we don't have a heap of visibility in that space and those languages that they use there are ones that may not have as much memory safety or those sorts of things. So there's greater exposure there. So yes, I think that there is a challenge in being able to do that. Yeah, and over time we'll have to invest more in understanding how to get into those spaces to be able to provide good answers and understand the criticality of the dependencies in that space. I actually have a question for y'all. Who here has read the census too? I really highly recommend reading that report because it does touch on the different, some of the ecosystem factors that make this a hard problem and it won't call out any of the ecosystems here but we do see that the development patterns, the social patterns, communication patterns, they all impact differently for different types of languages. And so understanding that and kind of curving for that is a hard problem. It is a hard problem especially if you're trying to do cross ecosystem analysis. The way that it goes in my head is, I think in matrices. So I think in like, okay, well, we've got a Python project that does this. Here's this type of, this shape of project. And we can add on dimensions as we see fit, right? And being able to compare like to like or like ish to like ish is hard but I think worth it. Wonderful. So we could probably take one more question if there if someone has a burning question. Otherwise, we will be around most of the week. So if you'd like to talk about this, please let us know. Highly recommend joining the, or participating in the securing critical projects working group to talk about this. And I just want to thank everybody. I have one question. All right, when is the next meeting of the secure and critical projects working group? Excellent question. Yes. And there is a slack as well. Join the slack. Yes, good point. I think the last working group meeting was the Friday just been too. Is that correct? It was just last Thursday, yeah. Sorry, sorry, different time. Oh, it's Friday, right. So we do currently meet every other week at 11 a.m. central time. And we are working on potentially doing maybe in every other where one session, I've seen other working groups do this too, be more APAC friendly for different time zones. But also a great point. We have the slack as well for folks who might not be able to join the meetings. So participation via slack is more than welcome. We're also, I think there's a GitHub repo as well. Yes, good point. I've been committing to it. And if anyone wants to talk to me about that project, please, please do. Commit yourself. Yes. Awesome. I think that is a great way to end our panel. Seriously, thank you all so much for having us. Thank you. Thank you.