 like that, what's the coverage of these, what are some things that we should add to that list. And finally, the last one, as you can see, is on data. And really, I think there are two parts of this. One of it is, with vulnerability data, what kind of analysis or what kind of information can we mine out of it? But also, the other question is, what is that missing data that will be critical for us to be able to do more analysis on that, and like what are the fields that. Discussion point, we will share this with the entire group and you know, we will follow up. Just want to get a sense who's interested in what topics. So, let's start with process. Who's interested in talking about process of lifecycle, group of people that are talking about different things. Usually, how this works is we split up into two groups and talk about it. Shripak, can you help with the facilitation? I think we're going to really just open. Yeah, we have, when we get the, so how do we define, what do we define vulnerability data to be? Anyone at CVE, is that the vulnerability data, everything that comes with it, the CVS score, the description, what package is impacted, what version is impacted, and how do you fix it? CW is there, so in thought, like how do you want to see that? What is the vulnerability data? Let's start with that. Generically, what we think vulnerability data is? Yeah. Just because it kind of aligned with the presentation I gave earlier, of vulnerability data at its core, we have a lot of derived data points that everybody cares about, and they're all very important and can be used for all sorts of analysis, but what has happened, at least in the recent, like five, 10 years, is there's been a lot of effort to start providing all of these derived data points and very little effort on actually providing any foundational data about how those derived data points came to be. There's no, there's very little ability from downstream consumers to be able to validate that any of the derived data points are actually accurate. Everything eventually is starting to be put into kind of these black box, well, these black box efforts where you don't have to worry about how we got the data. You don't have to worry about how we got there. It's fine, everything's great. It's definitely not wrong unless you prove us wrong, but here's this other stuff that you should really use and that's useful in the short term, but in the long term, it makes it very difficult to be able to do any larger scale analytics on vulnerability information. Can you? This might be it. Waters, so I agree that it, you know, former academics so would love to roll around in data all day, but the, what are some of the sources of the bad data? Is it just because P-Certs are doing their job and don't want to fill out all the forms in the dropdowns or is it that they're actually systemic biases that are coming in from the data generation? So why don't we have all of the fields that you've asked for? So, well, I can't answer all of that. What I would say is, exactly, right? No, so what I would say is the point I'm making is if the people who are producing the data say my downstream users care about actions that they need to take in the next 36 hours, so you fancy pants people in the US government that care about data, one of them, you're just gonna have to either write your own damn data or pay someone to do it because we've got jobs. And so I guess what I'm trying to figure out is there a bias in how we're collecting the data that fundamentally undermines our ability to do to have community level gleamings or is it just we haven't built the right tools so that it's easy for people who are doing their jobs to put the data in? Right, so what I would say is there's obviously going to be biases naturally from every organization that's providing information. Everybody's driven to provide data for different reasons, usually money, but there's lots of other reasons, reputation, et cetera, to do it. So there's definitely a driving force to kind of just skip the foundational data and just jump straight to the results. So I wouldn't say that's wrong from like an immediacy perspective, but as far as being able to make sure that they aren't missing important data points or things like that, I would say that that's because there's a relative void in an expected approach that's necessary to get even what's universally considered some sort of baseline. If that wasn't the case, we would have things look more similar than they do today as far as vulnerability reports go. And right now it's kind of all over the place. Okay, so we have, Yana finished her talk, do you have something to say? I don't know, we can get to it, okay. So Emily has a question. She asks, what are the fields we want that allow us to better analyze the information presented? Could you actually, we paid that again? So what are the fields that would allow us to better analyze information presented? What kinds of things do we want out of the data? Right, so there's obviously the minimum that we require in terms of figuring out exactly what is affected and what versions are being affected and this needs to be done in a consistent scale of machine readable way, right? But I think there's actually because of the nature of open source, there's a missed opportunity here. If we can kind of tie vulnerability management closer to open source development workloads, say for example, if every vulnerability had to commit that fixed devote or even better, if we could kind of have some sort of automation to perform automated bisections to get the range from when we first introduced the vulnerability to when we fixed the vulnerability. I think there's a lot of metadata here that we can automate on. So one possible example is if we just naively match vulnerabilities to something in the S-bomb, we may get a lot of false positives, right? This is the whole point of vex. But I think that if we have, say, the patch, given the open nature of open source, we could potentially kind of automate the process of figuring out, given the vulnerability, what code part needs to be hit to actually to hit this vulnerability and that kind of enables us to have some automation around automatically excluding a lot of these vulnerability scanning, giving us false positives, right? So I think there is a lot of things we can do because we have open source carry that we can rely on and that we can do more automation on the help give better results for open source. Yeah, I just have one thought, right? So in most cases, the CV fix is not part of one commit, there can be multiple commits that goes into fixing some vulnerability, say, iteration. So do you, yeah? Something I wanna add on that, many reporters have no incentive to give us good data. That needs to change somehow. I don't know how to change that. So a thought on what kind of matters when it comes to the data, am I affected? How bad is it? What can I do to fix it? That kind of what comes to mind. So making it easy to identify what components are affected, if for whatever reason there's an exception to whether or not you're affected. What kind of vulnerability is it? Having that be easy to understand. I spoke with Kate in the GitHub security lab about this, or the GitHub security advisory database about this, like adding financial incentives around good data, like paying people bounties for writing up good reports and stuff like that for like, thank you, you did a nice job. You included like, all of this information. Here's a little bit of an incentive to do this again in the future. Yeah, the other thing that I've run into around issues is people don't like the answer of this CVE exists and there is no patched version. Like the industry kind of like comes up with fake versions to say this is patched in a certain thing when it's actually not. And I was like, I had an example of a vulnerability in Google Quava, where I said, you know, they deprecated this method in a certain version. And everybody said, oh, that meant that it was fixed in this version. And I'm like, no, they just deprecated the method. They didn't actually remove it or fix the vulnerability. Yeah. A quick add on to like the incentive part too. I wonder if some part of that is providing incentives but also kind of over to that is just making the process easier so that there's less disincentive to go through the process. If it's super easy, I just put in some information. Just a dropdown done from a maintainer perspective, that's pretty doable. But if it's like, I don't know what the heck any of this means, it's a long process. That's when you start getting really junk data. Is it just throwing whatever? Sorry, briefly here, a possibly fundamental issue I've come across is, sure, you sit down and pick your field and create a format and create tasks for the things you want. And I bet you we've all come up with pretty similar things. But, and you know, CVE is guilty of this, right? So there's a description field. Well, how do you, what is the machine-readable version of, I don't know, local privilege escalation? Encoding this stuff in machine-readable, like that's a whole language problem that is completely not, completely not false. Have you guys gotten anywhere on that? All right, I didn't hear, I couldn't come to the talk, but so there's a real issue with like, we can write pros that humans can kind of act as human experts and sort of understand. And there's a whole bunch, it's a complicated thing to encode, I guess, in the first place, fundamentally difficult. Otherwise, I think we would have made some progress by now on that, no offense to anyone's efforts to do that. Yeah, I think we went into the different topic. We are finding the gaps, right, or the gaps. But let's say we are all the data, that's the one, right? What are the analysis? What are the new things we can do with this stuff that we are not doing today? We have CVS of course, that gives us a ranking and allows us to do that. What is, let's say, what all the information that you are talking about, we have that, we've seen the gaps, what new analysis we can do on that. Well, I think when it comes to what we can do new to help address this, I think Mark kind of touched on it, there was making it machine readable data rather than having it be like, pros that an expert can go through and understand. This has to be automated, machines have to be able to understand it. And I think that's probably the next step. So from my experience at least, it seems like there is a general lack of knowledge about this kind of information for the people that want to give it. Maintainers have, I'd say at least in my community, more support than maybe researchers or reporters do. A lot of them are very well-intentioned and they just can't find the way to do it. They don't know who to contact, they don't know what information to give or how to give it in a useful manner. So I think if we more over-artingly give more support to these groups, still more support to maintainers, but definitely more support to security researchers that are doing and finding this information, making it easier for them to share it and give it in a way that is usable, actionable would be really, really helpful. Something that we're passionate about that we're working on, we're working on that in OpenSSF as well, trying to give just more support for researchers from the start earlier in the process. So I think one of the things that I was thinking about is kind of looking at the back stuff and put up like a stack graph or something like that to be able to detect which are the vulnerable functions am I calling them? Do we think that this information is useful and I guess is it encodeable at all? So the question is can we encode the call graph and that kind of stuff into the IDs? In a way that we can do that. All of that is the right way to do it. So I think that data is encodeable. The security I don't know. If you're saying you're affected by this function, what is your action there? Do you still just update the package or is there something else? As far as I'm concerned, that's an open question. I think things are usually affected. I guess where that could come in is am I affected as opposed to yeah, right? So if you use this function and that's the vulnerable function. Sorry, and the function there is the data would have to be very pertinent. But getting that pertinent is somewhat hard. Yeah, so I think getting this call graph information is definitely very challenging to do very perfectly. But I think if we can just get rid of say 50% of all false positives, that's a very good result already if we can kind of interpret it as an algorithm to do it here. Sounds like templates of expected content with me that that can be easily convertible. Convertible machine readable format perhaps doing this in the command line with the user with you. I don't know whether there was a question or there was no. Sorry, I'm a bit, I guess maybe there's a question. We're kind of talking about two is there's varying levels of vulnerability information? There's what I would consider the most basic amount of vulnerability information that you can share, the description, what the issue is, what you can do about it, where it's fixed, what's vulnerable. On top of that is a lot of other extraneous but very useful information like what function is affected and a lot of very useful data that is harder to get. I love that, I want to do that, I want that information, but we can't even get the basic information always right. I think we need to fix that first personally. And I say this to somebody who has a product that supports giving vulnerable function information, so please still give me that when you have it, but I would also personally greatly appreciate like better decisions. I'm like, let's get the basics right first. I personally have experience with, as an example, JetBrains, when they issue CVs for their vulnerabilities and their stuff, they are like one line and that is it. They don't include very much information and they have a link to a private issue tracker that they've never made public. So I end up having to do my own disclosures of their stuff as a researcher with their CDE. Annoying, but yeah, I don't know how to fix that or like encourage the upstream, like the maintainer that's asking for the CDE, they like give might or more information. If we reach the horror story part of the day, many, if not the majority of ICS vendors actually put their advisories behind their customer paywalls, which is very, very bad. Again, that's yeah. Yeah, I think we'll move to the next topic now. I think we should partly cover that, but yeah, gaps in the vulnerability identifiers, right? Specifically, oh, okay, we have our interview. Yeah, kind of us, I mean, we know the vulnerability identifiers, you can just say how we are doing it today. So gaps in identifying the vulnerabilities. So identifying the vulnerabilities, usually the researcher has to know, like either the researcher or the receiver has to know that a CDE needs to get assigned, right? That's a pretty normal way that disclosure occurs. That disclosure that occurs that actually causes positive impacts to the downstream users usually has a CDE number in sign and it's included. One of the issues that has occurred is like there are people that find vulnerabilities that don't know about the CDE system or don't know about the disclosure system. And so I've found vulnerabilities in JavaScript libraries, for example, that like some guy was just finding a bunch of vulnerabilities and issuing open issues and like nobody was getting a CDE and I'm like, that's a lot of remote code execution right there. Other examples are that I have run into a lot historically is I have run into cases where both Google and GitHub have had vulnerabilities disclosed to them in their bug bounty program, for example, that impacted open source software and people talked about it at Black Hat and Defcon and been like, oh yeah, like I made $10,000 out of this vulnerability disclosure and you go back and look at the actual vulnerabilities in the software that was used in the exploit chain and nobody, neither the maintainer nor the GitHub or Microsoft, or sorry, GitHub or Google actually took that vulnerability and turned it around and said, hey, maintainer, you had a vulnerability that was a reason that we had a $10,000 bug bounty payout. I'm like, what? So will you characterize it as this is a lack of indication from like who is reporting, they don't know the vulnerability disclosure process? Well, the reporter knows how to disclose to the company that's vulnerable. Like I've talked to their maintainers but like there's no follow up by like, I think it's like a passing the buck sort of thing. Like neither side is taking responsibility for actually getting the vulnerability fixed in like the upstream component that was part of the exploit chain. Yeah. I think we respond to this, but I think you'll kind of talk a thing like the life cycle of the vulnerability, which is the other side. Well, no, it's like identifying it, like actually like identifying it in so far as it like actually gets assigned, like a CV so that actually disclosure actually occurs. Yeah. The other point that I want to make is somebody did a great talk, I don't remember who, but about like the idea of scraping GitHub commits and like finding vulnerabilities hidden in the rough. Like these, like this looks like a CVE, but it might not be, oh, hey, that actually is like, people are doing good work around like, I think Snick does that too, where they like, they have like a feed of CVEs or they like feel, oh, that's actually might look like a commit that fixes the vulnerability, maybe it should get a CVE number assigned. So more people doing that and finding those actual vulnerabilities in the fixed feed is useful. I thought about this from the respective of GitHub, where there could be like a banner that says create a pull request, instead of creating a pull request, say, hey, the commit that you just pushed looks like it might fix the vulnerability. Do you want to open a GSSA? Like that's like a really easy step towards like, encouraging maintainers to do that. Any other gaps you see in the vulnerability identified? So one thing I found is we don't have vulnerability when a software goes out of support or out of life. So you have that, do you need that? I have talked about this. I've brought this to the CVE board and they have said that they do not want to issue a CVE number for a piece of software purely on the basis that it is no longer maintained. Sure, okay. So no, yeah, sorry. I misunderstood something you just discussed earlier. So the fact that something is maintained is or is not a vulnerability or it doesn't get a CVE is one question. That's what you just said, right? So there's a CVE called this, that basically the CVE covers this vulnerability, this package is not maintained, but it's not something that the CVE board wants to issue a CVE number for. Right, so that sounds, so knowing your software is no longer maintained, very important, no one has to argue with that, right? Knowing you're out of security support, important security state to have in mind and, okay, everything get the CVE then because everything will be out of date. Eventually, yeah. I mean, here's the reason that I think CVE is the right thing is because it's a tool that works to get people to update, right? It's a flag that is a functional flag that already tells you this version is not safe in some way and people would see it and be like, okay, now we need to do some action on that, right? It would feed into the existing pipeline that all these other stuff, that all these other stuff already died. And the similar discussion in the malware or malicious software or injected software, it's a light abuse of the V in CVE, but if it's working and it gets people to pay attention, maybe the argument is in favor. I just wanted to say, and I misunderstood it first, you can issue a CVE, so you make software, I report to you, you're like, I don't support that software, that can still get a CVE and it gets flagged as no longer supported, but that still can get a CVE, got it? The difference is this thing is just not maintained anymore, don't use it. That in itself gets an ID, is the question you're asking. And the board was said, no, we will not do that. And for a number of, I think that is an important thing that every piece that I've talked to has sort of flagged at some point another attempt at. I don't think the solution is to shoehorn it into the vulnerability ecosystem for a number of reasons. One of them is the point you've already heard, which is the vulnerability management ecosystem is now like eight years old. Because vulnerability management is now so overloaded that you need to have a special, separate product to manage your vulnerabilities management tools. And there's a lot of stuff that are, right there, there's gonna be a lot of times where I'm going to make a advised decision to use it. What we need to do is help the maturing organizations realize what's out of date. Rather than, and sort of help them prioritize that and there's gonna be a lot of metadata on it. So I think this is one of the things where GSD, so the global security database potentially stepped in is creating a new type of identifier that's specifically for say the maintenance status of a package. So rather than trying to shoehorn it into the vulnerability space, have just a new space that's specifically the maintenance status. And if you want to consume it, you can. If you don't care, ignore it. Yeah, I mean the point is essentially to have this user right to notify there some way that you're using such a particular version, which is not supported. I mean, it's not the CD, but some way to notify it. Just one point out, some databases you already include this. The rest of that database, the number of advisories to get to the product. Something to integrate. Yeah, I mean, we need it across all the ecosystem not just one. Consistent way for people to look up and build automations around there. So this is the question of what is deprecating. If you have pieces of stable software that hasn't been in update in a year, is that deprecated? Or does it maintain or have to opt into deprecation? Or this is hard and so I don't have an answer. End of life and end of support are two very different things and your comment on the open source world is incredibly valid as well. Use case, I found a vulnerability in a piece of Java software that the maintainers weren't responding at all and eventually had to get snick at a CV for me and it's like, I mean they're not, the maintainers are not there to say this is end of life, they're not maintained anymore but it's very clearly that I can't get in touch with any of the maintainers. Part of our work through the open SSF is we can encourage better project and maintainer behaviors, illustrating it's the best practice to state what your project's life cycle is, how you will or will not respond based off of the age or state of the software. So we can do that through both the best practice of working groups through the education side and the new open source cert efforts to get this out in the mind of the maintainers to help part of the problem. It doesn't have an identifier, so in fact you're in your identifier problem but we can start to teach better behavior and try to encourage that in the specific community. Do you see any other gaps in we need to identify with something and we need to identify for some other fixed software? Do you think we need, just like we say end of support and end of life, we need to identify for that? Do you see any other gaps that we need to identify for certain characteristics of software? And you might never get that from a maintainer but we could hold our commercial suppliers accountable for having some type of end of life statement. I made a passing joke but it actually could work which is for projects, we have a project based dead man switch which is at a certain point, if there's no live ping, you move it to some sort of other status and then actually we've had that over the last year and it's DNS, when your domain expires, it's a pretty good sign that your project is not maintained. And one quick addition to that is maybe rather than explicitly saying this is when it is considered dead, this is when it's no longer maintained, you can see the data, this is the last time we saw a response from this project and then let the consumers of the data decide how long is too long, I don't know if that'd be. So I wanna roll back a little bit. One gap that I see a lot is in severity rankings. The CVSS system is not well suited for libraries which get embedded in many places and CVSS guidance says to assume a worst case scenario which is hard when a library can be in a child's toy or a pacemaker. I would love to see a new severity ranking system designed for libraries first. Yeah, anyone has thought like what could be the new ranking system or prioritization system? To be clear, I don't necessarily mean a new one but even a parallel one, but something probably from MITRE. Yeah, so. CVSS is maintained by first, there was a special interest group, SIG, the special interest group at first that maintains that, you do not have to be a member of first, which is to be part of the CVSS SIG and some folks here, Christopher at least, I think I am still our members of the SIG. Yes, and I actually think I would get in trouble if I didn't clarify this. The CVSS at least V3-1 guidance does not say to use the worst case scenario. It actually explicitly says to score based off the most reasonably achievable impact. So it's been a point of debate in the SIG itself too. Is it worth having explicitly an unknown? No? Unknowns make everything worse. Again, not to be a downer too much. Library, microprocessor, these root components, there may not be a way to say it. So it may be that the way I use libpng, right, it's critical and the way you use it is not. And it may, the severity, prioritization, risk analysis may have to occur closer to the end user or the end product and service. And the library is like, there's a thing in the library, but what does that mean to you? Go a couple hops downstream. I'm not sure there's a way to do it at the library level. But I could be wrong, of course. It's always good to flag when someone else is tackling the similar problem. One of the things that the S-bomb community is discovering, and in fact the S-bomb everywhere work stream from open SSF is trying to tackle this is there is a difference between thinking about dependency graphs of a repo and dependency graphs of a built. And so I'm not sure they're any closer to the solution, but as we sort of think through it, that might be a good thing to track, which is repo versus built. And going back just a little bit one step, one of the things that we were talking about is that it's very dependent on maybe how you're using the library. Something that does try to address this, and I don't think is really used at all, is the environmental metrics within CVSS, which is exactly for that. But the issue is that's not visible to anybody other than yourself. Most organizations or anybody that is using the environmental metrics does that internally. They're not necessarily sharing that. Also why would they in some cases? So I think there is some support for that there. It's just not either widely used or visible in any way. All right, we are coming close on time. So I think the last thing I wanna get folks to do is if you can go into the ball session, there's a bunch of hack MDs, go to the one on just any, I guess the one on the data analysis, since we talked about that the most, put in your name. And so that we can fill out the discussion and bring the right folks back in. Awesome. Yes. You should be able, if you just hit the middle button on the top left. Yeah, yeah, yeah. So they can do the mark down. Yeah. Yeah. Yeah, so I think we're close for now. I guess if you wanna stick around and chat. Awesome. Thanks so much everyone. Thanks for your time.