 Good morning, everyone. Thank you for coming to the session. So today I'll be talking about a project that we are working with OpenSSF, and specifically the Alpha Omega project inside OpenSSF, and I'll be talking about what we are doing there, what are the outcomes of that particular project, and specifically the insights that's going to come out of that project, which will guide us about how the security practitioners can serve open-source maintainers without annoying them. So a little bit of background about myself. I'm Anur Hafiz. I'm the CEO of a Bay Area-based static analysis tool company called Open Refactory. I have been working on automatic bug detection and fixing for the past 17 years. So my PhD work at University of Illinois at Armona-Champaign, it was considered one of the pioneering works on automatic bug fixing. It was asking the question, why stop at detecting bugs only? Why can't we have tools that not only can detect bugs, but also can fix them automatically? I've worked in the academia. I worked in the industry at the top bug detection company. So right now, we are working with Open Refactory, which is our startup. The origin of that work came from my academic research and everything, and we're continuing that. So this work is done by the Open Refactory security team, and it has been, as I mentioned, it has been supported by an Alpha word from Alpha Omega project, and it's associated with the OpenSSF. So thanks a lot for them for supporting the work. So obviously, I don't need to show this slide, but it's just a preamble. Everybody knows why we are here. Under the hood of all the software that we use, right, there's all open source. So some recent statistics on that, so 70 to 90 percent of code in web and cloud applications come from open source software. Ninety-one percent of commercial applications contain outdated and abandoned open source components. That's the problem. That's what the Supply Chain Security Conference is about. So there are two approaches that we'll talk about, and we will be guiding you on specifically what we are doing and how is it overall connected with the Supply Chain Security Conference theme. But everybody knows about the security failure, and there's a lot of failure out there. It's increasing, it's costly. So let's go to the meat of it. The work that's going on about securing open source software, you can look at it from two perspective. One is to fix the software itself. Look at the root cause. It's the software that is vulnerable. How can we fix the bugs in the software, work with the maintainers so that those vulnerabilities or bugs in the software are not there in the first place? Then there's the second approach, which is you are as the downstream consumers of the projects. You're consuming a lot of these open source applications, and how can you keep updated about the Supply Chain and make sure that you're always using the secure component and so on. So specifically this work, this is focusing on approach one, which is finding the root cause or fighting the root cause, fix the vulnerabilities and bugs in the software itself. But there's so many software out there, how do you do that? So what we are doing is we have an ambitious project, we are trying to fix bugs at scale. So this is basically the mission statement for us. So Open Refactory, we analyze source code of the top 10,000 Java, Python, and Go projects, and we use our Intelligent Code Repair or ICR tool, which is our proprietary tool, and then we also use other static analysis tools that are available, and we report many different kinds of bugs to the maintainers and work with them to fix the bugs. That's essentially what we're doing, pretty simple mission statement. So there are two approaches. When we're talking about many softwares and how do we work with the maintainers, there are again two different paths that you can take. One is you can do a depth-first approach. So you detect and fix a single vulnerability across all the projects that can harbor that particular vulnerability, and there are values because we have hard talks yesterday and today earlier as well, that there are many open-source vulnerabilities that are identified or software vulnerabilities in general. They are identified fixed but not adopted by all the projects because there's folks and so it's just hard to keep up on all of them. So the depth-first approach, it focuses on detecting and fixing that single vulnerability across all projects to get rid of that particular vulnerability once and for all. The second approach is the more comprehensive approach, which is look at different kinds of bugs that can happen in these different projects and how do you fix that? So this particular work again, now we are following approach two, which is looking into different kinds of bugs, different kinds of projects and how do we fix those bugs in the software. But let's talk about the depth-first approach a little bit. So there are two prior works that have grown into prominence and then people have tried that. So one was done by my friend, my good friend Jonathan Leitchu. So he did some work on doing these campaigns on, and that's kind of the, so he kind of invented that idea where you look into some kind of bugs and then look into all the projects that have that bug. So look into, do a GitHub search and find all of the things out there, then create a pull request across the board to all of those projects and see if you can fix that problem. So theoretically that looks very good. Recently in 2022, researchers from Trellix, it's a company they actually reported over 61,000 GitHub patches in one go to fix a vulnerability that has been there for 15 years. It's a Python tar file vulnerability of using a method in an improper manner, but it has been fixed, but it has been out there at least in 61,000 different projects. So they generated pull request in order to attempt to fix all of them. So that all sounds good, but then bugs are nuanced. So that's definitely one way of targeting that. But if you want to generate a single fix for all the bugs, you will run into two problems. One is that you will limit yourself to shallow bugs only, because the more difficult a bug gets, it's hard to produce a fix that will apply to all of these projects. So this is Jonathan Leitch's Zip Sleep Bug Fix Campaign, for example, and if you see here, there's 101 open pull requests for something that was submitted two years ago. There's about 91 that has been closed, but it also contains the project data. So it's not like there's all 91 of them. But so the point is there's still like you have generated a pull request. How many of those pull requests are accepted? How can you do that? One of the many different problems that occurred in this one and also the Trellix approach was that you created fixes that would break the product itself, and that would annoy the maintainers, a whole lot and that's just creating a lot of chaos. So that's the second point. So if you don't limit yourself to shallow bugs, try to do something aggressive, then you overreach and you make a lot of mistakes in detecting the bugs, in identifying whether it's a true positive, in creating a fix that is also correct. That's a lot of work and unfortunately doing that in a totally automated manner is not there yet. So there is a term that has been in the parlance for some time. It's called drive by pull requests. So as security practitioners, drive by pull request means that you are not a maintainer of an open-source project. You are a security practitioner who somehow have found a bug, and you just came to that project, reported that bug, you moved on. Torval Linus this morning was mentioning about 50 percent of Linux pull request in the Linux kernel where just by some contributor who just created that one fix and moved on. So it has been happening for ages, but the drive by pull requests are specifically annoying for the maintainers because they're already under a lot of pressure of doing a lot of stuff, and now there's the security practitioners who have come up with this pull request, and some of them may work, some of them may don't. So it creates a lot of annoyance in the community of the maintainers. So for example, one of the problems is, so these are also coming from that the first campaign of that zip slip campaign that I was mentioning that this is only in the test code, so it doesn't matter, like don't bother me with this and so on. Then there's something about that this is probably coming from a bot, and I don't like to get things from a bot. So if you act like a bot, if you sound like a bot, don't come to me. That's a common concern that's provided. Actually, the pull request was filed under a different name. If you go back, these were the generic name that was added, but then the maintainers themselves actually changed the pull request name to this. So you see there's a lot of annoyance that's happening. So what we have done, and so before the start of this particular project, we also did our own proof of concept. And we looked into a small number of projects, about 50-something projects, we submitted bugs. We were doing it in a semi-automated manner, where we were using the tools, then we manually triaged, and we used an automated way of submitting the vulnerabilities, but they were actually driven or triggered by somebody human being. And so we also got similar love, hate, however you look at it, that kind of response is from those. So from that, we did something called a fire-hole-style interview, basically asking the maintainers who were not happy with those things and go to them and ask them what would make you happy? What is the problem? We explained the context to them and so on. How can we work with them better so that they would be more appreciative and accepting our work more? So these are the four key things that came out from the conversations. So first of all, the shallow bugs, that's if it's a very simple bug that doesn't bother me, there's many other important stuff to do, don't do those kind of bugs. There's also suggested fixes that break code or have other bugs in them. So that's also something that's very problematic. There's also the reporter, because it's a drive by PR, at that point the reporter is just absent and therefore they don't want to follow the process, they want to follow the norms of the project. And if each project has their own norms, it's very nuanced. And so the reporter refuses and that's not good. Then and the final one is the reporter appears to be a bot. That's the single most triggering response that we have got. It was also shown in our proof of concept and also proof of concept study and also the previous campaigns that were there. So then we basically planned our approach. How do we do that? And so our approach is focusing on addressing the four issues that we identified. So what we are doing is we are triaging bug reports manually from the Intelligent Code Repair and other static analysis tools that only concentrate on high and medium severity bugs. So ignore all the low level severity bugs that are provided by the tools. People don't care about them anymore at all. So don't bother with them, only focus on those. Adapt a semi-automated approach. There's some automation that would help but that would be about creating the pull requests or in a programmatic manner. So it's not about doing anything robotic in the sense of like there's a bug reported we just automatically go and reported by the tool we just go and automatically report it to the maintainers. We don't do that. There are some human who's driving the bug submission process but the bug submission takes a lot of time. So there's some value in creating a triage portal that's gonna allow you to manage all the, so many bugs that you are receiving by these tools. And then the bug reporting is done following the project requirements. The team members, the security team members they work with the maintainers to follow the norms which is very important in like making less friction in this relationship. And then bugs are triage for correctness before being reported and sometimes proof of concept exploit code is created to demonstrate the impact of the bug and so on. And most importantly, bugs are reported from handles that belong to human beings so that it doesn't support, it doesn't look like a bug. And the context is explained as in like, okay we are doing this because of this overall project that we're doing, the context is very, very important so that it's clear to the maintainers that this is not just a drive by PR campaign. But the fundamental problem is still there, how to scale for so many projects that are out there. We mentioned that we are looking for top 10,000 Java, Python and Go projects, that's 30,000 projects in this particular work that we are doing. If you think that there's about, I don't know like a thousand bugs that are being reported by your tool and many of the static analysis tools out there, they report a ton of bugs, many of them are actually false positive and so on. If you do that, we are looking about 30 million bugs. Who's gonna treat us that? Or how can we scale to that particular level? So that's where our tool ICR, it comes of help. And so ICR or Intelligent Code Repair, we find bugs that other tools miss and we do that with dramatically low false positive. So here's some data compared to some other tools that are out there. So the first one is a project from Red Hat, it's called Epicurio Registry. This is a Java project, about 150,000 lines of code. So ICR found about 100 something bugs in that. There's about like 10 false positive, but then most of the bugs are true positive. If you whittle it down to only the high and medium severity bugs, you'll probably look at about 10 bugs or something like that. So it's manageable. Compared to that, if you run Sonar Cloud, it identifies about 1200. Even under high and high severity bugs, it will still categorize about like 300, 400 of them, but then many of them will be false positive when you look at that. So that's where the problem is. So that's where the precision of ICR comes. This is another project, these are bugs found in Django. This is a Python project, obviously. And so we are compared with ICR and Bandit. Here Bandit, the open source tool, it actually finds, it doesn't find enough in this particular case. So you have to cater for both of them. You have to find bugs that other tools miss and find critical bugs in the first place, but at the same time, you need to do it with very low false positive. So we have to focus on bugs that matter. So we only file if the bug is critical and if we can avert that, then there's a major win. So we look into obviously the more important bug categories, like the injection attacks, the high level, the more important security bugs are mostly the inject related about the injection attacks and the weak cryptography and sensitive data leakage, that kind of, those kind of areas. We also look into reliability issues like null data reference and concurrency issues in code and so on and focus on those. So for example, here's a cross-site scripting that was averted in one of the Red Hat projects. In this particular case, we reported the bug privately through emails because that's what the norm was in that particular case. And so here, there was a response that was coming from a user input and it was just getting to a reflected cross-site scripting attack, so you just made the response different. We didn't create that fix, we just worked with the maintainers, we reported to them privately and then the maintainers came up with the fix themselves. Here's another bug that we reported in the, and it's still actually open, this is a data race in the Kubernetes code. And so again, we did not create a fix in this particular case, this is a very complex bug, but we have identified the bug anyway, reported it, it's right now still open and is being fixed at this particular point. Here's another one, this is an improper method call. This has already been merged and so this is about using make-tem which is an unsafe function and use a safe or alternative for that. So obviously in that particular case, we created a pull request because the fix was pretty easy, it was also created by ICR in this particular case and so we reported that it got more pretty fast response in this particular case, but nevertheless, a particular security issue that was averted. So one of the things that was coming up in this case was following the process. Different projects come up with, or they have their own way of reporting vulnerabilities. And so how do we also report these vulnerabilities in a responsible manner? So this is a draft proposal and obviously this is not consumed. So I have a, like in the final slide, which you have access to, there's a reference, so you'll get basically the Git repo where, which contains this and so on, but obviously I did not mean for this to be read. And so this is a draft proposal that looks into different ways of how a vulnerability can be reported and which one to, or which part to take in under what circumstances. And so there are some of them enumerated, the different ways. So there's the GitHub private vulnerability reporting, which is something that was recently introduced by GitHub and now being heavily promoted. You can also do pull requests, but then that's disclosing openly. So that's not necessarily a good thing. There's, you can also do issue creation. You can do GitHub advisory. You can email to maintain some, some projects have just protocol to do that. Then some others use separate portals to manage the bugs and so on. But the point is you work with the maintainers. That's the key thing in order to do this stuff. So here's our semi-automated triage portal that we maintain. So the part that we automated was, for example, the boring part of creating a pull request. So you can just like change the commit message. You can just change the body. You just submit all the like minor steps that you need to provide in order to do the pull request. Those have been automated. So that expedites our work a little bit. But specifically we just follow a subset of the protocol, the parts that can be easily automated and our triage portal right now automates those aspects for sending the bugs. Again, I don't want to read that. So the communication is the key. You need to convince. You can not only report bugs, but you have to convince the people in order to fix the bugs. So for example, here's a communication chain. So we identified improper method call. So there's a not implemented error in Python which was done, which is not the right way of doing things. So we reported that and we got the regular response. Is it an AI generated bug or not? So then we explained our context to them again, even though it was already explained there, but we created another explanation. Then they were happy about it and they did fine. So we worked with them. The original fix that we supported or we created, they say that there was some changes that needed to was needed to be made in that because they were following some other norms in their particular projects. So we did that work with them and eventually it got marked. So that's the process. Each bug has its own story and you need to follow that process in order to do that. It's a tedious manual and semi-automated work, but then you are doing a lot of good by averting these things in the first place. Sometimes you do explicit demonstration. So here's a very simple bug. I did not have an explanation of the critical exploits. So there are many bugs that we are reporting through private channels, the cross-site scriptings and the others and the more important ones. So I have a short POC of a very simple problem here. Here we reported a bug where there's a Python import statement, but that import statement, instead of using, so an export statement, but instead of using a string there, it was like exporting that library by providing the function itself, which would create a crash. So it's very easy to demonstrate. So we just created a small, this is not an exploit code by any means, it's just like a three-line demonstration of what's gonna happen, but it just makes the problem vivid to the maintainer and they just created accepted or fixed and moved on. So it's pretty straightforward in those cases. There are many more critical examples that I could show, but I'm not showing that because of privacy reasons. These are, like we have created exploits for deserialization, cross-site scripting, cross-site request forgery, log injection, et cetera, and have shared with the maintainers and they have then worked with them to get those bugs fixed. So what's the current status of work? So the results are of all the scans that we are doing, they are publicly available. So we are using a Google Sheet. So here's the link of that particular Google Sheet that's there, but here's basically a general idea of the report. So what are the key metrics that we are following? So the number of project scams that we are doing, how many projects that we have scanned do not have a bug? How many bugs that we in general report? What are the security and reliability bugs which are of more prominence that we are reporting? There's also, in how many cases where we able to automatically create a fix, how many of the bugs were actually accepted and how many of the bugs were the security and reliability bugs that were accepted. So these are the key metrics that we are following in this particular project. So in the last four months of work that we have just started, we have looked into over a thousand repositories, about 900 of them we didn't find anything. So at least they are scrubbed and they're clean. That's also a very important message. If you are consuming those data, you can just come to this portal. That's basically the use case that we're looking at that you'll just come to this portal, look at this data and not a portal. Let's not over complicated. It's just a Google Sheet, it's just a Google Sheet. You just come here and see, has it been scanned? Has it been cleared? Okay, I can go ahead and use that. So we identified and reported about 168 bugs in the past four months. About 80 of them are security bugs. About 140 out of those 168, we were also synthesizing a fix. So that expedited the process. The fixes were generated by our tool. In 22 cases where we were creating exploits that we mentioned to make the communication better and therefore that helped in adopting about 45% of the bugs that we reported were merged. About 30% of the security bugs that we have reported are merged. Many others, the security bugs typically take a longer time to fix. So it's just a rolling thing that we are doing. It's not that they have been rejected. There's only nine bugs that we have reported about 5% that have not been accepted. That's fine, the attacks are, so we reported a log injection but the attack surface is not there. We don't, or some, like in another case, like something that was done on code that is, and that part of the code is not active anymore. These are stuff that, like we cannot understand. Like from somebody who's outside the project. So that's where we communicate with them and figure out what's the best course of action to take and we just move on. So that's that. So what's the key takeaways of this? So the open source maintainers, they need a lot of help from security practitioners but the security practitioners should communicate with the maintainers in a meaningful way. And doing it and not following up with them, the communication is the key. That's the most important thing that we have found out. And Open Refactories work that we have done, we have demonstrated a model of engagement that shows promise. I wouldn't say that this is the way that everybody should take but at least it's promising 50% of the bugs that we are reporting gets accepted. So that's pretty good. And again, the results are available here. So I again thank our sponsors, Alpha Omega and Open SSF for supporting this work and I'll be happy to take questions from you. Thank you. Yes, Kate. So right now we are not looking into C, we started with Java, Python and Go. We are obviously want to increase it to specifically CC++ and JavaScript so that we covered the most important language of the top five languages but that's not the scope of the project yet. Definitely that's something where we're going. Yes. So that's a good question. So the question was and I should have repeated the previous question, sorry about that. The previous question was about whether we are supporting C and I answered that. This is about false positive in the security tools. Existing state of static analysis tools are frankly not great. These tools, they're known to operate at 70 to 90% false positive rate. So every 10 bugs that are identified by the tools, nine of them are actually false positive. So there's a lot to be done in the static analysis tool landscape in the first place. And so I'll be happy to talk about that because that's been something that I've been working on very passionate about but it's just that the existing tools they, so for example, many of the bugs they will find in test code and it's very easy to filter test code and not report bugs there. But if you find a hundred bugs out of those 1200 that was identified by some tool, it's just a waste of time. And how easy is it for a tool to filter those out except the dome? That's, it's just how the state of the art is at this point. Yes, yes. So the question is, and correct me if I'm not phrasing it correctly, but what can we do so that projects can also, maintainers can also buy in to these kind of reporting and they consume this. That's definitely the vision where we want to be. We're not there yet, but maybe in the next one year or two years when you get, the whole point is about the trust issue. How can we create trust so that the pull requests that are being generated is just not done by a company PR. It's not just done by a drive by and not somebody who cares. So that empathy that needs to be, in the end security is a human problem and this is just working on that. Obviously as the tools become more mature and some trust is instilled on the effort, we expect to have a better future where people can actually automatically subscribe to this stuff. But we are not there yet at this point. Yes, Adam. So the question was the, how do we find the scope of the projects and where we filtering based on other results on supply chain like dependable already. So right now, we are using the, there's a list that's created by OpenSSF of the top 10,000 GitHub projects on all the different languages. Our good friend, Caleb Brown from Google is working on that. But that is not dynamic yet. So we are working on a snapshot of time in time. So 10,000 projects, top 10,000 that were identified at a certain point. We just use that as the starting point of doing the scan. In each of the bugs that we identify, we do a quick scan. Obviously it could be improved of whether something similar, like when we're doing a cross-site scripting reporting, we just look for the cross-site scripting in open pull requests, et cetera, to see whether there's a overlap or something. But in many cases, we are actually finding new bugs, which is also very interesting. So not found by any other tools. So that's, so right now we don't, the short answer is we don't use, depend on what or any tools or data yet, but that could be something that could be used to improve the bug reporting. Yes, so it's more a comment, but let me just rephrase that comment. I think the point that is coming out is, can we use the depth first approach that I was mentioning for some of the bugs and use bulk PRs in that particular case to fix them? Yes, we can. Again, I mentioned that my Jonathan Leitch, you who did the initial work, is a very good friend of mine. And so we have been working hand in hand on this particular issues. But so definitely that can be done. The other point that I was receiving was about, sorry, there was something there that escaped. But yeah, we can definitely have a conversation after this regarding this. Thank you. Okay, thanks a lot for attending this talk. Thank you. Thank you.