 Hello, and welcome to Better, Faster, Stronger. If you are joining for Open Source Summit Japan, you're here to learn about how the global acceleration of open source development is changing ecosystems for good. I just took a role last week as Director of Open Source AI DevSecOps at Esher Cloud, but most of what I'm presenting today will come from my developer relations work at Sonaty focusing on the software supply chain and the security that we can put around it. So today, we're gonna cover a couple of sections. I'm gonna go over very lightly what open source looks like by the numbers in 2022, how vulnerabilities are entering your supply chain with the top four vulnerabilities that we see, how versioning and OS herd behavior across the ecosystem can affect past practices. And then I'm really gonna go into detail about how we can use math to predict the least vulnerable open source packages, which I think is really, really fascinating with math. And then we're gonna talk about a couple of best tools that you can use for good and secure open source injection. And then a little bit more about community approaches to ecosystem hardening that we're taking over the next few years. Now, I care a lot about this because I've been an open source for over 10 years, but I've primarily worked on solving hard problems around pager duty, meaning I care a lot about chaos engineering, site reliability engineering and cybersecurity mostly, but I come from a space of working on really hard problems when human beings have to use technology to make decisions in real time. For example, my first job was working on Boeing cockpits to make sure the information was provided in real time to make sure they stayed in the sky. Now, I think that's a really nice metaphor to be bringing over to these pager duty activities as we're increasingly looking at monitoring and control. Now, I joined Sonotype because I think what they do is really fascinating. So instead of taking the traditional approach of having your cybersecurity teams located just internal to organizations really downstream, they have put an ecosystem of protection around the repositories that we pull our packages from. So they were the original maintainers of Maven Central where you'll pull most of the Java open source projects and have now put an immune system around PyPI, which is where you'll pull most of your open source Python packages. What this means is that we do security monitoring and vulnerability scanning at those repositories level where you remove the vulnerable packages before you can ever install them into your source code, saving hundreds of thousands of human hours of remediation. So if you're doing modern best practice for cloud engineering, you're gonna be pulling in open source packages and components, pulling them together into source code with some of your own internal logic. You'll be containerizing them and then you will be doing best practices with infrastructure as code in order to allow these to scale. Now a note here is that at every stage here you can have vulnerabilities. They can come in from your open source projects, they can be from your own source code and even containers themselves can be targets for vulnerability. So we have to pay attention to all levels. Now 90% of modern application components are open source. This does not mean that nine out of 10 organizations lean on open source. It actually means that about 90% of the lines of code that they're utilizing come from open source projects. Another way to think of this is that nine out of the 10 lines of code that you use in your running application were developed and are still maintained by developers who are not on your payroll. So you have to develop a very different way of thinking about understanding, monitoring, communicating and updating those lines of code to make sure that they are not vulnerable. And we've seen a 742% increase in the software supply chain attacks meaning that malicious actors are now going for the front of the supply chain rather than to organizations themselves because they can capture them just the same with injection and 96% of the attacks that we're seeing inside of this domain are avoidable. So as we get to about the middle of this talk I'm gonna tell you about how we're able to predict with math when and how to avoid open source packages that are likely to have vulnerabilities and the ones which are likely to give you a good and stable ingestion over time. Now, if you'd like to look into the full open source supply chain report there's a link at the end of this talk. It goes into some details across ecosystems. There's just a couple of quick notes here. We're still seeing an increase across almost all of the ecosystems but we've seen a slight decrease in the overall contribution from 2020 to 2022. Now, there's many factors for this maybe this is post pandemic development reality but this is something that I would want us to pay attention to over time is the ratio of contributions to an overall ecosystem correlated to the total contributions of vulnerabilities are they or are they anti-correlated for basically what this means is are they going after ecosystems that seem to be growing or those that seem to be stagnating because they're easier to inject into. So with these 742% increase in supply chain attacks what were the most common ones? Now, most of these are similar to what we saw in the previous years. Again, just demonstrate that this is logarithmically increasing and probably will still continue to. One of the most common ones is typosquadding. This is very simple. This is when you're trying to install something like matplotlib and you make a simple typo say maybe two P's or two L's. You are then installing what was not the original package which is still in itself secure. You're installing a completely different set of codes from the repository. In this case, often installing a crypto mining application for example, you must be very careful with these and this is an argument against manually writing your helm charts, et cetera. You really need to be careful with these. Now, dependency namespace confusion is also really interesting. So this is when they'll find and determine your inner source namespaces if they can and publish an open source package which is identical to it. Now, if you're using something like dependabot this is gonna be a very easy way for them to use automation to inject malicious code and put that right sitting on top of what your inner code should be. Namespace confusion is something that is challenging and it has some really serious effects at sort of the community level or the ecosystem level. We saw an example in February of 2021 of someone who was an ethical hacker going and demonstrating this namespace confusion but in their campaign for awareness they actually just created a level of awareness for perhaps the worst crowd and they encouraged over 300 malicious copycat attacks over those namespaces over the next coming months which were very, very challenging in the MPM system and other ecosystems. So it's something that, yes, we're developing an awareness of within the security world but malicious actors are also developing an awareness of this being a great vector by which to inject. Now, malicious code injections are also a concern across ecosystems. We're finding better ways to use automation and things like SigStore as you should have in your best practices make sure that you have a signature signing method to know who is contributing and who is maintaining and absolutely who's pushing the merge button. So malicious injection is a very simple practice of using effectively what is called the halo effect. Do you have a known maintainer or a known contributor who consistently gives good contributions so that you are less likely to check the lines of code individually so that they might provide you something that which looks like a great feature contribution but has just a couple of lines of code that are a backdoor which they will then be able to use later. This is something that, yes, put in some of your best practices but we also at sort of repository and storage level need to come up with better solutions to ensure that not just a couple of open source projects are doing best practices around SigStore, et cetera. We all need to be using these and reduce this kind of injection. This is a major, major problem just to talk about the scope and the surface area of the impact of these. There was the December 2020 malicious injection to SolarWinds, which ended up having an impact on major, major infrastructure and systems that we use in the real world. So this impacted NSA, our Department of Treasury in the US and NASA. These were all impacted by a malicious injection into a single open source project. Lastly, a new one this year is Protestware where we've really seen an increase in not malicious injection but the true real maintainers of a project intentionally putting malicious code into it so that it creates a DOS or problems for the end users. This is typically because they have not been paid to maintain the lines of code which they are maintaining or have an issue with continuing maintenance and are looking for others to take that on. This is something which we don't have a real solve to. This is something that we need to put maybe an economic measure into place for but you're probably going to see more of these happening in 22 and 23. Now I said 96% of these vulnerable downloads are avoidable. I'll tell you a little bit about what we discovered in 2021 and now these 2022 predictive measurements for choosing the right packages. So developer teams that are doing really well in security already have monitoring and measurement in place for their open source compliance risks. This is making sure that you are compliant to the OSSF scorecard, et cetera and that you're looking at what versions of your components you wanna be bringing in. We know that on average it's a little bit behind two releases back from the very cutting edge that is going to have the least vulnerable version. That makes a lot of common sense. It's been tested a little bit but it's also very much in that space where it's being actively maintained. Now those developer teams that do these best practices again do often maintain and those version updates along with the updating of those projects that are updating. This all makes sense. Just keep yourself close to the cutting edge across all of your dependencies and the open source projects dependencies. So what does this mean? It means that in the last five years we've really moved from having new security thinking and architectural thinking, not from the macro architectural level of do all of these components fit together but do all of these components fit together with versioning updating. Now if you do not pay attention to these versioning updates you can still run into issues. For example, it was having a discrepancy between version updates within something like log4j and log4shell and JNDI where you could conceivably think that you had a green scan that says you have no vulnerabilities, you have no known vulnerable versions but it was the different types of vulnerable versions working together at an architectural level to still have a vulnerable surface area. So you really do have to pay attention to these. Now just as I said, your developer teams can have best practices. You wanna be looking for these best practices on the development teams of the projects which you in debt just. It's actually ends up being very easy to do. So you wanna calculate the mean time to update. This very basically means how much time has passed between the updates of the dependencies that my open source packages pool in and the most recent update of that package itself that I ingest for total dependency hygiene. So there's a very, if there's a very small mean time to update, meaning there are updates to their dependencies that they updated and then you update, that is a best practice for modern day supply chain. And I think what's really beautiful and interesting about this is that there is a strong statistical correlation between mean time to update of the projects that you ingest and your organization's mean time to remediation which effectively means that most of the time that we're spending in remediation is cleaning up problems from the open source code that we ingested. Again, remembering that problem, that 90% of open source in your application means nine out of 10 devs aren't on your paid team. This is an excellent way to use numbers to reduce your time spent in remediation or to prevent a need for remediation which is more important. So before we got to the new stats which I can show you today, we're using a couple of different ways to define what a good project is. Now you can use source rank, you can use OSS criticality which is still very good and you can use the Sonotype MTTU all in combination to think about what projects are doing well. Mean time to update is also accelerating. This is really the premise of the title of this talk and why I need you to pay attention to this graph. We're seeing that not just in Maven Central but across every ecosystem, open source projects are updating more rapidly than ever before. This is in part because there's a larger contributor base in part because there's more vulnerabilities but it also means that you are going to have to move from more manual processes of choosing these components and versions than before because it's going to look like within the next year or two. Our mean time to update for the packages that we ingest is gonna be shorter than perhaps a two week sprint. Automation needs to be in place. So this year we've got some really interesting stats and I'm just so excited to show you this. So we did a survey of DevOps engineers. We looked at the download data directly from Maven Central. We looked at security scans from Nexus Lifecycle and we looked at the open source security best practices from open SSF and then we put these all into a predictive model and we got some really, really cool results. So when we're looking at the decision making by developer teams, we find that 64 to 65% are making imperfect or in unoptimal suboptimal decisions. So these are pretty good choices but they are going to lead to some level of refactor moving forward only about 31 to 33% of decisions are optimal meaning they chose the right package and the right version and they don't need to waste time in two weeks or six months in order to remediate it. Unfortunately, 1% are still choosing some risky decisions with that bleeding edge, dead-end decisions because there's no way to move forward in their architecture and 3% are choosing what are subjectively bad versions and we can't quite tell you why but I think that these are stats that really need to be moved and we need more developer teams working in that monitoring and measurement space in order to make those decisions good. What's also really fascinating is that there's a massive difference between the perception of security readiness between the developers that do it and the managers that oversee it. So 68% of developers are pretty confident that they're not using any vulnerable versions and they state that they're pretty high on their remediation maturity as self reported but when you really break into this I think the statistic is valuable for all of us to take home and reflect on. Managers are 3.5 times more likely on average than their own developer teams to say that their remediation is fast. That's a conversation that we should all be having internally because that should not be the case. We have to understand. I wanna show you this which is the log4j log4 shell downloads and it's showing us that even today we're a year past the original announcement of this zero day and there's still about 30% vulnerable downloads. This could be people not paying attention but this could also be developer teams that have put in some good practices and some good scanning but just have these sitting in their jars and their Uber jars that they cannot scan. So this is just a note to say there's still a bit of a manual edge to being able to make sure that even where we put a lot of awareness and automation we're still getting faulty downloads. We're still not faulty. They're intentional downloads of vulnerabilities. And at this juncture, it's the end users that need to take on the responsibility of making sure they have the right versions which are available and have been available for a year. Now we see this. It's really critical to understand that security is not functioning perhaps with the level of maturity that I'd like us to have globally when we try to announce these zero day events. So when there's a zero day event for a popular vulnerability versus what we saw for uptake for spring shell or log for shell we're really seeing what is what I call a security by Twitter feed effect still. It's really done by vectorizing this information by where our common platforms of communication are and we need to ensure that more open source teams and more developer teams are going to the watering holes for security information making sure that they have vulnerability scanning in their systems that they're getting these updates and they're aware of these zero days so that they're not the laggers on this. So why is this really hard? Even if I give you automation so a commonly imported open source package is gonna have 5.7 dependencies on average meaning it ingest 5.7 dependencies. Now the average Java project has 150 dependencies maybe an average of 10 releases on year that's over 1,500 dependencies to consider and that's a lot if you're trying to do feature engineering on top and not just security hardening. So there's better ways to predict the good projects to bring in so that you can be focusing on new development and not remediation. Now there are some pretty predictive metrics for this. You wanna look at and we looked at the OpenSSF criticality score, security scorecard, source rank, meantime to update and the general popularity of the projects. Now there's an inverse relationship between popularity and vulnerabilities that just means you'll tend to have more vulnerabilities surfaced on a popular package because there's more eyes on the code not because it's particularly more vulnerable. So what we found most predictive of an open source package having a vulnerability were the following four things and they do make sense. Number one, if they do not have a code review process in place, number two, if they do not have a binary in place with which to scan for vulnerabilities. Number three, if they pin their vulnerability, their dependencies, that's going to make it very hard for you to do automated updating when you need to. And number four, not having branch protection. These are known best practices, but in combination they really serve to predict if an open source package is secure enough for you to use. Now we've put all of these together with this MTTU and we've developed the Sonatype safety rating. So if you are looking now on the open source index you're gonna be able to look at that safety rating and know that that package, that set of maintainers maintaining that code are doing their job to make yours a little bit easier when you try to keep security in your own source code space. And it's not perfect, but like it's very good. I can tell you that this was using this set of combination factors gives us 86% accuracy on knowing whether or not in the next quarter or so you're going to see a vulnerability in an open source package. None of the other scores, not the criticality score, not source rank, none of them had a correlated statistical relationship to actually having vulnerabilities in place. That's really important to understand. Now the best practices we had in place to date actually focused on vulnerability reduction. And now this Sonatype score does. So if you are a developer we gotta pay attention to these dependency decisions. Most of your vulnerabilities arise in these transitive dependencies. That's what I've been speaking to when I've said that a open source project ingests packages, those are transitive dependencies that come through to your code. So favor ones with small dependency trees those are easier to track, easier to scan. Look for projects with that quick meantime to update and just minimize the total number of dependencies. I'm not saying that we all go back to monoliths but I am telling you that right when we're thinking about transitive dependencies and accelerating mean time to update if you have a massive architecture you're not going to be able to know or understand very well which of those is giving you the vulnerabilities or be able to maintain them without losing your feature acceleration at the same time. And we see this, we see this really well by looking at the best and poorest performers on developing teams. So those optimal high performers have intelligent and contextualized decisions automated workflows and they're structured and scalable. They live close to the edge but not totally on the edge those that live on the edge are really at that cutting edge of trying to make things work well for some experimental architectures but again, you're still might be at a malware concern. So one of the things that I really wanna highlight and I think I have several times in this talk is that I've seen so many issues with otherwise highly performant teams leaning too heavily on update automation in the last year and providing them some issues because of that. And just to say like the real human hours spent in remediation that were unnecessary if 96% of vulnerable ingestions are avoidable these are the hours that you could be saving inside of an individual corporation in a year. And I think if we just think about how fascinating it would be to be a year or two from now with more hardened ecosystems we should be able to have better open source contribution and better products built on top from it. Another little note that I found interesting from this year was stating satisfaction as developers as compared to their supply chain maturity and digital transformation and remediation are the most two predictive factors for whether or not your developers are happy in the development space. This means we're spending more time building new and interesting things rather than fixing things that we may not have even broken we just ingested. And my last note is these mature and immature developer teams and enterprises you can see this so clearly across different zero days. When I say zero day this means that we've surfaced a critical vulnerability that's somewhere between eight and 10 vulnerability and you see their meantime to remediation is drastically drastically different. So I'm not telling you these best practices in hope that they might improve your performance you can directly see that this is gonna improve your meantime to remediation today. So on the production side our open source maintainers need to implement particularly the scorecard best practices are an easy win to be able to have a least or less vulnerable open source package and keep those dependency up to date. On the enterprise side this is where the hard work really still needs to be done. Choose projects with a high safety rating use tools to flag and fix vulnerable libraries and get a realistic view of your organization's performance there should not be a discrepancy between what managers and developers believe about your readiness for an attack. So this 95 to 96% of vulnerable downloads had a less vulnerable version available. 62% of consumers are still downloading those vulnerable versions. I'd like to recommend you a piece of tooling that we've developed at Sonatype which we're making free and openly available. So this is called S-Bomb Doctor the software bill of materials that we have been asked to produce particularly out of the United States requires us to provide what is essentially a nutrition list of all of the packages that we have injected. That's great and that's going to be an improvement but we have gone in and provided an additional layer to that and again you can openly use this for free. This is providing the visualization of those transitive dependencies that I said were so tricky to capture. Now this is looking at a basic application an example application and showing you in these lines on this diagram what on that right-hand side you could improve in order to make that left-hand side that application go green. It makes you understand where your criticalities are in that dependency tree, what has been ingested and where and exactly where your developer team should be spending their remediation time to have the maximum value in hardening your architecture. This is so valuable and so useful. This is how you have to be doing open source security in 2022 or you will be spending too much time in remediation in order to be able to keep moving. So if you want a bomb doctor demo this is actually now available online on YouTube or on the open source, sorry, all day DevOps platform which is free and open throughout the year but I'd really encourage you to check out Elko Tarunin's have a plan in place before you need it dealing with supply chain issues the right way. He's gone on into a 30 minute talk into detail about the development and utilization of bomb doctor and tools similar to this all free and openly available that you should be using to understand your architecture. Now with that note, that's a look at the cutting edge what I think is really good best practices for automation tooling around decision making for least vulnerable open source injection. There's still more that we can do at the community level. So I am a cloud computing engineer by training. I have been working with CNCF for a few years and I was really interested in looking at that ecosystem where we have several languages interacting where our scanning tooling is still developing but you have a community of support around it. So I've been running both bug bashes and security slams to improve those projects security stances. So we know the typical open source contributor model you have a project, a maintainer and contributors and there's a conversation had between them in issues and pull requests which allows you to create new features, new ideas or refactors but I wanted to use that model a little bit differently and go in, work with the open source maintainers and run bug bashes. So open source maintainers from CNCF would sign up we would run a vulnerability scan over these and we would provide directly into the pull requests where in the line of codes there is a vulnerability or even something as far as typos we provided a bunch of linters within this as well. We basically went through and provided quality control and cybersecurity support with automated tooling intervention that gave these maintainers a list, a backlog of potential vulnerabilities that we could get through in a week or two and then we reached out to their contributing communities those who were already active, engaged and invested in those projects and we worked with them to teach them how to fix the vulnerabilities over the code that they care about. This was an incredible experience and this made it so that now there is a security awareness and security communication in open source projects where some of the maintainers have never had any security training and we actively need to put these processes into place. Now we did this utilizing sonotype lift which you can use, it's also free and openly available but again, this makes sure that you're not just going through and doing a manual review process or a scan process which gives you a list of things that you have to go search this puts the vulnerability and security concern results directly into pull requests so that you can address them from day one knowing exactly where you need to take action. It saves the developers so much mental energy I cannot recommend it enough and at this point in time we have worked with over 52 CNCF projects between both bug bash and the security slam in order to make them more security aware. I do want to talk a little briefly about another approach that we took so we've run two bug bashes over the course of 2021 and 22 but in this last round for QCon North America 2022 we ran a security slam which is a very different approach. I wanted now to not just create a security awareness both for the projects and their communities but to really make sure that the projects and the maintainers had a security awareness that we have not been able to give them before. So we use the CLO or CLO monitor which goes in and scrapes the CNCF repositories and gives you your score against best practices for documentation, licenses, security for example. And we wanted to pay attention to getting as many of these security scores up to 100 as possible because as I just explained that sonotype security score one of our most predictive metrics is whether or not they perform well on the OSSF scorecard. So getting dozens of projects secure in an ecosystem raises the security hardening of that entire ecosystem. And this is really at the level that I'd like to continue working for a few months with CNCF. Now maintainers, we really wanted to gamify this and we made it fun. So the maintainers got rewards, recognition and financial support for their security implementations. They have a faster time to compliance because of this and they became strongly security enabled. I now am able to contact maintainers directly and with a coordinated disclosure in a way that I was not able to do six months ago making sure that we address zero days in the CNCF community if they happen in a way that's safe and secure. Contributors were able to win awards and recognition for new and outstanding contributions. They were able to gain access to Linux cybersecurity training and they found new projects and meet new communities and provide an essential support as we're trying to build an entire new layer of cybersecurity maintainers that can help across ecosystems. And Google from Alpha Omega donated $50,000 US dollars to the CNCF diversity scholarship for every project which got to 100%. This is an incredible opportunity to just do some good while doing some security good. And I'm really grateful to all of the maintainers and all of the Linux foundation just support that we got around what is a community initiative around a really, really serious cybersecurity concern. And I think that for this approach for CNCF specifically, this worked really well. So in this talk, I wanted to give you a bit of a whirlwind again of why there's been this explosion of open source contribution and also an explosion of vulnerabilities because of that. There's no reason to lose hope. Open source is the most incredible space to be in if you care about human ingenuity. It's the place to be. But some of that human ingenuity goes towards malicious action. And we just have to make sure that we're creating ecosystems that are ready for that. And we're all responsible for the open source projects that we rely on. So I will continue to do my part to go in and harden both the ecosystems and the individual projects which I work closely with and which I rely upon. But I'd like us all to consider taking these community approaches, giving back and making sure that our systems are secure because we've worked with and supported the people that made them that way. If you'd like some follow-up information, absolutely check out. I really gave you just the top line statistics from the state of the software supply chain report from SOTA type this year. Pay extra attention to what I didn't cover in this section especially if you're into the management layer, the establishment and expansion of software supply chain regulation of standards. This is really looks into what S-bombs look like or will look like from a regulatory point of view for every country which has had an opinion on it at this time. That is worth looking at. And although you are now listening to this talk for OSS Japan, I do wanna highlight that I will have a talk coming out in a couple of months which is looking at some of these but over the hardware open stacks and a little bit more about what communication, community and security looks like at that hardware software implementation. Thank you so much for coming to this talk. Two quick things to note. If you have Java in your application, remember this and please go look, manually open your jars and Uber jars and see if you are somehow still installing downloading a vulnerable version. Although if you have a updated version of JNDI you probably are not vulnerable but just go in and check because these numbers should not still be this way a year onward. And there is nothing that we on the supply chain can do to change it. It's your ingestion practices. Number two, play with some of the automated tooling out there but if you are still in the process of choosing your best tooling for S-bombs I recommend using the bomb doctor. It's free and open source and you're getting vulnerability, dependencies at the same time. You want this information and I think that if you can give a report that gives not just what you have but what your best action for next or mediations are, everyone's gonna be happier and you're gonna be spending a lot more time developing than in a headache. And that last talk that I do really want you to go check out for a full demo of that bomb doctor, have a plan before you need a plan. You can find that on the platform for all day DevOps and that should have been released about a week ago. It should be public for you. Thank you so, so much. I'm glad that we got through one more year of cybersecurity. And as we walk into the next year get in touch with me if you are serious about continuing to work at the ecosystem level to secure and harden open source for all of us. And as you reach out to those who you work with on GitHub today, consider if you've thanked an open source maintainer today you could even just open an issue and say thank you for all that you do and thank you all for your time today. Reach out to me on LinkedIn at Salchimic if you have any questions and I will definitely be in touch. Thank you.