 Very, very much. Thank you very much. So Frank Nagel is soon going to give some very, very interesting results from the research, but we thought it'd be very useful to first give a little context here. So I'm going to try to do that now. Hopefully most of you already know what open source software is, but some people I think still have misunderstandings. So quickly, we're talking about open source software or free-end open source software, which is software licensed to users with specific freedoms. Run the program for any purpose study, modify, freely redistribute, either the original or modified version. You can see the open source initiatives, open source definition for more. There's a number of widely used licenses that are open source licenses listed here, MIT, PAPACHI, BSD3 clause, LGPLGPL. If you've got software, but it's not open source software, what do you call it? What's typically called either closed source or proprietary software. If you ever work, particularly with governments, it's often very useful to understand that open source is a kind of commercial software. It's licensed to the general public. An important aspect of these open source licenses is that they enable worldwide collaborative development of software, and that has remarkable positive properties, at least it enables a lot of positive results. Open source software is a critical part of today's software supply chain. Synopsys' recent study found that 98% of code bases in general and of Android applications specifically contained open source software. One study that they did found that 70% of code bases were open source software, even if the entire application wasn't, and that's just one study from Synopsys, another one from Sonotype found 90%. So anywhere from 70% to 90%, depending on exactly the data sets you're analyzing, and the use is increasing. You can see from the graph from the right, the growth is just increasing and increasing and increasing. In fact, even back in 2019, Sonotype noted that they've seen double triple digit growth, no slowdown in site. Well, that's great in terms of functionality, but what that has unfortunately yielded is a lot of projects and organizations really don't know what's in their systems. They use components that then use components that then use components, and in the end, there is some component in Nebraska that nobody knows about, but in fact, your entire modern digital infrastructure depends on. So we're going to be talking specifically about security, and one myth that I want to immediately explode is this word always. I'm sure many of you will realize that always is a rare situation. Is open source software always more secure? Is proprietary software always more secure? And the answer really is neither. Now, it is true that open source has a potential security advantage. If you look at the basic security design principles that were identified back in the 70s, they're still valid today. One of them is called the open design principle, basically it says protection mechanism must not depend on attack or ignorance. Now, open source better fulfills this principle, so that gives it a potential advantage. And I've always been perplexed by people who say that peer review can't work or what's called many eyes theories. This is perplexing to me because it's how the rest of society works. Academia, science, math, engineering, all depend on peer review. That's how it works. So the notion that peer review somehow is useless for software doesn't make any sense. But it is true that just being able to be peer reviewed is not the same as having actual review or have reviewers who know what to look for. You know, no software is perfect. Vulnerabilities may be found even in well-run projects. That said, continuous careful review is more likely to find vulnerabilities and fix them and get rid of them over time. So let's put some context here. The Open Source Security Foundation is a cross-industry collaboration. It's part of the Linux Foundation. Its purpose is to improve the security of open source software by building an expert community. It was established late 2020. We switched over to a member-funded model late 2021. And it has a huge, huge number of folks involved. Here are the premier members, quite a long list. Here are the general and associate members. This is not just one or two organizations getting together. This is a large number of folks across the software industry working together to improve security. I can't possibly cover all the things that the Open SSF is doing. There's no one magic answer. Many things need to be done. And the Open SSF project has a huge number of projects. One of them, if you look on this iChart here, Q, Harvard Study, that's what we're gonna be talking about today. We're looking to help identify critical projects. And the reason is not all open source software projects are equal, a small subset of special importance. The census too that Harvard's gonna be talking about today in the context of Open SSF is actually within the Securing Critical Projects working group. And the working group is then gonna take this data and other data to create an updated list of critical open source software projects. It already has a draft list. The best practices working group within the Open SSF has already been distributing multi-factor authentication tokens to critical projects as was identified through an earlier draft. We expect to do more of that with their updated list. The Alpha Omega project has an Alpha side focused specifically to help the most critical projects and we're expecting to use the data you're gonna hear today to help us identify those. If you're interested in more, openssf.org. And with that, hopefully that gives you a little bit of context. So Frank, please take it away. Great, thanks so much, David. So as David alluded to, we're working with the support of the Open SSF and the Linux Foundation. We thank them very much for their work. And before we jump in, I do wanna just give a mention, David mentioned how open source has enabled the world to collaborate. And I think it's important to note there's actually tens of thousands of open source developers in the Ukraine. And of course our hearts go out to them and our thoughts are with them during this tragic time. And so I also want to thank all the people that were involved in this report. You'll hear from a few of them today during the Q&A session. But this was a multi-year effort that involved a lot of people and a lot of stakeholders and a lot of different support. And so we greatly appreciate everyone who contributed. So David mentioned kind of ended with thinking about open source vulnerabilities where a lot of these efforts started was back in 2014 with the OpenSSL Heartbleed vulnerability that some of you may remember. Certainly many at the time said this was a very bad bug and a very bad vulnerability as it was. But it's unlikely to be the last one that we see. Now, of course, as we all probably have heard of in the more recent times, just a few months ago, the Log4J vulnerability and Log4Shell. Jen Easterly, the head of CISA, said that this was the most serious vulnerability that she's seen in her decades-long career, which I think many would agree with. And even more than that, when we think about how all of this impacts individual companies, we have to keep in mind that even though open source is free for being used and it's open and all these wonderful things that we can build upon it, we do have to keep in the back of our minds that when we think about security related to open source, individual companies can be held liable for the results of any breach related to these types of open source packages. So just as an example, shortly after Log4J and the Log4Shell vulnerability came out, the USFTC said that it would use its full authority against any companies that didn't patch and fix the vulnerability and that led to customer data being lost. And so this is something that, as David alluded to, open source is really in every aspect of our economy now and it's becoming more and more obvious that we need to think about what we're working on, what we're building and how can we secure it because it affects us all. So David mentioned a little bit of the core infrastructure initiative, which was the precursor to the open source security foundation, open SSF. So just to give you a brief timeline, back in 2014, the Linux Foundation set up the CII as a multi-party stakeholder effort to think about what to do about the open SSL vulnerability and to lay some of the groundwork for the work that we're doing now. And that included doing what was called at the time the census project, but now we're referring to it as census one because we're doing census two. And so this was focused on examining the Linux kernel itself and the packages that were critical to the Linux kernel. And so actually David and a number of others ran this first census to identify which software packages and they had to pick one. So they looked at the Debian Linux distribution in particular to understand which packages were most critical to the kernel and its operation and security. Now, of course, where they drew the line at the end of census one was the focus on the Linux kernel itself and obviously open source even since then, since the past five or six years has become very infused within all software. And so thinking about the open source that's deployed in production applications is where census two is focused. So the goals of census two are really to think about reinforcing the open source infrastructure and guarding against systematic vulnerabilities but better understanding the following things. So number one in the most critical piece of what we're trying to do here is think about what open source is out there how much of it is being used and how different dependencies make that maybe behind the scenes that may be hidden in ways that we wouldn't necessarily think about, oh, this piece of open source is something that all developers, lots of developers use but actually there's pieces of open source that that's built on that are behind the scenes. And so digging deep into these dependencies and the various pieces of open source usage to better understand what's being used and to understand its impact on the economy and innovation was one of the primary things that we're trying to do here. Related to that also thinking about a measure of impact. So thinking about as David mentioned that the open SSF has a critical working critical projects working group that is trying to better understand if we're going to offer support to various open source packages and open source maintainers where should we start and which projects are most critical? So now our hope is to be able to contribute heavily to that conversation. What we've done is not the be all end all of the most critical open source projects. However, certainly an important piece of when we think about criticality is how widely used is a piece of software. And so our goal is to contribute to that conversation. And then some of you may be familiar with our prior efforts related to running a large survey of open source community members. And so our hope is to use these two efforts in tandem to think about ways that we can better help some of the open source contributors and maintainers that may be rather stretched in, right? Especially when we think about the super widely used open source projects. And then finally, when we think about investment and where we go from here and where we go moving forward, the hope is that census two and other open SSF projects and research can be used to help prioritize investments and resources to support the security and health of all open source. But we have to start somewhere. And so the hope would be that we can start with some of the most widely used and some of the most critical projects that are out there. And so when we think about this in particular in the context of what we're gonna be discussing today, the real primary objective of this is to better understand which free and open source software packages at the application library level that companies and industries rely on in their daily operations that they're baking into software that they then resell or the software that they rely upon internally, how many instances of each different open source package are there and what upstream and downstream dependencies results from this. And we did this by analyzing datasets on private usage of open source that was provided by our three software composition analysis or SEA companies, Sneak, Synopsys and Falsa. So these companies, if you're not familiar with them, what they do is they work with their clients to identify what open source is in their infrastructure, in their products, in their software to better be able to make sure they're not violating any licenses or to also be able to better secure these things when new vulnerabilities are found. So thankfully these three companies worked with us to contribute data in an anonymized way and we were able to bring together all this data to better understand again what open source is baked in at the application layer going into other software that's been being used for other purposes. And the great thing about working with these three companies by just working with the three of them, we were able to get insight into the use of open source at thousands of companies. And so we, again, as I've mentioned, this is not the be all end all of open source usage, but certainly we have a very wide and a very deep insight into what open source is being used at this particular layer in the stack. As David mentioned, we issued a preliminary report what now seems a lifetime ago because it was pre-pandemic, pre-war in Ukraine, pre all these different pre Joe Biden presidency, all these different things that have changed since then, but what hasn't changed or what has only become more important is the usage of open source. And so back in February, 2020 we released a preliminary report that kind of gave some notion of the top 10 lists which now we have a bunch of top 500 lists that I'll be talking about in more detail. And so this work really builds on that for a preliminary report. And so I thank all the folks that were involved in the early days in this because they, we couldn't have been doing what we're doing now without all of their help. So what is the open, the report that we're discussing and that was just released this morning? So the primary piece of it is looking at eight rank ordered top 500 lists. And the reason we have eight lists is because as you all probably know, open source is super complicated and the ecosystem varies a great deal in how we think about it. So we did a few different slices and dices of the data to give us better insights into different ways we can think about what are the most widely used open source. So for example, in particular, we looked at version, it's particular versions of software packages versus thinking of things in a more version agnostic way. We also looked at the JavaScript NPM repository separately from the other repositories. This is because JavaScript encourages kind of the use of smaller packages and therefore people import more packages. And so just as a function of the programming language and the way it's designed, we have lots of more used packages there. And therefore if we didn't separate these lists out, JavaScript and NPM would really dominate the whole list in a way that isn't truly representative of what's actually being used out there. And then lastly, I mentioned the hidden kind of dependencies within open source. And so we did this again in two kind of cuts. One, we looked at what programmers are actually putting into their code themselves directly. So these direct dependencies. And then we also looked at indirect plus those direct dependencies. And so the indirect ones, we can think about, if a programmer says calls a particular library, that library builds on a whole bunch of other libraries. And so those are the indirect dependencies that we included in the more, the broader list of how we counted open source packages. So again, unfortunately there's no, this is the number one used open source package because we have these kind of eight different lists. But we really think that looking at these and all these different slices and vices allows us to better understand how open source is being used and which open source packages are the most widely used on these different dimensions. So the analysis was based on over 500,000 observations of open source usage from 2020. The report again is not a definitive claim of which false packages are most critical, but it does represent our best estimate of which of these packages are the most widely used by different applications, again, given the limits of our data. And our hope, and I'll talk about this a little bit at the end is that this will be an ongoing longterm, yearly or every other year type of project. And so we'd love for other folks to contribute as well because the more people that contribute, the more we can see into the entire open source ecosystem and the more we can think about getting closer to what the ground truth actually is. So a few notes on the methods and just how we did all of this. So we, as I mentioned, we kind of end up with these top 500 lists. We did that by from each of the companies that contributed data, taking the top 1000 packages that showed up in their own analysis, and then splitting it on these dimensions that I just mentioned. And then we did what's called calculating a Z score. So the Z score is thinking, if you think way back to maybe you had a stats class at some point, this is just thinking about where in a distribution a particular observation occurs, right? And so if it's very high and it's a real outlier on the high side then this Z score is gonna be very high. And if it's something that's in the middle then the Z score will be relatively moderate and then it could be much lower than that. And the reason we had to do this is because these three companies all have different sizes of customer bases that they're looking at. And so if for example, one of them had 10,000 customers and the other only had 1,000 customers then obviously if we just added those numbers together then the one with the 10,000 customers would kind of dominate but that's not necessarily how we wanted to aggregate this. So instead we think about where in the distribution of an individual company's open source usage statistics did this show up? And then we use those numbers to combine them together to think about how high or low on this kind of ranking in the overall distribution any given package was. And then we take this so we have 3,000 different components potentially and we whittle this down and chopped it off at the top 500 for a variety of reasons. Our goal was to be able to share as much information as possible while still respecting the privacy and integrity of the data and from our data providers. So that's how we ended up at these top 500 lists. Happy if there are questions about that to chat more about that during the Q&A. But what I wanted to just show I'm not gonna sit here and show you all the top 500 lists that would take hours and hours and hours but I'll just give you kind of a teaser of the data which is now available as of this morning with the report. And the last thing I'll mention too and you'll see this in just a moment when I show the tables is that we also included information on OpenSSF badging. So David mentioned that one of the goals of the OpenSSF is to also help educate and increase the security levels of lots of open source projects. And one of the ways that they're doing that is through the badging program that essentially shows here's a whole list of best practices that you should be following and where do you kind of fit on that? Are you doing, as you mentioned are you using multi-factor authentication on some of the key accounts or are you doing security audits and things like that? So the OpenSSF badging just to have that in the back of your mind it's on a score of 300. 100 means essentially you're passing and you're at kind of the next level and you're doing all of the basic things and then you can get up to 300 if you're doing the more advanced things that are certainly highly recommended but aren't necessarily kind of the baseline everybody should be doing this. And as you'll notice once I move to the next slide lots of these open source projects are actually not participating in the badging process. And so certainly one thing that we hope if nothing else comes out of this is more awareness about these types of efforts that are, it really intended to help open source maintainers and contributors enhance the security of their own projects. And not everybody has to be a security expert however, following these kind of baseline best practices really goes a long way to enhancing the overall security of open source. So as I mentioned we have a whole bunch of different cuts I'll kind of show you just the top 10 from each of these eight cuts. And so here we have the cuts that are looking at direct dependencies. So these are the things that programmers put in their code directly across and in a version agnostic way. So there's no version information here. And these are on the left is the NPM lists and on the right is the non NPM which this list interestingly is moderately dominated by Maven although some of the other cuts of non NPM are dominated by Go and other programming languages as well. And so again, the Z score really kind of helps us think about the relative ranking of these. So not only, if we look at the NPM for a second not only is low dash number one but it's nearly twice as widely used in a variety of ways than number two is, right? So we not only have this kind of rank order list but we have some sense of the difference in the jump from one project to the next, right? And so obviously, again, we can see some of these packages are using the badging and are on their way, some are higher than others in terms of how high they are in their tiered percentage but at the same time, lots of these projects are not using the badges. And so obviously, log4j has been on everybody's mind. I will point out, there's a number of letters here that are similar to log4j in the number one package that showed up on the non-MBM. That SLF4j is a late API that can call multiple logging packages, including log4j. And so I think these are the types of things that it's important to think about in the future as we think about where we should be spending our security dollars in our efforts and thinking about prioritizing where, you know which projects are very widely used. Just another, you know, brief teaser when we think about the indirect and the direct and combining these kinds of hidden dependencies that are higher up the chain. Obviously, we see different lists here. So that's interesting because again, when we think about the most widely used open source often developers, if you survey them would say one thing but what actually is behind the scenes is something that is quite different, right? And so we don't even show the low dash here which was number one on the MPM direct list showing up in the top 10. It is on, you know, the top, I believe 100 and it shows up a little further down but at the same time it gives us a different kind of insight into what's going on with open source usage. Just one more cut and I'll, you know stop throwing these I charts up after one more thinking about versioned packages. And so here we also included information about individual versions of packages. And this really allows us to better understand kind of how old the software is that people are using, right? One of the more fascinating things that came through in this analysis in particular is that actually a lot of people using log4j were using the 1.x series rather than the 2.x series which was the 2.x series was where the log4 shell vulnerability was found. And so interestingly enough, lots of people at least in this data set weren't necessarily susceptible to that log4j issue because they hadn't updated their software in a long time. I wouldn't recommend that as a defense mechanism certainly the log4j version 1.x was last updated in 2015 and there's actually been a number of vulnerabilities discovered in it since then. And so by no means were those people secure from everything. However, interestingly, they weren't necessarily affected by the most recent log4j issue that was in the news over the past few months. And then lastly, this final list is just again another cut thinking about the indirect and direct dependencies again at the versioned level. And so you can read the report for more insights and for the all eight of these top 500 lists both of the version and the unversion level and with these other different cuts. All right, so now pulling back to the higher level we wanted to think a little bit about lessons learned and where we can go from here. So at the highest level, we highlight the need for standardized naming schema for software components. One of the things that was certainly tricky on the backend, merging all this data and bringing all these things together is that there are lots of packages named debug or lots of packages named XYZ. And so when we think about these efforts to better understand open source usage, there is no kind of true naming scheme that everybody has to follow. Different companies use different schemes, different open source projects use different methods. And so thinking about that is something that was noticeably important at the higher level. And related to that, there's also a lot of complexities associated with package versions. In particular, we saw some versions reported by our data partners that were widely used pieces of software that didn't actually exist in the public repositories for those packages. Our speculation is that what's happening here is there's been internal forks by company and the companies and they maintain internal versions of individual packages and have their own sort of versioning system. Now that's certainly, depending on the license and David ran through a number of the licenses before, depending on which license they're using, that could be totally fine from a kind of legality standpoint. But when we start thinking about software bill of materials and the ability to actually track, what version of a piece of open source is in my code, that can make things a lot more complicated because then different version numbers can mean different things. And it makes it harder for companies to understand, am I vulnerable to, whatever the new vulnerability that just came out was. Third, we talked about, we thought a lot about how many people are actually contributing to the different open source packages. So, and for an example, in just one of the top 500 lists, looking at just the top 50 packages in that list, only 136 developers were responsible for more than 80% of the lines of code added in the last year. And so when we think about this kind of many eyes and lots of people working on open source, that may be true in aggregate, but on any given project, often we see one or maybe two developers contributing the bulk of the code. And this makes things more complicated when we think about security because there's a lot that needs to be done. And then we think about the notion that those people are stretched pretty thin already. Fourth, we thought about the increasing importance of an individual developer account security. So here we had, we noticed that lots of these very popular packages are hosted under individual accounts on GitHub or other repositories. And that can be concerning because those don't have the same level of security as organizational accounts or some other types of things. So David already alluded to the fact that the OpenSSF has been working with many of these projects to give them multi-factor authentication tokens and the ability to increase just the baseline level of security on their accounts. And then finally from the high level, this persistence of legacy software in open source. Like I just mentioned this issue with log4j. And that was with, we saw that with lots of different packages, right? The very old versions were still being relied upon heavily. And this is a problem that is in all sorts of software, right? Not just open source, but it's something that needs to be thought about in the context of open source moving forward as well. So our hope is that this report spurs a number of different types of actions. I'll just mention three. The first is data sharing. Obviously again, we worked with private companies to get access to this great data. We hope more are willing to join future efforts because this is really where the best data and best insights are coming from about open source usage. We can get some kind of tip of the iceberg from public repositories, but really we need this behind the scenes type of data to better understand what's being used in practice. Second, thinking about coordination. I talked about the naming and the versioning concerns as the world moves more to rely on software bill of materials, which is a fantastic thing on many dimensions. This naming and versioning issue will probably continue to be a bigger deal. And so thinking about how we're going to account for these different names and different versions is going to be something that I think will be important as we come to rely on software bill of materials. And then finally, thinking about spraying action and thinking about investment, right? So this is a big thing that we've seen even from when our preliminary report came out in 2020 that's changed over time. There's been some more funding for research and thinking about how we actually understand what's going on in the open source ecosystem, which is great. And then even more recently, especially coming out of the meeting at the White House related to log 4j and open source security in general and the efforts of the open SSF. We're seeing more and more investment. David mentioned the Alpha Omega project. This is the one I'm referencing here. Where individual companies are sponsoring these types of projects to help fix all of the open source that they rely on, but also that the world relies heavily on. All right, so with that, I'm going to turn things over to Jessica because Jessica is going to give us some insights into thinking about how individual companies can take this type of data and help them ensure they're secure their own ecosystems. Right, thank you, Frank. I'd like to thank the Linux Foundation for inviting me to join in today's discussion and begin by saying these are my own opinions as someone who's led Linux and open source software engineering teams and having been a part of the open source software ecosystem for the past 15 years. I'd also like to thank the team at Harvard who has done this research. This means that every company who's trying to develop similar lists of open source packages and their software linkages and dependencies, we don't have to try and replicate this work on our own. This is a great example of how we can work collaboratively as a part of a community by leveraging the strength of the ecosystem to create it once and use it many times within members of that ecosystem. So this research is definitely a step in the right direction. There is more work for each of us to do but this is a really solid foundation based on data that companies can build upon. Now, if your company or your company's products leverage open source software, you need to contribute to those communities and the health of the ecosystem. This includes the understanding of the health of a project or community and the risks associated with using the code. Now, companies are gonna be in different points in their open source journey, but security, no matter where you're at on that continuum, security needs to be top of mind. It cannot be an afterthought. And as David mentioned earlier in the discussion, no software is perfect. And even though open source has the opportunity to have more eyes looking at it, it does not mean that it is inherently more secure. There is work that must be done to create secure software and open communities, just the same as there's work to secure proprietary software products. So, this research report develops awareness of the potential issues. Now, companies who use this software need to develop their own action plans. So, this list should be used by companies to develop a risk assessment of their open source software. Now, many companies are currently looking at the open source software they consume, that their customers consume and how to mitigate any risks associated using that open source software. I'd like to share a few best practices that companies can use to help develop their plans on what to do next. So, now the first best practice is just kind of what I began with, right? Companies should be contributing their time, their talent and their support to the open source projects and communities that they're directly benefiting from. The times of being a passive consumer of open source software, that time has passed. Taking the free puppy analogy when I rescued my two dogs from my local animal shelter four years ago, it was during a clear the shelter event where adoptions were free. So, essentially, these were free puppies, but as any pet owner or parent knows, I've spent thousands of dollars on my dog since that time over the years because pet ownership is filled with great responsibilities as well as great things like companionship and joy. I fully accept that having my dogs comes with responsibilities like feeding them two times a day, grooming them every six weeks, taking them to the vet, walking them daily, expanding that analogy to open source software. If you're experiencing the joy of open source software, you also need to take on the responsibility to improve the code base and remediate issues in open source if they arise. Now, the second best practice is knowing what you are shipping within your own products with a software bill of materials. Now, when a company is manufacturing a piece of hardware, for example, a compute server, having an accurate bill of materials is essential. It ensures that all parts are available. It ensures that the manufacturing process is not gonna be interrupted if you have to locate a missing or out of stock part. It includes the name of the part who produces it, how many parts do you need? What does each part cost? If there's an alternate part that can be used in its place, it covers all the bases. Now, we would never design a system, a hardware system without a bill of materials. It's a recipe for success. Now, this really, we have the same need with software, right? Software companies should have a software bill of materials when building complex software systems. And let's be real, every software system is complex, right? With all the kind of the interdependencies we have built throughout our code base and ecosystem. So, as a software bill of materials is gonna provide that transparency of components delivered within the software supply chain and ecosystem and learning more about that and having one is a great place for organization and companies to start who want to strengthen their software security. Now, this is gonna bring me to my third best practice, which is to enable your open source engineering teams with consistent and strong tops down focus as well as bottoms up enablement on the importance of secure software practices and policies. Now, this includes a best practice that I've been directly involved with is having core teams of engineers and practitioners who specialize in security available to consult with your open source product teams to make sure that they understand that they're following the best practices in the correct tooling, right? It all begins with awareness and education. It also includes seeking out training as well as tools and expertise to raise your company's awareness of what steps can be taken to mitigate risks. Now, as David mentioned earlier, everyone is free to participate in the open SSF community initiatives and can take advantage of free training, webinars on best practices, as well as access to tools to help us and mitigate risks. Now, the next step for us as members of companies that benefit from open source software is to make this a priority, right? These are not someone else's problems to solve, right? We need to accept that as a collective and we need to accept that we're gonna have to be a part of the solution. So there's tools, there's education and an ecosystem to help support us to solve these problems. We have to come together and make this a priority. It's really time for us to get to work. So I know we've allocated some time for Q&A. So I think we can go ahead and shift to that now. Great, thanks so much, Jessica. And I think it's so important to me to have this kind of corporate or company-based opinion or perspective because I think that we as an academics tend to get super into the weeds and into the data and have lots of fun with that. But again, our hope with this whole project has been to give guidance to kind of three buckets of folks, right? One are organizations like the OpenSSF or even governments that are trying to think about this at the high level. The second is really to companies themselves and thinking about how they use open source, how they can support it, where they should go from here. And the third is even the open source contributors and maintainers themselves to think about how they appear on these lists. And frankly, I'm sure that some of these open source packages that have ended up on these lists will actually be a surprise to some of the maintainers that their packages are that widely used, right? And these are exactly the types of things that were from the company perspective. I think it's very important to understand how companies can utilize this type of information. So thanks so much. So yeah, so now we'll open up for Q&A. We have a couple of questions that have already been put in the Q&A Zoom function. Please feel free to keep those coming and we'll answer them as they come in. David, I think you were gonna answer one of those first. Yeah, so let me answer the first one I see from Thomas Frick who mentioned that the German government has developed something sovereigntechfund.de slash en for the English version, I assume, about the German government wanting to invest in open source security and wanting to connect with the open SSF. Delighted, absolutely. As we mentioned earlier, if you wanna work with the open SSF to improve open source security, please join and so on. But if you want to talk about, hey, either funding the open SSF or coordinating efforts in some way, probably the quick, easy way to start is just shoot an email to Brian Bellendorf and myself, David A. Wheeler. That's, I'll post our emails in the chat. We're not hard to find. And we'd be delighted to work with, coordinate. If you've got funding and you wanna chat with us about how to fund some things, happy to do all those things. So yeah, please contact us. We'd love to chat with you. And thank you very much. Frank, do you wanna take the next one? Yeah, I'll take Lawrence Hecht asked, can we agree that the naming scheme can be dealt with or it can be dealt with in the SBOM discussions? So I think the answer is yes, but we have to make sure that it is, right? And so both the naming concerns that we had, for example, that there's 15 different debug packages and sometimes there's in the same language, there can be different packages with the same or very similar names. And then also with the version issues that you mentioned as well. I think, so I think the answer Lawrence to your question is yes, we can do that, but we have to make sure that we do that because, and so agreeing on, as kind of a community, for example, how should we think about the canonical name of a package, right? Should it be a package manager plus the name or should it be something like a URL either on a repository like GitHub or GitLab or something like that or a separate URL? I think these are all reasonable options, but I think that's some of the things that the folks at NTIA and Department of Commerce and those working on the SBOMs now and the language for the suggestions around them are thinking about at this point. So my hope is that the answer is yes, but I think it has to be a decision on purpose to try to address those. David, did you want to? Yeah, I just wanted to, I definitely agree that this is something that needs to be dealt with within the communities developing SBOM specifications and related things. I'm already on record saying that I think URLs have to be at least part of the solution, in part because, hey, everybody downloads from Maven. No, they do not. There is a huge number of, say, internal to organization repos. System packages are often modified and tweaked by various system work. So version Foo 1.2.3 from, say, Red Hide Enterprise Linux is not necessarily the same as the source code of Foo 1.2.3 from its original developers. And so, this is why the naming is actually more complex than it seems at first blush. And so we need to have ways that can handle widely distributed distribution systems. Thanks. All right, we have a question from Denton on how to best promote getting an open source coordinator position at all universities and asking where we can collaborate on software, where they can collaborate on software security and university needs. And so my answer to you, and I know there are some other questions in the chat that are somewhat similar, is never let a good crisis go to waste, right? So Log4J has captured the attention of not just the kind of people who work in IT, it is something that has permeated governments and business people, it's gotten that awareness. And so I think what you should do is work with your administration and kind of put together the business case on how important having this CSO type function as a part of a university community. And really it's not something you can kind of afford not to do, right? So again, my advice is to not let this crisis go to waste. Yeah, it sounds like they're talking about something like an open source program office and OSPO or something similar. There's actually a whole bunch of, if you just Google that, use a search, you'll find a whole bunch of things about doing those. I think in an academic situation, it's a little more complicated. I think that it's probably more easily done in a more distributed, lower level way instead of necessarily trying to do it at the university level. But really, I think the key there is, don't let a good crisis go to waste. Hey, there's some challenges, let's make things better and focusing on that. And then tweaking things based on your specific circumstance. And I'll add to that. I know that Johns Hopkins University has actually been doing a lot of work on building out an OSPO and things like that. And they've been, I forget the name of the person in charge, but they've been publishing some blogs and some kind of best practices for how universities can think about building an OSPO. So I check that out as well. So there was a question about kind of where we're going from here and what's next on the radar for this type of effort. Steven, I don't know if you wanna maybe answer that one. Yeah, sure, happy to. Yeah, so I think what we wanna do is we wanna continue to build on these efforts and move into understanding OSS in the cloud. But also, we're an academic institution. So we like to run experiments. We have quite a few eager economists on staff who love to run experiments. So I think we're gonna run some experiments on the value of open source, both to the developers, but also to industry as well. And then I think we're also going to look back and we released a report, I wanna say, September 2020 on the community contributions to OSS. And so I think we made a circle background there, but much on the horizon, I think, in terms of building on this data in this report. Great, thanks, Steven. And then, Yano, there was a question about, where did it go? Javier is asking if we could be, it would be interesting to see the breakdown of this data by regions, to think about contributions coming from different regions and if there's specific areas geographically that may need more help than others. Sure, sure, happy to answer. So I think for this question, just as Fred mentioned in the discussion before, like we have some kind of the privacy concerns from the data provider standpoint. So at this time, we're actually not able to provide that level of insight, but I think this is actually something we wanna do in the future. And then with more data sharing and then also some potential more negotiations coming out, maybe this is something we can do in the future. Great, thanks. Let's see, working our way through the questions. Do I have to take the question from Benjamin Bukari? Yeah, great, that'd be great. All right, so best practices for medium, small to medium-sized businesses to best maintain their open source software dependency graphs, they don't have the resources to track that information. I mean, if you're developing software, I think the first step is trying to use the tools that are already there maximally automating. In particular, there are still developers who think that the good way to add their dependencies is by copying and pasting some random version of the code. Don't do that, use a package manager, it's what they're for. Package managers exist to automatically manage your dependencies. Now, there's a lot of tools that once you start using automated tool, particularly package managers, they can take that information and generate software build materials that work across different ecosystems. Excuse me, the next step would be starting to encourage your suppliers to provide software build materials. And right now, that's an effort that's ongoing. The US government is already pressing on that particular gas pedal and I'm sure there's going to be others. There's already work, I mean, it's been a long time coming but for example, the SPDX was only finally an ISO standard but last year. So it's one of those things where a lot of people have agreed this is something that he's doing but things are only recently coming to fruition. So, but I think the things I just listed, I think are a good step in that direction. Great. And then we're just kind of going health or numerically here. Or David, were you gonna, so that Kathy Geore asked a question about best practices of how to organize lists of open source internally, I think, so they can easily, more easily support being part of ongoing census efforts like this without so much manual effort to analyze the data. So I can speak for myself and Yano and Steven. Yes, there was a lot of manual analyzing of the data on our part but also on the parts of the SCAs to ensure the privacy and that they weren't revealing anything sensitive about their customers when they were providing this, hence the geography concern that Yano mentioned. And so one of the things that we're thinking about doing in the future is trying to do exactly that. So trying to make it easier for individual companies that have insights into their own operation, but not necessarily like the SCAs insights into many operations to be able to contribute data to this type of future census efforts. So unfortunately, the only really answer I have for you, Kathy, is stay tuned. But we are hoping that in future versions of this, either when we run kind of censuses based on surveys, so more like the census, the way we think about run at the government level or census is based on more technical means with kind of the SEA partners and things like that. Certainly, Kathy, feel free to reach out to me. I'll put my email in the chat for anybody that wants to reach out that's interested in contributing to future efforts around this. Yeah, if I could quickly add, the problem wasn't organizing the data, it was getting access to the data. So that's really the thing that we would love to have more folks being willing to share. Thanks. Yes, agreed. The next question is from Blake. Why is there a focus on differentiating NPM versus non-NPM? This is, we give a little more detail in the report on why we do that, but in particular, NPM packages tend to be much smaller and have fewer functions than other types of packages. And therefore, in average JavaScript, which is what NPM predominantly hosts, type of a program, you end up importing a whole bunch of packages. And so that ends up making it look like when we aggregate this all together, that NPM is really dominating the entire ecosystem, but really it's just more of a function of the way that there are kind of norms in the way that the coding is done. And so it's not anything to do with kind of backend or front end, it's more about just the way that the size of the packages and the number of functions. And so if we had combined everything together, NPM and JavaScript would have dominated everything because of that. And that would have kind of hidden a lot of useful information coming from other types of packages and languages and things like that. Quick stat, about half of all the NPM packages have zero or one functions. There we go. Excellent. It's quite a spectacular difference and that's why these were separated. Excellent. Let's see, Aster. Hello, Aster. Good to hear a few. Thanks for your question. Is there a coordinated effort between OpenSSF to reach out to and engage with international stakeholders such as other governments, but also currently less engaged industry verticals, et cetera? I can give a bit of an answer to that. David, you may be able to give a bit more detail given your role. So I think the answer is yes, they're moving in that direction. Certainly there have been focus more on businesses at the moment as opposed to governments, but I think given where many governments are headed and the EU I think is often a bit ahead of the US on this type of game. So I'll let David add on to that because I think we might have more. Yeah, the OpenSSF actually does have a public policy subcommittee, but it's been more of an industry trying to answer questions from various governments. But yeah, the security of OpenSSF is important for society writ large, industry, government, everybody. So this is an area where public-private partnerships are absolutely the right idea. So we'd love your help in making that happen. Absolutely. I will note that it's almost the top of the hour. We're happy, there's a few more questions. We're happy to stick around and answer them. Just to make sure everybody gets an answer if we can answer them. But the report went live today. I believe the link went out in the chat. Please feel free to check it out. And again, for folks that are interested in either joining the census effort in particular or future census efforts, please feel free to reach out to us. And certainly if you're interested in joining the OpenSSF or in the other efforts around the Linux Foundation, feel free to reach out to us as well. Because I think personally one of the things that's so fascinating about this whole open-source ecosystem and the problems that we're trying to face is that by design, this is all very decentralized and distributed. And that can be a good thing from the support and the productivity standpoint, but it actually can make these types of issues harder to solve. And really it's only going to be by everyone coming together and working on these problems and aggregate from nonprofits like the Linux Foundation, companies like IBM, academics, governments, all these types of folks, that we're really going to be able to fix this and ensure that open-source continues to play the vital role that it has been playing for decades to come. All right, so thank you all very much. We'll stick around and answer a few more questions, but please feel free to log on. And thanks also to our panelists because it's been great to have you all here. Let's see. What other questions? David, do you know about the PURL standard? I'm not... Yes, yes, all right. So, yeah, that's an effort to deal with package naming. I mean, I don't want to beat up really on the PURL specification too much here because if there's a shortcoming, the correct solution is to send an issue to the PURL specification folks and say, hey, you're missing. That said, I mean, you asked the question, so let me, one in particular weakness of the PURL specification is that can't handle version ranges. And that's really important when you want to have a, I got a vulnerability, what does it apply to? So I think that's an area where there's a weakness, it needs to be adjusted. Whether or not it's a weakness or not, I think it's important to understand that it's really focusing on built packages, not necessarily source. There's a number of projects where really you download and deal with the source code directly. That's not really what it's designed for, but it's a reality. But I don't want to beat up too much on PURL, really, the PURL spec. I think the correct answer for weaknesses and it is go talk with the folks who are actually trying to do the work and try to give them a hand. Great. So Thomas Frick asked about, or mentioned that the German government is interested in building an OSPO as well. And I think this is again, something that we're seeing more and more interest in it, all these companies, universities, governments. I will say that the recent, I guess it came out, I think it was last summer, European Commission report commissioned by the European Commission and done by Open Forum Europe had a number of suggestions related to OSPOs at the government level. And so I would encourage you to check that out. I'll try to find a link while we're answering other questions, but there have been a number of suggestions of how government level OSPOs can be formed. And I think too, there's kind of two functions that at that level they'll start to play. One is just to organize the open source that the government relies on itself. And then the other is to think about how the government can support industry activities and individual activities related to open source. And so I think that report from Open Forum Europe and the European Commission has a number of ways to think about how the government can sponsor OSPOs that are for themselves in kind of internal purposes versus also kind of sponsoring them for individual industries to help out with organizing the industry level open source usage. Yeah, and Sir Hott had a question. I hope I'm saying the name correctly about how do we encourage companies who are benefiting from open source to actively fund projects? And he had some ideas that he shared. So my experience is, I can't make someone do something, right? I can only change my own behavior. I can't change other people's behaviors. That's after years of therapy, right? So what I think what you have to do, what companies have to realize is what are the implications? There are financial implications of not using secure software. There's implications of how you were seen by your customers, by the media. No customer, no company wants to be on the front page of the newspaper, right? That you had this breach or this security problem. So I, again, kind of going back to, you have to tend your own garden, right? You have to, companies have to take ownership of this. And it all starts with awareness, right? And so this research, this report starts at awareness as companies become more aware of some of the, you know, of what they should do. You know, and here are some processes and tools and education that you can use. You know, I have full belief that companies will engage, right? But again, they have to come to that realization. You know, it all starts with awareness. Yeah, if I could riff a little bit on what Jessica said, I agree with you that it all starts with awareness. I do think this report helps. I think another one, interestingly enough, is the S-bombs. I showed that to XKZD cartoon of, you know, here's my amazing infrastructure and here's the component that's in trouble that I don't know about. I think things like S-bombs are actually going to help projects. Notice, I think increasingly, you know, users, particularly large end users which have really critical needs are starting to ask what's in this software? They start taking a look. Yes, I'm glad I want this functionality. What's in there? Let me ask some questions. You know, tell me about the risks and very, very quickly they realize, oh, wait, this stuff's great. This one, no problems. What the heck? And so I think the short answer is the work that's ongoing right now to help provide the visibility will help incentivize folks to improve things. Because, you know, different companies depend on different things. Different other organizations depend on different things. So they should care about what matters to them. We need to give them more visibility into what they're depending on. Agreed. Let's see. Our friend Jim Zemlin mentioned that the Linux Foundation is actually providing free training for government ausposes through the to-do group that was mentioned before. I don't think that's happening yet, but it will be likely something that's on the very near horizon. Because indeed, you know, they're just like any large organization. They're heavily reliant on open source, but not necessarily, you know, don't have experience trying to think about how to manage this at the, you know, kind of the full organization level. So thank you. Thank you, Jim. Let's see. So, and there was a question from Holger Stridl on thinking about sources of error from the SCAs. And maybe, Yano, you can talk a little bit about that. Yeah, yeah, sure, definitely. So I think it's we try to actually be as clear as possible in our report, which is just a link we sent up to you. And then in the message section, we try to be clear about assumptions we made and what kind of data we had, how we're dealing with different kind of situations out there. So feel free to check the report. But I would say here, like what we're doing is more like matching those, the standard we're referring to is more like the library.io's website. So they have a good kind of, they've done a great job in terms of recording those kind of software and components so that we're trying to follow that standard and actually that covered a lot of, most of the kind of the packages that we have in the raw data. So, but feel free to reach out if you have any more questions after reading the report. Great, thanks, Yano. There's one last question about regulatory requirements for ISV. To disclose scores or badges for compliance. David, do you have anything to that? Sure, well, unfortunately, the world of governments, I presume this is where this is coming from, love to live with acronyms, but we don't always have the same expansions. I'm assuming what they mean is independent software vendors, but I'll bet there's 300 different expansions of that that might be applicable. But you know what, regardless of that, regulatory requirements to disclose scores, badges for compliances, I mean, frankly, regulatory requirements, we're the wrong people to ask, that's a question to governments. Governments are the ones who establish regulatory requirements. So I can't speak for various governments around the world. I think that I would be unsurprised long-term if there would be some regulatory requirements, but I think that there's always a risk, and I used to work with the US government a bit, quite a bit. I think the risks for anything that comes from governments is of course, they're trying to come up with criteria, but they're not directly in an industry. It's often hard to make them accurate and correct and handle all cases. It's hard to make adjustments. And so really, I think as much as possible, industry should try to develop, scoring badges and so on themselves because they have the same problems and can be a whole lot more, both similar and they're directly involved so they can see it. So I think really governments should try, might governments do it? Absolutely. And governments have ever right to do whatever it is within their laws that they choose to do. But I think where possible, industry should develop those ahead of time. So that governments don't have to try to do that. And then of course, if it turns out that the industry creates something that works out well, I mean, there are some things that already exist and then governments say, hey, that's good. Let's keep doing more of that. I think that's often a better outcome. Great. Yeah, and I would tend to agree. There's the types of things that some folks are talking about being baked in and I think it's kind of, I would tend to agree that if industry can make that happen itself, that's probably more efficient, but that may come down the line at some point as we see governments get more heavily involved in thinking about the role of open source within their own economies. Yeah, in fact, if I can jump in further. I mean, I've seen some of the proposed things and some recommendations and already I'm seeing the, we assume all software developers use waterfall approaches. Everything is, we first write a 10,000 page contract with all the requirements. So we make sure you follow the 10,000 page requirement. And oh, we assume that all software developers are very large enterprises. And then they're shocked, shocked when it turns out that their assumptions are just not true. It's not because government people are stupid, they're not, but they're working in a different sphere. So I think that it's better if we try to make it so that governments, either don't have to regulate or their regulations can build on things that have already received widespread consensus by the folks who have to deal with it day in, day out. Great, agreed, agreed. All right, I think we managed to make it through all the questions. So thanks to those who stuck around, thanks to our panelists. I know Jessica had to drop off a little bit early, but we really appreciate everyone attending. Hope you enjoy or get learned from the reports probably more than enjoy, but thanks all for taking the time. With that, we'll hand it back to Marisa to close us out. Wonderful, thank you so much, Frank, David, Jessica, Steven and Yano for your time today. And thank you again to everyone for joining us. Just a final quick reminder that this reporting will be up on the Linux Foundation's YouTube page later today. So we hope you will join us for some future webinars. Have a wonderful day.