 Hi, I'm Naveen Srinivasan. I work at Enddoor Labs. We are a supply chain security startup. I am also an open source contributor and a maintainer. I contribute to a few open source supply chain security projects at the moment. I am one of the maintainers of Scorecard where the stock about. Brian. I'm Brian Russell. I'm on Google's open source security team. I'm a product manager there. And I work a lot with the Scorecard's project. So today, we wanted to just start with the problem that Scorecard is really setting out to try to help solve. And if we look at this diagram that comes courtesy of XKCD, what we start to see here is a representation of what modern digital infrastructure looks like. If you peel back the veneer of almost anyone's end product, you start to see a real hodgepodge of different shapes and sizes of the dependencies that an organization has taken on over the years. Or taken on just by adding one dependency and getting that cascading set of dependencies behind it. And we don't have a lot of insight into exactly how strong or weak some of these dependencies are and how well maintained they are. There's a lot of different concerns that could come out of this. But when it comes specifically to security, we don't have a lot of insight, and we'd like to have some. We want to know on top of how big and small these dependencies are, we actually want to know are these safe or not. Because even the smallest one that we have is a concern to us. So imagine if we could start looking at this and start getting a little bit of extra information about some of these dependencies. Imagine if we had numbers like these that gave us a sense of this dependency is in a really good state. Its maintainers care a lot about security. They are really trying their best to make sure they follow just good principles and good development practices versus here's some projects that actually aren't doing that at all. And then here's some projects that maybe they're trying, but they've still got a little ways to go. Just having that insight starts opening up the world to us in terms of what decisions we want to make, what we should do with these dependencies, how we can take on different pieces. So enter in open SSF scorecards. We are seeking to add that extra information to all of these open source projects and dependencies that are being taken on. This is an example of what one looks like. Some of you are likely familiar with it and some of you might not be. I'll spend just a little bit of time explaining what we're looking at. What we see here is an overall score and we have a number of different checks that Scorecard is looking at to determine whether or not a project is being maintained in a way that's considered secure. So without going through each and every check line by line, I did just want to take a couple of examples and take a look at what these really mean. So if we look at the first one, we have a check for binary artifacts. That is, is this project storing binaries and therefore obfuscated source code in the project? Or can we see all of the source code that's in a particular project? With each check, if you look at the table that's on the project page, and I hope you all visit it after this talk, we go through each and every check that we provide in Scorecards. And in there, you'll see what it is, a little bit more information about it, a risk level in terms of is this a serious risk if a project is not adhering to it? And then the amount of access that's required, which I'll get to in just a second in terms of how is Scorecard gathering this information that needs certain levels of access to determine it? And then just any other notes that might be beneficial to know when it comes to those checks. So that brings me to how is a Scorecard generated? It's not people manually filling out surveys and trying to score themselves. This is all done programmatically. We're using GitHub's APIs to basically bring that data in and to score it. And we can do that in a couple of different ways. Right now we have a GitHub action that if you're an individual project that is interested in maintaining a Scorecard and using that to help inform security decisions that you're trying to make, you can run that as just part of every other GitHub action check that you would run. There's also a way to just run this as a command line tool. You can run it locally. It will still end up calling those GitHub APIs to gather that data. So there's not manual assertions that are feeding into it. It's all done programmatically. So just to give folks a sense of where this project is today, it lives within the open SSF and we've had over a thousand installs of just the GitHub action. We're seeing an increased number of people that are watching the project, forking it, starring it. Our goal right now is to continue to put those trends on the positive and just keep the momentum going that we've started to see. We have had some mentions and some adoption also at the organizational level. Eclipse has been using this as a tool to start looking at how different projects can make some security improvements. Envoy did some piloting and has been giving feedback working with the open SSF to improve scorecards. And Sonotype included scorecards in its annual state of software supply chain report and I'll dig into what they said in just a second. Now, since this is a CNCF event, we thought we would mention how the CNCF is using scorecards right now. And thanks to Chris for giving us just a quick quote on how things are being used today. Scorecards is being used as a tool to look at different CNCF projects to help improve that security posture. And it's being used in at least one competition as just an indicator of good security practices. So I encourage you to read Sonotype's annual state of software supply chain report. In it, there's a specific section looking at what are good indicators of a project's likelihood of being vulnerable over time. When scorecards was included in this study, it was shown to have good potential in terms of indicating whether or not a project if following these best practices was going to incur some sort of vulnerabilities over time. Sonotype went further and broke down scorecards into a check-by-check basis, which you can see there's some correlation between certain checks provided better indicators than others. But again, I encourage you just to take a look at the report, it goes into a lot more detail than we have time to today. So we'll jump into a demo in just a second, but I wanted to give a sense of kind of what's existing, what's new in terms of major features per scorecard. We originally launched and then added a GitHub Action that does exactly what I mentioned before. It's checking project scores really on a regular basis as many times as you've set it up to trigger from a GitHub Action perspective. And we've also created basically a weekly scanned Cron job of gathering scores for the top million projects. That was originally bootstrapped on some infrastructure that is undergoing some improvements at the moment. So some of those scores are stale as of December, but we're working on launching an improvement that will basically get those updated. Any scores that are running the GitHub Action are also pushing their data into that same repo. So those are all up to date. So if you're using scorecard as an action, you can be assured that the API that we're about to talk about is reflecting the most up to date data. So without further ado, I'll hand this over to Navin to talk about what the API looks like. Thanks, Brian. Why did we build Scorecard's API? Like Brian mentioned, we've been running Scorecard for a million repositories for a little more than a year. And all of this data is available free for everyone using BigQuery. And we spoke to a few customers and not everybody wanted to use BigQuery. They wanted a standard HTTP endpoint which they can use to get this data. So that was the motivation to use to build the Scorecard API. It's a standard HTTP API with REST endpoint. You can hit the API at api.securityscorecard.dev. And we also made sure that the API is predictable like you can see here, because there's a CNCF conference we tagged in Kubernetes, Kubernetes. So you can take the tail end and replace that with whatever project you want. As long as the data is available, like Brian mentioned, we have a million, 1.2 million repository that we scan and you should be able to get that. And second and another critical thing, we don't have any tokens. You don't need any tokens at this API. This is served as a public good instance for people to come and get this data. Another thing, we are not throttling any of the API request. So please don't DDoS this. Like I said, this API has a get and a post, but I'm gonna talk about get at the moment. This will be if you go hit this API for whatever project is, it'll give out a JSON document similar to what Brian showed as scores being in a table format, but it'll give out a JSON document so anybody should be able to utilize this. We launched this API in last fall. We are seeing about 85,000 to 100,000 requests every day. We see this trend only going up. So that's the little bit of update people are adopting. That goes on to the next one, who's using the API? To be honest, we don't know who's using the API. If any of you are using the API, please come and update or read me. There was one person who came and asked us who was using the API rate limit and we got to know from that, looking at that person, that person is from a university in Germany who's trying to use the API. Okay, all of this is great, but how can this API be useful? What can it do? What you're seeing right now is the code review for, code review data over a span of a year of top 1,000 critical projects. How did we come up with top 1,000 critical projects? We didn't come up with that. There's another open initiative project called Criticality Score and that project's primary motive is to take, I could be wrong, 100,000 projects and decide and give it a score between zero to one and the algorithm for that was based on by Rob Pike's algorithm. And we use that data. I'm gonna shout out to them that the data is available free so anybody can go pull the data. So we use that data, use Scorecard's API to say, now pull the data and show us how the top 1,000 critical projects are doing. Looks like most of them are doing well. There are a few places that need some help. Another check that we said is, are they able to do binary artifacts? How many of them have binary artifacts within the top 1,000 critical projects? Looks like most of them don't have, probably teensy, but of them have. Why are we doing this? This is essentially to go back and help the critical projects and tell them how they can fix. And there are people from within the community of helping doing this. And also, if we can use for the open source projects, all of us can use for our own enterprise solutions, any of the things that we're doing. This API is available for that. Enough with all the talk, show me the code. Okay, so I decided it's not nice if we don't show what the code is all about. So here's, like I mentioned, here's the piece of code. This is written in Go. This is a sample code that I was shared with where this code was hosted. This code, what you're seeing right now is, this is written in Go. I'm from the other Go. Scorecard is written in Go. So, but you should be able to use any modern programming language to go query this data. So what you're seeing here is the meat of the code that goes and gets, that hits that API like I promised, does not have any tokens, any of that, goes and hits that API, gets the data, and unmasks us to a struct. That's the code here. And if I go onto the top over here, I take the thousand critical projects, I use some Go magic to confidently go hit the API and it should be able to get the data. I'm not gonna run this. I'm a little scared of the demo gods. It might not work. So I'm not gonna run this moment. But that said, the code is available in this repository. It's my personal repository. It's not an openness of anything. Anybody should be able to code for clone and use this. Let's talk about the usefulness. Yeah, so we thought we would just kind of act out a scenario here in which Naveen is going to be a contributor to an open source project. And I am a maintainer. And I care quite a bit about security. So I'm gonna have a little chat. Brian, I want to submit a PR with a new dependency. So I think that's a great idea. But I'm really concerned about if this dependency is healthy. Can you talk about that? Would a scorecard be helpful, Brian? I'm not sure what that is. Have you checked out the scorecard's repo? Well, actually, now that I have, this is really great. In fact, I think I'm gonna use this API for checking all my future dependencies. In fact, I think I'm gonna go further. I'm gonna use this in my CICD workflow as well. It's just so easy to use. And why doesn't everyone use this at this point? So these are the kinds of questions, hopefully conversations that we'll be seeing in the scorecard. So my demo is going to be equally unambitious. I am going to just walk through a quick run through of how easy it is to add open SSF scorecard badges. So here's an example of what you can expect to see. It's a pretty standard looking badge on a GitHub readme. When it comes to actually adding this in, kind of narrate what's happening here, you can just go to the open SSF scorecard's GitHub page. There's a specific section in it that talks about how to add badges. And if you go to that section, there's just a snippet of code that you can grab, you can copy it, you can go into your readme of a project that you have, you can paste it in, and then you just have to tweak it a little bit, take the name of the project that you want the badge to represent, put it there, and then once you have those changes made, it's a pretty standard change to the readme, just commit it, announce that you're adding a scorecard badge, and as soon as it's pushed, you should see the badge generated right away. The one caveat here is that you do need to have data behind that badge in order to populate it. So if you're already a large project and you're one of those 1.0 million that we're scanning on a regular basis, you can expect to see that badge populate itself. If you're not, or you just wanna make sure to guarantee it, you can install that GitHub action, and at that point, you'll have that data guaranteed to be in the repo itself that we're pulling from, and it will populate that way. So we don't need to see that again. But Navin, if you wanna talk about just how to get involved. Yeah, we have a bi-weekly, you should be able to hit the QR code. We have a bi-weekly meeting. Please come and hang out. We are looking for contributors, or if you're even using the project, please come and let us know how we can improve. We are all active in the Slack, and the issues we constantly are, all the maintainers are pretty active in helping grow this project. So I think at this point, we are way ahead of schedule. So we have plenty of time for questions. If people wanna just ask, we can just kind of do an ask me anything right now. Yes, so that's a fantastic question. And actually one that I would say talked about a fair amount in the project itself. Honestly, both of those cases, I see is very viable. So if you're in that case of trying to take on a new dependency, we think it's useful information to have so that you can make a more informed decision about is this really a dependency that I wanna take in? Or is this something that maybe needs a little bit more work or maybe has a few follow-up questions before you'd wanna take it on? If you are the maintainer of a project or you're looking to contribute to one, I think it's a great starting point to see what are some of the key security changes that could be made that would be really impactful for a project. So we've seen that in a few different places too where anyone who's looking at trying to just improve how they're doing software supply chain security, they'll generate a scorecard as a starting point and then say what do we need to do to bump up the score or celebrate that they're doing a really good job already and hopefully put up a badge. Great question. Getting a 10 on that in the whole score is hard. Scorecard does not have a 10, so just giving some heads up. Olympic style scoring. Any other questions? That's a configurable, you don't understand anything. You can say please don't publish results and a lot publish anything. So second thing, if we find issues we're not gonna, the moment your score is bad, we're not gonna put it out open for everybody to come and attack you. We're gonna write to your security dashboard so you can go as a maintainer or administrator of the project to go address them. So we take that, we take care of that. Just wanted to add that to that. So yeah, should be flexible and fit whatever kind of data you do or don't wanna share. Great question. At this point, it's still something that we're working on. We do want scorecard to be flexible in that it can run not just on GitHub projects, but any open source project or dependency that someone would either like to be making improvements on or be considering whether or not to pull into their own project. There are some things in the work. There's the get lab work going on. Yes. And to add to that, there are some checks. Take example if you have Git repository on Git lab or somewhere else. You can run scorecard, but it'll not be able to get all the checks. It'll be able to get your contrivenment or checks with score only putting to that. So it'll be able to say, hey, are you pinning your dependencies? Do you have this x, y, z? Do you have binary artifacts? We don't because it's irrespective of whether it's in GitHub or Git lab or anywhere it is. So there are some things that it will run. So to answer your question on that, we should be able to run them, but not get comprehensive results like what you get and get up right now. Good question. We have all of this data available in our BigQuery. It's available free. So you should be able to go pull this data to plot out to say repository was here. Are they going up or coming down? Okay, to answer your question, we run weekly scans, scanning 1.2 million repositories with the amount of GitHub tokens we have. Takes us roughly about four or five days to do that. So we ran, if I'm not wrong, last year, 52 times. So you should be able to take, yes, sir. You should be able to take that and plot it. Absolutely. Yeah, right. So on that front, we are having plans on release, working on a feature, where then if you're going to add a new dependency, what if your pull request gets a scorecard result on that to say, and you decide I want, of these 17 checks, I want four checks. I just want these four checks and I don't want it to go below eight, below seven, whatever number you want to configure. So it's on our works, we are working on that. So to answer your question on that, so that you don't have to fly blind when you're taking new dependency. Yeah, and I think, obviously, the more granular we can get, the better off we'll be. We started bootstrapping the infrastructure to get that initial set of scores. I think, generally, not much changes on a day-by-day so much as a week-by-week, but yeah, you can at least get that date range. Yeah, that's a great use case. Yes? So what are you trying to say is, hey, from the CVE, I want to go back and look at that on the score, okay? Right now, great feedback, but at the moment we don't have that. There are other projects which are working, trying to integrate scorecard into that, but we can take offline, talk about that. Yeah, essentially you'd need some third-party service to do that lookup right now. That's another open-source project, we're just trying to build that. Any other questions? Going once, going twice. All right, maybe extended coffee break. Thank you.