 Verifying the validity of crowdsourced results in open source software, why did we come up with this talk? Because someone reached out to us and asked, hey, how do we trust the scorecard results? I'm Navin Srinivasan, I'm excited to be speaking at supply chain security on this year. I'm all about open source and supply chain security. I'm one of the maintainers of Scorecard, I'm a contributor to SixTour. And then I am Spencer Shark with the Google open source security team, I was very excited to be in Austin last year with everyone and even more excited to be speaking this year. Thanks, Spencer. For people who don't know, the OpenSSF Scorecard utilizes SixTour for its remote attestations, which we'll talk about in this talk specifically. Who knows, who's this? Ken Thompson, if you don't know. Ken Thompson, why are we talking about Ken Thompson today? I happen to meet Ken Thompson in scale, scale is a, so Cal and XExpo, this year is phenomenal to see him over there. He wrote a great paper called Reflections on Trusting the Trust, if you haven't read the paper, it's a great paper to read, there are great YouTube videos which impacts that. It specifically talks about how do you trust the trust? Now given that, how do we trust Scorecard get up action results? Why? Because Scorecard is supposed to be, Scorecard is a security scorer, but how do we trust the results? That's what we're trying to unpack with this analogy. Before we dive into any of this, I thought, what is trust? The dictionary definition. Trust, the firm belief in the reliability, truth, ability, or strength of someone or something. Just making sure that we are all on the same page on this. Another thing that this talk is going to address is how Scorecard is using remote attestations to scale specifically. Why? We need Scorecard for every open source project. Scorecard comes up in different flavors right now. There's a client tool, there's a get up action, there's an API, and there's a BigQuery data set. And right now on a weekly basis, scans about a million repositories and stores those results in API server and as an API and as well as in the BigQuery table. But we want to scale not just a million repositories, we want to scale to every repository. And the limitations to scale are hardware, the get up tokens, and that's why we're using the Scorecard get up action to using remote attestations to scale, which we are specifically going to talk about in this talk. We are not six-storey experts. There are a few folks in this conference who are, but we are not six-storey experts. I just want to set the context straight for that. Before we dive deep into any of this, we thought we'll talk about what are the tools that we're using. Six-store, it's a sort of set of tools for attestation mechanisms in the open source projects. It has got a great set of tools, but we're going to talk about specifically three tools today. And those three tools are something that the Scorecard get up action uses. The first tool is CoSign. CoSign makes signatures in visible infrastructure. One of the critical things that the Scorecard get up action uses Kela signing. The Kela signing is phenomenal feature. It uses the full CO and recore. Those are other two tools of six-store. Full CO, the acronym is a free-to-use CA for code signing. Full CO is for issuing code signing certs, uses OIDC as standard protocol. Another critical thing that Full CO provides is a short-lived certificates which are valid only for 10 minutes. In Spencer's demo, Spencer would demonstrate that specifically, the 10 minutes validity of those certificates. The last tool of the six-store tool chain that is recore. Recore provides secure immutable ledger for metadata. It uses multiple tree data to ensure that it hasn't been tampered with. For folks who don't know what a Scorecard is, a tool for generating security scores in open source projects ensures the projects are secure and ensures it uses six-store tools for signing those things. So what a Scorecard provides, what are you seeing in the left side of the screen right now is the actual Scorecard. The Scorecard takes the best practices and provides score for any given open source project. It starts with 10 as the highest score, zero being the lowest score. On the right side is the QR code to go get the score for this web UI. Now that we are here in this conference, we thought what is best to have this specifically use an analogy for this talk to what the conference tickets as the standard. Issuing conference tickets and creating open source software have a common goal. The goal is to ensure authenticity and trust. We want both of them to be genuine products. The conference ticket issuance process, we go get conference tickets, which in turn provides us unique bar codes or QR codes. This makes ensure that we haven't tampered with the conference ticket and so anybody has valid ticket to enter into the conference event. Same way the project maintainers create on the open source software, the project maintenance create software, Scorecard get-of-action ensures that the project has its security scores and it's signed with six store tools to ensure integrity of Scorecard results. Coming back to the analogy summary, we want to ensure that both of them have the conference ticket ensures that we have trust and authenticity in respective domains and the Scorecard get-of-action uses six store tools to provide that. Now that we have talked about all of those prerequisites, now let's get into details of how these things work. The Scorecard get-of-action uses OIDC as a protocol to authenticate against full CO. The full CO is the tool that six store provides. The critical thing in this process is it uses an ID token as a write in the get-of-action that specifically allows it to use keyless signing to authenticate against the full CO. It captures some critical metadata that Spencer would be demonstrating part of the X509, which is going to be useful for validating the integrity of the results. What you're seeing here is the entire process of how the attestation works. I'm going to start explaining from the bottom of the diagram from step one up to step eight. Step one is the code gets committed to the repository. Usually the Scorecard get-of-action runs, merge domain or master. The step two is an opt-in feature. If the get-of-action has an opt-in to publish results, it uses the step three to use core sign to publish results to RECOR. Simultaneously, it takes those results, pushes to the web app, which is a Scorecard API. That in turn validates against RECOR to ensure it's valid. It also checks with get-of-api for the validity. All of this is the general overview. Right now, Spencer is going to demonstrate each one of this in details. Thank you. Thank you, Naveen. When a repository opts in to publish their scores, this is what helps us scale. As Naveen mentioned, we are currently running Scorecard on over a million repositories every week, but there are processing constraints and token constraints since we're hitting the get-of-api where if a repository wants their results to be part of our data set, they can install the action and publish their results. We don't want to accept just any old results because we want our data set to be valid. We don't want someone saying that they earned a perfect 11 out of 10 score. The analogy for the next few steps that I'll be talking about is how repositories can present the results and how we can verify it. During the first step, when we push the results from our GitHub action to the web app, the analogy I'll be using is presenting a conference pass. We didn't for this particular conference, but some conferences, you go up to the door, you have them scan your QR code, and it looks up a unique identifier. I think it's the conference registration in our case, and that's what the web app does first. When you publish results to our data set, there's some post endpoint that we send it to, and you can see between the two gooses what we might be sending. The badge that Scorecard is presenting is these are the results. I'll break down in the demo what results look like a little more, but in the example we're sending a score of 5.8. The branch that the analysis was done on, and for a couple Scorecard reasons, we always use the default branch, and then finally you'll notice an access token. Part of scaling, as I mentioned, is the GitHub rate limits, and GitHub tokens are a token included in GitHub action workflows that help us to do that. We are sending it over to our web app, but there's a couple things we do to make sure that we're not getting any permissions that we shouldn't be. The first is that GitHub tokens are short-lived. They're only valid while the workflow is running, and after it finishes, it doesn't provide any access. We do check before we send any token to our endpoint that it follows the format that GitHub secrets use for their workflow tokens. There are different prefixes for personal access tokens, for app tokens, and the GitHub token, for example, starts with GHS. We are taking steps to ensure that any token we use for crowdsourcing purposes, we're just using for the quota, and this is not some long-term access to the project. After we scan the QR code, we essentially want to validate that it points to like a valid conference pass. In the six-door example, when we send a score to be admitted into our data set, we are going to look it up in Recor's transparency log. Being a transparency log, it is immutable, it's append-only, so with a unique identifier, we can go back to Recor and say, give me the transparency log for this entry. I want to do some analysis on it. Instead of a QR code, a Recor transparency log entry will have either a unique identifier, a log index, or they support currently looking up by content hash, so a SHA-256 hash, which I'll talk about in the demo. It is important to note that six-door marks this search lookup as experimental, so there's no guarantees in terms of its availability. Six-door does offer SLAs for its other features, but there's no reason that the scorecard can't include the unique index when we're sending results over. So when we get a transparency log entry from Recor, we verify the inclusion proof since the transparency log is a mergel tree. It's a pretty standard inclusion proof. And then we also check the certificate. So as Navin mentioned, since we're using cosine, we get a 10-minute certificate from Pulsio. So we're just trying to make sure that when the results were signed and included into the transparency log, all the time stamps line up. So I will now jump into a demo to demonstrate some of these steps that the web app is doing. It's a little easier to follow as a command line demo instead of walking anyone through the code, no one wants to see that. But one of the things when you run scorecard repository on your repo, what you can do is show off your score. So it's a bit of gamification for best practices for repositories. The score is out of 10, so we want people to be able to show off their high scores. So when you see a badge on a readme page, and this product is saying I have a score of 9.5. Some questions that you might have is where does the score come from? Can I trust it? Has the score changed since they published it? So I'm going to be walking into some steps about how to start from a readme badge and answer some of these questions. So I'll be going through a couple commands that I'll talk through as I do it. The gist is available on GitHub. If you scan the QR code, it will take you there, or it's in the slides. So I will move over to a prerecorded demo. Just do not tempt my fate with the demo gods, but as Navin mentioned, some of the results are available through our API, or rather all of our results are available through the API, which is normally api.securityscorecards.dev. I'm hitting the staging server just because we had some PRs to change the order of some of our results around so that the content hash matches. And this is what a scorecard result might look like when you out port it from the tool. A little bit of information about when you ran it, which repository you ran it on, which commit it looked at. Some information about the tool itself, so which version of scorecard are you using. And then an overall score. So this is a 9.4. This has to do with scorecard's own repository. And then it breaks into one of the 19 checks that we run on repositories. I'm not going to get into all of them, but you can see, for example, binary artifacts is one thing, one best practice that we try to enforce. Saying it's usually not good to check in a binary into your repository. Hard to code review, hard to detect changes, that sort of thing. And then I'm going to take the SHA-256 sum of it as the unique identifier that I'll use when looking up the transparency log. So I'll just save it to a environment variable so that I don't have to type it in manually. And then recourse.li is a tool that recore has to interact with the transparency log from the command line. And I will search using the SHA as the unique identifier to get the transparency log entry. So the UUID for the transparency log entry that I care about is here. And again, I'll put it in a environment variable so that I don't have to type this long hex digit again. And then I'm going to use recourse.li get, which here's the usage for it. And ultimately, I'm just passing in the UUID to get the entry. And then we'll take a look at what that entry looks like. All right, so big blob of text, we see the information that we looked it up with, there's the unique ID. There's our hash, and more importantly, the signature. So the signature itself is in the content field. And the certificate that a lot of the useful OIDC information is in that public key content. So this is base 64 encoded, and to do the analysis with OpenSSS tools, I am going to extract it. So I'll just quickly send this off to JSON so that I can extract the fields that I care about. And then these next few commands are long JavaScript or JSON parsing commands, so I'm just going to copy and paste them. But essentially, I'm just saving the signature into its own file and saving the certificate into its own file. And then using OpenSSL, we can print out what the certificate looks like because we're all like human readable things. So it looks like a normal SSL certificate. We can see that the issuer is six store. This came from six store after all. There's a ten minute validity window. So this demo was recorded last Friday, and we see May 5th. It was valid for ten minutes. There's a public key in here, which we'll use to verify the signature. And then most importantly, there are some 509 extensions that contain all of the OID information that we got from when we ran this with a GitHub action. So we can see the URI and some of the features that'll break down. But this is nice human readable format. But how do we know that the signature actually matches our file? Well, I will extract the public key from the certificate so that we can double check the signature, right? So I've just extracted the public key and then I will run the digest command to verify the signature. So part of the transparency log entry was the signature and we extracted that. So here I'm just going to verify using the public key from the cert and the signature from the transparency log entry. And finally, the results that I fetched from the API. And we'll see a verified okay to signify that the signature is good. So this certificate corresponds to the results we pulled from the API. And everything checks out. But that assumes some trust in the public key. We don't really know about this public key. So I'm going to fetch the full CO route from six store. This key is also on their GitHub. There's a bunch of different ways of getting it. But I will also verify the cert came from six store. And it's important to note that I'm not going to check the time because the certificate was only good for ten minutes. So when I recorded the demo, ten minutes has passed. So just assuming everything else works about the certificate except for the time we see that the results are indeed signed by a full CO cert. Which means in summary we started at the API, which is what if you click the badge it currently goes to. We're working on a more human readable format as well. We verified the transparency log by fetching it with recourse CLI. We extracted the certificate, made sure the signature and the public key matched what we expected. And we were able to answer two of our original questions. Where did the score come from? Came from some GitHub workflow that was specified in the certificate. And has the score changed? No, because the signature matched from the result that we fetched from the API. But we still need to trust the original score. So provenance doesn't, even if we know where something came from, we still have to trust who produced it. And that's the second part of what Scorecard does. So a conference pass can be legitimate. It can get you into a conference. But that doesn't mean that you can get into every conference or talk. I, for example, had the open SSF day. So when you try to get into that on Wednesday, they checked your badge to make sure you signed up for it. And that's the analogy for the conference tickets. So just because some results were signed by Recor and came from a GitHub workflow, that's still not enough for us to say we want to use these in our data set. And for that, we're going to be looking at the certificate from the transparency log. Because it has a bunch of fields from GitHub that uniquely identify the workflow. And then we can reach out to GitHub and do our own logic. So for anyone that would want to crowdsource data, this is where the general use sigstore to get provenance and attestation part ends. And this is where the Scorecard specific logic is. Where the scores are being published by the repository they're analyzing. And that they're produced by our official GitHub action without any tampering. So again, here's a certificate that we saw during the demo. It comes from Sigstore, valid for 10 minutes. But most importantly, the 509 extensions where Sigstore puts interesting information. So we can see the URI field uniquely identifies the workflow that uploaded the score. The issuer, that URL corresponds to GitHub actions. And then a couple other pieces of information, what kind of trigger caused the workflow to run. We don't use that information. The Shah of the workflow that was used, that's one of the pieces that we will look at in the next few slides. The repository that this came from, et cetera. So all of these fields are keyed with a bunch of numbers, which I've annotated for your benefit. If you want to find out more, I've included a link at the bottom of the slide to find out all of the information that Sigstore provides from an OIDC token. So the very first thing I said is that Scorecard wants results to be self-published. And this is really just a consistency check, where when you run Scorecard action, it looks at the repository that's running it. So if you're trying to upload a score that's not in the repository that's running it, it's kind of like a consistency mismatch. So this is just a brief sanity check that we do, that the repository being analyzed is the one uploading it. So here we can see that the name and the Scorecard results matches the name from the certificate. But the more important bit is that the scores are authentic. So we don't want a repository just generating all tens and then uploading it. We want the results to come from our workflow. So here is a snippet from the certificate again shown here, and we see with the URI field, we get a specific workflow, so .github slash workflow, scorecard analysis.yaml. This is the workflow that produced it, and we have a shot. So with these two pieces of information, we have a way of uniquely identifying the workflow and then fetching it with GitHub. So we do this with the GitHub token that gets sent over. Any interaction with GitHub uses some quota. And again, this helps us scale. And then we can look at the workflow and question and do our validation. So there's two things we really look for, and that's minimal permissions and allowed job steps. Again, minimal permissions goes back to the fact that we're sending a token to ourselves. And in an effort to be good stewards, we're trying to say do not send us a token with any right permissions. We just need this for API purposes. So if you have at the top of your workflow that you're giving right permissions to the token, we don't want it. We're going to reject it. We're trying to encourage you to set up your workflow using a read-only token so that we minimize any risk in terms of sending a token over. But more importantly, we look at the steps in the workflow using similar steps with how we parse GitHub actions for scorecard analysis. We parse the same actions here to walk through the steps one by one. So the very first step we see is our official scorecard action, OSSF slash scorecard action, pinned to some SHA or TAG. And that's our action, so we're happy to see it. We'll give it a green check mark. But if there's a step next in the workflow that's called malicious scorecard action, well, we probably shouldn't trust the results from that because they could be sending any bogus JSON that they want. So I like to call that a scarecard because it's a fake scorecard. But if someone's trying to send scores with 11 or 10 or 12, that kind of stuff, we want to reject that. And then finally, there's a few additional steps that we will allow because you can do other things with the output from scorecard. You can upload it to the code scanning dashboard. You can have it as an artifact from the workflow run for you to do it later. But the important thing is that the allowed job steps are minimal. There's a list of, I think, five that you can find on the README. So sometimes people try to set up scorecard action on their repository. And they're getting failures because they're trying to do other things. So we know it can be a bit of a pain, but it helps to inspire confidence in our results. If we can take a look at the action, verify that it's doing a minimal amount of things, and if we trust every action in there, then we can trust the results that it's producing. So that's a bit of a conclusion. I'm actually going to jump back a few slides to the overall picture. We started off with a repository where the analysis happened. A user opted in to publishing those results to help us scale our data set. They want to be able to display a README badge. They want to help out generating scores and showing off their score. That result was signed with cosine, which used a ephemeral certificate from Fulcio. We're trying to make this as painless as possible, which is one reason we use the keyless signing. That certificate entry got included in Recourse Transparency Log. So that in step six, when we moved the results from the action to the web app, we were able to look up detailed information about where these results came from and then do our own analysis with the GitHub API to trust that we can trust the results. These are valid results and include them in our data set. So in conclusion, scorecard and six-door combined make for secure and trustworthy open source projects. Six-door from a provenance and attestation point of view and scorecard for a development best practices point of view. And in thinking about this crowdsourced validation, the conference ticket analogy hopefully simplifies the sort of attestation and legitimacy that we are doing with the scorecard web app. And then yeah, crowdsource results require two steps, provenance and attestation, and then validation. So for anyone else looking to do crowdsourced research on GitHub or any sort of generation, I recommend six-door as a easy mechanism to do it. The keyless signing with OADC makes it pretty easy. And then your validation steps are going to depend on your use cases, but you've got a demo of how scorecard checks its workflow contents. And it wouldn't be open source summit without saying that you can contribute. So of course PRs are welcome. We do our development on GitHub and the OSSF scorecard repository. Come make an issue, submit a PR. There's a lot of different ways that open source works. So scorecard may not detect everything of how you're doing it. We're open to making it better. Come help us do that. We are in the open SSF Slack channel under security scorecards. You can join by clicking on the link below. And then we have bi-weekly meetings every two weeks. It's on Thursdays, at least in North America. And you can find out more on the public event calendar linked at the open SSF's Getting Involved page. And then we'll open it up for questions from either Naveena or I. There should be mics by the audience. It's curious. For multi-repository projects, do you have any plans to scale scorecard maybe to the organization level for a GitHub organization? Simple question. We don't have any specific plans, but I did see. But is your question so that you can install on your organization, which in turn goes to every other repository, is that your? So. Oh, do you want a single dashboard? I'm just trying to understand what are you specifically trying to achieve? So if, say, your project was comprised of 10 repositories, in this model you'd have 10 scorecard scores, one for each of those. I guess I'm looking for an aggregate scorecard score for the entire project, which would be 10 repositories worth of code. At this moment, we don't have anything, per se. But that said, there is another open source project. We're just trying to do this specifically. And they are, in fact, trying to contribute, bring it into OpenSSF. It's in the process of doing that. But by the time if you want, you should be able to hit us up in Slack, and we should be able to point you to that. OK, thank you. I guess I'm cheating a little bit. This is really a question. It's an additional response. What challenge is that the different repos, even though it's an overall, it's all the same overall organization, the different repos within may have very different results. Some may be active, some may be not. Some may have all sorts of binaries and others things in them. So even though it may be conceptually an overall one large meta project, it's still, for a scorecard point of view, it's often, I think it's very helpful to analyze it per project, because they can be different. Any other questions? Great. OK, thank you. Yeah, we'll be around.