 Hello everyone, welcome to our talk on building Salsa 3 conformant attestors for artifacts generated on GitHub. My name is Astrali and I'm a software engineer on the Google open source security team and I also work on phase on a team about open source FHE translers. This is my co-speaker Ian Lewis. Yes, I'm Ian Lewis. I'm a developer advocate at Google Cloud working on containers, supply chain security and I'm currently on loan to the Google open source security team working on Salsa tooling. All right, thanks. So first we'll be going over some background information on attestors and Salsa 3 conformants. Then I'll be handing it over to Ian who will be discussing the challenges in building an attester. After that I'll go over an example attester that we've created that's based on containers and then Ian will finally finish it off with diving into our templatized framework for attestors which we're calling the delegated attester. All right, so many of you are already familiar with the software delivery pipeline and the complexity that exists at each of these stages. As an open source engineer one can be both a developer of a software delivery pipeline and also a consumer of other people's software delivery pipelines even at the same time perhaps either by pulling in dependencies and so forth. Each software project has these dependencies that are pulled in from other upstream developers and then used to create your own new software. But during any of these stages from either the source and the build pulling in dependency and then all the way to deploying your resource anything can go wrong which may cause a software supply chain attack. So the threats and attack factors in this are all kind of marked at each of these stages by through these links between let's say the source and the build and also what can occur at one of these stages like during the build. And there have been many sort of countless examples of real attacks that have happened in the past couple of years in each one of these places. So at the start for example you might imagine bad code could be committed to a source repository or even perhaps that those source control systems could be compromised themselves. The same thing could happen maybe at the build where maybe perhaps bad dependencies could be injected but also the build platform itself may be compromised. And then likewise through all of these a compromise could exist both at the place where translations are happening or through the links that are happening between these stages. With so many different potential attack factors at each step it's really hard to trust that each of these in conglomerate are actually happening in full with trust. So on the flip side there are projects like the salsa framework and others that are creating frameworks for securing this pipeline in a methodical way. At each of these steps you could imagine information being used to validate that potentially a transformation or that the event actually occurs safely. But if we want to automate these validations with code we do need to start with some data. The data that I'm going to be talking about is really these attestations. And this attestation represents a proof of an event which makes claims explicit about how the event was performed, what the outputs were, and everything that went into the process of creating that output. Essentially it allows that user to trace a particular output which we are going to call subjects from the inputs with the build process that's specified in the environment. So in software and particular in the software delivery pipeline the relevant attestations may be things like code scanning, code commit, source repository control scanning, build provenance which may help that build steps, software component analysis which may help determine what sorts of build dependencies are going in, and things like package repository attestations. And there's many many more as you could see there were tons of places in the tax where things could go wrong and each of these could benefit with some data and attestations that would describe what is actually occurring. So for example one of these may be an SBOM attestation. What this information may tell you is what is going on from that dependency to that build step and help someone who's consuming that software be able to determine what sorts of dependencies went into that production and were there any ones that were vulnerable to either a potential CDE at the time or potentially a CDE that may have arisen now. So it can also help people who are involved in their own SDLC to determine what sorts of dependencies are going on in their own projects. So like this is just one example relevant to this whole build and dependency portion of the SDLC of an attestation that you might use during your SDLC release. So for example maybe you want to block a release from being published until you check that there are no vulnerable dependencies. However now if we assume that we actually have some attestation data for example SBOMs or maybe build attestations or source attestations the next step is actually figuring out how to trust the content of them before taking action in the software delivery pipeline. For example you might want to decide to allow a publication to a package repository only if the builds were successfully created by an actor or a CICD provider that you expected. If we don't verify these attestations malicious actors could inject those malicious artifacts or tamper with the build process in the SDLC but also provide a malicious attestation alongside them. So without actually doing some sort of trust or verification process you would have no way of determining whether your attestations were actually useful or not. So with the verification process we require two things. The first is integrity. So this is that the attacker could not rewrite or tamper with an original attestation. So essentially once it's been created there's no way to sort of tamper or rewrite the content of that without detection. And the second this property of authenticity and non-forgeability rely on the trust of our attestation provider. So authenticity ensures that we can identify the attestation creator. I like the person that's going to create that attestation that we can create a policy later in the pipeline that verifies the authorship of the attestation. With that verification of the author we can determine non-forgeability and non-forgeability is the property that the attestation content could not be influenced by users even the ones operating the SDLC pipeline. So in this case let's say I'm triggering a build. The attestation that would be created about the build process could not be influenced by me terminating complete zero trust in the system. So think that the attestation provider is creating an attestation that is in an un-interferable transparent isolated and both from the build and from the users who are creating those builds themselves. So now I'm going to hand it off to Ian to talk about how to actually create a build or how to build an attestor in full. Cool great so let's go to the next slide. So in order to kind of understand a little bit more about what attestation is and what is actually kind of happening we need to kind of understand a little bit about attestors and what attestors actually do. And the idea that attestors for an attestor is that they attest to something that happened. So the idea is that like an attestor attest to the details of an event that happened in most cases this is some sort of like software build or some sort of scan like software scan. As Oscar mentioned things like generating S-bombs these sorts of things can be attested to and that typically like the event that we're attesting to is going to be something that generates some sort of artifact. So in this case like scan results or a binary or some sort of software package. And another really key important aspect to an attestor is that it's going to be attesting to that thing happening but being completely isolated from the actual event happening. So your software build itself as Oscar mentioned can't modify the modify the attestation or change anything that the attestor does. So this means that essentially the attestor can't be impersonated and so this is something like you know like a notary or some sort of like witness to invent like a wedding or something where you have an actual person that's separate from the actual people going through the process that attests to the fact that it actually happened and the right people there. Okay next slide. So what happens if the build platform kind of itself doesn't really support attestation yet right? Like the build platform itself can't attest to the builds happening or it's just providing sort of a set of you know build pipeline primitives or things like that. Can we one of the problems that we faced was like could we decouple the attestation logic from the build platform itself and use essentially just the the primitives that the build platform provides in order to do attestation of a build happening separate to the attestation. So next slide. So one of the things that we did was we came up with a kind of general framework for how to build attestors on top of a an existing build platform. So this would be like your normal CI systems and in our case our main target so far has been GitHub but this kind of general concept could apply to other CI systems if they provide the right types of build primitives and we'll talk a little bit more about what the actual requirements of those build primitives would be for us to be able to build attestors but this sort of concept could apply there. So the idea is that we use the existing kind of build primitives so in the case of like say GitHub actions I'm talking about things like jobs or like build you know build steps and the things that GitHub actions provides to you in its API and the attester you know performs this build process and attests to the process but the building signing are like built so that done so that they're isolated from each other and this is done using the aspects of the actual framework itself. So next slide. So the idea is that like your build and artifact their build and your generating of prominence and signing of prominence using like doing the attestation itself is done in a in separate VMs or in completely isolated jobs and this makes it so that the build can't actually interfere with the attestation step but doesn't necessarily require then require that the build platform actually support the attestation. So we can actually kind of build that on top of the build platform but still have the isolation requirements or properties that we want. So what we do with the build step is we build it in our secure VM and then we press the artifact digest from the results to our attestation step using the normal kind of inputs and outputs. So this is another kind of requirement is we have like a secure way of or a modifiable way of doing inputs and outputs between jobs. So next slide. So the attestation step after we've done the build will record the trusted system parameters. So these are the things the parameters coming from the build platform and the environment as well as external parameters. These are inputs to your build step and record things like the artifact digest and the and information about the overall environment and the build process in order to give you a what we call provenance which is a set of metadata that tells you about how the build happened and what sort of parameters or what sort of inputs were used into the build process and then tells you what the actual output was. Okay next slide and then once we've generated that metadata as part of our attestation step we're going to sign that metadata using a trusted private key infrastructure so our public you know PK at public key infrastructure. So this is like a way of so this will tell us that the provenance is trustworthy because cryptographically signed it and we can verify that signature later. And one of the things that we do ourselves is though that instead of having to have static keys that we've generated ourselves and we have to manage storing the keys we use the machine identity which is essentially OADC the OADC provider on on github or the OICDC provider provided by the build platform in order to generate signing certificates that we then use to sign the the provenance and this is gives us a way of signing the provenance and using certificates that can be short lived and we don't need to manage the the storage and the the security as of those keys quite as much. So next slide. So once we've done all of that signing and generating the stuff we need to be able to verify that at the end you know before we make any decisions about like actually using the the stuff that was generated by the by this process or by our build process. So the verification process is essentially establishing trust making sure that this metadata hasn't been altered or changed in any way. So this process is basically just verifying that the statement is subject is right are the is the artifact that I'm going to use the actual artifact that's being described then we verify the signature on the attestation to make sure that the signature is describing the the provenance correctly has signed the provenance and then we also can verify the proven identity. So this is like verifying the the machine identities that we've that we were using to generate the certs as well. And then once you've established trust by using through a verification process you could then like use that metadata in things like policy engines or things like that in order to actually make a decision about how to use the package or how to use the artifact. Next slide. So in summary like building in a tester like is possible on top of build build platforms without the build platform necessarily supporting it but implementing it is is pretty hard and it requires some specialized security knowledge. So individual package developers shouldn't really need to be knowledgeable about these types of details we really want to be able to provide some sort of off-the-shelf solution that people can use in order to just add it to their build pipeline and so that people can just kind of like use that off the shelf and don't have to have this specialized knowledge. So that's that's one of the the main things that we want to accomplish with the projects that we've been working on. So now I'll turn it back to to Asra and she can talk about some of the the actual projects that we've been using developing in order to make use of this kind of generalized framework. Okay thanks. Yes I'll quickly describe a container-based attester which was one of the testers that we recently created specifically to target a variety of use cases. So as Ian mentioned the Generate Providence step in the tester records information about the build process and the inputs for the build. But because the attester must perform the build so that's why there's the gray build is inside the red attester. The attester must be configured to understand the semantics of the external parameters and also be able to determine what the build process is in order to perform that and then generate the information about it in the provenance in a separate job. As an example if you want to see what that kind of looks like here's a snippet of that provenance that's included and there's a small record called the build definition which is responsible for describing the inputs and type of build. So in this case this is a Salsa Framework Salsa GitHub Generator Go build so it's very specific to the Go ecosystem which takes a source URI with a with a specified digest and a config file in order to configure a build. By that specificity and the inclusion of information in the provenance means we kind of have a problem. Problem is in order to produce an attestation record like that we need to have a Go specific attester so this whole entire red box is Go specific and it's one that knows what those external parameters are to expect i.e. that source and that config file and what to perform in that build and then it has knowledge to do the attestation recording in a separate job. But then that means that we also need one for the Rust ecosystem and then also one for maybe the Node ecosystem and then one for scorecard attestation and one for CDEs and then we just kind of get into this mess. So we came to a problem that we needed to solve of how generic can these build processes be while still being useful. In other words we can't really maintain this many builders that have deep knowledge of the build process and are able to create that rich provenance information build definition. But fortunately there are existing solutions to creating build or at least a generic template or platform for what a build might look like across different ecosystems of common configuration and that also happens to provide isolation and reproducibility sometimes and this is the end that which we created and then to this end we created one type of builder that would allow for these different types of ecosystems and target types and is already in widespread use which is using container images. So the use case here is we could maybe create a single common attestor built off of using container images with a common configuration that would allow anyone in a variety of different ecosystems to be able to create their build. So what this container based build tester would do is to create a artifact or a binary whatever it is a scan given a base image that packages the tooling required to create that binary. It also takes a build command that is needed to invoke that base image to perform the build and also the specified output path. So does it expect to create a disk folder or does it expect to create a scanner JSON and so on. So this means that we can allow any tool chain or ecosystem so long as it packaged inside that container image by the tool creator. What happens then in the provenance is that it identifies those inputs which mean that it identifies the base image that was told in the execution command. Because of that a consumer of that artifact may be able to inspect that attestation record and potentially reproduce or at least recreate the process of that build. So the idea behind this was that there are many examples of container tools that are used in supply chain security. So for example trivy or OSD scanner or scorecard scanner and also work that's being done by others to create build attestations based on this common type of build language. So taking a look at the resulting provenance we see something similar to the go one where we have a container based build type along with the source repository that was used to be mounted in the container. The builder image which is used for the base image and also the command that was used to invoke the build. So these are all specified within that external parameters which are again the inputs to that build. So with all three of these pieces after user establishes trust which is that piece of verification that Ian described before I'm determining that that provenance statement itself is actually trusted. Now you can validate the contents inside that statement. So sometimes this is as simple as is this the correct builder image that I expect. So for example if I had a scorecard JSON is this actually a scorecard base image that was distributed by OpenSSF or if I'm using a container image that I packaged myself is this base image the one that I packaged. So we do have to use this extra step of is this base image trusted and then you can optionally do this is the source repository trusted that was actually used to create the build and also potentially if the build is reproducible you could further than validate that the build was reproduced by simply pulling the base image and then running it with a specified command. So finally I'm going to discuss a little bit about why this still isn't enough sometimes. So what I described before is a nice solution for a variety of ecosystems that can be used in a single attester but it does require that that person who's distributing the tool packages all their logic inside a container. So if let's say I will really want to use My Cool Scanner I have to request that My Cool Scanner is packaged inside a container image and also packaged inside a container image in a way that I could run it against the source repository inside this attester. So now I'm going to hand it off to Ian to kind of describe what happens if that's not enough and what happens if that tool provider doesn't have enough flexibility with just a container-based attester. Awesome yeah so we've uh so as Oscar mentioned one of the limitations of using a container-based attester is that you have to have everything packaged in a container and many ecosystems may not have that kind of already done for them and so we came up with kind of a separate way of solving this problem that we're calling the delegated attester framework and this allows people allows kind of build tool authors and other folks to use to build essentially what we're calling delegated attesters which or that will allow people to or tool authors and ecosystems to create their own attesters. So next slide so we um the main problem that we want to solve right is as I mentioned that tool authors want to create a tester we want tool authors to be able to create our own testers for their own tooling so this means like if you have something like for example Go Release or for the Go in the Go ecosystem or you have NPM for the Node.js ecosystem like different types of build tools used to package up a software in order for to release it and to publish it. We want people to be able to use those tools and to also be able to use the the attestation logic that we've developed. So next slide and one of the ways that we've noticed that folks do this is at least on GitHub is many of the tool authors have their own kind of GitHub action that they've already developed that people can just slot into their pipelines or slot into their workflows in order to actually run and generate like their do their releases and so our framework that we've developed will allow you to wrap an existing GitHub action and to provide a GitHub action callback which we will then call to actually do the build and so this can include the you're already existing GitHub action in order to actually perform the build and then we can generate provenance based on the action that you you're existing action so this means that you don't have to like rewrite a lot of logic that you've already developed and you're also using logic that's familiar to you like in the case of GitHub actions so this just gives end users a better experience they don't need to know the details about like executing the tool itself and they don't the tool authors don't need to necessarily know about all about attestation and how to do that securely and yeah so the the attester itself can actually make use of things like github's builder identity and provide all of the the security benefits without anybody else having to know about it next slide so this basically allows existing third party tools to create attesters that and all of these folks don't need to understand salsa there's really low overhead to integrate because most you know particularly open source developers are using github actions and already and so they don't need to rewrite a bunch of stuff the tool authors maintain authorship for their their workflows or for their their attester so when an end user uses the attester they're using the tool authors attester they don't have to create a new kind of relationship or understand like the salsa framework or some other some other organization or some other group they can consume something that the tool authors have written so in the case of like say go release or or npm they're using the npm builder or attester they're using the go release or attester so they've already they're already using that tool they're just now using a tool a separate you know an attester that's provided by the tool authors next slide so one of the kind of first uh examples of or implementations that we've used this framework for is for the no gs or npm attester that we've developed and this develops builds npm packages to provide and provide non-forgeable prominence and uses npm scripts which npm authors are already familiar with so things like npm run build or whatever are these types of scripts and you defined in the package json for your package and this does all of the the normal you know separated isolated prominence generation that we've described earlier and then generates prominence for the package and provides an additional step to actually publish the the package itself and what's really kind of special is that npm js.com or the npm public registry is now actually supporting uh attaching prominence to your package as metadata and so you can actually uh see that associated with your package and distributed with your package in the registry so next slide so you can see that on the registry itself if you look at the the web page for your package and from there you can check out some things like the links to the source code some links to the build script that was used to generate the package things like the public register entries that describe that this event actually happened and provides kind of an auditable piece of metadata that you can use for during verification and so this provides a lot of extra trust in your package when you've developed it and published it as well as providing a way of actually verifying that strongly when you install the package so the next slide so finally we have like a way of verifying these packages using the salsa verifier project that we've developed for verification um and this allows you to do things like verify the signatures and verify the things like the builder IDs of package names and versions as well as the source code repository where it came from and we're working on ways of better integrating these with the kind of native tooling for different ecosystems so next slide so just wrapping up we are looking to improve these different things and expand to different ecosystems and make these easier for people to consume so you can go to the next slide so if you're interested in learning more about supply chain security and some of the work that we're doing with the salsa framework and the projects that we're developing particularly the salsa github generator you can check those out at these links these are the projects that we're typically working on but the we're looking at kind of expanding not just to know gs or containers but also to other ecosystems like python or ruby or those type of other language ecosystems all right next slide and so thanks for joining our talk um and uh let's uh build a safer open-source software ecosystem together all right thanks everyone