 OK, I think we'll get started. So this talk is on spicing up containers of image security with Salsa and Guac. My name's Ian Lewis. I'm a developer advocate at Google Cloud. I focus mostly on container and supply chain security. And I am currently on loan to the Google open source security team or the ghost team, as it's colloquially only known, working on Salsa tooling. So I said that this talk is about spicing up container image security, but I'll be pretty mild. No hot takes or anything like that. I want to start by, I guess, kind of level setting a little bit and give a little bit of context for why some of the things I'm going to be talking about are important, like why we're doing these things in the first place. So hopefully this kind of gives you an idea of why some of these things or how some of these things are actually solving problems in the real world. And because it's really easy to get lost in the details. But essentially, software supply chain security has been a hot topic, I guess, in security space for the last couple of years. And this is basically because attackers have shifted their focus towards the software supply chain. And part of the root causes of some of these are that development and CI systems are essentially not treated like production or are not given the same level of love in terms of security hardening as production systems are. And so there's been a culture of, essentially, development CBI systems are essentially YOLO. And production is where we actually harden things. And that's not actually the case because there's channels to get software or malicious software into production through these CI and development systems. Another aspect of this, why it's a good target for attackers is that there's lots of places that attackers can insert themselves. Apologies for the slide. It's a little bit harder to read. But essentially, attackers can slot themselves in, essentially, anywhere in the development life cycle. So from source to source code, the developer's development environment, through to the build systems, through to the package managers and distribution systems you use for deployment, as well as all the dependencies that go into the software that you're building. Today we'll be pretty much primarily focusing on the build aspect of this, so your CI systems and the build systems, and having traceability from there into your production systems. But it's good to know that there's a little bit of context here for why we're talking about these things. And another overarching theme is that there's a lack of information about the software that we consume. So as we develop these things and we consume software, we download it, we run it, there's a lack of understanding about where that software came from, how it was built, who built it, and being able to verify that and make sure that it was actually built by the people we think it was built by. So this comes out in a lot of ways. So you have something like, how do you know essentially where your software came from? We can download artifacts, and we can download them for example, from a GitHub repo or something like that. But how do you know that that artifact was actually built from the source code in that repo? There's no real guarantees for that. And similarly, this is even worse in the cases of where the software is being distributed on a totally different platform. For example, Docker Hub on GCR or on other container registries. So for example, you'd say, is Ian Lewis on Docker Hub the same Ian Lewis as the Ian Lewis on GitHub? There's no real linkage there that you can be sure is something you can verify. We won't actually solve that particular problem, but that's kind of one of the root problems for this. But also, is the actual image, say like Ian Lewis Foo that's on Docker Hub, the same or built from the source code that's in the Ian Lewis Foo repository on GitHub, for example? And how do you know that? How can you verify that that's actually the case? So essentially, how do you know how the sausage is made? How what goes into the food that you're eating? How do you maintain some of that hygiene? That's a pretty difficult thing. So for example, let's say that we have a software repository. We have a GitHub repository. We have the Salsa and Guac demo repository here. And we want to be able to run this as a container. But we're going to first take a look at the source code. We've kind of found this thing out there that looks like it does what we want it to do. We'll look at the source code here. We'll say, OK, this is going to print out some version information. It'll print out the Git version and what tag it was built from and things like that. That's cool. That's what I want. And we can even kind of look on Docker Hub and see that we found an image there that's built by a guy named Ian Lewis. And it looks like it's the same name. It's got a link to the Git repository in the readme. We can look at the tags. The tags seem to match up. We've got the same sorts of tags as we have on the releases over here. We can actually click on this tag and see some of the image layers right. If we look at our Docker file, we can see, OK, this is a two-step build. So we can only see the second half of this. But we can see that these steps here line up with the steps that are present in the Docker image. This pretty much looks good. This is something that looks like it's linked to this repository. It was built from this repository and all that. But if we actually go and try to run this, and this is just a Docker run command to run it, this is essentially building and running code that's completely different from the code we just looked at. And this is because we have no linkage between the source code repository and the actual image that we're downloading and running. So for example, the Ian Lewis on Docker Hub could be a completely different person. If we don't verify that. Also, the names don't necessarily match up. Somebody could just make up their own account on Docker Hub. Or for example, the account credentials could have been leaked or gotten out on Docker Hub, for example. So you could have a personal access token that was expoltrated or was compromised, and that allowed an attacker to upload an image. So this is an issue that is fairly pervasive. You're not able to trace back to the source code. So how do we solve this problem? So one of the ways that we solve this problem is by doing something called the software attestation. So what we're going to do in plain English when we say something like is an attestation or we attest to something, we're basically asserting that this is true. If I say something like my name is Ian Lewis, I'm attesting to the fact that my name is Ian Lewis. And then ideally, I would have something that verifies or that somebody else could use to verify that I am actually Ian Lewis or that the statement that I said is correct, like my passport or some statement that I've actually signed with my own signature. So in the case of software, what we're going to do is we're going to actually use some metadata, which we'll call provenance, that describes information about how the software was built and where it was built. And we're going to combine that with cryptographic signing and identity in order to create a software attestation. So essentially, the identity is going to be either a person or some kind of workload. For example, a machine identity, what people call machine identity, but essentially, this is an abstract identity. And then you combine that. The identity then signs this metadata to say that it's asserting that it's true. And then that attestation can then be verified by the consumer when they're actually consuming the software. So one of the big things is that people say, kind of throw these terms around a lot. But I like to think of attestation as being something that you can actually verify in a strong way. So it needs to be signed by somebody and it needs to make sense. If you don't have an identity that says, hey, this is the thing that signed it, who's a testing to what? That becomes kind of an issue if you're just providing metadata, for example. And you're not signing it, or you're just signing stuff, but you have no understanding of what the identity was or where that came from. So as kind of a building on that topic, we have the idea of software attestations. But who's kind of like in the context of building software and deploying software, who's actually kind of attesting to what? And what are you actually going to be attesting to? So basically, what happens is when you're going to do a build, the build system will essentially attest to the fact that that build was run, that it built that or ran that build. So the build system is essentially some sort of machine identity or has some sort of machine identity. And then that is used to essentially make the attestation, which says that I ran this build, I used this source code, I got this artifact out. And so when you consume the artifact, you can verify that, OK, this is the same artifact that was built by this and is described by this attestation. So you can verify that that artifact was actually built by that build system at a particular time using particular source code. And we make sure that we are cryptographically signing this so that we can verify this in a very strong way, that we actually know we don't have to do human level signature comparisons or anything like that. It's a very objective rather than subjective process. Another issue here is that we need to make sure that the build system, since the build system itself is attesting to running the build, that the build itself isn't actually doing the attestation. So in the sense that we need to have the build system needs to have some level of separation or isolation from the build itself, because effectively you get a softer equivalent of a conflict of interest if the build itself is attesting to the fact that it was run. It could essentially make up whatever it wants to at that point. So the build system needs to be trustworthy. We need to make sure that we trust the build system itself. And so if the build itself could be running any kind of software, we need to make sure that the build system that we trust is actually the thing that's making attestation. And then we're going to combine the different ideas right together along with Sigstore. And you've heard a lot about Sigstore, I'm sure, during the conference. I'm not going to go too detailed into it. But effectively, Sigstore is a set of tools that we're going to use for keyless signing. So what we do basically here is instead of using a key that we have to manage and that is long lived and that we have to store somewhere and maintain access to, we're going to use quote unquote keyless signing. And what that allows us to do is take essentially an identity, the machine identity from the build system, and use that identity to mint a new cert or to use the certificate authority, in this case Folkio, which is the Sigstore service, to create a new certificate that is short lived and is only going to be used by the build system one time in order to sign our provenance. So that basically allows us to take the key, sign the provenance, and then throw away the private key. We don't need it anymore because we're only going to use it one time. And then the public key, what we can do is upload that to another service, Sigstore service called Rekker, that we can then allow, create a log of when the key was created and then allow us to also retrieve that certificate later when we need to verify. So effectively, we don't have to really keep the key around, the private key around, we can keep the public key. We can put that in a log service that we can then use to pull down and verify it. And so we don't actually need to store the keys anywhere in our actual build system. We can essentially be keyless in that sense. So there's a lot, there's been a lot of talks but like one talk by a colleague on the ghost team, Hayden, was really good. And so if you are really interested in much more of a deep dive into the Sigstore ecosystem and the services, I recommend you look at the recording for his talk. So OK, so now that we've kind of got some of the concepts out of the way, let's talk a little bit more about the concrete way that we're going to implement some of these ideas. And so what we're going to do is apply some of these ideas or one application of these ideas is in the supply chain levels of software architects. This is the Salsa framework. And the Salsa framework is essentially a framework that is used to define a set of requirements or levels that define an increasing set of security requirements. So right now it has levels defined one through four, which are progressively more secure or more hardened levels of security for a build system. And these make it so that it's incrementally adoptable. So each level has a set of requirements and those get more stringent as you go up the levels. This also sets some things like common terminology. So when we talk about problem, we kind of know what we're talking about. When we talk about signing and keyless, we kind of know what we're talking about. I guess keyless isn't really necessarily defined by Salsa, but it defines a bunch of common terminology that can be used for build systems and for supply chain security. It also, as part of the framework, defines a provenance format, which is essentially just the JSON format that you can use to define metadata that describes the build and how the build occurred. OK, so those are a lot of kind of like abstract concepts and frameworks. Now let's kind of get down into actual implementation of some of these ideas. And so what we've done is we've started a project called the Salsa GitHub Generator Project. And what this is doing is utilizing some of these concepts like keyless signing, Salsa provenance, and some of the features of GitHub actions in order to achieve isolation for our build between the build system and the build itself. And so we have a number of different workflows that we've implemented. And this is using GitHub actions reusable workflows. And the reason why we did that is because this gives us a level of isolation from the actual build part of the system. So when you build something, you can separate the build process and the generation and signing of provenance into separate build jobs that are effectively isolated from each other, the run in different VMs. And so what we have as part of this, we have a number of reasonable workflows. One is the language agnostic kind of generic generator. We call the generic generator. This is like a workflow that you can use to generate provenance mostly for file-based artifacts. So things like binaries, things like S-bombs, maybe that type of thing. We have another workflow called the container generator, which we have just essentially GAID. And so we are basically saying that this is stable. And this is also used for container artifacts. And this is partly to make it so that it's easier to integrate with all of the other six-store tooling around containers, as well as have support for things like uploading the attestations to the container registry alongside of the containers themselves so that you can easily get the provenance. We also have a Go Builder for Go projects, which allows you to build and generate provenance in one step. We also have a project called the Salsa Verifier Project, which is used to verify the provenance that's generated by these builders. So this Verifier Project is essentially a command line tool, which will verify the provenance for trusted builders that we essentially have identified as being well built. And so some of those are the ones that we've built ourselves and have a full kind of like threat model, so like the workflows that we built, as well as Google Cloud Build. And we're looking to support other CI systems as we have more ways of generating provenance safely. And this also allows us to do things like verify the source code is the source code we expect and that it was built with the tags that we expect and with the builder that we expect, et cetera, et cetera. So let's take a little bit of a look at what that looks like. So let me blow this up a little bit. Also one thing that we can do as well with containers as we upload them or when we generate provenance, we can actually look at some of the, like inspect the provenance and take a look at what it looks like using some of the six-door-based tools. So as a way of just kind of looking at this, like let's look at the provenance and the attestations for the container that I just built or that I just ran that was like a malicious container, right? So if we look, we can use the cosine tree command to actually check the container registry and look for attestations. So we can actually notice that the kind of malicious container that was here like actually has an attestation itself, right? So this is a malicious container even though it had, and but it has like an attestation. It's just when we actually look at what the attestation contains, so in this command we're going to see, use the cosine download attestation command in order to download from this image. And then we'll look at the internal payload or the internal JSON for the provenance that was generated. And so this is the actual JSON that was generated. And so we can look at some things like the subject. This is a what's called an intodo statement. And this defines somewhat of like subject object verb kind of a thing where, but in this case, the object is actually the subject. But like what we have here is this subject is the thing we're talking about. So in this case, the container image and here's the digest for that. And then we can look down and see more inside this thing called the predicate. We can see the builder that was used to build it as well as information about what source code was used and which workflows were used in GitHub. And so we can see here that if we look at the actual source code that it just came from, this is actually not the source code repository that we expected it to be coming from. It's actually a different source code repository. And so we can use things like the Salsa GitHub generator or the Salsa Verifier in order to run the Verifier to verify this. And so we'll look and we'll say, okay, let's verify this particular image and we'll expect that the source code came from my Salsa and Guac demo repository at this particular version. And so if we run that, this is duplicated a little bit, but we can see that when it tries to verify it that the generated artifact does not have the expected source, right? So the expected source is Salsa Guac demo, but we got the Fulda repository. So in this case, the verification failed because we were expecting a different repository than the one that was actually present in the provenance. So if you want to actually see what it looks like to verify one that actually does verify, effectively this is going to check the provenance, like check the signatures on the provenance and check the fields in the provenance for the source code and the tags in order to verify those are correct. And if those are correct, then the verification will pass and then we can go on to use the container as we normally would. So it's also kind of worth noting that as part of SigStore, using the SigStore tooling and Salsa provenance, we're able to integrate really well with a lot of other tooling that supports SigStore and Salsa provenance. So tools like Kiverno and the SigStore policy controller are two tools that are used in Kubernetes to verify provenance before you actually deploy containers into Kubernetes. So these are used as what's called an admissions controller so that at admissions time, when you try to create a pod, it will actually verify the provenance for that image against the policy that you've defined inside of the cluster. And in our Salsa GitHub repo, we have a number of examples about how to use those and example policies that you can use to verify containers. So effectively what the admissions controller does is it verifies the provenance and it will reject the pod outright if the verification doesn't pass. And these can be used with Salsa GitHub generator because our workflows use the same formats that are accepted by Kiverno and SigStore policy controller. Okay, so now that we've gone over how you can generate provenance and how you can verify that before you actually run your containers. There's also, there's still like, we haven't totally solved the problem of understanding, getting information about the artifacts that we're consuming. So not just like how it was built, but like what's inside of it. Do these things have access stations? Don't they? Which artifacts do we have? How do they relate to each other? Things like that. So one of the tools that we're also developing is a tool called guac, which is a kind of graph based tool that is used to kind of ingest metadata and information about artifacts and then allow for a querying and understanding and visualization of the relationships between those different types of artifacts. So this is really useful in a number of contexts. Essentially like you have artifacts that are node, artifacts and metadata that are nodes and then you have like relationships between those nodes as they're linked. So in the graph you would have like nodes that are artifacts and metadata and the edges are these relationships that go between these. So in our case like, so guac can really apply to a lot of different areas of discoverability and auditing and so kind of across the life cycle, I guess of a vulnerability. Like so from the kind of being reactive, how are you affected by something that actually happened to trying to implement safeguards into your system? So like adding policies based on this information and then also on the proactive side of like trying to understand the wider set of security implications of like different artifacts like which areas or which artifacts or have higher risk profiles which need more attention or need to be prioritized in terms of like improving their supply chain security, that sort of thing. So I'm gonna take like a little bit of a, and I'll say also say that one of the folks that is working on the ghost team working on guac did a keynote earlier, talking a little bit more about guac. So I definitely recommend checking out the recording of that as well. But effectively like guac is a system that will ingest artifacts and give you a graph kind of representation of those artifacts which you can use to kind of visualize and understand the tools. Right now it's effectively like, it's a little bit of an early, early it's an in its development life cycle. So right now it's mostly just ingestion and then you can do queries on the database directly. So in this case I'm using Neo4j to directly query and visualize in the database but in the future there'll be much more targeted type of APIs for developing policies and things like that. But we can see here, here's a set of metadata that I've ingested for the images that I was using earlier. So we have the image that I built that is good. And then we have the image down here that is the malicious container. And so like say for example in a reactive context where we figure out that this one malicious container is in there, how did that get in there and what happened? We can actually see that this container has was built by the same builder as our good container and that it has a software attestation. So this orange node is an attestation node. So we actually built it was actually built with the same builder and generated an attestation in order to try to fool us but it was built from a different repository. So that's one key knowledge point here is that we even though we could verify the key signature and that it had an attestation we really still need to check the repository information the provenance metadata itself in order to make sure that that's what we expect it was. So we can see here a little bit about the relationships here and how this artifact was built. These red nodes over here are part of a information that was ingested from an S-bomb. So this was built from an image, a container image and we use the SIFT tool to generate an S-bomb which tells us the different goaling packages. These are not quite as relevant for our in this reactive use case but we can see that they have at least one kind of common dependency within the two containers. So there you can kind of see like an overall set of relationships between the different types of metadata about our artifact. So we just use like kind of more reactive context for guac but in reality what we wanna be doing is like quote unquote shifting left. I'll maybe take a second or two to let you guys like check off your bingo cards but really what we wanna be doing is using this in a much more proactive context. Be able to like look at the metadata in a much more proactive way and better understanding overall our the situation of our supply chain and our the artifacts that we're building and using. So with that I'll end and I think I have a couple of just a couple of minutes for questions if there are any but we have a blog post that we released I think today on the GA in the container workflow. If you wanna find out more about Salsa you can check that out at salsa.dev. Our GitHub generator and Salsa Verifier reposer on GitHub under the Salsa Framework Organization and guac is under the guac sec organization on GitHub as well. I'm Ian Lewis on GitHub and on Ian Lewis on Twitter so if you're inclined you can hit me up later as well. So thank you all. I have about three minutes for questions so maybe one or two if there are any. Yes. Yeah so the question was like there's a lot of GitHub on the slides is there an equivalent project for GitLab? The there is we are working with GitLab to to implement some of these things like right now some of the issues around on GitHub are around isolation. So like we are kind of lucky in a sense on GitHub that we are able to create these generators using features of GitHub actions that allow us to isolate that from the actual build but with GitLab there's not really a way to do that quite yet and so we're working with them to actually be able to generate the prominence safely and then be able to do essentially the same thing as we're doing on GitHub as well. So yes we are thinking about that and working with them on that. Yeah any other questions? Yes right so the question is as a user of a CI system or a smaller CI system is this something that we have to rely on the CI system itself to implement or is this something that we can build ourselves? This is really something that usually the build system itself would have to kind of implement or that ideally it would implement but in the case of GitHub we're like kind of using a feature of the system in order to like build it ourselves but we are kind of essentially working with other like build tools like build kite in order to be able to implement these things but there's still a lot of that is still a work in progress on other CI systems particularly with regard to the isolation that we mentioned about GitLab and also about around the identity portion so being able to have an identity provider so an OIDC provider for example for providers like build kite and GitLab in order to provide the identity piece that allows us to sign and do keyless signing. One more the red shirt here. Okay so the question was like did the salsa verifier like walk a tree of like dependencies in order to verify the dependencies as well. In our case like what we're doing is essentially just verifying the provenance that was generated for that image instead of doing any sort of dependency walking. That's something that we can explore in the future if we have if the artifact itself has a way of like defining or if we have a way of figuring out what those dependencies are. Right now like you know the only way we do that is like if we were explicitly provided an S-bomb or we like tried to generate one ourselves but we don't really necessarily want to trust the artifact itself. Should you provide us with those dependencies? We'd like to be able to get those like kind of out of band but yeah that's something that's an area that we kind of like want to think about and explore. Well thanks everybody for coming. I'll be sticking around so if you have other questions and you want to talk to me in person then I'll be available. Thanks a lot.