 So quick introduction about myself, I am Parth, I am the co-founder of a supply chain security company called Kusari. Previously before that I was in defense contracting and I'm currently the maintainer on various open source projects, which is one of them in Squawk, which we'll talk about today, which is graph for understanding artifact composition. There's also Fresco, which is a factory for repeatable, secure creation of artifacts, it's a mouthful. In total attestations and then in total goaling. I'll hand it off to my co-speaker. Hey, I'm Dan. I'm a software engineer with Red Hat, and I've been for a past year or so working also on the software supply chain security topics. Previously I was active in all other different kind of communities, from messaging to edge computing, and yeah. So let's get started off. Hello? Is it not working? Hello? Oh, yes. So you might have seen this slide in this room multiple times. So we'll do a quick background into what software supply chain security is. You know, there are many different aspects of it. There could be, you know, there's the source threats, there's build threats, and as well as the open source, you know, dependency threats. So this could be from a compromised source repo. This could be at the build level where the build process may be compromised, and at the package level where the package repository or something malicious could happen at the end. And before it reaches a consumer, it may again be intercepted. So there's a lot of things that are in this space around software supply chain security. In this space, particularly in this particular demo, we're going to talk about how do we make sure that what we're pulling in is secure, and how do we make all these policy decisions automated. So we'll use tools like guac, we'll treat trustification, and we'll go forward from there. So again, a little bit more background. You may have heard of the word SBOM, Software Ability Materials. There's also a lot of vulnerabilities out there. We all know about this. But, you know, it can get overwhelming at times. There's so many vulnerabilities out there. There's a lot of noise. So how do you know exactly what you need to focus on? So there's another term called VEX, which is the Vulnerability Explodability Exchange. What this is, is that it allows the publishers of their projects or artifacts to provide information about, hey, is there mitigations, or is, you know, for example, is the particularly vulnerable portion of the code not being called, or is the feature made disabled, for example, that could have not affected the, so it means basically you're not affected by the vulnerability. So there's multiple stages for it. So there's multiple statuses. So you're known you are affected. You're not affected. Either you're fixed or you're under investigation. And all this can change. So if you're under investigation one day, you may come out to be affected. Or the next day, you may come out to be, oh, you're not affected because you did more investigation into it. Either from the call stack you're not going to get, you're not reaching the vulnerable code, or the feature is disabled, so you're no longer vulnerable. So from this, we can filter out the noise and get to the two critical vulnerabilities that you actually care about. But as with the space, there's a lot of information out there. There's S-bombs, VEX that we talked about earlier, CVE data, there's build data stations like Salsa, there's all this information coming in, and it's all over the place. There's nothing putting the pieces together. So what we're trying to provide is can we can guac and other tools such as trustification, can we take all this information, put it into, can we combine the puzzles pieces together so that it makes it a lot easier for people to understand what is happening, and can we use this data to make proactive decisions about should I be using this software, can I be running this in production, can we use this data to make automated decisions. So there are multiple layers, and again, you may have seen this slide, but there's multiple layers. At the bottom, there's the trust foundation. This is your signatures, your identities, and so forth. The layer above that is all these attestations. These could be S-bombs, these could be Salsa attestations, which is your build provenance, your VEX, your vulnerability data, all this kind of stuff that's all living there. Again, like I was saying before, all this data is very spread out, scattered. How do you make sense of this information? How do you put it all together and use it for policy or for other insights? So that's where guac would come into play. So guac takes all this information and aggregates it all together so that you can use this to use it for automated policy decisions. So for example, you can use it for Opa Gatekeeper, like we'll show today. So a quick background in what guac is. So you can see from this diagram, guac is a graph database. So what it does internally is a graph database, and it takes in inputs from various different sources. So you can ingest S-bombs, you can ingest information like scorecard in total attestations. So this is like salsa information and so forth. It also takes in, it's a living graph. So basically it tries to find more information that it can. So it pulls in thread information from OSV and it can also pulls in more information about your specific dependencies and projects via depth.dev. So all this information gets combined and you can use this to, again, you can use all this information for policy checks, for patch planning, or to understand what's critical in your infrastructure. It's a very pluggable system. So these are the things that we started out with and then we want to go forward, you know, if there are more pieces that you want, pieces that you need to understand the environment, you can always add more and more integrations into it. So yeah, at Red Hat we started also a project called Trustification and used guac for ingesting and making a graph representation of all the relations, as part says, between the S-bombs, packages within the S-bombs, vulnerabilities, vex files, and all other stuff. But we wanted to provide also a couple more services, so to say, on top of it, right? So we wanted to be able to provide a little bit more metadata about all these files, not just relations, but also make things full text searchable. For example, that you can, you know, search for, you know, log for shell and you get appropriate CVs and provide a way to ingest vex files from the CSF compliant repositories so that you can always ingest new diffs of new files that are coming in the repository, store everything in the S3-compatible storage so that all the files can be later downloadable. Now the guac provides also the blob storage, so that's doable from the guac itself, but when we started, that's something that we did on our own. And as you can see on the top and on the bottom, we want to provide a good UI for people to be able to, as I said, search, get all the information about S-bombs, about packages, about vulnerabilities affected different S-bombs and packages, but also provide the API that we can integrate this data with further system, for example, like a VS Code extension or CICD tools. So now that we have basic graphs of what guac is and what justification is, so the question is how do we use this information now in the Kubernetes land, for example, right? And for that, we created a little demo. So let's start from the first principles. So first of all, we have like OPPA, right? So Open Policy Agent, which is like a general-purpose policy agent which allows us to separate our services from defining and executing our policy decisions, right? So in a normal situation, you will define your policy in a regular language defined by the OPPA, and the OPPA will have some data to work on, so your service will basically query OPPA with some input data, usually like a JSON formatted file. OPPA will execute the policy with input data, with the data it has, and make a decision and return that decision back to the service, right? So another sub-project of OPPA is called Gatekeeper, which basically allows us to implement these kind of policies on our resources inside of the Kubernetes cluster, right? So Gatekeeper defines an administration webhook that will be executed each time that we try to deploy a certain resource inside of the cluster, right? So that webhook, in this case, in Gatekeeper case, will query OPPA with the actual resource in case and try to figure out if we are allowed or not to actually deploy that resource based on the policy that we defined, right? So there are two kind of... So how did then all work together? So the last missing part in all this is the external data provider. So as we said previously, OPPA needs some kind of data in order to make a policy decision, and the data can be defined within the OPPA or it can use an external data provider to actually call external service and make a decision, right? So for this demo, we implemented basically the GUAC data provider that will call GUAC over the REST API and get the data. So the whole flow will be... We try to deploy deployment or a pod. It will go through the Gatekeeper to OPPA to our data provider. We will basically extract the digest of the container we try to deploy and we query GUAC with that digest, try to figure out do we know anything about this container? Is there any salsa attached to this container? Are there any vulnerabilities that we know of? And if anything of those security metadata is not okay with the container, we will deny basically the deployment of it in the cluster. This is a little bit how things look like. So there's two things in Gatekeeper to be aware about. First is the template. And this is where we actually provide the regular spec of our policy. And as you can see here, we take the image from the input object that we are getting from the resource and the administration webhook. And we call the external provider in this case GUAC to tell us more about this image. And if we have anything in the response, we said, okay, this image will be blocked due to the GUAC policy violation. Then based on the template, we create actual constraint and say, okay, apply this template to the namespace test and apply it to all the deployments that we try to deploy into a test namespace. And down below, you can see basically how it would look if we try to actually apply or deploy the image with some vulnerabilities found in as we'll part demonstrate now. Okay, so let's talk about the actual... Let's do a demonstration of this. So you can see deployed in my Kubernetes cluster, so I have kind running locally. And you can see I have the top two things up there. If that's a little too small, I can make it a little bigger. But I have the GraphQL server and the rest API piece. So those are the two GUAC components that are running. And then below that, you can see Gatekeeper is running. So we'll start forward from there. So in the meantime, what I did is because for the demo purposes, I already ingested SBOM data, ingested some Salsa, vulnerability and all that kind of stuff gets automatically... As part of the Graph database, it's automatically updating. So as new vulnerabilities come out, it's a living graph like I was saying before. So as new vulnerabilities come out, it's automatically updating the package information and it's creating a what's known as a certified vulnerability node attached to that package. So now I know that package is vulnerable. And then based off that, I can find out exactly what depend on that package. So before I go forward, I kind of want to show off exactly what this kind of looks like. So if I look at the... All this information is exposed via a GraphQL API also along with the rest API. So starting off, just to show off, you can see there is already a bunch of data in here. So for example, the specific version of console does have a vulnerability and you can see the vulnerability ID specified here. And if I keep going, scrolling down, more and more packages are... You can see that they have, in this case, this Debian package has another vulnerability. So this is kind of showing off that decomposing of information. So it's not just about the S-bomb, it's about taking this information from the S-bomb, decomposing it into individual components, into packages and relationships. So this allows us to understand the view of the world basically. What does your software environment look like? So you're not scanning in the normal sense anymore because now you understand how all the packages are related to each other. So if one package gets a vulnerability, now you can automatically know that, hey, this one package... My project depends on that package. So I know that it contains a critical vulnerability without me even scanning it anymore. On other pieces here, you can see that I have ingested the S-bomb. So this is basically showing off this specific demo image that I created. I have a specific... You can see where it came from, the namespace. This is a Cyclone DX image. And then where is the actual location? So if I ever wanted to go back to the actual S-bomb itself, and I can go retrieve it, it also shows me exactly what dependencies came from that specific S-bomb. So you can see included dependencies and all the different relationships that came from it. Salsa is the built provenance salsa. Again, it's just kind of showing off for my specific subject, which has this specific digest. Where did it come from? It was built by Karel, and then where the build type was, where it was built, and more information about the builder and so forth. And then finally, the last thing I wanted to show off is there was... A lot of times, there's zero days, there's places where before a vulnerability comes out, a maintainer might come out beforehand, a couple of weeks before saying, hey, there's a bug, or there's a vulnerability in a package that you're using, for example, maybe Curl. And how do you know... You've been warned, but all the database, the vulnerability databases haven't been updated yet. So how do you respond to this? So can you be proactive about it? So there's a thing called Certified Bad in the open source guac, which basically... So in this case, I'm marking, as an example, I'm marking Alpine-based layout as a vulnerability reported by the maintainer. So I want this to be blocked by OPA. So it's not just about vulnerabilities. If there's policies, or if there's company policies, or something else that needs to be blocked, you can set that up along with licenses. And if there's specific licenses that shouldn't be used, you can use that as a gate mechanism for your policy. So going back here. So as you can see, we have gatekeeper running, we have guac running. So the next thing we need is that guac provider, which is that connection between guac and gatekeeper. So that's the thing that allows us... allows gatekeeper to query guac via the REST API, pull the information about a specific image, and then make a policy decision based off of it. So I'm going to do a quick helm install here. I'm going to cancel this out. So I'm going to install the guac provider. And if I do... You can see... You can see that now it's running right here. So it's a very small image. Basically, it's an interface between guac and gatekeeper. So the next piece I wanted to show off is... I'm going to get the logs running for my guac provider. So this way I can see exactly what's causing the policy violation and what's causing it to be blocked. So the next thing I'm going to do is I'm going to run a couple of examples. So the first example I'm going to show off is going to be a vulnerable image that I showed off and... What happened? And it created it. Of course the... Oh, I did not do the constraint. That's right. Let me delete this. The constraint file is that Dijon talked about. But this is the good show how things work when the provider is not working, right? Yeah, so it got created because the constraint file is run there. So the next thing I need to do is the template file as well as the constraint file that Dijon just showed off. The rego policy as well as I want to be any deployment that's running in the test namespace. It has to be checked against policy. So now I can run my vulnerable image now. Oops. See that? Run this. And we should see... Over here. And... Of course, something goes wrong. Of course, the demos... Demo gods always strike. But you can see in here that it did find the policy itself. So Guac came back and said that there are specific vulnerabilities that are in place that are blocking this from running. And one of the things that Guac does also behind the scenes is that it's actually validating if there's a VEX, that vulnerability-exploitability exchange that I was speaking about earlier. So if there is a VEX associated with the actual project and if some of these vulnerabilities may be mitigated based off that not affected, then this will automatically be allowed to run because it's no longer affected. So... I don't understand why I did that. Huh? So I'm going to kill the provider. I'm going to bring it back up again. Bear with me for a second. All right. And I'm going to reinstall the provider. Of course, the policy should already be in place. So it should be good. And the provider is running again. Perfect. So I'm going to go back up and grab this. Check logs again. And then, yeah, we can run more... Oops. We can run more examples. So the next one I want to run is, you know, this is the one we certified bad, for example. You know, it doesn't contain any vulnerabilities, but, you know, right away it got denied because it got blocked. That specific, you know, image got blocked and we can go check the logs out. And in this case, right, we did that certified bad. If you remember in the GraphQL, we certified a specific, you know, Alpine-based layout as bad because, you know, for example, the maintainer came out and said there was a critical vulnerability coming out for it, but the actual vulnerability database hasn't been updated yet. Another example is, like, let's say, for compliance reasons, hey, you need S-bombs for everything you're deploying, right? So then in this case, this is being blocked because the S-bombs is not found for that particular image. Similarly, you know, it could be requiring, you know, as a company policy, you may be requiring Salsa to be required for all the things. And then we get blocked and we can come back here and say, okay, Salsa is now found for this particular Alpine image based on this digest. And then, of course, lastly, I want this to show off a good example. You know, that's passing all the policies. We do this good one. And you can see it got created and we can go back to our thing and we can see, like, yes, it got verified because it passed all the checks that we wanted to check, right? So I wanted no vulnerabilities. There's S-bombs, a salsa attestation and ensure that there's no certified bad. So what the certified vulnerabilities is, it's not just direct dependencies. It's also, you know, any transited dependencies, they may have vulnerabilities. It's catching all that kind of stuff. Similarly for certified bad, it's not just like, is there a direct dependency on the Alpine base layout, for example, or is there a transited dependency? If there is, then I want to block my image from running. So it's doing a... a transitive scan through the entire system. Back to you, Hira. Yeah, cool. So you just saw where we are today, right? And I think where the whole ecosystem is. I'm glad that I was at a couple of sessions today talking about the sessions and S-bombs and all the supply chain security. And I'm very glad that the conclusions are the same as what has been ours in Reddit in the last years or so. So we're still super early in the process. So most people are not even producing any S-bombs. If they are producing, they are only producing it for compliance reasons. And all the tools that are used to actually analyze and store the S-bombs are in the early stages of development as the project that we show you today. But also, you can see that there's a lot of inconsistency in the data being produced even by the companies that should know better, right? And I heard that at the panel at 2 o'clock today. So a lot of S-bombs are not even spec compliant and not to talk about semantic validity of the data of those things, right? Another big problem that we also experienced is that people can't agree on the identifiers for our components still, right? So purals are used in guac and used in the open source world very much. This is cool. But there's still CPs, there's omnibores, there's hashes. So there's like an internal company kind of standards that people use. So that makes things very hard to correlate. If you want to go like, okay, this is the product, this is the package, this is the vulnerability, how we all tie that together in a graph that we can actually query later on and show the data. So you'll be here on Friday at 5 o'clock means that you're interested in the topic, which is good. And what I want to say is that it's really, really early days and there's a lot of work to be done. So please join all the initiatives, right? In my perspective, all these initiatives that we are doing should eventually become invisible to developers like we want to make Kubernetes invisible to developers, right? So it should be integrated in all our CICD tools or IDs or things like that. So it should be something that we don't think that much about in terms of the infrastructure should become our second nature. And as you said, like dependency managing and dependency breaches are just one part of it. There's a lot of other things and OpenSSF and CNCF tech securities are, I think, a very good place to start this if you're interested in the whole topic and all the other projects that are related to software supply chain security, yeah? Yeah, and then Guac actually just became an incubating project within the OpenSSF. And the demo just we showed off today is also, it's out there in GitHub. It's still a working progress. We want to clean it up a little bit, but we definitely want to provide that as another demo, basically, of how Guac data and how to take all this information, Sbombs and Salsa and Vex and all this kind of stuff and make it usable, right? I think the question asked regularly is, I'm generating Sbombs, what do I do with them? And the answer is, use tools like Guac and trustification. So because it's part of the OpenSSF, we have a monthly call. So if you're interested in joining up with the community, working on this problem, like Dijon was saying, there's a lot of challenges to be solved. So we have a monthly community call. We're on the OpenSSF Slack on the Guac channel. The QR code will actually lead you to the Guac.sh website, and we have documentation there. We have demos there. And yeah, if there's new features, new use cases that you would like to see, like to work on, we're happy to chat and happy to work together. Thank you. Is there any questions we can answer? You can go back. Please use the mic for the streams. Thanks. Yes. Hi. Thank you for this presentation. It's Friday afternoon, so I will probably need to check it again to understand everything. But my question is, would it be... You're using an OPA gatekeeper. If we switch OPA gatekeeper with Kyvarno, would this still make sense? What should we do to Guac to work with Kyvarno instead of OPA gatekeeper? Yeah. So we chose OPA as a starting point, but because there's a REST API, basically, right? So Guac provides a REST API. So this isn't also integrated with Kyvarno. So it's not like admission controller-specific. We started out with OPA, but we do want to work with Kyvarno and any other kind of tools that you would like to work with. Because the REST API, right, you get the information back as you need and whichever format Kyvarno needs it in and you can use that as a decision-maker. Yeah. There's just need to be integration implemented, yeah. But not nothing systematically that would prevent you to do that, yeah. Thank you very much. Cool. Thanks, Jor, for coming in. Yeah. We hope to see you in the upstream communities, right? Thank you so much.