 Okay, so this talks on eliminating the unknowns through using guac. So actually before, once again, this is a QR code for my book, Securing the Software Supply Chain, if anybody's interested in that. I know for some folks who are in the last talk, this is gonna be repeating myself, but I'm the co-founder and CTO of Kusari, we're a software supply chain security company. I'm a co-author of Securing the Software Supply Chain from Manning Publishing. I'm an open SSF Technical Advisory Council and Solacea Steering Committee member. I'm also a member of the CNCF Technical Advisory Group. I'm a security lead for that. And I'm co-creator and maintainer of guac, which is now currently an open SSF incubating project. So, okay, so let me start off with asking a question. Do you trust this cat with your business? All right, so, you know, hey, there's potentially some problems here, right? Does he have fleas? Is he mean? Is he sick? Right? Is he likely to commit fraud? He has a bow tie, seems trustworthy, but how do I know? And what it's really about here is, where did he come from? What information? Yeah, I can just switch to the hand mic. Oh, this time it. Yeah, okay, yeah, I'll just switch to the hand mic. All right, all right, yeah, much better. So, yeah, what is this really about, right? Where did this cat come from? What information do we have about him? And more, almost just as important that what information do we have about him? What information do we not have about him? What do we not know and we want to know? We need more information, right? So, what do we know about him? Well, from his medical history, he had just had surgery, he was given his rabies shot, he was given preventive flea medication. And when I adopted this cat earlier this year, whoops, you know, we were told based on his behavioral stuff, he doesn't mind baths, he likes treats, he is tibid and he likes ribbon toys, right? The stuff that's in red turned out not to be true. And so now the problem gets more complicated, right? What happens when there's more cats? How do you keep track of this information and how do you track how they interact, right? More cats equals more problems, right? We have three, actually we have four cats, the fourth one's not in that picture, but as you can imagine, it's not just about knowing about just one, if one's sick, it gets the other sick, if one cat is having, with one cat fights another cat, it causes issues for the whole household. And so now let's talk about software supply chain security. Software supply chain security is like herding cats. And when we think about, when we look at all this stuff and we look at the interactions between these things, it's all about the data. So this image, it comes from salsa.dev. And really this, you know, when you think about software supply chain, it's about protecting the production and consumption of software, right? If you are somebody's, if you're producing software, you're part of somebody else's supply chain. And when you're consuming software, you wanna make sure that you're consuming software securely. So right, a developer, whether it's a good actor or a bad actor, can push potentially vulnerable or even malicious code to a source repo. A build tool can pull malicious source, sorry, a build can pull that malicious source of the build can also pull in potentially malicious dependencies. The build can pull stuff in from a repository that you didn't think you were pulling from. And then that same build, right? That build can go in and potentially do something malicious, think about a lot of the attacks that have been happening most recently. Or, you know, you look at the SolarWinds attack and then there was also recently a similar sort of attack that came up where, hey, the build itself got compromised, so you were building software supposedly using your own systems and that software was being signed with the organization's keys and that was malicious software. That then gets pushed to the package repository and then as a consumer, you can download that malicious package. You could also potentially download an unknown package coming from a place that you weren't aware of, all that good stuff. And then really when you kind of think about the problem here is that thing happens recursively. It is, you know, as they say, it's turtles all the way down. It's just you are, the dependencies that you're pulling in also have that same set of issues and that just keeps going on and on and on all the way to the hardware, right? And the way we need to start looking at this is we need data, right? If you have data and you have accurate data, you can begin to start thinking about the problem. And so when we look at this, right? The same way that more cats equals more problems, more software equals more problems. A lot of folks in the past few days have been talking about, hey, the number of packages that a single piece of software depends on continues to increase year after year and it leads to a situation here where our software supply chains are complicated. So this was an image I generated from Guac early on and this is actually Kubernetes where over here is just essentially, you know, the go ecosystem and then over here is some stuff that depends on blah, blah, and then eventually it blows up into, on the left hand side is a Salsa attestation and all the files that were put into that Salsa attestation, right? Once again, this is just a visualization. It's not intended for, you know, this is not how you would actually use it, but, right? And so why is this, you know, why are our software supply chains complicated here? Now I know we said we need the data, but one of the problems is we often have too much data. There's so much noise, right? We have Salsa, we have Sbombs, we have OSV, we have VEX documents, we have various other ITES 6 which are in total attestations, right? We have various attestations about all sorts of stuff. We have vulnerability scans, we have certification checks, we have all sorts of different documents that are being invented every new day that we need to make sure that we have in our, so that we can understand our software supply chain. And then we also once again have too little data, right? The data quality is often poor leading to a lot of this, you know, a lot of the documents being essentially noise as opposed to being really effective. And also very importantly here is we don't know what we don't know, we don't know what we're missing, right? And really, you know, if you don't, if something's not there, it doesn't mean it's not causing you problems. And really when we come down to it, you, given that software supply chains are about things depending on other things, you need to understand the relationships between the data, you wanna have connections you wanna essentially analyze the connections within the various, the supply chain of the various ecosystems. You wanna be able to map that to your threat intelligence to be able to pull in data from resources like OSV. You wanna be able to understand are these vulnerabilities that I'm pulling in, are they exploitable through stuff like VEX? And then you need to also reconcile the identifiers or different documents. And so for folks who are in the US, there is a thing that came from CISA which is there is a call for, there's a request for a sort of comment on a document that they released regarding software naming, right? And how we name software is actually quite complicated because my name for a piece of a package might be different, you know, the Python name for a package might be different than the Debian name for that same package and how they are interact together could be quite, quite complicated. And there's so much more that I'm, you know, I don't have enough time to get into. And so what is Guac and how does Guac help? So Guac is a knowledge graph of software metadata to answer security and supply chain questions. And so if we take a step back, where does Guac fit into the solution to software supply chain security questions, right? So down at the bottom, you really need to have a trust foundation. You need to know who is in your software supply chain. You want to understand what identities are doing what, right? And so this is stuff like, are you signing with Sigstore? Are you using GPG keys? Or what are you doing to sort of say, you know, that you can essentially ensure that folks are doing, you know, who are the people who are doing what in your software supply chain. The next step up is stuff like software attestations. So that's where you take those identities and you associate them to a set of claims, right? So is your build system, what is your build system claiming? What is your, what is your build system claiming? What are, what is the developers claiming about the software they wrote? What is the package repository claiming about the packages it has and so on? So this is stuff like, are you signing your S-bombs? Are you associating your S-bombs with an in-toto policy through something like S-bommit? Or, you know, do you have salsa provenance, right? Which is just for folks who aren't super familiar. Salsa is just a build framework that tries to establish provenance around software that came from, you know, is this code built in a particular way and packaged up and can we, do we have confidence that that is true, right? We wanna know like, Vex, who is claiming, you know, that this thing isn't exploitable and all sorts of other stuff like security scorecards and all that good stuff. Then you wanna be able to take all of that data that is, you know, lots and lots of different documents about lots and lots of different pieces of software in your environment. You wanna be able to then aggregate all of it and then synthesize it and then be able to run queries against it, right? So you wanna be able to understand the relationships between the data. You wanna be able to understand, you know, are these claims actually important? Do I trust the people who are making those claims and that sort of thing? And then finally, you wanna be able to then actually action on those queries, right? So if the idea here is if this party claims as a vulnerability in my software, doesn't matter where, if it's a dependency or a dependency of a dependency, I wanna know and I wanna be able to block that, right? So what sorts of supply chain security questions does Gwok help answer? So there's the proactive ones, right? Which is how do I prevent large scale, you know, supply chain compromises, you know? So this is from an XKCD comic, right? And you know, one of the big things here is like, there's often something like a left pad in your ecosystem, a something that you don't even really think about, it's a super small piece of code, but it's critical to your environment, right? What projects in here are ones that are potentially single maintainer and your organization doesn't know that it has a critical dependency on something that has a single maintainer? You wanna be able to take preventive actions as well, right? Do I have an S-bomb for this piece of software? Is there a known vulnerability on a piece of software I'm about to pull in, that sort of thing? And then when stuff does go wrong, cause as it will go wrong, you know, and CVEs are discovered, vulnerabilities are discovered, code is discovered after the fact to be malicious, you wanna know how am I affected, what pieces of my ecosystem are affected by a thing and what to do about it? And so this is kind of a high level architecture of what sort of ingestion into, actually some of this stuff is actually, I realized just a little out of date, that inside of guac, pretty much we pull in all sorts of data from OSV, from depths.dev, Salsa, S-bombs, et cetera, we are able to pull in from various different data sources like Google Cloud, Amazon S3, local file storage, all that good stuff, and then we're also able to pull in some vulnerability information from various vendors, pull it all into a graph, and then allow you to query that graph via GraphQL. Though I just wanna point out Neo4j is not supported anymore. Right, so now let's take a deeper dive here. So you have a bunch of software, you have a bunch of different S-bombs and a bunch of different formats, and first you send them over to guac, and now once you send all of that into guac, guac can start establishing the connections between that software, so if you have references between different software, whether it's hashes or pearls or CPEs, so on, about the different pieces of software, it can determine, oh, these two documents are making claims about the same piece of software and begin to turn that into a graph, and then this helps you understand, where are my S-bombs, what attestations do I have, and you begin to understand what is known and also, more importantly, or just as importantly, I should say, what is unknown, what do you not know about your software supply chain? Then we also begin to look at it, we be able to determine where there are gaps and we begin to pull in additional information from open source data sources like open SSF scorecard, depths.dev, and then we begin to pull in also threat intelligence data from open source vulnerability databases like OSV, and then all of this flows back because now you have these insights, you're able to start doing policy checks, you're able to start doing patch planning, you're able to start also, very importantly here, you're able to identify what are the critical pieces in your software supply chain, what are the critical pieces of your infrastructure, where are those critical dependencies in my software supply chain? And then one of the key things here is you really wanna be able to get that data in the hands of the people who need it when they need it. So this is an example of one of the POCs we built, which is a VS Code plugin for Guac, which allows Guac to essentially, allows VS Code to run those GraphQL queries against Guac, which then allow you to get that nice little red squiggly when it determines that there has been something like it's against policy, it's been certified bad, there's a known critical vulnerability, and because when you think about a lot of this, the software supply chain problems too often, these problems are brought up as a gatekeeping measure, as opposed to a proactive measure. For example, I don't know if anybody's dealt with this before, but you deal with, you're in front of a change approval board or change review board, and you learn that there is some sort of reason why you can't deploy something to production right at the end. And then when you ask the security team, hey, why is this getting blocked? They said, oh, we sent your manager's manager's manager a report last month about a problem with the software. And you're like, well, why did you not tell me a month ago or why was I, why wasn't I aware? You wanna make sure that that data gets in the hand of the people who need it when they need it, so that means get it in your IDE, get it inside of your source repository, get it inside of your build system, get it inside of your packaging system, get it inside of the tools that people use to ingest software as well. So now this is sort of the state of how things are today, so really Guac is an API, so more importantly than anything else, similar to how if you think about it, like Kubernetes, right, Kubernetes is an API on how to run workloads, Guac is a GraphQL API on querying information about your software supply chain. And so as we were talking about before, at the bottom you have stuff like SPDX, Cyclone DX, Salsa, VEX documents, all that sort of stuff pulling in to get ingested into Guac. You have stuff that is being pulled in from other external data sources, like depth.dev, OSV, OpenSSF scorecard, gets pulled into Guac, all that gets collected, and then gets sort of parsed out, ingested and assembled into a database, and then on top of that database we have a GraphQL plugin, sorry, GraphQL API, which can then be interacted either via a CLI tool, the GraphQL online client, and then we also have an experimental visualizer that we've been working on that'll show off in a little bit. And then also because it's all GraphQL, you can just write your own plugins as well to do stuff. So what documents are sort of supported today? We have Sbom, so SPDX and Cyclone DX, VEX support, so we support OpenVex, which is part of OpenSSF, but we also support Cyclone DX VEX and CSAF VEX. We support Salsa, we support OpenSSF scorecard, we support license information, so whether or not the license is GPL or MIT and so on, and then we are also building actually out some pretty cool ways to query that information to determine like, do I have a license mismatch? Is this thing claiming to be MIT, but it depends on something that has a much more restrictive license, which would then cause issues. And we also are able to pull in arbitrary in-toto attestations, and then we plan to support others, so if others have other additional documents they want in Guac, feel free to bring those up as well. So as far as collectors go, we have OCI registries, so containers, we can pull in Salsa, Sbom, et cetera, from a container registry. We can pull in from S3 buckets, we can pull in from Google Cloud Storage, file collectors, GitHub releases, we can pull files directly from Git, and then also if there's other places folks want us to pull from, we can do that as well. Pull requests are welcome. So now I'm gonna show a little bit of a demo here and bear with me, because I realize I'm holding a microphone. Oh no, no, I got it. So let me increase the size here. So, right, now I've already ingested the documents into Guac for this demo. How many folks were familiar with the curl vulnerability from about a couple of months ago? The one that was apparently gonna be, everybody was super worried for a while, right? So what did we know about that? Well, the maintainer of curl gave us a little bit of a heads up that, hey, there's a big vulnerability coming down the line. I want you to know it affects a few years' worth of versions just to give you a heads up, right? And so now let's say you're like, oh geez, where does that, like, okay, I have a heads up now. I know that there's a problem, where does that live in my ecosystem? So just as a reminder, I've already ingested a bunch of salsa, S-bombs, et cetera. I can run Guac here to do patch planning, right? I can pretty much all this is doing, and I know it's a little hard to read here, but pretty much I'm looking for curl. And I wanna do a search depth of two, right? And I can do this, I can run it. It essentially just runs a set of graph QL queries that determine where does curl live in my environment, and then also not just that, but it also looks at it from a depth perspective, so it looks at what else depends on that. So here you can see there's a couple of different versions of curl that are in my environment that are part of the Debian ecosystem, a couple of different versions there, right? For various different versions of Debian. And then in addition to that, we have information about different packages that rely on those different versions of curl. So it's not just about, right, like, oh, I have curl in my ecosystem. It's like, well, what's relying on curl here? Well, here we can see, you know, I have Haskell relies on curl, Rails relies on curl, Nginx, and whoops, and so on. You know, some information here about a Python version that relies on curl, and so on and so now I have this information. I can also, if I wanted to actually take a closer look here, I can open up the visualizer. That's not the right one. Let me just go back to here and refresh here. So here is, you know, example of a visualization here of a particular library that we have, and it depends on curl. And as you can see here, and once again, this is what happens when you have a bunch of back-end developers try and build a front-end. So once again, since this is all open source and there's an incubating project underneath the open SSF, if there's any front-end engineers who want to help out. Welcome to the community. But anyway, we have a version of curl. In this case, it's 7.74, and we can see here that actually there are, you know, this Cyclone DX, sorry, not Cyclone DX, this Python package over here, as well as this Engine X package over here with these hashes, both rely on the same version of Debian, sorry, the same version of curl for Debian, 7.74. And yeah, so now we can kind of go in and begin to, you know, remediate that. We know what needs to be patched and that sort of thing. All right, and then similarly, if I go over here, here is just sort of the GraphQL playground where I can also run that same query over here. I can run it and, you know, see where curl lives in my environment. In this case, this is a different version here, where I also have it not just under Debian, but I also have it in Alpine as well, and I can look at that as well, and I can do run all sorts of other queries too. Okay, so, and then a second here, right? And to go back to the visualizer real quick, I can also begin to, using the visualizer, I can also continue to explore to find other sorts of packages that are involved here, but once again, it's a little crowded. So once again, I recommend, at least for now, using this and use the CLI for the majority of the queries, but you can also do other things here as well to actually do a further search depth as well. But yeah, anyway, so those are kind of the primary ways to sort of interact with guac. There's all sorts of other features within guac as well, and let me actually just, and all of this is in the documentation, but we also have the ability to certify as well, which allows you to say, hey, even though this thing doesn't have a known vulnerability, I'm going to say it's bad, right? Because it doesn't match some policy, you could do that. You can also certify that something is good, right? You can say, I know it has a vulnerability, but based on our sort of investigation, we determined it not to be an issue, so we're going to say it's good for our environment as well. And there's all sorts of other things you could do. You can once again, generate new plugins or integrations for the API and all that good stuff. All right, and let me just go back here. All right, finally, let's talk a little bit about the roadmap. And so one of the things that's coming up is we want to associate identity and trust with various guac artifacts. So this is stuff like actually recording information about the signatures so that we can actually look at, were these like who signed these S-bombs and do we trust the people who signed those S-bombs or signed those salsa statements and so on, and then also associate them so that you could always go back and say, hey, yesterday I trusted this party to sign salsa attestations, but today I don't, right? We are also looking to like work with the community to understand their use cases to build out more advanced queries. And then we're also looking at some additional integrations via the GraphQL interface, and we're also have an experimental REST API that you can use to sort of register complicated queries so you don't have to run the complicated queries each time. This is, we're looking to integrate with various policy engines. One of the other big ones is once again, UI and UX. So front-end engineers, please come join the community. And yeah, and so I know I'm holding folks from the booth crawl and you might have fallen asleep here, but yeah, that's it for me. And once again, the book folks are interested in securing the software supply chain from Manning. That's up there. And I'm gonna open it up for questions. Yes, Justin. So one of the, I like the curl example. It's really interesting dependency because in practice, a lot of ways this gets used by different packages and package managers isn't as a library, but as part of like a pre-install, post-install, whatever, to the point where a lot of times it doesn't get listed as a dependency. So if that's a problem I'm worried about, how, what tooling, I don't think Guac directly addresses that omission of information. Is there tooling around some other thing that does that? And how does that integrate with Guac? Sure. So there's a couple of different things that can be used. I'm gonna talk about, I'll talk about the best stuff for last. So I'm gonna start off with the ones that you might, you know, things like scanners that can scan for software composition analysis, right? They're great as a second, you know, as a double check. They're not usually the best for, you know, even though folks often use them as the first sort of a line of defense. They're useful as well to kind of determine potentially, like, hey, this thing seems to maybe have curl in it because we scan that image. Similar sorts of tools like that that I think are also pretty useful are like stuff like OS query, which can, you know, essentially scan a server to say, hey, what packages are there? What package are potentially installed? Once again, a lot of these things don't prevent malicious actors from obfuscating particular piece of packages, right, because you have situations where somebody changes a byte in curl that still makes it do the same sort of thing, but all of a sudden it doesn't look like curls on there anymore because all the systems that do the scanning often look at it from the perspective of just looking at either a hash lookup or they look at the package database just to kind of say, yeah, does this thing declare that curl is in there? And if it's not declared, then often it says it's not there, which is often not the case. So that's where kind of the more preventive stuff is more important. So this is stuff like, you know, is the thing being built correctly in the first place and am I tracking everything that's being built? So that's where things like Salsa comes in, but also stuff like, you know, in total policies help out to make sure that, hey, it's being built in the first place. So I'm pretty convinced that that thing shouldn't have been able to kind of get injected in. And that's also where, you know, and I know I'm lining this up to the person who maintains Sbomit, but I think there are things out there like Sbomit where you now have, you're generating an Sbom, but you're actually doing all the right things to make sure that you're tracking everything along those lines to make sure that, yeah, something should not have been able to sneak in to that thing without being caught in the Sbom. So I think those sorts of things are probably probably the most common ways to do that. Other questions? Oh yeah. So when you use GARC, do you need a dependency field of Sbom? So generally, yes. Like we can collect anything that's in the Sbom, but like a lot of the information is gonna be super important when you have like X depends on Y, right? And you have like the, you know, the depends on field because what ends up happening is, and there are a lot of Sbom that just have a collection of information. And we can still do stuff with that, but without having the more information in the Sbom, the better results you'll get from GWAC because you'll be able to actually see that, you know, yes, we can see that X depends on Y as opposed to just X and Y are in the same location or, you know, X depends on Y depends on Z. Like, one of the things we see in a lot of Sbom's is you'll see as opposed to X depends on Y depends on Z. You'll often just see X somehow depends on Y and Z and that's it. But what we begin to do actually is we've begun to sort of correlate that data with data in depth.dev, for example, which looks at not just Sbom's, but that looks at like how the actual package dependency resolution happens and can sort of infer then from, you know, oh, even though this thing says X depends on Y and Z, really that resolution according to depth.dev is X depends on Y depends on Z. Yeah, yeah, and I think that sort of thing is kind of really complicated when it comes to, and one of the things that we're looking to kind of solve as kind of time goes on is helping out there, but I think really the way to solve that sort of problem is to start to bake in the generation of Sbom's into the dependency resolution in the first place so that, you know, let's say you're using Rust and Cargo, right? So as Cargo builds software, why not have it come in and as it requires these new dependencies, it begins to say, well, I know that X depends on Y, I recorded that. Oh, I just downloaded Y. Well, Y now says it depends on Z. Great, I've now recorded that Y depends on Z as opposed to a lot of the Sbom's that just sort of say, I saw X, Y and Z in the same location. And so I'm just gonna say, yeah, they're all the same thing. Yeah, any other questions? Cool, if nothing else, you know, thanks again for coming to the guac talk. Once again, we're an OpenSSF project, so join OpenSSF Slack. If folks wanna have, you know, want more information on guac, feel free to take one of these little postcards here that have information about, we have monthly community meetings, but we're also on the OpenSSF Slack. It has information about the docs, how to get started with it. Once again, it's an open source project so you can get, you could just start using it and if folks are also interested after the fact about the book, come over here and I also have some stickers.