 So, again, we're going to tell you about, you know, S-bombs, actually, superpowers and more specifically, how do we make S-bombs better with more S-bombs? Of course. We love recursion. Just a quick thing to note, before we get started, a lot of this work was also contributed by Marco Dekas, who is an internet Google. Cool. So, if you haven't been on Twitter for the past few years, you've probably already heard of S-bombs, kind of heard of software built on materials. But if you haven't, software built on materials is basically, you can imagine, it's an ingredient list of your software applications. So, what's in your software as to what's in your food? And this is really being driven by, especially in the US, the EO, executive order from Biden's administration, as well as the OMB memo that was recently released, saying that here's the deadline. We need federal agency vendors to produce S-bombs. But S-bombs really isn't compliance for compliance sake. And they are actually use cases for S-bombs, such as vulnerability component analysis, figuring out, you know, do you have that log for J on now, the text for shell vulnerability in the software that you create are provided by a vendor, and also other use cases such as licensing. So, let's double down on this food recipe because, you know, every S-bombs is an ingredient list. You know, the first question is, how do we get this ingredient list? And so, let's take this scenario here. I go to this really nice restaurant and all of this board of ramen. And in this hypothetical scenario, I'm allergic to peanuts. So, what I want to do is ask myself this question, you know, I'm really hungry, but I need to know whether that's anything I'm allergic to in this. And so, one way that I can do that is, you know, by visual inspection, I can take a look at the bowl. I'm like, you know, I don't see any visible peanuts. I don't see any peanuts. Yeah. I think you're good. Looks good, right? And then, because I'm super careful, I let Chris, like, have a bite. And he's like, no peanuts. Yeah, we're good. Come taste it. Yeah, we're fine. Should be good. We're good. Right? Then I eat the bowl of ramen, obviously, right? However, so the second option here is asking the chef, right? So I can go back to the kitchen and be like, oh, what do you actually put in this? I'm allergic to peanuts. In this case, the chef comes back and say, yeah, you know, this is our secret ingredient. We actually boil the broth with peanuts. So it's more fragrant. And obviously, I'm a security practitioner, and so I'm a little bit more paranoid, and therefore I go with option two. But how does this all tie back to software? So option one, when you talk about software, we're talking about observing, analyzing something that we have. So this can be a bundle application, a container, a binary. And we use software composition analysis. We look at binary symbols. We look at package metadata, heuristics and fingerprinting. So there's option one. And then we have option two, which is like ask the chef, but what does that mean in the context of software? And this means usually I'm going to talk to the vendors. I'm going to go to the folks that are building my software and ask them what's in this. But the tricky thing about that is in modern software, there is a lot of complexity in that. And that's one chef you're asking. It's a whole network of different chefs. And these chefs can be in different countries, in different organizations, may not want to tell you what the secret ingredient is and so on. And what happens now is that this added complexity makes option two of asking the chef very difficult for S-Bomb generation. Very much like. And so what happens now is a lot of folks today are still generating S-Bomb to option one, to heuristics, to analysis. And that's a good reason for this. It's very usable and it provides a baseline of security that we can work with. And this ties back to the kind of H-O question of balancing usability and security. So in this case, what we want to do in this talk is to show you how we can maintain usability by the same time kind of turn the knob on security. Nice. So my name is Chris Phillips. I'm also worked a little bit on this. And what we're trying to drive home with the idea of software composition analysis is it's no longer on the consumer themselves. There's this idea of like, oh, if we shift left and all the developers scan the things that are on their computers and they do, oh, the binary is fine. Great. No vulnerabilities. And it gets shipped out to production. But what actually happens is if you have like a static asset that's sitting on a container or on your machine, you have so much less context than if you did the build time of itself, right, the producer themselves. You can get manifests from the DB. You can go through all the other metadata of the build system of like the Jenkins pipeline. You can even query and look at every single bit or part of what went into building that binary before it gets shipped out and over. So the analogy here is just like the top, this is Superman as the producer. And by the time it gets to on my laptop, I'm just like staring at a rock trying to decode it. Compilation is obviously lossy in itself. And it's always a losing battle if I'm trying to do that as an individual developer. Here's a good example. We just ran a package manager tool from open source against one which was a file and we found 121 packages in that go binary. But then when we stepped up a level and we scanned a directory as our context, we found 140 packages. So there were 19 or so packages at that directory scan level that might have been used to go into the build process. So imagine you have like an intern and we're looking at the dependency list for the go module. And someone goes, yeah, is this everything that's been introduced in the pull request? I just scanned what was downloaded, so that seems good. And then you get the LGTM, you get to go home. Everything is deployed to production, you're fine. But there's always an issue. There's more. If we actually look at the build logs for when we scan the directory, we found, oh, there was going access at this commit shot and also this commit shot in a temporary. And there was one at this commit shot. Did we scan those? Did we do any analysis against any of those? It's in production, isn't it? And you have to show up on Monday and realize that you're in a lot of trouble because you didn't go deeper on the graph of analyzing how it was being produced. So what we're going to show in this demo and kind of with the tools that we put together is we want to make metadata discoverable as producers of software rather than consumers, putting the owners on consumers to generate their own proof of things being vulnerable or not or secure. So producers and people who publish software get ready. Awesome. So just to set the stage for the example and the demo we're going through, what we have is a build of a Rust binary here. We have our developer, our chef here, compiling something we call a vulnerable demo binary, which shouldn't be vulnerable, right? And we do the thing that most of us, once upon a time while we were learning Docker do, which is create a Docker file from Alpine or something, run curl into a binary and ship it, right? It's in production. And so what we see, this is the state of today, right? So if we take a tool like sift and scan this container image, it's going to tell us a few things. It's going to say, cool, you're using Alpine. Here are the list of Alpine packages that you're using. And by the way, I found this thing called vulnerable demo. I have no clue what it is, but yeah, it should be fine, right? So I think the question that we're trying to say here is, what if I could go to the source and ask, what's in this thing? And so what we have here, and alluding to what Chris mentioned about having producers kind of share the metadata, now we have the chef here. On top of creating and compiling the binary, they also run sift to then generate the s-bomb. And then we add this step of testing the s-bomb and putting it in a discoverable stall, and in this case for a demo, we're using six-stall, and more specifically, the recall transparency log. So it's taking the s-bomb, it's testing it, putting it in the stall. And now what sift then does is now when it sees the binary, it's okay. Let me go check out if there's any metadata available. It's going to take the hash of the binary and take the shot of the 56 and ask recall any metadata, any s-bombs that I can use. And in this case, because there is one, recall returns with the s-bomb. And then what sift now produces in the s-bomb includes and encapsulates the additional metadata. Right, so we can put all these nice fancy graphs on the screen, but everyone comes to keep calm because they want to see the code of how do we actually get to this final step right here. Now we're going to choose a side, we've gone back to the old gif. So everyone will get to see live step by step how we would do this as a producer. So the first part we talked about in that graph, build the artifact itself, right? Just a nightly build release, and we get our binary. Tail all this time, you can use make, you can use go build, anything. The next bit we do is we go in and we say, okay, I'm going to take this show posts. I'm going to take this new thing that I build and I'm going to double check and see, yeah, thumbs up, no vulnerabilities. Right now, we're the consumer. We are going through this whole process and we're saying, my code is not vulnerable. I'm going to ship this right now. We're good, right? And then finally, we're going to generate an image. And if you're following along from the graph that Brandon put together, this is where we are where we build and we package that binary in the image. We scan that as well. And then this gets uploaded to JFrog or Docker hub. And everyone's extremely happy because we're zero vulnerabilities. We could report that and we get a nice little pat on the back. We have solved supply chain security. That's incorrect though, right? If we want to go, we're going to go a bit deeper. Let's try this with more fidelity. So we're going to do the directory scan, actually get the full contents that go into this Rust binary that we're building. We're going to get the SHA-256 of our binary. So we're going to go into that release. We're going to fingerprint it. And then we're also going to fingerprint our S-bomb. So if you think about it, you say, this artifact that I have right here and this S-bomb, these two fingerprints are linked together. What that means is that when we go and check the record transparency log, if the pointer to whatever the blob store is of that extra metadata document that you're uploading, that S-bomb, that vulnerability report, if that SHA does not match the fingerprint of the thing you're testing to, bail out. You cannot verifiably use it for any reason. Someone is trying to man in the middle of you and there's no way for you to understand how or why that's correct data. Here is the code. And I made sure that this one didn't play for a little bit so you can watch it go all the way through. But there's a binary called S-bomb attest that Brandon wrote. And what this does is this interacts correctly with Sigstore. We build the predicate type, which is in this case Google S-bomb. That's how we know when we go out to pull down the extra metadata that it is the type that we want to decode and scan for vulnerabilities. We include the subjects of the binary itself. So that's us committing the transparency log that this file, this exact fingerprint exists for this metadata. And then we upload it and then sign it with whatever method that we want keyless wise. So in this case, I'm using my OIDC provider for GitHub. But you could think of all different trust policies this way where a local, I would say, the golden goose keys of your company is signing it so you only trust things in Sigstore that were signed by those keys, not just a random developer, Chris Phillips at GitHub. The levels of trust and policy that you can do on this are nested where you have that trust policy problem. And then if you're following along in the diagram again, you are here. So now we're uploading this additional metadata. So now we go back and we say, did this work? Do we have a better image? Now with this fingerprint uploaded to Sigstore, when we go to catalog and when we go to actually look at everything that's in this image, we will be greeted with, in this case, this is human readable, but you can also state as an annotation, machine readable, UID. And this says, hey, I fingerprinted that file that previously I had no idea what it was. And some person that's in your config that you trust says, thumbs up. That thing has additional metadata. They have generated as a producer. They have put up there and uploaded for you for vulnerability analysis. So if we pull that down, we use our new superpowers, we're going up to Recor with this command, we're using the UID, we're finding the attestation, we're pulling the predicate, and then we're scanning it right here. And we can say that because the producer has been responsible, because the producer has uploaded a document that's more complete based on all of the metadata and all the information that they had in the build system, we can alert the consumer that there might be vulnerabilities, that there might be something that otherwise was just a black rock of a compiled file before. And you think about this, imagine if you went and built a system where something hadn't been fingerprinted, and you're just blindly scanning no vulnerabilities forever, and then all of a sudden, you go and you start making these correct attestations to the thing that the producer is putting out there. All of a sudden, because it's in Recor and you say you trust it, your vulnerability reports can start growing where you can see that new metadata that's being populated by the producers that are building correct S-bombs in this case. This is in contrast to, obviously, the very sad and very localized example that we did earlier, where we as a developer were just trying our best to analyze the thing there. And this is basically running the exact same command with different results. Yeah. And so everyone loves memes. They said there's no way you can see those vulnerabilities in that compiled file. You're right. But I can trust the person that's uploading the data that knows what's in there from their build system with all the additional details and then get the vulnerabilities. And like the world is your oyster, too. You can pull license data from that as well. Any kind of compliance check that you think that you can assert data on from those build systems, you can upload and run it into an extra tool. There are obviously caveats, but the great power comes great responsibility. So Brandon, please. Yeah, we did a mixing with the Spider-Man now. So one thing that we're using here for the discovery of the metadata, we're actually using the Recall Experimental Assessation Search function. We are building another project called Guac that we'll be talking about tomorrow that we'll really focus on cataloging all this metadata so that it's searchable so you can search it by calling that addressable hash. And we are having a talk about this tomorrow, 11 a.m. in the same track, the security track. So if you're interested, find out more about that there. Another quick note is Attestation Format we're using is a little bit custom, not exactly the same as the in total SPDX one. This is to keep it small enough so that Recall was installed at Attestations. And finally, as Chris pointed out, there is a lot of things you can do with trust policy. It's that list of people that I trust for certain artifacts, that organizations like Google, Microsoft, or Intel that I trust for the binaries that they produce. So a couple open issues still in discussion. And I believe this is all upstream and main so you can just play with this today if you just run SIFT on your containers. Cool. Oh yeah, and the last part too that I did want to mention is that we do want to start thinking about and getting ideas from the community about how the parse multiple entries from third and first parties and how to do that kind of decisioning against who is the first order of which one do you take when you're evaluating, right? So if the producer themselves is always favored or if you know that, let's say Red Hat or Google are doing independent analysis of things, right? And you want to take theirs instead of the producer because you've had some run-ins or some more vulnerable things that have been led on that you could just lift that in rather than the first party producer themselves. Yeah, so in conclusion, I think this is kind of a shout out to be like, everyone should be producing software metadata and then attesting it and making it discoverable. We see that it doesn't, we can kind of work at this problem from both angles both from asking more about metadata but also from the producer side and making this basically transparent and seamless and maintain usability of S-Bomb generation but having better S-Bombs. Check out our tools and check out all the awesome open source tools that exist that produce these documents as well. It's awesome. Just there's so much going on in this ecosystem right now and the more that we get, everything kind of working together rather than just granular, insular and one single projects. The better it will be for both the consumer of software and hopefully producers in that guard. So thank you. Okay, we have time for questions and I see one hand already. Hey, this is really interesting stuff. I just have a question about the, so you have a bunch of problems with the permission model and stuff like this and also I'm curious why you use the attestation format and in Toto because link formats in in Toto actually in the layout formats actually handle all those permission issues for you and things like this. I'm curious why not just use in Toto links? So we are using that. I think there is like the expressability in the locator of the metadata. So we had this restriction where record doesn't store attestations above, I can't remember a couple of kilobytes or something. So we do need a way to store that. We are using the in Toto attestation. We're just having a custom predicate for that. Okay, but the link, because there's different formats for attestation versus link format, did you look at the link format also or not? It's okay, we can take this off. I'm not trying to nail it to the wall here or anything. Hi, maybe I didn't really get it. What kind of format do you use for your response? Oh, yeah, that's a great question. Yeah, so I think in this demo, we use the SPVX JSON format, but essentially you could use basically anything because you basically say here's the additional metadata I can go download that you can take a look at it. This does lead to another thing that we kind of hinted at over here but didn't talk about explicitly, which is S-bomb composability. So within individual ecosystems, it is generally pretty simple as we saw with the Golang one. You can have a good idea on what everything looks like within your ecosystem, but then when you pull from other ecosystems like for the Rust and the containers one, this one, you kind of need to put them together. So in this case, the metadata we can gather for any kind of S-bombs, any kind of format, whether it's like on DX or SPVX, that is a aspect of composability, which for example in SPVX, what we do is we use the described relationship to say that this binary is described by this particular S-bomb, which is in a separate document. So that is like a little bit of nuance in there, but in general, any format, any type of metadata should be able to be handled as long as you know what you're expecting. Thanks. Any more questions? Okay, so there's a couple ways to go about generating something like an S-bomb, and one of them is scanning. And another way is to pre-populate this somehow. Basically, you can initially do a scan and then have this be like the set of rules for the things that are supposed to come in. And I've heard a lot of discussion about how scanning as a whole has a bunch of problems with especially malicious actors. If there's a malicious actor anywhere in your system, then scanning is just not gonna save you. And I'm wondering if you can just comment on that and where you think this might go to move to a model that's, where your work might go to move to a model that's secure against malicious actors, not just oddities of scanning tools not picking things up. Do you want to take this? Yeah, it depends on what the malicious actor is doing. So in the case of like if they're modifying your APK database, right, and they're looking at different entries and you take the sum of that and you're using those digests and fingerprinting as much as you can and you're diffing against the different outputs of S-bombs, hopefully ecosystem evolves so that we can see those different diffs between those documents and say, oh, the fingerprint here changed from APK to APK. Those are not the same bomb out we need to investigate why this wasn't the same for our quote reproducible build. If that's the main use case I've looked at as far as like a malicious actor changing data on the system to represent that some version is not there or that some like, let's say you're doing a scanning tool and you're not actually reading the file, you're pulling a bunch of package management info or metadata.json and you're not asserting that that thing actually exists there. You can go a step further and compare that metadata, that package.json, that package lock to the files, read the file itself. Does that match up exactly with what exists there? If that's been changed, bomb out, a certain show to the user that something different has happened at this level. And you can take that localized example and you can go even further up. So if we're talking about uploading these like S-bombs to some blob store that can be discovered via stick store, then when you go and you pull that down and the company has said, this is reproducible, this is the exact same. And you dip it against a previous version that you've pulled, then you say, hey, you said this was reproducible. You said this is exactly the same. Why am I seeing like a get diff of a red minus and a green plus here where it wouldn't have existed? So if there are other, but like whether, we can talk more offline about other motions acting kind of concepts of what they could do because that is, that's awesome to think in that space. Yeah, and also to add on to that, right? I think that S-bombs is, you know, concentrating on figuring out what's, and like the process of trying to immobilize it. I think there are other aspects that we want to tie into. So like having the provenance of S-bomb but also having that of, you know, the salsa attestations and kind of figuring out, you know, whether there's a bad actor, there's a bad identity, which documents are affected and which S-bomb generation processes will affect that very much you would like a regular supply chain attack. And I think like that's kind of like what we're trying to dissolve with like the block project as well, right? Is that now we have the S-bomb information. That's that the S-bomb people like worry about their problem. We will solve other aspects of this with salsa, with developer insights, with, you know, tracking like, oh, was this a malicious developer? This was a malicious comment, propagating that through the transitive graph and then doing analysis of that. Right, wave at me in the back if you have more questions. One more shout out for the Glock talk coming up as well. Okay, if no more questions, please give a big round of applause to our speakers. Thank you.