 Hello, Supply Chain Security Conference, Los Angeles. My name is Eva, and I'm joining you today from Rainey, Seattle. I'm going to show you a brief overview of the breadth of current development efforts in the open source software supply chain landscape to help anyone who is new to the space find their way. And then I'll spend a little time talking about one of the many tools that folks are building right now that I think will be most helpful. My writing teacher once said, you start writing a book by writing the ending. So I'm going to kind of cheat now, and I'll tell you how I imagine this journey ending before we start. What if there was an easy, efficient way at launch time to enforce a policy that could check each executable object's complete build tree against a then-current list of known vulnerable source files? Wouldn't that be awesome? Wouldn't that solve a lot of the challenges we collectively have? This might seem like fiction, like I'm just dreaming, but it's good to know what I want to head towards, and I think it's possible. So now that we've imagined that end, let's work backwards. What are some obstacles? What monsters might lurk in this landscape? And what friends can we make along the journey? And that's how I began a few months ago. And I asked a lot of questions. I met with a lot of folks. I listened to stories from different communities. And with each person I met, one of the questions I asked was, do you have a map? And not only did nobody have a map, everyone said they wish they had one. And I do love surprises. And so while I thought, when I joined this effort at Microsoft that I could find a project to contribute pretty easily, turned out the most helpful thing I could do was hold space for everyone else to come share their work and start a map. And that's how this talk started. So as I made a list of projects that I knew, and I reached out to folks and I met with our community as that list grew and grew. And now today, I'm aware of 12 open source foundations doing work in the space, a whole lot of work on reproducible builds, several different Linux distros and compilers, and another 42 open source projects working specifically on software supply chain security. And I began to think of these kind of like all these different villages, different projects grouped based on what part of the ecosystem they're in. And lots of groups are busy trading ideas and some groups don't quite use the same terminology yet for the same things. But I do wanna share with you, even though I don't have time to go into everything, a quick overview of the five categories of tools, a way of grouping these different projects together. We have standards, we have Sbom generation tools, we have scanning tools to review things that have already been built. We have identity or attestation tools and we have access control or policy tools. And across all of my conversations with people and different groups and different foundations, there are two concepts that have really anchored my exploration and helped me sort of function like a, these concepts function like a Rosetta stone letting me translate concepts between, sorry, translate terminology between different communities that aren't yet using the same words when they mean the same things. And those terms are claims and policies. Now my understanding of these terms comes from ongoing work in both the Confidential Computing Consortium that I've been a part of for almost two years and in the IETF remote attestation working group. And I, as a little side note, I really think attestation or remote attestation rooted in a hardware is going to be key to the future of secure distributed computing that we're all trying to build. Now a claim is a statement about the construction, composition, validation or behavior of an entity that affects its trustworthiness and a verifier accepts evidence and then performs an appraisal by comparing claims found in the evidence with some reference values or some policies. And these quotes are summarized from the IETF draft specification for remote attestation. And now with these two concepts in mind, with three, claim, verifier and policy, let's look at the landscape of open source projects again from another perspective. How might one build a whole solution out of all of these different projects? And every solution, I believe we'll need to integrate components that provide certain common functionality roughly across those same five categories of tools. And as I've talked with different groups, this picture formed in my mind. So let's think of this like a map and take a look at it. An S-bomb is a type of artifact kind of in the middle of the map here. Specifically, it is a claim. It is metadata about an object like a Docker image or a Debian package. An S-bomb is a claim generated by some tool or process like in Toto that says here's information about this thing. You might care about that. That claim needs to be stored and conveyed so it can reach you possibly independently of the conveyance of the object that it relates to. Some claims might be stored in a container registry. Others might be stored on GitHub or in an S3 bucket or on a blockchain. And while that is all storage, not all storage systems are equal. Some storage systems might provide a stronger claim about their own security or integrity properties. And you may or may not choose to trust any given claim. For example, you could rely on an identity system that's already widely in use like DNS and SSL to verify the identity of a claimant before trusting a claim or before using a storage provider. You might even ask an auditor to verify a claim and it's with an object evaluating their trustworthiness and making yet another claim that refers to the first claim. An automated decision-making process that might establish such criteria as who you trust or what evidence you require or how to verify all of that. That is the domain of policy. And while all of this is meant to facilitate a transitive trust across decentralized, often chaotic-seeming software supply chains in open source or not in open source and closed source, and while today's widely adopted SBOM standards do tell me a lot about a package, I don't think they say enough about its dependency tree. Recent drafts and ongoing revisions of standards such as SPDX are adding support to describe deep dependency chains, but many open source projects don't yet produce an SBOM or include that in their SBOM and those standards aren't quite done yet. And asking smaller projects to do all this extra work is putting a lot of burden on a lot of open source maintainers that doesn't directly benefit them. Well, what do we do, right? We're now armed with a map and a translation guide. So let's revisit my initial thesis. Do we have enough tools to build an efficient way to enforce launch time policy against a build tree for every incoming binary? And even if every open source project did ship with an accurate SBOM with dependency references, to build this, I'd need to find all of those, bring them all together, digest them, build a database, clean out any errors in it, and then perform that runtime policy evaluation. What might such a process look like? And today, in most cases, that build info and the signing and the SBOM generation takes place after compilation, whether that's an executable or a package or a Docker image. A build process is often usually separate from the SBOM generation process. And that's it also additional complexity in work for project maintainers. And many SBOM tools don't capture the full build tree or scanning tool that you might run later only looks for package name and version strings and doesn't really scan the content. So if there was a downstream change, it might not get noticed. And so while I think it's possible to build such a large database and gather all the data, it'll take a lot of work and it won't be efficient or as accurate as if the build tree were captured in the first step. Now, remember how I said at the beginning I kind of cheated and told you the end of the story first. I wanna talk a little bit about a project that I found along this journey called GitBom. And this approach learns from many of the excellent qualities of Git. It's decentralized, it's a merkle tree, it's compact. It builds a lot of powerful functionality from just a few simple primitives and Git is already a de facto standard. So we don't need to go through a lengthy standards development process. Even if we all use Git every day, I don't want to assume that everybody knows how Git works under the hood. So I'm gonna give a short recap of the relevant functionality. And Git is really a database that's masquerading as a version control system, cleverly using your file system as a storage engine. There are three types of objects stored by Git. Blobs, commits, and trees. And every object is stored as a separate plain text file on disk with the same format, with a fixed size header and a descriptor of the type of file and the size of the file followed by the raw content. These files are all named deterministically to the shablon hash of the file. So they're guaranteed to be unique globally even, right? That means if I have the same content, anywhere it is stored, it has the same referent, the same Git ref. So we get deduplication across trees and branches. We even get deduplication on a centralized service like GitHub and you and I could even reference the same file that's stored on my hard drive and your hard drive by the same file name even if the path is different locally because it has the same Git ref. Now we can splash on some metadata to capture those differences like the different path on my computer or the timestamp or a copyright or the provenance of a file if I built it, but we still know because we've separated or Git has separated the metadata from the actual file content, what's changed and what hasn't. So from Git's perspective, every artifact is just a blob. It's just a byte array and we can build a tree out of those byte arrays including trees themselves as a text file as a blob. So as I mentioned earlier, S-bombs today typically combine metadata with the artifact tree. If I have say an SPDX S-bomb describing a project and it's referencing several other projects that are dependencies, those S-bomb documents are referencing each other. We have the artifact ID and all the metadata about it in documents that are forming a tree like this. But Git separated them for good reason and so with Git-bomb, we're proposing to do the same thing now. And one reason, among several, is that we want an efficient way to scan the whole tree for known vulnerabilities so we want the tree to be small and easy to scan. And a tree can contain references to any kind of file. And input files could be source files or header files or object files or other libraries or other container images. Remember, every artifact is just a blob. And we can build trees out of these in our build process. It's what we do today. With Git-bomb, we'd like to use the Git ref as the actual artifact ID. We already have them stored in Git. And then represent the dependency relationships in a Git-bomb doc much like Git's tree document type. These themselves can nest. So one step in a build process can reference preceding steps and their build trees. We can still use existing S-bomb standards to capture metadata and reference either an artifact or a build step as it's Git-bomb document. And with all of this put together, if we integrate the generation of Git-bombs in the compilers themselves, then we get a flow that looks more like this. What are the last couple pieces that we need to get there? To have Git-bomb documents automatically generated. We would need to do three more things. We need to embed the Git ref of the Git-bomb in the actual build output from the compiler or the build step. Generate the Git-bomb directly in that process, not after the fact. We could embed it in the elf header or the container manifest. And the actual compiler or linker has authoritative knowledge so it's not gonna be as prone to errors as log scraping. We can write the output directly to a .bomb directory much like the .git directory. And then subsequent build steps can just read off the disk about preceding build steps. And this gives us enough information to correlate CVEs based on file or package hashes later on. One of the biggest upsides to this approach I believe is that it doesn't require any additional work for most open source developers. If these changes were added to compilers or container build processes, then most open source projects get it for free. They just run their existing build tools or compilation process and there's additional metadata generated by it that can be used downstream. We also get tamper resistant build artifacts. And while scanning a B tree of 20 byte hashes, no matter how deep that B tree is, it's relatively fast. That's the end of the story for today, folks. Stay tuned. Hope to see you around. Hope you're having a great conference. You can find me online on Twitter or elsewhere. And if you wanna get involved in GitBomb, we've got a temporary channel set up on the open SSF Slack for now. Just DM me on Twitter and I'll point you to where we're working. Thanks so much. Bye.