 Hello, everyone. Thank you so much for having me here today at Open Source Summit Europe. I'm Nell Shamrell Harrington. I'm a Principal Engineer at Microsoft in the Azure Office of the CTO. I'm also a board member of the Rust Foundation and a long-time open source contributor. If you'd like to reach out to me at any time, I'm at Nell Shamrell on Twitter, and it's wonderful to be here with you today. I'd like to start this talk off with a few questions. Do you build software? Go ahead and raise your hands, whether you are remote or in-person. All right. For those of you with your hands up, do you build a lot of software? Even though I'm pre-recording this talk, I'm guessing that a lot of hands just went up, both in-person and in our remote audiences. Nearly, if not all of us, build a lot of software. Let's now consider a scenario. What would you do if one morning, after you poured your coffee or if you prefer poured your tea, or in my case, I like to pour both a cup of coffee and a cup of tea? Let's say you sat down at your computer and received a new CVE alert. I know the first question I would ask as soon as I saw this was, am I vulnerable? Is my company or organization vulnerable to this CVE? Where would you start to answer this question? I would likely start by pouring another cup of coffee and another cup of tea, and then start the process of gathering evidence. This evidence I gather is meant to answer key questions. What dependencies am I using in my software? What parts of those dependencies am I using? Sometimes one part of a piece of software is vulnerable to a CVE while other parts are not. What dependencies, the dependencies of dependencies, etc. Am I using at all levels of my software builds? It's often not just the first level of dependencies that I need to worry about. It's all the levels. Where am I using all these dependencies? If I build a lot of software, I need to isolate what pieces I need to update and what pieces I do not need to update. A solution that helps provide this evidence, that helps you gather it, is Gitbom. But before we go into what Gitbom is, it's useful to cover what Gitbom is not. Gitbom is not itself an Sbom. It is designed to complement Sbom, such as SPDX and Cyclone DX. In fact, Gitbom is included in SPDX 2.3, which at the time I'm recording this was just released. Additionally, Gitbom is not Git or any version control system. It is inspired by Git but is not dependent on it or tied to it. Now that we've covered what Gitbom is not, let's transition into what Gitbom actually is. Gitbom is first and foremost an open source project and community. This community is creating a minimalistic scheme for build tools that provides a programmatically generated record of every artifact used to construct a piece of software. At every level of the build chain, and we're talking all levels of dependencies here, not just one or two, it follows the dependency chain all the way down. Let's take a visual look at how Gitbom works. When a build tool uses the Gitbom scheme, it creates an artifact dependency graph, tracking every input or generated artifact throughout the build process. For each artifact, it hashes that artifact and uses the hash to create a unique Git object ID, which we call a GitOid for that artifact. It then writes that GitOid to a Gitbom document. When the build is nearly completed, it calculates a GitOid for the Gitbom document itself that it's been writing to, and embeds that GitOid into the final artifact or artifacts that the build tool produces. That's a big overall view. Let's zoom in on each step in this workflow, starting with the build tool itself. When I say build tool, I mean any tool that reads one or more input artifacts and writes one or more derived artifacts as output. Some examples of build tools include a compiler that takes source code files and produces a compiled object file, or a linker that takes object files and produces an executable, or a dynamic linker that takes an executable and a system library and produces a running executable. It can also be something like the bytecode compiler, which takes a .py source file as input and produces a compiled bytecode Python file as output, or something like the Java compiler, which can take a .java file and produce a Java class file. And these are only a few of many different types of build tools. What all these types of build tools have in common is the ability to take input artifacts and generate derived output artifacts. And when I say artifact in this context, I mean any software object of interest. This includes source code files, object files, executables, container images, and more. There are so many different types of artifacts that can be created, transformed, and consumed by software. What they all have in common is that they are all arrays of bytes. And in Gitbom's view, two artifacts are only equivalent if and only if those arrays of bytes are equivalent. Additionally, every artifact should be identified in a way that is canonical, that is unique, and that is immutable. We'll talk more about what that artifact identification looks like a little later in this talk. But going back to our workflow, the build tool will generate an artifact dependency graph throughout the build process. This artifact dependency graph of an artifact is the recursive, directed, a cyclic graph of all input artifacts that are transformed by a build tool into an output artifact or artifacts. This includes direct input artifacts, things like source code files, and recursive sets of artifacts to each input artifact all the way down the graph. Let's look at some examples of artifact dependency graphs. In the case of C, dot C and dot H files are compiled into dot O object files. Those object files are then linked into an executable. In the case of Java, dot Java source code files are compiled into dot class files, which are then combined into a running executable. Something we like to refer to in the GitPhone bomb project is the artifact singularity. An artifact should have precisely one artifact dependency graph. And all equivalent artifacts should have the same artifact dependency graph. As the build tool generates this graph, it computes a GitOid or a Git object ID for each artifact. A GitOid is how GitBomb identifies an artifact. It is canonical, which means independent parties presented with equivalent artifacts will derive the same identity. A GitOid is also unique, which means non-equivalent artifacts have distinct identities. And a GitOid is immutable, which means an identified artifact cannot be modified without also changing its identity. The word GitOid stands for Git object ID. GitBomb follows nearly the same process as Git uses to generate object IDs, but GitBomb adds to it. We're about to go more in depth, but now is a good time for a reminder. GitBomb is inspired by Git, but GitBomb is not Git and GitBomb is not dependent on Git. With that said, let's first consider how Git creates an object ID. As Scott Chacon and Ben Stroud put it in their book ProGit, Git is at its core, a simple key value data store. You as the developer insert content into a Git repo and Git hands you back a unique key that you can use to retrieve that content. This key is the object ID. When you want your content back, you provide that unique key to Git and Git hands you back your content. This key is the Git object ID, which is composed of a header that header includes the object type, such as a blob or a tree, followed by the content byte size and a null byte. This header is then concatenated to the content, such as the contents of a file. Git then calculates the SHA-1 checksum of the header plus its contents and gets the object ID as a result. And if you are about to jump out of your seats from hearing SHA-1, yes, Git does now use a hardened version of SHA-1 and yes, it is still controversial. We will address that in just a bit. After Git calculates this ID, it persists it to the file system using a .git directory, which contains an objects directory. And within that directory, it creates a subfolder named after the first two characters in the generated hash and then writes to a file named after the remaining characters of that hash. All right, about the use of SHA-1. As many of you likely know, there are security flaws in SHA-1, notably the shattered attack. Git version 2.13.0 and later use a hardened SHA-1 implementation by default. This hardened version is not vulnerable to the shattered attack, but there are still other potential vulnerabilities that may be discovered in the future. There is an effort to migrate Git to using SHA-256 rather than SHA-1, but it is slow going. So Git Bomb is inspired by Git. And in early designs, Git Bomb also used the hardened SHA-1, but we received a very strong negative response to that. This is understandable. Git Bomb is intended to allow you to know exactly what goes into your software and that is not something we want to be vulnerable. So rather than using only one hashing algorithm for all Git Oids, Git Bomb creates parallel trees of Git Oids. For every artifact, Git Bomb currently calculates one Git Oid using SHA-1 and one Git Oid using SHA-256. This is something that can be changed in the future as hashing algorithms and our understanding of them continue to evolve. In the future, we could easily add additional algorithms or stop using certain algorithms as appropriate. A Git Oid is very similar to what Git uses to identify objects, but we add a couple of things to that. Git Oids first consists of the word Git Oid, followed by the object type, such as a blob, followed by the hash algorithm, SHA-1 or SHA-256 at the moment, followed by the hash value calculated using that algorithm. Here's an example of a Git Oid using the SHA-256 hash. We have the word Git Oid, followed by the object type, which in this case is a blob, followed by the hashing algorithm used to create the hash value, followed by the hash value itself. Git Bomb then persists this Git Oid to the file system in a dot-bomb directory. And within that dot-bomb directory, there is an objects directory, like with Git, and then a sub-directory named after the first two characters in the hash, followed by a file named after the rest of the characters in the hash. After generating each Git Oid, the build tool will write them to a Git Bomb document, or to be more accurate, two Git Bomb documents. The SHA-1 Git Oid goes into the SHA-1 Git Bomb document, and the SHA-256 Git Oid goes in the SHA-256 Git Bomb document. And this continues for each artifact throughout the build process. When all the Git Oids have been calculated and written to their respective documents, the build tool then generates another Git Oid for each of the two documents themselves. And then embeds those Git Oids into the final artifact or artifacts. As for how that Git Oid is embedded, we are still figuring that out, but we do have some ideas. And it varies based on the type of artifact. For L files, we are considering embedding the Git Oids into an L section called dot-bomb. For things like tar or gzip files, into an archive entry also named dot-bomb. And for Java class files, into an annotation named at-bomb in the class file. For container images, we could see embedding the Git Oids as an annotation in the image manifest. And as for the generated source code, we could embed the Git Oid into the file itself as a comment. It is important to note that none of these are set in stone yet. There will be many more discussions and experiments to come. Our vision for Git-bomb is that if you are using it in your build tools and you hear about a new vulnerability and have to answer the question, am I vulnerable? You will have additional evidence to determine whether you are vulnerable anywhere in your build chain and at any level of your build chain. This means that rather than it taking all of the coffee and all of the tea to answer the question, am I vulnerable? It might only take one cup of coffee or cup of tea before you know the answer and can act accordingly. Is this sounds good to you and you're wondering where you can get Git-bomb? The project is still in its very early days, but we have an active community from across the open source ecosystem. We are creating this project through open discussion and consensus. So far, we have drafted parts of the spec on GitHub with much more to come. Please feel free to check it out. We have submitted GitOid as a URI scheme to IANA where it has been provisionally accepted. You can check it out at that URL. And as mentioned at the beginning of this talk, GitOid has been added to SPDX version 2.3. We are a compliment. Git-bomb is a compliment to SPDX. And is now a part of it. We have also been creating prototype implementations of Git-bomb, including integrating Git-bomb into LLVM, implementing Git-bomb in Go, implementing it in Rust. That's the one that I've been working on in Bash. And there will be many more prototypes and much more experimentation to come. We would love if you would join us in this project. We have a website where you can find more information at gitbomb.dev, including the white paper, which lays out the idea of Git-bomb. You can also find information about our weekly community meetings. We would love if you would join us there in our Zoom meetings. And you will also find links to our Slack channel. Our Slack channel, the Git-bomb Slack channel, is in the open SSF Slack instance. And links to our GitHub org. We would love to have you join us on this journey. Thank you. Again, I'm Atnel Shamrel. Feel free to reach out to me here in the conference or reach out to me on Twitter at Atnel Shamrel. It is so good to be here. Thank you again.