 Hello. Welcome. I think it's time to start. We're a couple of minutes past. Whatever, we'll catch up. We'll go fast. Hi, I'm Ava. I do stuff at Microsoft. I've been doing open source for a minute or two. This is Ed. I'm Ed. I get into random trouble throughout the industry. We're going to skip intros. Most of you might know who we are. We're pretty easy to find online. We're assuming a little bit of our background, but you're really here to learn about this project, which you may have heard of GitBom. We've re-branded the community all agreed and had a lovely voting discussion, chose this name and logo Omnibore, which stands for a universal bill of receipts. So quick backstory on this. I have tried for all of my 20 odd years in tech to say I don't do security. And then a couple of years ago, well, supply chain security became a hot topic. I'm like, okay, fine, fine. I'll go look at the tools that people are using for this supply chain security business and kind of approach it with this bit of a beginner's mind. And I've never really thought about an S-bomb or source composition analysis tools. And I went, look at the field of what's out there. I'm like, wow, this is a tangled mess. We can't even agree on what the term attestation means or what format to sign packages with or what format to describe the dependency an S-bomb in. Gosh, this is a tangled mess. Wouldn't it be nice if it was simpler? It's really frustrating. At the bottom of security, we all want to know, am I safe? Am I safe to download this piece of software from somewhere on the internet and run it in my home or in my company? When I'm buying a piece of software from a supplier or a piece of hardware, a server to stick in a rack in my data center, am I safe? Is the software in that machine that's baked in the firmware baked into the baseband management controller? Does it contain log for shell? I want to know that. And it's kind of hard to get that answer. So I feel kind of unsafe because I don't know how to know what's in the box. But hey, S-bombs are kind of a thing and we're all talking about them now, thanks to an executive order a couple years ago, which really changed the tenor and the meaning of this phrase. I'm a little grumpy because S-bombs used to mean or used to be used for license compliance. And now they're all about security. And the NTIA definition of an S-bomb minimum elements doesn't even mention license anymore because security is important. We need to know what's in the thing. See what I mean? It's all still tangled and confusing. When you have an S-bomb, maybe you've downloaded a package from GitHub or from a foundation or gotten something from your supplier and they've actually given you an S-bomb. If they haven't, you should ask for one. You might wonder, how was that S-bomb generated? And can you trust that S-bomb? Are you sure it's accurate? Is it complete? I mean, if it's signed, clearly that means it's trustworthy. No. If you verify the signature, all that tells you is it wasn't tampered with in transit. It still doesn't tell you that it's complete or it's accurate or the person who wrote it isn't lying to you. You don't know any of those things. You don't know that it's gonna match what someone else says. And you might have to do things like munch formatting. Maybe a CVE is published that says this version of this piece of software is vulnerable and they use dashes in the software package name and your S-bomb contains underscores and it doesn't match when you compare them. Whoops. Signatures don't solve all these problems. But before we go further into this talk, let's take a big step back and ask ourselves, what is trust? We're all asking, is this software trustworthy? Should I trust it? But trust is not a property inherent in any system or piece of software. It is an assessment of something based on experience. Trust is a declaration made by an observer, nay, an attestation. It is not a property of the thing observed. And trust is always these three properties. It is time dependent. I might trust that software today and tomorrow learn about a vulnerability in it. It is asymmetrical. I might trust you driving a car with me as a passenger. You don't need to trust me for me to get in the car while you're driving. As, right, because I'm not driving it. And trust is contextual. You probably shouldn't trust me to drive that car after three drinks. So what can we trust? Well, if you're scanning the inputs of a build process, maybe you've decided to download all the source for all your dependencies and rebuild it all yourself because that makes it trustworthy. It doesn't necessarily explain the output though, right? If you're looking at someone else's build system and they say, here's what I put in, maybe they used some compiler flags that they didn't tell you about and that changes the output in some interesting ways. Who here hasn't seen debug symbols accidentally left in a production build? Yeah. Well, if you're only down to the binary and you do some deep scanning on it, you can often learn some things about its inputs. But it's not really that accurate, right? If I give you a pie and you have, say, some allergies and you ask, what's in this pie? Does it contain gluten? And I tell you, no, it doesn't. Are you really gonna trust me with your life if you have a serious allergy? You could look at the pie, you could maybe do some poking and prodding and well, that dough doesn't feel like a regular dough but post hoc scanning isn't great either. Well, build tools inherently transform their inputs whether it's a compiler or a linker or Docker build, right? Part of what we use build tools for is that transformation and I would posit that the only thing we really can trust is the build tool itself because only the build tool really knows what transformation it performed and how it performed it. So what is a build tool? So how many people in the room have actually built something? Excellent, good. So this is not gonna be new to you. So a build tool fundamentally is anything that takes a set of inputs, transforms them and produces an output. And the world is littered with examples. So we're probably all familiar with compilers. A C compiler will take a C file and a bunch of .h files and it'll output an object file. Java will take in, the Java C will take in a Java file and output a class file, sometimes more than one class file but let's not get into anonymous inner classes. And what may be not clear to people is how many of you knew that Python compiles? Okay, much smarter than average audience. The Python interpreter actually compiles things to Python byte code and it writes them out as .pyc classes so that it doesn't have to recompile when it runs again. And then as we move further up the stack in some languages, you get linkers. So the classic example of this is the C linker which would take a bunch of object files that are results of compiling individual C files in their headers, link them together into an executable. Now you've got something you can actually run. And then you get to run times. So how many people actually know, I'm sure you all know, about shared objects? Shared objects? So even if I give you an executable and I actually succeed in telling you precisely what's in that executable, you still have no idea whether you're safe when running it because in the run time, the dynamic loader takes that executable and unless it's statically linked, it'll take a set of shared objects and dynamically link them into the running executable. But even more so, there are entire languages that are predominantly dynamically loaded. When you are running anything in Java, every single class is being dynamically loaded by the class loader. It actually doesn't matter what the class file was that was present when I built my class file, as long as the class file that I loaded is interface compatible. And so run times become very crucial. And of course, the Python run time pulls in .pycs. Any folks play with JavaScript in anger? Things like Node.js are intrinsically dynamically loading JavaScript all the time. And then you get into interesting parts of the world with build tools because there's a lot of places in the ecosystem where we use code generators. Any go people in the room? How many of you have run go generate? Right, so often times, and this is true across many languages, Java has many code generators, lots of languages have code generators. There are even things that generate C code or CPP code that get used commonly in practice. And so a code generator takes some input. Maybe it's another piece of source code that it's transforming. Maybe it's something in a domain specific language, consumes that input and outputs a new source code file that goes on to be consumed by another build tool in the process. And this is a really fascinating one that most people don't think of as a build tool. So if you look inside most packaging systems, Debian files, RPM files, what you will discover is they contain an ostensibly pristine copy of the source code. And then a collection of patches that get applied at build time. And many people as a pattern for embedded work will likewise do this. Partially because we have tooling like Yakto that encourages it, they will get pristine source code and they will apply patches to it. Which means that you take source code, a source code file and a patch file as inputs and you output a source code file. So when we look at all these inputs and outputs across all the myriad of languages that we'd like to be able to understand. And the world is truly polyglot. If you use Python, odds are you don't realize some of the modules you're using are building C in the background. If you use Java, you probably don't realize that your JVM is actually built out of C++ and that if you're unfortunate. And if you use Docker, you probably don't even know how many languages are in that image. God help us all. And so if we're gonna try and reason about the world in a common way and we're not gonna have the Tower of Babel where we've got a million different ways that we reason about it, you have to ask about what are the commonalities? And the commonality is that all of these things, all of these software artifacts are arrays of bytes. Good? And so the next question you might have is how am I going to identify an artifact, right? What is the identifier for it? And that naturally leads to a question. How do I know when two artifacts are the same? If I have foo.c and bar.c, are we gonna decide they're not the same because they have different filenames? Filenames are pretty ephemeral. But a reliable way to compare artifacts is to say, are these two source code files byte for byte identical? Are these two executables byte for byte identical? And we'll come back towards the end as to why this particular choice of identity is relevant here rather than say the origin where the file come from. It's a different way of determining identity. Identity, providence and location can be very tricky when you come in with them. But you'd like there to be some unique artifact identifier you can use for equivalent artifacts. So if you look at artifacts, this is a non-exhaustive list of the kinds of things that we are talking about when we talk about artifacts, source code files, object files, shared objects, classes, jars, PYC files, any executable you care to name, debit RPM packages, Docker images, the list goes on and on and on and on because we live in a very complicated world. And so when we look at this, we're trying to figure out how we want to identify artifacts. It makes sense to ask because it helps us evaluate our choices. What characteristics do we want this identifier to have? And we maintain that there are three. You would like it to be canonical, which means that any two people in the world who pick up the same set of bytes independently without communicating with each other arrive at the same identifier. We'd like it to be unique because we don't want to have an identifier that points to multiple non-equivalent artifacts. And we'd like it to be immutable because if we change an artifact, we don't want to have the same ID referencing it. So if I were to use file names and I have foo.c and I edit foo.c, we would like those to not have the same identifier. So I definitely can't use file names as identifiers, which brings us around to non-solutions. So we talked about file names. And in addition to foo.c or bar.c being different, it really shouldn't make any difference if I build something in my home directory versus someone else building it in theirs. When you look at things like URLs or Perl, these are locators. They tell you where to find something. Where you find something is not its identity. Location and identity are different. It even says location in the name. Indeed. So what about the minimum elements for an SBOM? Like I said at the beginning, not sufficient to identify something uniquely. Do you know if those two kernel builds from Microsoft or Cisco are exactly the same bytes or not? You have no idea, given the minimum identifiers of an SBOM. And in fact, you don't actually really even know what they are. Because when you look at the kernel source code for a particular kernel version, there are about 50,000 source code files in there. And depending on what knobs you tweak, you typically use a single digit number of thousands of them. And so if I tell you there's a CVE in the kernel version 5.17.3, do you know if you have it? Do you know if it's in this one or that one? You really have no idea. Yeah, so this makes me sad. But it's already solved, it turns out. Yeah, so how to identify things that are a stream of bytes everywhere has been solved really, really well. And I love not reinventing things. Yes, how many of you have committed something this month? Okay, congratulations, you're already on board. So Git solves this really well. It computes an object ID for the contents of every file that it stores in the repository. It takes the contents of the file and it outputs a 20-byte hash. So, and what may be less known is that Git's more of a Merkle tree masquerading as a source code management system. It's an object store in a Merkle tree. We're not gonna call it a blockchain. Please don't. It's a Merkle tree. We don't need VCS3. So every leaf node is labeled with a cryptographic hash of a data block. And every non-leaf node, every directory, et cetera, has those hashes down the line so that if I give you the head of it as a commit, you can tell whether I'm lying about anything inside it. And so, we think that Git object IDs, which are easy to compute even outside of a Git repository are a suitable identifier for all software artifacts. In particular, because they are already being used to index much of what we do. Much of the software world, yeah. The next slide, yours or mine? I think mine. Okay. So, generalizing, right? So if I've got a build tool like a compiler, it takes cnh files and produces.o file, we can generalize it to think about it in terms of an input artifact, a set of input artifacts that get built with the build tool into an output artifact, just tying us back to the generalization. And we can describe that relationship with an input manifest, where the input manifest is simply a list of a prefix, you know, one for each line, blob, and then the Git object ID of that input, of that artifact in lexical order. Which, unsurprisingly, is the same file format as Git uses in its own Merkle-Trayon disk. Yes, and we would like this to be computed by the build tool because only the build tool really knows what the inputs are for a particular output that it wrote. So, we can identify the input manifest for any input manifest by just computing the Git object ID of it. And we refer to that as the input manifest identifier. And we're still working on the pronunciation of the acronym, and Imid, Imd. Yeah, we coined this acronym this morning, so it's still workshopping it. And we would like to embed this into the output artifacts. So, the goal is to have the build tools who know what it is, compute the input manifest, compute the identifier of the input manifest, and embed it in the output artifact. And pretty much all output artifacts have a place to put it. When you compile to an object file or an executable or a shared object, they're ELF files. You can have an ELF section that you insert the identifier into. In class files, you have annotations that you can use. In Docker containers, you have annotations you can use. Pretty much all up and down the line, even in source code, you can embed a comment line that contains the identifier. So, what if an input has an image ID already embedded in it, right? Say, I actually patched a file so it has an image ID and a comment in the source code file that I'm compiling. Well, in that case, we slightly augment the input manifest by adding bomb underscore and the input manifest ID to the single line record that we have for that input. And again, maintaining lexical order so that everyone is always going to compute input manifests the same way. And in this way, you can link things across different ecosystems and different languages without knowing how something else was built. So these input manifests taken together allow you to construct and describe what we call an artifact dependency graph. The artifact dependency graph, or ADG, is just a way of looking at the inputs and the resulting outputs and the outputs that are used as inputs all the way up the tree. And in this example, you can see where input manifest one captures information about both the artifacts two and three, but also information about the input manifest IDs and down the line. This forms a Merkle tree, which means that you have tamper resistance in the system. So to de-generalize and just to sort of bring you back to the concrete, we're talking about our artifact dependency graph. An example here might be a bunch of CNH files that are rolled up into or built into object files by a compiler. The compiler can then pick up the object files, see the embedded input manifest IDs, compute input manifest for the executable, embed the input manifest IDs into the executable for the linking stab, and so forth. Really simple examples just to wet your whistle. This would be the example of taking a C executable and dynamically linking with a shared object at runtime for a running executable. Or in the case of Java, Java files become class files, class files are loaded in the running executable at runtime. So as a whole, Omnibor is a minimalistic scheme for build tools to do these things, to build a compact input manifest that is composable into artifact dependency graphs up and down the chain. And by compact here, I mean it's about 1,000 the size of the S-bomb for the same artifacts. So Linux kernel full build S-bomb is close to a gig. The Omnibor graph is about a meg. Something in the neighborhood, yeah. Yeah, way more compact, more portable, more easy to search. To embed an identifier for that entire manifest directly into the generated artifact, this is orthogonal to signing. If you are also signing artifacts, remember, signing shows you that it was not tampered with in transit, this, assuming you trust the build tool, tells you what in fact is in it all the way up and down its dependency graph. And to do this in a language heterogeneous environment, it should work from C up through Python, up through Debian build, up through Docker images, with zero developer effort. And this is the key that got me really to push, to put my effort behind this project. My goal here is to enable this to happen transparently for open source projects with zero effort from volunteer maintainers across the ecosystem. So think of it this way, there are a lot of people running around turning to open source maintainers who are already stressed and saying, we need an S-bomb. And of course, even if the open source maintainer is not of the grumpy variety, the very first question is gonna be how exactly? And why? Like you're not paying me, I'm doing this in my spare time, generating this S-bomb before you does me no good, why am I gonna do it? And with what money? I don't have a CI system, I don't have a budget, I built this at home. And the goal here is for the how, the answer for the how to be, well, did you build? Yeah, it's just automatically there. That's what I want. Automatic for the people. Yeah, so there's a lot of cool stuff under construction in our community. We have a whole bunch of proof of concepts. These QR codes are just URLs up to our Git repose. There's also a bunch of recordings up on YouTube from our weekly meetings where folks have been demoing things like the LLVM Clang LD integration, which instruments an entire Linux kernel build, and then can be cross-referenced against known CBEs. Same thing for GCC and Binyutils, we have demos and Go and Rust. Bomb SH is a really cool one that actually uses, I think it's Petraise. It can use Petraise to instrument from the outside. One of the interesting things there is it can operate in a non-embedded mode and capture for an existing Debian build for a package you already have, if that package build is reproducible, you can capture the artifact dependency graph, and it matches the Debian package you already have. Other things that are sort of interesting is minutiae. Some of the things we discovered along the way is in the kernel you occasionally need to keep track of assembler. We already have support for assembler. So we've got broad support either coming up for a wide variety of languages in the community. A couple that I forgot to add here is we actually have some support coming up in Java and Python as well, that will actually be able to do things like answer questions like, what is the artifact dependency graph of my running JVM? And this can be crucial for something like log for shell, where the root of that CVE is in a single JD and I realm mumble mumble.java file. But just because you have that jar doesn't mean you loaded that class. And if I've got 2,000 places where I think I've got log for shell and I've got 50 where I know I'm loading the class, which one should I remediate first? So my hope with this is to be able to correlate CVE is when they're announced to the files that cause them like this log for J version, blah, blah, blah or open SSL.h version or hash, blah, blah, blah. Deep, deep in dependency graphs that are far deeper than we are required to disclose in Sbombs, whether it's for a product or an open source project and to enable that scanning to be done very efficiently, very effortlessly by response teams with much higher signal to noise. One of the big complaints I hear from data forensics incident response teams is gosh, we are drowning in Sbombs when there is a known issue. I get so many hits when I search my list of Sbombs and most of them are false matches. How many took high school chemistry? So the question is, what's the molarity of your system? How many moles of CVEs do you have? I believe that at a macro level, we as an industry, as a community all doing open source and security need to start thinking of vulnerabilities by their molarity and treating systems in very large bulk. Not looking for which one system is vulnerable today but out of the millions of systems in my infrastructure which subsets are affected by this specific thing, I need to pull those up to visibility with accuracy. So high signal to noise ratio is what I'm going for with this project. So how do I get from, I've got instrumented build tooling, it's generating omnibore graphs, how do I get from that to having an Sbomb because I also do still need Sbombs for federal compliance? Well, hmm, that was a weird cut. Metadata, how are you gonna fill that one? Well, I mean, effectively, it comes down to understanding what is an Sbomb really? And this was an observation by a colleague of mine that an Sbomb is a format for organizing metadata that describes the makeup of the software artifacts. It doesn't really point to the software artifacts per se. It tells you their name, tells you their license. Maybe who you got it from. Yeah, who to reach out to if you have trouble. And all this metadata is really important but it's a description. It's sort of like telling you how to get to my house by saying, okay, so go down the road until you see the big tree, turn left, there'll be a McDonald's at some point on your right, go three streets past that, turn right. And all that's actually super useful if you're really trying to get to my house but it's not my address. Or if you're a legal team who needs to know where a certain artifact came from during a lawsuit, which is what Sbombs were originally about was licenses, not so much security. So with this, what we're doing is separating the metadata from the actual graph of the artifact itself. So effectively the Sbombs itself ends up being metadata. So if I have an artifact, I can extract the input manifest ID. I can go look up its artifact dependency graph and know exactly what's in it. From that artifact dependency graph, I can go to a store of metadata. Some of that metadata may be relevant to Sbombs. Things like what component name inversion did the leaf source code come from? Some of it may be security related. Some of it may be performance related. Some of it may be compliance related. Whatever kinds of information you want to know about what's in your source code can be stored as a mapping between these artifact IDs and input manifests that describe the graph and the metadata that you need. And from there, you can then take that metadata and for example, generate an Sbomb. What about CVEs? Since my use case is getting better response when those happen. Well, if all of that is your metadata and you also have a graph and you know that this CVE comes from this specific artifact, well, you can scan really efficiently, find that and surface up that data and then go from the CVE to the Sbomb. Given an artifact with its input manifest ID you can look up in this ADG store. You can get the graph and look up in the metadata store and go, oh, that's also affected by a CVE. I think this was, did you switch it? Possibly. So effectively, it gives you the ability to know at fine granularity what you have. If you didn't build that file in, so let's say there's a bug in core utils. How many people know that core utils contains like 15 things? Right, so if I tell you there's CVE in core utils and you're only shipping one of the things in core utils you've got another false positive to add to the molarity list of the CVEs that you're going through that we're all drowning in. Or I can know exactly if that felt vulnerability exists. I could also go up graph and note that even though this source code file is the source of the CVE, up graph there's a patch that is known to contain the fix. And therefore we know that we have this fixed, that we're not actually vulnerable in spite of looking like we might be. So we all really want to know am I safe from a CVE or am I safe from a known vulnerable package? There's a slide I was looking for. Given that CVE, the next step will be annotating CVEs with the files that we know cost them. We can often do that by bisecting where they came into a project. And we actually already have tooling in the project that will go do this for you and spit out that mapping. And then given that CVE and that mapping you can then look up in a metadata store and scan all the known artifact dependency graphs from your system. You've ingested into your systems and go which of my artifacts that I'm using right now are correlated. Find a list of vulnerable artifacts that you are running and then go look closer. These are some reason why they might not be affected. Maybe they're not exposed to the network in a way that that CVE becomes relevant. Who knows, do you have a really high correlation here that your response teams can then respond to efficiently? So we'd love more folks getting involved and helping out in all kinds of ways, including more POCs for more languages. We'd love more help in different build ecosystems than we've been touching so far. I think we have a few minutes for questions. You? Yep, you. So I have a few questions. So let's say I have the artifact ID. Is there some open database where I can query it and see what it corresponds to? Or assumption is that every company will have internal database and how I can convert the artifact ID to... There is not... This mic is much louder than that, Mike. There is not yet a public instance or open database of this, but I'm sure some interested company or companies or foundations might run one in the future. Okay, and... So really quickly, we have done some work on this. We have some work that hasn't yet quite made it out to the Git repo that someone in the community did where they indexed the entire Ubuntu repository in order to grab the mappings between Git object IDs for source code files and component name version kinds of stuff. And it turns out that that's remarkably small, like order of 400 meg of JSON, like something you could load into memory. And so I expect that we will see the emergence of these public databases, but I also expect that you'll see internal databases develop as well, because you also end up with the same indexing behavior for the proprietary source code that you built into things, even though no one outside your company knows what that identifier means. And even if you are a provider of commercial clothes for a software, you could publish the artifact dependency graphs publicly without disclosing any of your secret sauce. If at some point in the future, a vulnerability were found in a component you're using, say again, I'll use OpenSSL as an example. Well, it's really not that big a deal for your customers to then map that hash from the CVE to the dependency graph that you gave them under NDA with your product and go, oh, hang on, I should call my supplier up because it looks like I need a patch. Okay, I have a few more questions. Other questions? I thought I saw a hand, but it didn't. Okay. So how does adoption and tooling look like today? You mentioned a lot of use cases, like detecting CVE stuff like that. Do you already have the tooling for those and how widespreadly is it used today? Like if we start using it in our company today, can we just start replacing the existing tools or it's still gonna take some time? So much of the code that we have right now is proof of concept code. There have been talks given to the LLVM community about this and that were received quite well. The goal is of course to upstream this. So it simply pops into your favorite compiler. So the question becomes not one of what do we do to adopt it? It's a, oh, it's here. And the goal really is to get upstream for most of these. Now, this is not always gonna be the answer for all the problems. We mentioned some cases with reproducing Debian bills that already exist, because there's a lot of brownfield in the world. But that is the direction that we're going as a community. A lot of what we're doing is prep work to move upstream. And so really the help, the call for action here is if you're involved in any of these language or compiler or build tool communities, we'd love your help adding support for this artifact generation to those build tools so that more companies can just get the benefit and more developers can just get the benefit from it without any effort on their part. Thank you. What kind of hash, hash sum are you using? Like Shah 1? We're, we support the same format as Git. So Shah 1, Shah 2, 56 optional flip in there. And you can have both in the same artifacts. Yeah, so for those who aren't familiar, Git object IDs are just a header that gives you the type of thing it is, a blob and the size of it and a null character and then the contents. And then Git will compute either a Shah 1 or their support now for Shah 2, 56. It's very, very narrowly deployed. We actually support both in parallel so that we're future proof and so that anyone who would like to wind at us about shatter.io can basically be told that yes, we also have Shah 2, 56 things in play. Yeah, because my concern was that the Shah 1 forget it works because that Shah is local to the repository but in your case, that will be the global identifier across many, many repositories. It's not that it works not because it's local to the repository when you're talking about blobs. No, because like there are already known collisions between different repositories with the same hash. So if you're going to use that Shah, it already may map to different, there's a collision across the repository. Let's talk afterwards. I would love to hear more about the collisions. The last article I saw that GitHub published indicated they had seen no collisions in the wild. I was reading a blog post that there was a collision. I can look it up. I would very much like to know if that's true but as of about a year ago, when GitHub was publishing on this, they were basically saying in their total store, they have never seen a collision in the wild. Okay. The blog post that I saw that might be the one you're referencing might not did point out that it is possible to artificially craft collisions but in the wilds in most file formats, they don't happen. I think we are, are we at time? Do we have time for one more question? I have one more if. Okay, no one else has come in the door. So I'll say one more question. Yeah, I will go. So the diagram you showed for the input manifest, there is object file, header file, they go into the build tool and the .o is in the input you have C file, header file and the .o file on the output but the output also influenced by the build tool itself and the environment of that build tool. So I guess for the build tool, you can capture the my idea of the build tool itself but how do you capture the environment? So you're fun. This is an ongoing question. This is a very astute question and an ongoing discussion in our spec design community to figure out the best way to do that. Okay, so it's still, yeah, it's still open. So one thing that's been discussed is the build tool itself is literally the authority on what the inputs were. There has been some discussion of having a document that allows the build to identify itself. This is my name. And optionally, its own input manifest ID, if it has one, as a node that it puts in the tree. So C files, H files and the tool descriptor file. And then also to capture what it considers to be its non-ephemeral configuration information for that build step. And by non-ephemeral, I mean, there are certain things that really don't matter. So the dash capital I flag in a C compiler, for example, is relatively immaterial with the header files are in my home directory versus your home directory. We don't want that perturbing the build. But questions like what macros are set are very, very important. And that may also be something a build tool could choose to capture in such a node. But for the purpose of reproducibility of builds, if the build tools own input manifest is encoded in the output, and you and I use different build tools that would otherwise produce the same outputs we've now broken reproducibility. I can no longer verify your build. So this is a complicated edge. And like I said, we're still discussing how best to do this because there are seem to be trade-offs in both directions. Thank you. An edge with no small amount of philosophy involved.