 Yes, I am hi. Sorry just getting getting off mute getting the video set up. So Hey everyone, so thank you David and thanks everyone for joining this morning here I will get these slides up so you can all see them. There was a question in the chat also about Availability of slides and yes my slides should be available for download on the event website So bear with me just one moment And there we go All right, so as David said, I'm going to be talking about generating software builds of materials for embedded devices and IOT at build time And in the I David I really liked that supply chain integrity map slide that you showed showing the different stages of the software development process and in that map that you were showing this fits into the really the I think it was the third section of that process the build and verification stage What this is about is going to be about Having that ingredients list of the software that you're building and specifically Generating that ingredients list at build time so that you have it available and to distribute along with your software Right when you build your software. So the specific context that we're going to be talking about is for a project called Zephyr and Zephyr is a Linux foundation Project that we support It's a real-time operating system for embedded and resource constrained devices It's written mostly in C and it's built on top of a couple different existing tools for its build infrastructure So it heavily uses C make to manage builds to generate its It's make files or it's build files and then it Uses a tool that the Zephyr community is built called West which is a meta tool for Running the builds as well as running other actions such as taking the generated binary and deploying it onto devices so And at a high level the build process here involves Compiling the source files from the Zephyr Zephyr operating system itself Then also compiling the source files for your own application that you're running on top of that operating system Statically linking them together into one binary blob and then deploying that binary blob onto a device and it's it's that process that The fact that instead of having a full distribution of software We're generating a single binary blob that makes it particularly helpful to have a software bill of materials because it's That much harder to go back and figure out. What's in that binary rather than being able to do that after the fact So we'll talk about that more as we go on And I'm not I'm not going to spend a lot of time on this but just to help orient for folks who aren't familiar with Zephyr I'm going to give a very high level Just understanding of the build process which should look familiar to anyone who's who's done similar development But basically on the upper left-hand side here for the slide There are source files that comprise the Zephyr operating system itself So a variety of C source files headers that get included On the upper right-hand side We've got your own application that you're building to run on top of Zephyr again sources headers that getting included Some of those headers coming from your own application you've written some coming from Zephyr Some coming from a Zephyr SDK if you're using that and then really everything at the bottom is what gets generated as build products So the libraries that get built and then that get then get linked together into that final binary blob And we'll come back to this slide later on as I get to some of the specifics But basically you can kind of group those together as I was saying that on the upper half You've got the different sources which can split in different logical groups and then at the bottom half You've got the build products and the build results of that whole process Coming down finally to that binary blob And I see that there are a number of questions coming through in the chat Please Jillian if you can jump in if there are any questions that I should answer as I'm going through that would be great Otherwise there should be time for Q&A at the end so All right, and then the other just another view and we'll also come back to this But another view on the build process for Zephyr is because it uses CMake CMake defines a variety of targets that are the different built the different stages in the build process And some of those are kind of utility stages Some of them are taking actual sources building them into Library files or build products that then go on to the next stage again coming down to that final That final binary at the end. So The goal for all of this I think as David was saying earlier was to be able to shine Shine some sunlight into what's in that binary that you're getting at the end of the day So Being able to tell easily after the fact what pieces went into that binary what specific source files Did we use that went into that binary blob? because as someone is do as Zephyr developer is doing development there of course changing the sources they might be mod because it's open source They might be modifying the operating system files cells. They'll certainly be modifying their own application They'll also be updating the the you know pulling new versions of the Zephyr sources and so on so the Source files that they're building from are going to be constantly changing and Being able to tell after the fact, you know six months later or two or five years later Which source files which specific versions of source files went into a particularly by a particular binary blob is challenging The fact that it's a single binary also makes that post build analysis challenging because you know for other develop other developments of software other Systems that are not for kind of embedded resource constraint devices You might not have a single binary blob. You might have something more like a file system or a system that's built on top of Ecosystem like npm or pi pi or something where there are Kind of a setup for dependencies and for versioning dependencies, but it's really what you're interested in here It's specific source files that are going into a single binary at the end so that post build analysis can be challenging And one one thought here is certainly you could and you probably should keep the source files around for anything that you're That you're building and certainly anything you're distributing to customers or to end users You know, certainly it makes sense to keep an archive of those specific source files around The challenge with just saying that you'll do you'll do that and that takes care of everything is that First off, you'd have to you'd need to be trusting that those source hadn't been modified post build That the ones you've archived are the exact versions you use to build that binary And then second the sources are going to be a little hard to interchange or review Even if you are distributing your sources to customers It's still you know harder to be able to quickly tell is this the is this the version of the sources that were vulnerable or You know, it's harder to review that than it is to review just a hash that's in metadata. So So the goal here is really that we want to create a software build materials a record of that metadata for the build process For Zephyr applications at build time and we want to have that that sbomb express metadata about file hashes I express metadata about licenses and then about the relationships Between the sources the intermediate build artifacts and that final binary And from my side, I think it Yeah, I'm so I'm an attorney with the LF I'm also kind of a decent development on the side But my interest is particularly on the licensing side of it How do we track which what were the open source licenses for their source files? How did those flow through the binaries but as part of building that and gathering that data we're also gathering data about these other details about the hashes the relationships and so on and That's going to be particularly of interest to the security side of things Shining the transparency into which versions were shared. So So the particular goals here are to first off do this at build time So not doing it after the fact Second to make it fully automated. So to make it something that's as low as low paying for Zephyr Zephyr application developer as possible. So having it be as simple as possible for them to To generate an SBOM without having to use third-party tools take a take any significant additional steps in their build process Third we want to try to do it without leveraging external knowledge sources And this is something where, you know, there are a variety of tools out there both proprietary and open source that can Can generate SBOMs can Pull in more information about what you're building and those are those are fantastic tools So those are things that should be used but here a part of what my goal was was to Keep it as limited as possible so that it's something that is just occurring within the Zephyr build process itself And then finally kind of touched on this before but not rewriting existing build systems Just leveraging what's already there having to be an additional extension that someone can use if they want to But that they don't have to kind of muck up their own build process in order to do it Um, and I touched on this last point already just that uh with Zephyr because of the nature of the project It's something where we're really not focused on third-party dependencies such as javascript Uh, or or python modules or pulling in it's really focused on the zephyr source code itself And someone's application that they're building on top of that on top of that or less Uh, so the format that used for the SBOM is spdx which David touched on earlier And as for those of you who aren't familiar with it spdx is a language that for expressing Uh software composition metadata including components licenses and so on Um, it's currently in the iso publication process. Um, and I think you're moving towards iso full Uh, full deployment as an iso standard. So um So the the simple way to do this and kind of the the starting point approach the version kind of version zero approach for this is to Just run the build Scan all of the source all of the source files and all of the build products that get generated scan them for Hashes scan them for licenses if there's license information in there Um, and then assemble that data into a tag value spdx document Um, so and this is something that you can do pretty easily leveraging other existing tools from spdx Um, and I've got a link here to an example where you can do this So this is kind of the the simpler the naive approach Um, and this is good. This is something that can be easily You know somebody if they want to use one of these third one of these external tools They can pretty easily do this and add it into their build process um The drawbacks to this So one is that it is an additional step to take after the build and it is something that's using separate tooling outside the build process Um, the other drawback, which really I kind of gotten some more details here But the other drawback is that using an external tool it's likely not going to know a lot about the specifics of A project's build process and so it may you know, it may take some additional effort to distinguish between the sources and build artifacts It also might not provide you with very much detail about which sources are built into which binaries or which What are the different stages of the build process? It's more just a snapshot of everything that came out at the end of the day So this can be a good starting point But it's what I think what we were looking to do was to have something that would go a step further and provide some more detailed Knowledge about the that build process and the relationships between the stages. So um What we decided to do was to leverage metadata that cmake provides and because zephyr Already uses cmake as it's as part of its build infrastructure Uh, what we did here was to look at how can we leverage what cmake already gives us to? You know to take that take metadata it gives us and reformulate it And express it as spdx in an sfam format And so cmake has this really useful functionality They call their file-based api which essentially what it boils down to is before running cmake if you create a an empty file at a at a well-defined Location this api v1 query directory path to create that that path and that empty file and then run cmake as usual Then in addition to generating all of the build the build artifacts and the build Files that cmake produces it also outputs metadata in json format about what cmake is doing and about what the different build sources and build artifacts are going to be And so it outputs it as json it gives data about the different targets that are the build stages and it gives Data and then provides other json files that and I realize this might be hard to see here But basically other other json files about what sources are being used. What are the different artifacts or sorry command line arguments they'll be sent to Compile their sources and then what are the libraries that the sources are being built into so all of that cmake just for free gives you as json if you configure it to use this api So what we did for zephyr and for the west tool that the zephyr community is built is essentially just to add on an extension to that to say So steps two and three here are what anybody would do if they're using west to build the zephyr products They'll they'll make a call to west. I think it's west build that will run cmake cmake generates the build files and then zephyr and then It'll use those build files to generate the zephyr artifacts just as usual Essentially, what we did was just to add on an extension called west spdx That enables that cmake to to generate those json files And then after the build is done Takes then processes those json files and does some other analysis to generate the spdx files And just in the interest of time, i'm not going to go too much into the details about this But essentially what the west spdx process is doing is parsing those cmake There's json metadata files that we get from cmake Looking at what sources and what build artifacts it It says that it's going to be that it's building Analyzes them it optionally does some dry runs of compiling With those particular command line flags to get more data about what which particular header files get included And then it Essentially processes all into spdx data So it generates an spdx id for each package that's built each source file and each binary file that's being processed Along the way it scans those for the spdx license identifier statement So it because the zephyr community has added those into most of their Most of their source files we can get information about the licensing for those files essentially for free And then at the end it's creating spdx documents to express that metadata So going back to this slide real quick. Um, you remember this is where we kind of broke out the different source files into Uh logical components for the zephyr operating system the zephyr sdk if you're using That in your own application and then at the bottom of the build the build artifacts So each of those as as essentially separate separate blocks Uh, what we're doing then is generating an spdx document for each one And using the functionality that's built into spdx into the spdx language to link them together So, um, just a an example of that The way that this is done in in spdx is what they call relationships. So in the Zephyr spdx file, which is about the which contains the data about the zephyr sources For each source file there we're creating which so the file itself is shown on the right We're creating in the middle there a uh an spdx metadata record for that file Giving in an spdx id and including it in that zephyr spdx document And then on the bottom in the build artifacts spdx document Where we want to indicate that a particular build artifact is generated from one or more source files We can now link out by reference and create a relationship to express that it is generated from that source file And in spdx i'm not going to i'm not going to spend time on uh the specifics here But if you want to take a look at the slides this shows some details about how the specifics of the spdx tag value format And how to express that how to have a in the build artifacts spdx document Make a reference out to the external document for the zephyr sources Define an id for it and then have relationships that uh that reference that externally defined Document and reference metadata from that externally defined spdx document so Um, ultimately what this gives us is it gives us automatic generation of s bombs In spdx format in tag value format and which express metadata about what we talked about earlier So which source files are used to build which binaries the hashes for those source files and for the binaries so that we know Exactly which uh, which source files were used And then additionally information about the licenses for those source files because we can get that from The what the zephyr community has added in as those spdx license identifier statements so Areas from Provement here part of what we're looking to do is what i'm hoping to do is to align it More closely with the nta spom recommended fields to make sure that we're covering all of those appropriately Looking to understand and cover a broader set of the community build cases Because what what we've done here is really focused on I think this sort of default process of using Using west using the zephyr sdk To generate and kind of assuming that those are in place which they will be from many developers But looking to understand and see if we can cover a broader set of use cases where someone might not be using just those Built-in tools and then also looking to extend the licensing related functionality. So gathering other information like copyright notices and so on And what I'd say just is the takeaways from this, you know been focused heavily on zephyr specifically and iot and embedded the kind of the embedded context generally but I think kind of what we learned from this is that s bombs really can be generated as part of a build system for any sort of software development Doing so doing it and generating spdx s bombs at build time it can be ideally it can be minimal effort for The developers using that platform and to generate these s bombs and provide and make them available to downstream consumers of of a product It can be done using using and leveraging existing build infrastructure and particularly for free and open source projects What I'd encourage is to look at building them into the projects build infrastructure itself and doing that at the project level Enables all of the downstream repackagers or application developers or so on to essentially for free or for very little cost to be able to generate spdx s bombs for their own deployments in their own products and services and Developments on top of the project and doing that really improves the entire ecosystem because it makes it possible to easily generate this metadata and archive it and distribute it in a way that Kind of the end users can ultimately consume Particularly for embedded devices build time. I think really is the right time to gather this metadata because When you're when you're actually building the binary is the point in time where you know the most about what's going into it and generating and archiving that metadata Really gives both you and your downstream end users and consumers knowledge about that that information that is going to be extremely hard to try to Retrofit and gather after the fact um, I think one thing from this the kind of learns is that for spdx documents that linking process between multiple documents and the Relationship syntax can be very expressive and is is both expressive and flexible so you can express a lot of different Different facts about the build process as part of that and then finally, I think I'd say just I'd encourage projects or or Whether it's open source or proprietary anyone who's looking to Integrate a process like this into their own build infrastructure I would encourage starting small and improving over time So start with kind of the simplest possible approach that you can to create an sbom As part of your build system and then add more on to it over time You know add get more expressive about relationships between different elements add in more and for more information about dependencies Or other elements and grow it over time But really starting small with something that just generates a basic sbom can be the starting point so Uh, and with that, um, I guess we've got a bit of time for q&a. Um, so I will take a look at david Feel free to jump in with questions or I'll also take a look at the chat and see what questions we've gotten here Yeah, take a look at the chat in particular We've got a question It's been hanging for a little bit. Does the build spdx capture the sbom for the building for the tooling and build environment In the particular case that you were talking about obviously it could but the question is in your particular case Yeah, it's a it's a good question. I think it were we're at the early stages of doing that So it captures a little bit when it comes to let me go back here when it comes So the stk for instance, if you're use if someone's using the Zephyr stk then there's an optional flag that'll capture information about headers that are being included from the stk Um, at this point, it's not doing more than that. It's not going into. Okay. What are the particular? What's the particular version of the stk that you're using because that also encapsulates the various tools the compiler the linker the other parts that you're using for your particular build So my hope is to extend it to also capture that information and that's something that I think with relatively easily could be added in um Particularly where someone is using the zephyr stk to record what version what version of the tools and so on but at the moment No, it's not we're really focused primarily just at the sources and the headers that are being included Yeah, keep going through the chat. I think we've got some other interesting questions Yeah, so uh, so let's see. So question here about how the mechanism will cope when external libraries are included in build systems Yeah, so that's something that I I haven't focused on as much here for zephyr, but it's something that in many contexts You know, it's going to be highly relevant to people's build infrastructure because particularly when you're Using something where you're pulling in third-party dependencies whether, you know, whether from external libraries or from a third-party packaging such as npm or Or pi pi or you know, whatever your whatever your particular ecosystem is that's going to be highly relevant It's pulling in that metadata and I think some of the what we'll hear um probably from I think from Thomas or Thomas in the next session and then from about some of the other tools later today We'll probably focus more on that sort of situation where you're pulling in information about to pet third-party libraries and dependencies that get included Um, so I see there's a question about Let's see about uh included headers as an optional step Um, you know how important that is for tracking header files in the SBOM I think that's probably going to depend on context I'm going to depend on the particular developer whether or not you're going to see that as important From my view, I think it's important to include if you can because it's just more information It's more that you know about the build um people, you know a particular application developer might decide that they want to Keep us, you know, generate a smaller SBOM. They don't want to cover that sort of information That's going to be a trade-off they'll make but um having the functionality there and having it optional is something that at least Has it available if someone chooses to Okay, and any other uh Let's see. So another question that just came in and and David feel free to cut me off when we're uh when we're getting close on time I know we've got a couple more minutes and I want carry on So let's see. So there's one that just came in about um the architecture the linking flow, but um When interacting with product teams, they don't maintain this high level low level architecture diagrams How did so how do you deal with scenarios or convention different teams? That's important to emphasize that for binary analysis so I think here I I Maybe take a step up and go back to the slide that david had showed original in his in his presentation about the different stages of development and going from the individual developers to the build and release teams to the um You know other other other stages and other teams that within an organization are going to be involved with building software and I think A couple pieces of that one is going to be that this process that i'm talking about here I've been thinking about it and talking about it largely from the perspective of An organization that's developing a product or a service then deploying it to end users But I think even within an organization the end users can be the next step in the chain. It can be the next Uh, you know the going from the product developers to the build and release team or so on And so I think having it be part of the build system to be able to generate these these sort of files and produce S bombs You know the next step in the chain could then be also consuming s bombs and making use of it But to your point about convincing the product teams about this being important I think that's something that ties into the broader, you know the broader messaging of what we're talking about today and uh of just the importance of Understanding the ingredients for your software and then having the tooling be smart enough to make decisions about that So that it's something that can be more and more automated If I may jump in steve still automate automate automate If if if it requires that people take special steps every time Uh, that's a lost cause and and frankly rather insane. I think uh, you know what if it's part of the build process You know, they build and things happen And I think you know, I noticed that one of the last words steve mentioned was automate And uh, I see there's and and just uh, yeah echoing that that was the really the goal of this was to have it be As low touch for zephyr developers as possible So it can be as simple as making a a call to west west spdx at the same time You do west build and that it's nothing more than that that there's no external tools to deploy or call separately or anything like that One comment in here. I see a question about spdx being focused on Artos and binary does support container image supply chain Yeah, I'll say that if if this presentation is the only thing you're you've seen or the first thing you're seeing about spdx You might get the impression that it's focused on artos and it really isn't it's a It's a broader language for expressing metadata about software composition generally and it can cover A variety of use cases. There's a lot of folks who are looking at it for containers And I think you'll we'll probably I think we'll be hearing more about that later today But yeah, certainly it's much broader than uh embedded and real-time operating systems There's a lot that you can cover and a lot of use cases that aren't focused on that at all so um All right, so I think we're we're a few minutes ahead So I think let me go ahead and I think david unless you unless you have any other questions I'd say maybe let's give the time back and stay a bit ahead Yeah, I think that would be wise. Uh, so uh, so steve thank