 So, welcome back everyone. Although I said before the break we'd start a little bit early, we decided to just go ahead and start on time so that those just joining us at the published time wouldn't miss out. All right, so with that, I would like to introduce Joshua Watt. He's going to talk about software supply chain with the Yachto project. Please take it away. All right, hopefully, you can all see that. There's a black dot spot in the middle. Is that, did it move? There's a right hand side that moved, but there's a square in the middle that we can't see through. Okay, sorry. Wip. Did it go away? It went away. Everything is great. Thank you so much. Oh, okay. Sorry about that. I didn't get a chance to try it out beforehand. All right, so my name is Joshua Watt and today I'm going to talk to you about the software supply chain with the Yachto project. A little bit about myself. I've been working at Garmin since about 2009 and at least in my department, we've been using open embedded in the Yachto project since 2016. I am a member of the open embedded technical steering committee and if you would like to contact me, there's a whole bunch of ways that you can do that. So if you are unfamiliar with either the Yachto project or open embedded, there's sort of two organizations that work together to help you build your own Linux distributions. It's usually the common thing that's used to describe them primarily, but not exclusively focused at embedded projects. So open embedded is a community driven project that provides a lot of the core technologies such as open embedded core and the build system that's used called BitBake. And the Yachto project is a Linux foundation project that provides a lot of resources to ensure the project keeps running smoothly. They run a lot of the QA tests, do some of the really scheduling and provide documentation and things like that. So why is the software supply chain important? I think we all kind of have a pretty good idea of why we want to do this. But, you know, at the end of the day, we're using these big binary blobs or distributing them to our customers. And, you know, we really want to know what's in them, right? Like it can be a little opaque sometimes or frequently. And so we want to know, like, what's actually in this big binary blob that we're using or shipping to our customers, you know, and what software is in there? Like, where did that software come from? What version is that software that's in this thing? And then it's also very important that we know if we're complying with all of the software licenses that might be at play in this thing that we're shipping or using. And we also really want to know if it's been tampered with. So it's really important to know that the software hasn't been changed either maliciously or unintentionally in a way that would cause it to misbehave or do something that it's not supposed to do. And then we also want to make sure that we're not vulnerable to exploits, right? That we're not exposing ourselves or our customers to risks, basically. And so I'm going to address a whole bunch of ways that we address the supply chain, the software supply chain within the projects. So I'm going to talk about the general open embedded build flow that's driven by BitBake. I'm going to talk about software build material support, which we are working very hard to get added to the project very soon. I'm going to talk about reproducible builds, CVE checking, and then finally something called build tools. So yeah, so we'll start with the open embedded build flow here. So basically the point of building up with open embedded is to build images, what we call target images from source code. So the way this generally works is you've got this whole pile of source code and you feed it into this magic tool called BitBake, and it spits out this thing that we call a target image. And so, you know, most users of the project, this is kind of how they interact with it. They provide source code and some metadata and they feed it into this tool, and it spits out this thing that we call a target image. But when we say target image, that's not exclusively the thing that you would flash to an SD card and run on something like a Raspberry Pi, although we do support probably thousands of devices, embedded devices like that, where you can do that type of thing. It could also be like firmware for a microcontroller. I believe we've been support like building up Zephyr project images and things like that. So it could be something that you flash or run in a microcontroller. It could be something that you could boot on even a desktop PC if you were to install it on a hard drive. There's also a ton of, not a ton, but there's several like virtual machine image formats that are supported. So you can boot up images into QEMU to try out things or just run things in an emulated environment. I mean, you can even produce like OCI compliant container images if you want to do that type of thing. And you could run those in Docker or Podman or whatever container runtime image you're interested in. And it could be something that you wouldn't even traditionally think of as an image. So the project supports three different packaging formats, IBK, WN and RPM. And so your target image could actually be a package feed for all these packages that you've built, and you could point a device at that package feed and either update software or install new packages or whatever. And then there's also a whole bunch of internal things that the project can kind of build. So there's SDKs that it can build that allow you to compile software against a given image that you've previously produced. There's the extensive SDK, which kind of lets you do even more in addition to the SDK and like extend images and things like that. And then there's the build tools tarball, which I will talk a lot more about later. All right, so I'm going to kind of give you an overview of a highly simplified build flow that kind of gets across the general way that stuff is built up within the project. So what we start with here is we've got our host tools out here on the left. And so that's basically the minimum set of tools that are required by the project in order to compile stuff for your local host. This is going to be, you know, your local GCC, your local bin utils, maybe a couple other utilities and Python basically to run BitBake itself. And then up here, we've got our source code that we want to compile. And we've got these files that have metadata about that source code that's going to tell us like how to download it, how to compile it, how to package it and things like that. So what BitBake is going to do is it's going to take these host tools and it's going to use them to process some source code and some metadata and produce what we call native tools and cross-compilers. And so these are tools that are designed to run on your host system but might target something else, whatever your target image might be. So these would run on, for example, your desktop PC, but maybe target arm or something like that. And the important thing here is that we're producing a lot of these native tools that we're going to need later in the build so that they don't have to be part of your host tools. So we don't require necessarily a specific version of GCC to be installed on your host system, at least for the cross-compiler, so that we're building it ourselves so that we can control what version it is and all of these things. So then we're going to take those native tools and cross-compiler and we're going to process some more source code and recipe metadata and that's going to produce target packages. So these are things designed to run on whatever your target is, if that's QEMU or your container or whatever your container host is or whatever. And then finally, we have yet more recipe metadata and this actually describes how images are created. So this is going to pull in all of these target packages that you previously created and produce whatever your image is, your container image, your SD card that you want to flash to run on your Raspberry Pi or it could just be the index file for your package feeds, right? And so this is kind of how the general build flow works for this. Now the way that BitBake tracks the dependencies between all these things is through a sophisticated method of hashing. And so the way this works is all the things that feed into these metadata components are all hashed together into a single hash that represents that particular metadata. So for example, you'll have the hash of the source code with a SHA one of the source code is going to feed into the recipe metadata. And then all the recipe metadata itself is going to get hashed together and produce a single hash here. And then likewise for this target recipe here, the source code for that's going to get hashed in, but also any upstream recipes that it depends on, it's going to pull in the hash of those to produce its own hash, right? And so this kind of creates this chain of hashing, right? Where each recipe kind of depends on the hash of the previous recipe. And so just as an example, if the source code here for this recipe changes, that's going to trigger all of these things to get rebuilt because all of these hashes changes. So the native tool or cross compiler, whatever this recipe is describing is going to change because this recipe's hash changed. The target recipe here is going to have a new hash that's going to rebuild the target packages. And that's going to cause the final image to be rebuilt. And so because we have this rather sophisticated method of hashing and tracking all of this stuff, we actually have pretty good traceability just from the way the build system works, the way that BitBake operates, because we can kind of take this target image and trace back through all of these hashes to the actual recipe metadata that generated it and even back to the source code that we built as part of that recipe. So even before we get talking about like S-bombs and things like that, we're already kind of baking in this concept of tracing things back to where they came from within the build system itself. All right, so moving on to the software build materials. So what is an S-bomb? So we've been talking a lot about this. So I really like this diagram. So this has this application here at the end and we basically want to know like, what are all the things that fed into that and what do we know about them and what do we not know about them and how are they related to each other? And so the reason that I like this diagram in particular is because it reminds me a lot of this diagram. And so I think it should be kind of apparent that we already have a lot of this information that needs to go into an S-bomb because we have all this recipe metadata which is actually, we have a lot of metadata about the recipes, right? Like we know a lot about them. And so for the most part, it's just a matter of taking this metadata that we have and putting it into a format that can be consumed, right? So for example, we already know the versions of source code and where we downloaded it from because as part of the build process we are downloading it from some URL. We already have all the declared licenses because we track that as part of our metadata for some license auditing stuff that we have. We already know the build time dependencies. We had to know them in order to build the software in the first place. We know the runtime dependencies. Those have to be correctly enumerated in the metadata in order for the target image to be correctly assembled so that everything works together as it's supposed to at runtime. We do CVE tracking, which I can talk about later. So we already know which CVEs there are, which ones have been patched and things like that. We know all the source files because we downloaded them and we know what files are in each package and a whole bunch of other stuff that we just track in the metadata. So our general strategy for this has been to basically create this SPDX document every time we process this recipe metadata. So when we process these native tools we generate an SPDX document for it. When we process this target package we generate this SPDX document and same thing for the target image. And then when we're all done we take all of these SPDX documents that we've created and we smash them together into one big tarball archive that kind of describes this link to this image. So this is the SPDX archive that describes what's in this image. And you can distribute that with the image or do whatever you would like to do with it after that. And so because we have all these SPDX documents we need to kind of define the relationships between them. And we already know a lot of these relationships or all of these relationships just because of the way that things are structured within the build. So the thing that we start with over here is we have this image index JSON file. And this is an SPDX format. We just use it to kind of describe what's all in that archive. So you don't necessarily have to go through and parse every single SPDX document to find something. And so this kind of indexes all these other SPDX files that are gonna be in here. And then we have an SPDX file for every package that was installed on the target image. And so this package SPDX is gonna describe basically the files that are on the root file system. And then it's also going to say, it contains all those files with an SPDX relationship. And it's also gonna have an SPDX relationship to the recipe that created it because you can have multiple packages come be generated from a single recipe. And then something else that we do is we actually parse the ELF debug data and generate relationships between the packages and other recipes that provide those debug sources. And I'll show you an example of what that looks like here in just a second. And then finally, we've got the recipe SPDX which kind of describes the source code, how the source code was built when we process the recipe. And so from those, we can put in these build dependency of relationships on all of the build time dependencies of the recipe SPDX. All right, so I'll just show you a really quick example here. Hopefully you can all still see this of what this, this is just the index JSON file. It's pretty basic. It just lists the document namespace, the file name for everything in the archive and it's SHA-1. And this is basically just so that you don't have to parse through every single file to pull this information out because we found that was kind of annoying. So we've got, this is for example, the base files package that got installed on the system, on the target image. And then I will try to find it. There's a lot of them. All the kernel modules and all that. And then down here we have the recipes. Right, so this is the SPDX document that describes a recipe. So there's like curl and there's a DB native. So it's important to note that we're not only including the SPDX documents for the packages that actually got installed on the target and the recipes that built those packages, but also the native recipes that were used and provided tools that we use to build, right? So we have a fairly rich supply chain by generating, by including those native recipes also. So we can look at the native SPDX, sorry, the package SPDX document is a lot of data, I know. So we've got our external document references up here. Here's all the files that were included in the package. So these would be the files that are actually installed on the root file system. You know, we've got the declared license in there and the version information. And then here's the relationships, right? So this is basically saying this package, this ATTR package was generated from the recipe SPDX, the recipe package that's provided in the recipe SPDX file. And then it's also got all of the files here. And then finally, so this is something that we pulled out from the debug information right here, right? So in the debug information for this ATTR program, it had get opt-core.h. So we actually add in the dependency saying that this ATTR program was generated from the GDLibC recipe, right? So we can actually track all of those things too. And there's a lot of them in there that we can do. And then we'll look at the recipe SPDX file really quickly here. It looks pretty similar. It's got a lot of more external references because it is pulling in a lot more things. So here's all the source code files that got pulled into the recipe. And then down here, sorry, I know this is a lot of scrolling. Yeah, so there's the actual package itself. You can see we're providing the CPE index for the thing if you want to cross-reference what CBEs are in there. And there's the homepage and the declared license again. And then, yes, this is all the source file relationships for the contains. And then down here, right, we have build dependency of, right? So when we built this particular recipe, we depended on Bison native. So that was the native Bison tool that we built ourselves. And so we add that build dependency in here as to say build dependency of. So we have a very rich set of relationships that we can describe with our SBOM stuff. So it's designed to be very easy to do this and automatic. So you basically just add this one line into your project configuration and it'll spit out SPDX documents. We do have a few knobs to control the amount of data generated and how long it takes to generate it because it is a ton of data. So for example, the target image that I produced was for QE and Mu and it was about 40 megabytes and the SPDX compressed that I produced was also about 40 megabytes. So we're generating about as much SPDX data as like the actual image. And we're not necessarily trying to be all things to all people. Like a lot of people want to like run phosology and things like that. But we're hopefully providing enough data that now it's really easy that someone could come in and write a tool that would run all of this against phosology and do additional license scanning and things like that. So we're trying to kind of generate as much data as we have so that people can do whatever they want with it. There's a lot of things we'd like to do in the future. We'd really like to add runtime dependencies but we're having a little trouble figuring out how to integrate that data sanely. And we really want to have support for pulling in these SPDX and SBOM documents from upstream source code. So if the source code itself provides an SPDX document or something like that, we should be able to pull that into our archive basically and correctly link against it. And we'd also like to do some level of source code scanning so we can get the concluded licenses and there's a ton more SPDX builds we could add support for that we just haven't gotten around to yet. All right, so moving on from that, I'm going to talk about reproducible builds. So, you know, why do we need reproducible builds? There's a lot of reasons and I highly recommend you go visit reproduciblebuilds.org to check some of those out. It's a really great website for this. And there's basically two reasons that we want reproducible builds from a supply chain perspective. And the primary one is to resist attack. So if the builds are reproducible, it's a lot easier to see what binaries need more scrutiny when you're trying to figure out if something's changed. And then the other thing is the compiler trust issue which requires reproducible builds basically to figure out if you can trust your compiler. From the project in general, we're actually really interested in reproducible builds from another reason, which is that there's not a specific guarantee that the recipe metadata that when we compile this source code with this recipe metadata, it's going to produce the exact same thing every time. But the thing that we use to determine dependencies between things is that SHA-1 of the metadata is not like the SHA-1 of what that thing output. So there's not necessarily a strong, like it's not necessarily the same like the SHA-1 is always this specific native tool, but that's what we want because that's how we're using the SHA-1s in the dependency chain. So we want reproducible builds for this reason also because it really gives us a lot more trust basically and reliability in our dependency tracking system. And it really kind of closes the loop here, right? Because if the builds are reproducible, then you can actually trace this target image back to the actual binary things that got produced by these recipes in addition to the recipe metadata and the source code. And so what are we doing about reproducibility? Well, we're doing a lot of testing. So the ActiveProject Autobilder regularly tests for regressions and we actually have a little website here that publishes this. So you can see this was run, yeah, this was run 818. So that was today. So they found 12 packages that didn't build reproducibly today. Normally that's zero, but we caught it at a good time. So yeah, so they're testing 34,000 packages for reproducibility. So that's a lot of testing that they do for that. And that's actually like 11,000 different target packages that they test across three different package formats. So we test IPK, Debian and RPMs for reproducibility. And that's how you get that 33,000 number. And they actually test this across multiple build hosts. This isn't something I've seen a whole lot done. So they're actually making sure that the source code and metadata, if you take the same source code and metadata and compile it on Fedora, you're gonna get the exact same output as if you compiled it on Debian. And that's really cool because that makes sure that stuff about your host build isn't leaking into your final target image or target packages like that. And then the other thing that we do that makes it really easy to debug reproducibility is we have automatic Difascope output. So Difascope is this amazing tool that can diff anything. And so if we detect something that's not reproducible, we generate this HTML output and publish it. And so people can go in and very easily see like why this thing isn't reproducible. And so I have a little example of this right here. So this is an example of a package we found that wasn't building reproducibly. And so we have Elfie Tools P test here, this Debian file. And we found that this wasn't the same between two builds. And so we ran it to Difascope and you can see here, it actually knows how to drill down inside the Debian file and it says the data.tar.gz differed by a couple bytes. And then it even knows how to drill into that data.tar.gz file. And so it says, oh, this make file was different. And then if we scroll down here a little more, we can actually see, oh, this make file, this have z standard was no here and it's yes here. And so that was the difference in that particular file. And so it just makes it really easy to debug when things aren't building reproducibly. And Difascope is really powerful. It can actually even go into like Elf headers and things like that and show you all that. And it's really easy to extend this into your own build. We've set it up so you can basically make a Python file with these three lines and test whatever you want yourself for reproducibility because even though we're testing 11,000 different target packages, that may not be the super set of things that you care about. So you can add these three lines to a Python file and then run your own reproducibility tests on whatever it is you happen to care about. All right, so CVE checking. So the active project is doing weekly CVE checks across all the active branches. So that would be the master branch, the stable branch and the long-term support branch. And they have used coffee as a bribe to get people to fix CVEs. So that was kind of fun. And so here's the graph. They track these metrics over time. And so this green branch here is our LTS branch. Blue is the master branch. And then we've got, this was our previous stable branch and there's the red stable branch. So you can see they're tracking these CVE metrics and trying to make sure that they get fixed in a timely manner. All right, last of all, build tools. So build tools is one of these things that they can output. And basically the point of build tools is to provide all of the host tools that you need to build, right? And so it can kind of slot in here and replace all of these host tools with things that have actually been built by the project itself or that you've built yourself with your system. And so this is really powerful now because what this means is all of the things that I have previously talked about, about reproducibility and S-bombs and CVE tracking and all of those things now apply to all of your host tools, right? So all of the people that have been asking, can you track the S-bomb for your host tools? Yes, you can, because you can build your own and use them to build up your whole project, right? So because of this, like, we have a very strong supply chain because you can track this target image back through not only all the recipe metadata and source code that you use to build it, which includes most of the tools that were actually, most of the native tools that were actually used to build it, you can also track it to those few remaining host tools that may have been difficult to track previously. And so, you know, if you had some sort of air depth golden system or something like that, you could use that to build some build tools, sign and distribute them to your developers and they could use those and you'd be able to trace back your supply chain all the way back to that, you know, single golden builder that built your build tools. So hopefully, you know, that's cool. All right, questions. I got to go back to Zoom, so it might disappear here for a second. Did I close Zoom? Did I close Zoom? I hear you. Okay, okay. You're good, you're good. Okay, I freaked out there for a minute. Sorry. Yeah, you got a couple of questions. Yeah, you got a couple of questions here in the Q&A. There's a question from Shua Khan, how does build reproducibility work with configuration options and selection of features? Right, yes. So, yeah, I mean, you're only going to get the same output if you have the same features enabled, right? But we try, you know, it's really configurable. So like if you change a package config option, obviously you're gonna get different output. But we try really hard to capture all of the configuration options in the metadata so that it's always the same. So yeah, that's how that works. Okay, we had a question, although we had one answer, I think I'd love to hear your take on it. It's really two questions. Where are you tracking the metadata, in this case, the SBDX archive? And also, how are you ensuring that the SBDX archive isn't compromised? Thomas D. Bergen gave a shot at it, but I'd love to hear your take, too. Right, so are you just so, sorry, the first question was about the... How are you tracking the metadata archive? The metadata, so the metadata is actually just a file. It's a recipe file that we use. It's just how we describe the metadata. It's just, it's a text file. It looks a lot like, I don't know how to describe it. But yeah, yeah, so you would potentially wanna, I think there's even tools to let you archive all of that up if you want and you could sign that in a tarball. And the same with the SBDX, we don't have it yet, but you could sign that tarball or sign individual SBDX files if you wanted to make sure that they haven't changed and are done. And, you know, SBDX is really big on, with the document references, they're all hashed, right? So that's good, because it means like, if the document you're depending on has changed, you'll know immediately. It's a little annoying when you're trying to generate them because it's like, oh, I wanna go in and change something later about an SBDX, like, no, you can't do that, right? Change the hash. So, yeah, that's where we're running just a little trouble with the runtime dependencies, but. Okay. Well, thank you very, very much. Let's see here. Oops, one last one. How do you maintain the baseline in the reproducible build context? I'm not even sure exactly what they mean by that. Maybe you can take a shot at it. Yeah. Yep, so that gets fairly complex into the way. So when I talked about like, when this hash changes, that cascades all of these things and causes rebuilds, right? There's actually a whole bunch of sophisticated caching mechanisms to sort of lessen the blow of that happening. And so that's kind of basically how the baseline is captured by the auto builder is like, effectively every time we build something, we're tearing it up into a tar ball and storing it so that if we come across that same hash later, we can just extract it and use it instead of actually building the thing. And so that kind of encapsulates the baseline. And so for the most part, the reproducible build test is just comparing like all of those cash tar balls against a fresh build, if that makes sense. I think we could keep talking, but sadly we've got another presentation. I say sadly, but I'm gonna have to correct that because I'm looking forward to that. Also, so Joshua, thank you so very, very much.