 information. I really appreciate it. So if Adolfo, if you don't mind we next up have Adolfo Garcia talking about leveraging Kubernetes SPDX tools to create your Sbom. Please take it away. Hi, thank you. So let me start my screen share here. So good day everyone. So welcome to this talk where we are going to showcase some of the work that we've been doing to generate the materials for Kubernetes. So a little bit about myself. My name is Adolfo Garcia. I am a technical deal with Kubernetes to Release for Release Engineering. I work in a small company in Mexico City called New Servers. I am the dad of two girls and I like to tour the world on my bike and I'm here because I recently led a Kubernetes Sbom effort which is still about work in progress but we already have some results to show you. So first of all I would like to talk a little bit about secret release and what we do. So secret release is a special interest group that takes care of coding the new Kubernetes releases and we are in charge of maintaining the Kubernetes release process which builds the artifacts, pushes the images, backs the repositories and so on. We are also in charge of supporting the release team that gets new Kubernetes releases and we also manage the changes that are backported to the supported branches of Kubernetes. So I would like to give a brief overview of what's in a Kubernetes release so that you can understand the problem where we that made us set up to build materials for it. So Kubernetes is a large open source project, one of the largest in fact so we have many architectures and platforms supported and we produce artifacts for some of them. So for example if we take the images that we produce in a given release for example we produce these images for all of these architectures and if you mix and match those you can see that we produce a large number of images and artifacts with each release. So each month we also cut the batch releases for the supported branches with typically it's three batch releases for so currently we have 120, 120 and 119 and if you take into account that we also produce binaries and other things it the necessity to keep track of all of these artifacts has become an issue. So some facts we handle artifacts for three operating systems. We have six architectures so as I said before we produce 30 images and 53 naked binaries which are the utilities that you can download and run as directly like Qcdl, Qadm for example. We also produce a set of tariff bundles which have some of the artifacts inside of them and some of those binaries are in there and we also produce a tariff with all the source code. We produce packages and documentation like the release notes. So part of the charter for our C is to provide guidance and tooling to facilitate the production of automated releases. So in this period this is taken directly for charter so in this period what we've done lately is that all of the coding all of the code that powers the Kubernetes release process we've started producing some little tools to that in general purpose tools that users can use and incorporate and use in their own projects. Some of these examples are for example the release notes tool which generates a release notes document from the information found in github vr's hsb builder which is a tool to trigger a cloud built in a cloud build and publish releases and a utility to update your Kubernetes that your github releases pages when you can support updating assets and also having a template of what gets published there and as of late we derived from the effort that we're leading here of what that we're doing for the build materials we produced a little utility call bomb which helps you publish a bit of materials. So check them back to the build materials thing I listed this definition of a build of materials straight from the nta.gov website so software a bit of materials is a record right so what do I think when I think of a build materials I mostly think it's an inventory right as I and it's like an inventory of things that I might find in my software release and it makes me it allows me to verify the provenance of the artifacts the build the modules the dependencies check that the artifacts are complete and so on but I would like to give another focus into the release so we are seek release not seek sbomb right so I would like to shift the focus a little bit into what a software release entails so when I think of a software release I think it's like putting an extra layer of meaning on top of a release of on top of the artifacts that get produced so in a software release you generally group the artifacts in a sense that would make more most usefulness or sense to the people that are consuming them downstream so if I think uh in the previous slide we have like an inventory of the everything that's packaged in my release but the software release itself presents even the same artifacts to the consumer in different ways so for example if I think about this picture here of a little store or market I can find for example tomatoes being sold as just just to consume them from there but I may also find the same tomatoes inside of the store in I don't know maybe a salad or maybe a sauce for example and I think of the releases in the same way I can find the same binaries inside of the table that we that we publish or maybe inside of the of the rpm packages so it's it's this layer of meaning that we focused on so we focused on building a tool that can accurately describe the the artifacts that we produce after each release come so let me give you an example of what I'm talking about so when we have when we can see there uh uh building a bill of materials for one of our releases uh it makes sense to build a bill of materials uh that will be most useful for the consumer that will take it so if you think of building a bill of materials you may start adding your source code to it then after that you may also list your container images inside of that then you may have binaries to it and things start getting crammed in there you have to put more stuff in then you have packages maybe then you have to consider your dependencies and then the indirect dependencies and all of a sudden you find lots of things in there that is an emoji my pets and an actual kitchen sink thrown into the sbom if you're wondering so um what we try to do is uh find a way to organize this better um so it would make sense that we produce uh bills of materials um separate to and better designed to um what will eventually consume them so um for example if you have your source code it will make sense to um make it available in bill of materials for something that will uh not a major process that may analyze the bill of materials for it when you're dealing with images it makes sense to have like a more leaner bill of materials that may be consumed by something that specializes in images for example I may buy I may build at some point a admission controller for kubernetes that scans the bill of materials and only allows images that are complete for example uh same thing for binaries and rpms at some point you could envision a tool that reads a bill of materials and analyzes the contents before allowing an rpm it's just an example nothing planned so what we set up to do is try to organize this in a better way so in the first thing with this uh uh starting to leverage the actual uh features of first pdx to build a bill of materials so first thing that you would want to do is have your source code separated in a in its own package uh then after that if you think about it uh what I was saying before it would make sense to have your images uh available with their own bill of materials which it can separate and then link using the spdx features back to your to your source code bill of materials and then if you think about your images uh what is an image in the end most of the time an image is a set of um tables and you can think of those as in packaged by themselves and then most of the time you would have an image um index that points to them so you can also add a package for that and have them marked as a variant and in the end you can also split and your spdx bill of materials to a separate document by itself and have them all linked using the spdx features so as I was saying before all of the code that we produce for the Kubernetes release process we have been trying to spin it as a separate tool for all the projects to use and currently it's named BOM so we are deciding if we rename the tool and maybe even splitting into its own repository by itself so this is the way it works now currently the easiest way to to run it is that you cd into your into your repository simply run BOM generate like the example here and it will go and scan your source code and package it you can add other artifacts after it is produced to your to your build materials by specifying them like this now as we said in the earlier example when you have a large number of artifacts produced by your build it quickly the command line interface starts to become cumbersome so what we built into the tool is the capacity to do this a declarative as BOM definition so basically what you do is you write a yaml definition file for your for your for your build detailing everything that is going to go in there simply run BOM generate with the configuration file and this way you can have a more a more complex build material defined and you can also do it in cd in your cd pipeline you can also have lots of sources to to analyze inside of there so with that I would like to give a little demo about our tool running so just so to demonstrate how this demo is going to work what where I'll demo now is going to build this so we have a little project in github and using tecton it's going to pull my source code it's going to build an image push it back to github packages then using that using the the build artifact it's going to generate an spdx build materials using describing my source code that file and the image then using cosine it will go into it's going to sign the image and upload the signature and the build materials back to to the github repository so let me switch my screen to my terminal here so um if I so this is my project it's a simple project it's something I've been sewing around it's a little go base project so an important thing is that our tool currently has support for analyzing the go dependencies in projects but we do not support other languages so Kubernetes is a go base project so it's it only supports that for now and so what I'm going to do is that I'm going to run the a tecton pipeline which is going to start the process so if I go into my into the logs here um so right now the the pipeline is currently running it is cloning the repository now it's starting to build my project so I can while it builds I could show you well I can show you the the the actual tecton files later so it's the image is built it is now pushing it back to github here and now tecton is staying in it so this is the s1 generation running so the first thing you'll notice here is that it downloads all the licensing information from the spdx website this information is used to train a classifier which will then be used to find the licensing information inside of the of your source code so using 474 licenses it correctly detected that my project is using attraction 2 then it will read your dot get ignore file and apply that to the scan directory and your source code and then it will start running and analyzing the dependencies so from seven direct dependencies it calculated a list of other of 12 dependencies in the end then it it will actually go into the each of the go modules scan the licensing information in them and determine the license that is found in each of the dependencies and it will reuse your go local go path so it doesn't have to download everything that it doesn't need to and then after that it will process the image reference so keep in mind that this is an image that is already published in the repository so one will actually go and read the registry and pull down the layers and build the the materials describing everything it finds in there so here is the it's determining the actual layers so the layers are here the layers of the new image an important thing to note here is that we have this option to go and perform a deeper analysis of the of the images themselves so right now as the analyzed layers option is turned to false it will not go into the the actual content of the tar files the tar file each of the layer tar files will be listed but their contents will not so I can inventorize the the layer tar files and have the the signatures the the actual checksums verified but we are not going deeper into that so this this flag will if turned on it will go on go inside the the layers and perform so for example it can auto-detect the illustrious bases that we do in that we use in Kubernetes and perform some some some some analysis of the things that it finds in there so in the end it builds my my back my build materials it defines that my source code has tar file relationships and it writes it to a file and then the last step uses cosine to upload the bomb and sign the image and upload it and upload both both things there so if I take a look to my repo here and the results that I get is that I find the beta tag that I pushed here's the build materials and the container signature the image signature so I can download the s1 using cosine to check the results so one thing I would like to show you is the s1 definition file which you have here so this is a simple example it lists my name the name of project the um the uh the artifacts I want to analyze here and if I see at the expected results I can see my build materials here and some of the some of the information I specified in the demo files are here so this is the readme file uh if you see if you take a look at the definition file I added the readme file as a separate entity and it's it's listed here outside the packages and there's a package describing the source code here and then if I go down here here's the here starts the dependencies so for example here's a go module and it went down and and read the licensing information into that so the licensing uh text is included in here this was a requirement from the Kubernetes string committee that we include all of the licensing information in the build materials probably this will be a a um an adoption further down and so in here's the description for the images so the images it pulled from the repository are listed here and um and yeah that's basically it so I can switch back to the so my presentation I can show you here the actual uh tecton pipeline that run so as I was saying it first built the image it uh then after that it built the build of materials which is uh what we saw happening there uh and then finally signed the image and uploaded and okay so back to the presentation um so a few words about the future direction of this tool we are thinking about the future of it and so currently we need to still build the analyzers for rpm and dev packages which are still working progress and I would like to add a more expressive general definition into the tools so you can express more complex relationships that declaratively then I would like to integrate our tool with the official spdx libraries from the lemmox foundation so we can get support for for the languages dependencies currently our tool only outputs to that value format of the that value format of spdx so I'd like to also publish to the other formats then I would like to add some validation of the tool um and that is at least that the thing can go back and inspect then what it interpreted is correctly expressed and it's the current state of the artifacts I would like to add some spdx visualization uh this is I think an important thing to have because as you saw when I was trying to explain the build of materials previously it it is a difficult for a human mind to interpret what is in there so my machine can read and interpret the the relationships and artifacts found in there but I would like to build into into our tool something that helps a human make sense of what's contained in there what's contained in what what the relationships and finally when things start settling down I would like to add signing capability to our tool and with that I that's safe I the presentation should be available soon and it has links for the tool and if you want you can reach me at puerco almost everywhere and feel free to reach me and ask any questions yeah so we've got a couple uh questions in the chat you can tackle as many as you can in the next nine minutes I want to try to stay five minutes early if we can sure ah okay so let me go through the questions here um okay I'll start going going back in the questions are you planning to link cba reports to that spdx visualization uh not currently so we the cba reports are published by coroneties and the release notes but we we don't have plans of linking those and not currently at least so we'd like to hear a good a good use case for that we can build it and it is love to compare your spdx against the uh yes of course you can only scan for how do you plan to handle how do you plan to handle those oss licenses not on the spdx list of files containing oss license snippets that's a good question so currently our test cases for licensing and scanning have been pretty much constrained into the usual licenses that we see currently in a semi scientific thing I did we have around 97 percent of accuracy in the classifier so currently it's pretty well pretty accurate for our use case but if you introduce another license it may turn unexpected results there yeah these go download a prerequisite for collecting licensing information uh yes so currently uh it it won't need it won't necessarily download the things because things may already be there after building the the things but it it needs to download the modules and read the information from the from in there um link to the bump tool yes it's uh in the slides which should be there already then uh is the bump tool designed to be used outside of Kubernetes yes so we've spent a good deal of time uh building things into the release process so we would like the objective is that all of this code doesn't stay constrained into the Kubernetes release process and other people can benefit from that um all right so I think those are the questions I have here so there are new ones uh there are quite a few distributions of Kubernetes is the tooling that you've developed also compatible with downstream projects yes so this uh so the tools that you that I just showcased analyze I used to analyze the the artifacts that we produce but downstream if someone has another distribution of Kubernetes can take these tools and apply them and analyze their artifacts uh just in the same way that that we do um and uh yeah I think those are the current questions so um just a final note uh I'd like to thank all of the all of the community around the the spdx efforts going in and around so a lot of your advice and comments and have been incorporated into this effort and it's it's still going on so if you have suggestions comments or whatever um we are more than welcome to to hear them um excellent okay well thank you very very much Adolfo for the presentation thank you everyone here for all the great questions