 Let's transition on to Thomas Steerburgen. Thomas, please take it away. Thank you, Dave. Let me start the screen share. That's the one. All right. Let's see if this all works. If everything works, you should see my presentation. Yes. Awesome. All right. I'm going to show a little bit on how we generate an S-bomb for your code in CICD using open source toolkit. A little bit about myself. I am the head of open source for here technologies. But I'm also active in several open source pressures listed here as we're trying to build basically a complete open source tool chain to do basically false reviews basically in CICD. We have the tooling that we're working on. I work on I'm the lead for the SPDX defects group. Then I also am in open source tooling group open chain work on the processes create defined work on the data and finally did. I'm one of the co-founders and organize of the to do group Europe to basically work together as open source offices and all of that is really needed to basically be able to produce quality S-bombs. So I'll start but where I come from basically it's at the easiest ways to for my situation for I work from is open chain chain is the ISO standard for basically your open source compliance process. And it basically has two things know your obligations certifies obligations. There's a little bit of focus on licensing but it actually covers now open chain is also expanding now to cover security vulnerabilities as well. So basically what open chain in their material says what do you need to get and this kind of overlaps already the whole and what we need basically in an S-bomb. And it's a long list of things that you need to get. So this is where we basically started was like, okay, that makes all sense. Yeah, we can implement it. How do we this list also matches what we have to send to our customers. Let's see what we how we get that information. Then we started comparing software component analysis tools. And we had these. After a little bit of research, we basically came down to five questions that we were mostly comparing tools on. And the number one question is like, which software components are included. And we found huge variety between various tools. And the main reason for that is that build tools are meant to build code and they're not designed to build an S-bomb. So S-bomb is kind of an after fact, they're basically developers are happy if they're code builds producing an S-bomb is not an was not a design goal from the get go from the build tool most of the time. So as a result, mostly most software composition analysis tool on the market generally do a best effort approach to figuring out what is actually in a particular software project. And that makes that the software bill of materials very different and it may differ on the method that they use. So there's dynamic analysis where you actually have to build tool, there are static analysis where you try to mimic the build tool or try to log log files. All of these produce different results. Then the second thing that we were looking at if a tool tells me has a particular finding where there's a vulnerability finding or license finding, can you show where they got it? And there's a you don't always get this a lot of tools for instance don't show if they tell licenses applicable in S-bomb where they got it from. So for us this was crucial because we need to be able to know that the license that are being told by our tooling or we need to be able to verify them. And the next thing that we looked at which was really a big thing for us, we have a particular way and how we think about in my organization about how we do our open source policy, how we make risk decisions. And we wanted to implement that in our tooling we wanted to automate, automate, automate. And fairly early on we discovered that it is really difficult with a lot of the existing tools. Because they mostly offered us kind of like allow the nilists and we don't have to do way more than that. And at first people thought like, oh, may our thinking is different but as I'm in part of all of these open source communities where I interact with several organizations across the world. We actually figured out that our way of thinking is kind of standard. It's just basically a lot of tools don't allow this level of automation yet. And so the next thing that was very important for us is here's not a small organization. So we wanted to run everything in produce as bombs in CICD but it had to be done at scale we have hundreds of teams. It needs to be at speed so when we develop code and then we have developers 24 seven writing code. At the same time, it needs to be able to do cost efficiently so it's cannot basically be fast and work at scale but have a cost that runs in the tens of millions. And at the same time we also want to make sure that it's the level of compliance that we could dictate it in scale for internal code maybe we have a lower level compliance but if we ship for instance to cars. If we ship software to cars, we want to have to be able to set the highest level of compliance. And finally, we looked at can we edit an S bomb. And people might ask why. Well, no tool is 100% correct, because open source is free. But that also means that you cannot take any guarantees on it from our studies will be an experience what we looked at. So about 20 to 30% of the open source that we seek flying around has some kind of issue. And it may be a method as an issue may be a licensing issue. A very common issue for instance is that the code repository is moved. So the metadata says, Oh, this package my source code is there, but in reality the source code is somewhere else. And then you need to be able to fix that up. And this is a, I added the slide because this is really something I wanted to make clear like a lot of tools are now being capable of generating an S bomb. But as there are many best efforts, there are basically maybe not usable for automating your open source policy, your organizational policies. And this is also not so much as surprised because S bombs are well not for me they're not relatively new, but for most of the people in the world is only since last year like you need a lot more information to produce a high quality S bomb that you can automate on and really automate your policies. It's not about generating it. And that's that's step one, but step two is actually making sure that you have a high quality one that the data is actually accurate, and that's actually traceable. So again, some of the tooling challenges that we saw again missing data missing sources, ways of working issues are also very, very, very fun, where the build tool recommends a particular way of working and then the developers working on do something completely different. I already mentioned a build tool dependency tool issues that are not really designed for fossil fuel or S bombs. Then you have a large number of built so my organization is about 30 different build tools that that we dependency tools that we use. And then you have to, when you do all of that together you have to handle a massive amount of scan results, basically, because we do license scanning for everything done so we can produce the license, we include licensing in our S bombs. But then you need to basically process all of that. And yeah, so this required something different. So, luckily, solving challenges is what we engineers do. So, what do we do. We started building our own tooling and we open sourced it. It's now a Linux nation project. And our approach is slightly different. That's how we open sourced it we are basically building it with the community for the community. So it's mostly a group of open source offices that basically wanted better tooling and basically decided to work together to produce it. And we basically the tool uses mostly data directly from the build tool so we always use build to information first before we use static analysis. We have multiple methods to fix the S bomb, our policy rules to basically generate and verify if an S bomb is correct. We basically written this code. So you can write whatever you want to do. And the tooling was designed as a component pipeline that you can really use in your organization because there's so many different ways that people build code so you need to be able to be flexible build a pipeline so you can produce your S bomb. And we build it on top of open standards, like SPDX, because basically, we don't want a tool lock in my successor in 20 years must still be able to use the S bombs that are that I produce. So this kind of is how it, it looks like and to end basically to produce a S bomb. The first step is we ask our developers to indicate what is actually the code in your code repository that is not included in in what is released. So that allows you to create kind of like a complete S bomb for the code repository and S bomb for all of the release artifacts. So the developers indicate this in a YAML file, and that allows our automation to basically process it. The next step basically it builds up a dependency tree we support, I think, 20 different build tools and dependency managers now. Once we have a dependency tree, the tool will download the source code. We do this for to do a license scanning but also certain licenses require you to give the source code to any consumer. And for also it's also a little bit of business continuity because things on the internet have a tendency to disappear. Then we scan it. That's basically a standard copyright license scanner. We use scan code for that. Then we put it to our evaluator and basically we check what all of the rules make sure that everything is okay. And the whole goal of these checks is to basically at the end with our report or produce a an accurate S bomb where all of the license information and security vulnerability that is all that is that is correct. How can you integrate this in a CISD pipeline? So this is how we are doing it. We basically have the scanner running in the CISD pipeline continuously. And it will produce a scan report that outlines what needs to be fixed, then the users can get help via a gigantic system of support. And what we on the other side have is you cannot review every scan report if you run things at scale. So on the other side, we have an audit process where we basically at scale basically check the reports that were generated in the S bombs and if there's something wrong there we basically create an audit ticket to basically say like hey, here's something wrong. I think it is great it will basically show up in the in the scan report to basically to notch the developers like hey, the S bomb that you produce might not be accurate. You need to take a step. And the way our rules work is, which is on the bottom the okay not okay to use a particular package. So this might be a tool called open source toolkit but actually you can use both close source and open source you can both review both of them with it as well for them. So we, this is the formula so we look at the code context of how something is used the license context, how it is in the in the product so we can produce different S bombs depending on what type of product we are producing, the level of detail or and also, and finally the security context. Now I would like to just show you actually the thing hands on, which is what I think most people will be interested in. So what I did is I took a project called my types on on get up, and I forked it internally to my personal get lab. And why did I do this well that allows me to add the so called get lab.ci file to it. And it allows you to define a city pipeline for a particular project. So what it does then is in the pipeline and I'm just showing at a high level for people not familiar with get lab this is kind of how I get lab pipeline looks like, where at first it will basically have different dependencies. And then in the test stage, it will basically do separate tests and one of the tests that we've added is the or scan as we call this, and this will basically execute the to basically produce the S bomb. Or support actually besides running it inside you can actually run or directly as well. So we use both. So, this is the same pipeline but this is where I just executed directly the pipeline so he kind of give you an idea. Why do we do this. Well, we have lots of cases where people are asking us like hey can I take this open source component out there and add it to my project and will it then all work. So we do a pre check, and the way how we have set it up allows us to just type in any source code location, and we can just scan it and produce an S bomb and then verify that the thing is okay yes or not. So, once this can once this this is the word scan step in a little bit more detail. And so it runs. And what it will produce is it will produce so called job artifacts. And if I click on the job artifacts, it will show me the results. So this is one of the results that are generated. So in this case, I have two as project files here. One in Jason and one in YML. And it also produce some other reports so one of the things that we do for integration in get lab we actually produce the native format of of get lab so it will show up in this compliance type here. And it will show up to show you the report so again, an as bomb is usually in text format is really difficult to understand so we have a standalone web app is a single page application on top of it that allows you to basically navigate a little bit easier all of the data that is in the So you can see exactly that in this mime times there are 355 unique dependencies. And, and then you can basically browse it with like a kind of like a table view where you can filter the packages, see the declared license so it is easier and you can just also see just like a browsable tree. So what we also do is, and that's particularly through our tooling you see these things are great out. This is because the developer has marked what is used for dependency so here we use spdx, spdx has a thing called relationships, where you can indicate relationships between packages. And this one is great out and if I hover over it says it's a deaf dependency meaning that this dependency is only used for development. So this is how we can produce basically an sbom for all of it, or just what in this case what is what we call release so if I only want to get the things that are released I just look at it and you see that this is the only two packages that are actually shipped. Now, if I then what we also can do is we can look a little bit deeper. So we have all of the scan results so we see actually if it says MIT we can directly see where that MIT comes from because we have the actual scan results with it. And finally, I just wanted to show you one of the spdx is that we produce. So he can see this is the same same data in Jason with all of the information that we basically capture so this is kind of in a, in a nutshell, how we produce an sbom using it in this case get lab. You can also do it for GitHub. It's pretty similar. That's about it. I think I would like to take some questions. Let's see if they are some questions. I don't know how I'm looking at time. When you run it directly in Jenkins pipelines. Yes, we actually have a Jenkins file available that you can run it. Basically, it's all a Docker image and you can run it wherever you want it whether it's Jenkins get lab GitHub or a bit bucket you name it it's it's relatively straightforward. How am I looking at time I think your time is looking good. I probably should forward on a question from the last one. Do you. What about build time, you know tooling information in terms of the tools being used for the build. We captured this in the. It's not that in the SPT accessible but we do capture it in our interest. So we look at what the information that we capture for instance is the project will capture which version of MPM was used to produce the S bomb, because actually the S bombs can differ depending on what kind of version of the package manager you have installed. So we try to capture all of the information possible that was used so that you can exactly recreate the same as well basically. You've got a couple more questions in the chat if you could take a stab at them. How are as bomb and security as big domains are connecting the dots. So, as as pdx originally came from the licensing side. So working at license compliance but now as said, the worker group that I'm leading the defense group is working on adding security vulnerabilities to it. On the on GitHub. And so basically what we're now trying to do is make it a complete software build material so we include, and it's going to be aspect we've been always going to be kind of like a profile stack. So we have a base back where you can just model a base as bomb so basically package data. And of it you can have like, optionally at licensing data option at security data there is talk about a full provenance one. So you really can pick and choose what you want. And it's, it's much more flexible than the current two point version. So we have what we're trying now to connect is basically capture all the information of interest for both licensing and security in one standard. So editing the generate as personal file. Yes. So that's the question of don't so I should explain that we don't directly edits the generated spdx file, what we do is during the generation of the xpx file. We have a mechanism called creations. And we capture we basically capture that. So we allow you to fix things beforehand. One of the issues that we saw is basically you don't want to generate the S bomb for one team but you want if you find a particular issue you want to fix it for all teams. So what we do is we have a center in tooling we have a mechanism called creations words allows you to patch of the patch data and then all of the generated as bombs will be fixed. And in the our reports, we show the kind of what we call creations that have been applied. So we keep track of the in our case the creations are maintained in a separate gift repository. I can just show you that how that looks like, because I have one here. So we maintain all of the patches in a configuration repository. It's here. And I think there are, yes, give you an idea of how these creations look like. So for instance here, we try to find out the version of a software package. So we would patch this in our in the configuration files of the tool. This configuration files are version controlled and basically the hash and the location of that code repository is included in the report. And yet in the as bombs in our tooling this is something that we're actually kind of fixed with SPX 3.0 because then we get better fields to capture all of this provenance information. But in the tool ourselves we have already all of this information is just basically we're still SPX 3.0 is still working progress. And we're working on getting all of these fields to find but yeah, once aspects trees ready we can just add all of the information there. So are the rules publicly available. Yes, example rules are available and I'm actually working on publishing most of our policy rules. So that people can exactly reproduce what we're doing generally from us from my open source office we try to open source everything that we do. The only exception that that is how we think about licensing but for that we will have other examples from our buddies at scan code. Yeah, you will be able to fully reproduce how we do things basically as a almost turnkey solution hopefully. Did I miss any other questions. Component supplier idea info. SPDX includes all the anti minimum elements so I'm not sure what the confusion is. When we, when we, when we let somebody else pick up that question Thomas why don't you go on to the next one. Orch catch results. Yeah, so what or does currently is we cash the one we scan a dependency it's only scan once. And then we basically reuse that we, the caching between CI jobs is actually a very complicated thing. Because if you know how the built was really work, you'll be surprised that if you, for some of them if you don't change a single line of your code. The dependency trees between the two things may change and why is that because you largely depends on the open source community and open source community doesn't stand still. So unless you lock down all of your dependencies. It might, it might change and even then even with lock files, you have things locked down file. The, some of the package managers see a lock file as just a input. And depending on situations they might choose a different dependency. It doesn't support home charts. No, it's not yet on the list you can actually see on our on our GitHub page we have a list published. And we're basically working continuously to add more things. Let me see if I yeah this is the complete list. I said, we're primarily focusing first at the source code level, so that we're able to build a source code level as bomb. And then can be included into the artifacts and then you can do that deployment and all the other stuff, but again, lots of tooling we're missing to have a good tool to basically produce as well as source code level. But that's step one and the next step is for instance looking at, okay, you know you produce an RPM, how does it work or produce and say a jar and then you package an RPM and then how do we process that. Thomas, I think we've got a clarification of the question SPDX has it, but it wasn't clear that you were capturing that data. Your tool, your tool. We misunderstood the question. Yeah, so the SPDX has a package supplier fields. But so it depends where the package manager has it, or not. So things don't have a package supplier field. So it's mostly the corporate information that's there. So we capture the source code locations, the binary artifacts basically anything that's pretty much available from the package manager. So we capture it, and then we map it basically to the as well. So the real challenge is you need to you and I think many others need to push back and say, hey, wait a minute. We don't have the data. Therefore, we can't capture it. Please add this information so that we can capture it. So, but also in the, in the traditional sense, if you look at a package supplier, again, it makes sense for a close source package to have a package supplier in this case but for open source packages it's the community most of the time so they don't have a legal entity it's just usually the name of the project and the name of the project we anyways have in the in the in the package identifier. Again, for for for for closer software capturing the package supplier makes sense, but I know most of the package metadata files don't really have a field for that at all. So what you can do is we can regenerate it by using the other information the package metadata to generate one. But yeah, we already have it already. All right. So, Thomas, if you don't mind. I really like being five minutes ahead. I tried to keep your five minutes intact actually gave you two more minutes extra so. Oh, wow. Okay, so thank you so very, very much, Thomas for all your time and all this information. Really appreciate it.