 All right. Let's get started. Hey everyone, everyone over there and all the way down there. My name is Sophie Wigmore. I'm a maintainer on the Peketo build packs project and also a member of technical staff at VMware. And I'm Frankie Galena Jones, also a maintainer of Peketo build packs and an engineer at VMware. Yeah. So Peketo build packs are and it's an open source project in a part of the growing cloud native build packs community. And Peketo build packs are essentially a tool for containerizing your applications in any language as an alternative to Docker files. So today we're actually here to talk to you about S bombs, though, the software bill of materials because there's a lot of interest around them and people seem to think that they really matter. So over the past year we've been implementing features related to S bombs in our build packs and trying to answer the question, now what? So hopefully if you're here today, you've heard of the S bomb before, but if not, we have a quote from the US cybersecurity and infrastructure security agency SZA that kind of sums up what we're talking about. So a software bill of materials or S bomb has emerged as a key building block in software security and software supply chain risk management. And S bomb is a nested inventory, a list of ingredients that make up software components. So as you can see, we have a picture of nutrition facts down here. And this is a comparison that we're going to come back to over and over again throughout this talk because it's a pretty strong example of having a full of manifest of information right alongside the contents of the thing that you're interested in. So this big US government agency has bothered defining S bombs and S bomb theme content is all over blogs and new sites. But why is there so much buzz? Yeah, so the short answer here is that they're essential for supply chain risk management, right? In recent years, it's become increasingly clear that components inside of software can expose that software and the orgs that make them to risks, right? One clear risk is the risk of a vulnerable or malicious component making its way into your supply chain causing you problems. And there's also the risk of litigation or you know, legal action. If you're using components that aren't properly licensed or not properly attributing those components. So these risks are serious. You wouldn't want them landing on your team or your organization. So you might think that at this point, S bombs are ubiquitous. Adoption is happening across the board. But if we look at the numbers, we actually see that's not true. In fact, in the 2021 anchor software supply chain report, they found that only 25% of respondents said that they produce an S bomb for the containerized apps they build. And while the year of 2022 has kind of been the year of the S bomb in a way, adoption is still nowhere near 100%. But interest is high. So why is rollout so slow? We at the Kettle Build packs think there's a pretty good reason for that. Right now, software makers are trying to figure out how to keep delivering the software that they're great at delivering and also deliver software builds of materials, which is a completely separate thing with its own set of nuances, right? It's a hard problem to solve. In other words, engineers like you and me are asking now what? About a year ago, Piquetto build packs started to see growing demand for S bombs from our users. These are people who are containerizing their source code using build packs. And so we wanted to provide S bomb automatic generation as a feature. And so we were left asking ourselves now what would it actually mean to produce something useful for our stakeholders and our users stakeholders? Yeah, so the questions that remain to be answered here are how can I actually adopt this for my project? How can we meet expectations of our consumers as it comes to S bombs? And how can we make sure we're just meeting best practices in the industry? So in our exploration of this topic at Piquetto, we found it easiest to understand the how by breaking it down into five smaller sub problems. Why, what, who, where and when. And the why is already pretty well understood. There's tons of research and these articles on this subject already. So throughout the course of this talk, we're going to break down the remaining questions and try and provide some examples of how we've tried to answer those questions in our implementation at Piquetto build packs. So the first question on our list is what constitutes a useful S bomb? S bomb is kind of a catch all term once you start looking into it that encompasses a number of different formats, schemas, data fields and use cases. So what constitutes as useful will definitely differ depending on who you ask. Our super important clarification to make when we're talking about what's in an S bomb is that it's not just an enumeration of vulnerable packages. It's a full list of ingredients. Identifying vulnerable components is one use of S bombs, but there are very many others and these ideas are often conflated. So if you're not careful about what you actually include in your S bomb, vulnerability skinners might not even be able to actually identify your packages because there's so many different ways to do that. And other stakeholders might not be able to get value from the S bombs if they have different use cases. Yeah, so to put a finer point on this ambiguity about S bombs, we're going to show a concrete example here. Side by side, we've got two entities that can both be reasonably called a Cyclone DX software build materials for an extremely simple node application. If you've never heard of Cyclone DX, for now just know that it's a spec for writing S bombs and both of these documents implement that spec. We'll talk a bit more about CDX later. So the first thing to notice about these two documents, right, is that the one on the left is XML and the one on the right is JSON. Obviously both machine readable formats, but your downstream consumers may not be equally able to consume both of these. Important to get clear about what they're expecting in this case. Secondly, if we look a little closer, we notice that the one on the left implements version 1.4 of the CDX spec and the one on the right implements version 1.3. As the Sember version indicates, theoretically 1.4 is backward compatible, but in practice downstream consumers may not be equally able to consume both of these. So again, getting clear about this distinction will help you in the long run. And lastly, possibly most glaringly, we notice that on the left there's very minimal package information for the remapping package that's shown in the S bomb, whereas on the right we have much more rich package information. On the left, this is the minimum required data fields for representing a package like this in the CDX spec. You might notice that it doesn't even contain the version of the package in question, which is a vital piece of information for just about anything. Whereas on the right, we not only have a version, but we also have a couple of ways of uniquely identifying the package and some extras like the name of the file from which that dependency was installed, which is useful for tracking down how something vulnerable or out of compliance made its way into your app. So all of that is to say it's really easy to not be talking about the same thing with your stakeholders when it comes to S bombs. So figuring out the answer to the question what constitutes a useful S bomb can vary very much depending on who you ask. That perfectly leads us into our next question, who wants S bombs? This phase involves discovering who your stakeholders are and understanding their use cases, and this might seem like a pretty simple and obvious task and question to ask, but spending time here is really important and will save you a lot of time in the long run as we've learned. This is a really rapidly changing ecosystem with new standards coming out constantly, tons of use cases that have plenty of interest. So understanding the full scope of your stakeholders will make the rest of your S bomb adoption process much smoother. So the entire question of who wants S bombs is meant to inform our earlier question of what that S bomb should actually look like. So that's why it's very important to be aware of the different nuances of the options that we have in this space. There are a few top level categories that we're going to focus on where you'll need to meet choices, schema, data values and formats. So from this visual above, you can see that there are plenty of choices to be made depending on your stakeholders, some having different strengths and weaknesses depending on your use case. So a few examples, if your stakeholders are interested in license compliance, auditing use cases for their S bombs, you might choose SPDX as your S bomb schema because it's ideal for complex license information. You still want to align with them about the format that they expected in because depending on how they plan to consume the S bomb, it will matter if it's in JSON or XML or another format. Another major use case that we already talked about is vulnerability scanning. In that case, you might want to use Cyclone DX because it has flexible support for a number of different data fields, especially relating to package identification. We have package URLs, pearls, CPEs and SWIG tags, which are all different ways of identifying your packages, which then can be used for vulnerability identification. Finally, a third and final use case that we'll talk about is that people might just want to use S bombs for looking at dependency drift in their applications. In this case, they might not even care about the schema that their S bomb is provided in, as long as it's, you know, maybe it's human readable, maybe that's what they need. And as long as it includes versions or checks some information for chain of custody concerns, that might be what you need in that case. So conversion between these different schemas can also be lossy at this point, depending on how you generate your S bombs. So you definitely need to nail down your needs ahead of time so you can provide the richest information possible. Answering these questions in Piquetto Buildpacks was pretty challenging because we have a very broad array of downstream consumers. So for that reason, we support three different schemas as PDX, SIFT and Cyclone DX, and we cast kind of a wider net in what our stakeholders might need. We know that vulnerability scanning is a major use case for our customers. So package URLs and CPEs are certainly present in our S bombs. And another major requirement that we had was stable schema versions. So our downstream consumers need to reliably be able to consume documents without fear of incompatibilities with new schema bumps that come out. We support older schema versions as well as some of the newer ones as they come out. Another primary concern that we focused on trying to solve was all of the different languages that we support in the Piquetto Buildpacks project. You might recall that we said Piquetto Buildpacks containerize applications in a number of different languages. Because of that, it meant that we had to consider a solution for the S-bomb that could glean information from a number of different package types like NuGet packages, Maven packages, Node modules, so many more. So all of these different requirements that we had from our stakeholders informed our final solution. So for Buildpacks specifically, we ended up using Incor's S-bomb generator tool SIFT as a library within the Buildpacks code to generate the S-bombs. We opted for this approach because one, SIFT is kind of the expert over us in generating S-bombs and then also they generate S-bombs in all of the formats that we were interested in and has language support for all the languages that we support in the project. Once you understand what an S-bomb looks like for your users, we recommend finding performance software to be the experts in generating your S-bombs as well. S-bomb generation is pretty difficult and there are plenty of amazing open-source tools out there to do this for you and for that, for us, that was SIFT. So just to put this concretely, we are going to show a quick demo of how we generate S-bombs for our stakeholders alongside containerizing your application using Piketo Buildpacks. So I'm going to show a couple quick demo videos. We'll see how this goes because the screens are quite far away. But, all right, so here we have a very simple React app that we are going to containerize with Buildpacks and we are going to use the Paxiali which is an upstream tool from the Cloud-native Buildpacks project to orchestrate our Buildpacks. Here we are specifying the name as frontendenginex. We're specifying a few Buildpacks that we want to use and then most importantly at the bottom we're specifying S-bomb output directory that we want to tell the Paxiali to dump all of our S-bomb documents into when we're done. So now we're going to let the build run. We've sped it up a little bit for the purposes of getting this demo done on time, but we're installing different things. Here you can see we're using the npm install Buildpack from Piketo and we're generating an S-bomb for all of those different node modules that we're installing with that Buildpack. Additionally, we're installing other things during the build like EngineX, configuring it and generating a related S-bomb. So we've really glossed over the build here if people have questions about the build side of things we can answer any questions during the Q&A. So at this point we have our image and we can just run it and make sure that the build succeeded and we're seeing what we would expect. So we can run it locally, hit the endpoint and just see that we have our app running. Yay, that's the magic of Buildpacks right there. But more importantly, in this case we can look at the S-bomb output directory that we specified. Right, so here you can see the S-bomb output directory and it is a nested directory structure of all of the S-bomb documents that we generated during the build. So there's three different formats like we said, Cyclone DX, SPDX and SIFT schema S-bombs and they are arranged by the Buildpacks that generated them. So each of the individual processes that contribute something to your final application image have S-bomb directories here. Now we can just take a quick look at another, a single S-bomb entry inside and have a look at what that contains. So this is the npm install S-bomb and it contains a bunch of JSON entries for the node modules that we installed and then we can take a slightly closer look at a single entry within that S-bomb. So here we have one for I think the remapping package, I can't quite see, but as we see it contains a whole bunch of different metadata for the node modules that we installed during the build and then in a couple minutes we'll take a little bit of a closer look and see all the different fields in a comparison of an S-bomb in another language. Great, thanks Sophie. So as we mentioned before, Piketo supports a lot of languages so we're going to show you a kind of similar example but with a completely different language ecosystem. Here we're going to be showing you a simple Python app that uses PIP for package management to build a flask server and serve the Piketo logo. So as before, you made this look easy Sophie, okay. Great, so as we can see our simple Python app, we're going to use the PAC CLI once again to run a build this time with the Python build pack and again notice we're using that S-bomb output directory that'll put the S-bombs on our local file system. We kick off the build and the build packs start doing their thing which in this case includes running the C-Python build pack that installs the Python run time and generating an S-bomb for that Python run time. Also generating an S-bomb for PIP which also needs to be present in the build container and then running PIP install and generating an S-bomb for the packages that were installed during that process as well. Then we wrap up the build and we've got a containerized app. So again we can run it just to confirm that it's actually doing what we said it would. There you go. Pick out our build packs. Great. So let's take a closer look at that S-bomb output directory. Once again you'll see that nested directory structure where each directory corresponds to one of the steps in the build process. So you've got a nice separation of concerns. And if we take a closer look at just one of those, the one for the PIP packages, we can see again several of the packages including Gunicorn and then taking a closer look at just one artifact from the SIF formatted S-bomb, we see the flash package with a bunch of relevant metadata. So let's take a closer look at that metadata. Cool. So here on the left you can see that sort of zoomed in view of just one dependency from the Python app and on the right that rematching package again from the React app. Off the bat we can see first of all that these documents look very similar. So if you're trying to manage S-bombs across your organization and you work with a bunch of different languages, we've already got potentially mergeable and comparable S-bombs happening here. We can also notice that as we mentioned before we have CPEs and Perl's two unique package identifiers that are quite relevant to our stakeholders for vulnerability identification. And we've also got licensed information for this React package. That's something that SIF supports right now. Not quite there yet for PIP packages but that's an extension point for SIFT. And lastly we've got a little bit extra that's not part of some of the specs which is the package manager or package ecosystem where the package came from and the file where the package is enumerated in the source code. So as these examples hopefully show we try to generate useful S-bombs for all the different languages that we support in the project. So at this point we've got S-bombs yay mission accomplished. But unfortunately even though it feels like we've done the bulk of the work at this point we still have two pretty important concerns left to address. So the first of these is where should the S-bombs be stored once they're generated. So a key stepping stone in S-bombs consumption and adoption is convenience of access making the wear very important. Creating quality S-bombs is great but it's not that helpful if they're really difficult to access. Going back to our earlier example of ingredients and nutrition facts it's pretty easy to just turn a can around and get the information that you need and much more difficult to get that information if it's separated and located somewhere else. So a key part figuring out where to store your S-bombs is coming into an alignment with where consumers expect to reasonably find their S-bombs documents. Persisting S-bombs documents directly alongside the image it was generated for is generally a pretty good practice ensuring that you can retrieve them easily with your corresponding artifact. So in the example we showed just now we use BuildPacks with the PAC CLI to output an OCI image on your local Docker daemon. We also dumped all of the BuildPack generated S-bombs documents directly onto your local file system. So at this point normally you would probably publish your OCI images to a registry and you'll need to do something similar with your S-bombs content. Ideally you'll store these together or if they're separate you'll have a way to associate them. This question of how to store your S-bombs is a little bit out of scope for the BuildPacks because we're in the business of giving people images but we still wanted to bring up the question of where to store them because it's an important part of the S-bombs adoption process. Great, so surely now we're done, right? We've generated S-bombs, we've come to an agreement with our stakeholders about what's in them and where they can find them. But now of course if you've been playing along you notice that there is one question left unanswered and this one is a new one nuanced but important one to consider and that is when should S-bombs be generated? In order to understand why we're even bothering to ask it's useful to motivate why this matters. In other words what could change depending on when you generate an S-bomb? So here to quote from Anchors 2022 white paper the software build materials and its role in cyber security. The use of S-bombs for containerized applications provides a unique opportunity to watch for S-bomb drift. Unexpected changes in the contents of a software application which can indicate potential tampering new versions or changes in dependencies. Generating an S-bomb creates a snapshot of the components of your container at a specific time during the development process. By generating an S-bomb for each build and at each step in the development process you can look for differences over time. Some of those differences might be expected but any changes should be investigated to determine if they introduce new risk. So with that being said it comes into view that when we ask when we're sort of talking about a bunch of whens ideally multiple snapshots throughout development. So when might be a good time to take a snapshot? Perhaps most obviously you can scan your OCI image to generate an S-bomb when it's pushed into your registry ready to run in prod let's say. That seems pretty reasonable you know it's asking what's in this right before you take the first byte. But for a hardened production image there might not be very much in the image that's worth providing an S-bomb for. As many of us probably know it's a best practice to include as few things as possible in your running app image in prod right? Fewer packages and extraneous things and the image means fewer attack vectors. You can achieve these minimal images with you know your well-written multi-stage Dockerfile you can also use Peketo build packs that'll handle it automatically but bear in mind that when you scan a minimal image to generate an S-bomb you may get back a minimal amount of information right? A component that could still impact the security or compliance posture of your application may have already fled the scene of the crime. So with that being said also probably reasonable to scan your source code right? Before you compile things down before you clean things up you can scan source code and identify packages from things like a package.json a requirements.txt but there's still a crucial step in the development process that is not covered the container build itself. To see why this matters let's return briefly to the food analogy. Suppose that I have a serious peanut allergy right? I'm definitely gonna care if there's peanuts in the food I'm eating right? The OCI image. I'm also probably gonna care if there are peanuts as the raw materials going into the thing I'm eating the source code but if my allergy is severe enough I'll also certainly care if the facility where my food was made processes peanuts right? A contaminant can be introduced in the preparation process itself. The same holds true when you're containerizing applications. A lot can happen in the time between taking source code and turning it into a final app image if you've ever written a docker file of any complexity you know this to be true right? A compiler or transpiler could be used in your docker file that's not checked in to your source code and not in your final app image. How will you know if that's exposing you to a vulnerability? Alternately, a bad actor can inject source code or dependencies into your build container contaminating source code that you thought was otherwise safe to use or fully accounted for. So now Sophie's gonna show us a concrete example using our React app from earlier that demonstrates how as bombs generated during a build can reveal vulnerabilities that you would otherwise miss by just scanning the OCI image at the end. Cool, yeah. So we're going to revisit that final app image that we generated in the first case study for our React application and here we're just going to run the CIF CLI on that final app image. So we're not using any of the as bombs generated in the build pack build here. So here we are running the CIF CLI on that final app image. We'll let it go. It's found a few packages and we see that it's output a document in our file system and we can take a look looks like a JSON file great. And we can use a clever little just jq command to figure out how many npm packages are in that as bomb and there are none. And we can repeat the same experiment just for Debian packages for example and we see that there are tons found and you know that kind of makes sense because during the build we transpiled our JavaScript code into static assets and then they no longer need access to node modules so the build packs clean them up. We have a minimal final application image and because of that there's no way an S-bomb generator could pick up any node modules if they're just simply not on the final application image. So we can run the same little experiment on one of the build packs generated S-bombs no surprises really here we can look at the npm install S-bomb again and run the same commands and we see that there are tons of npm packages node modules found in that S-bomb and when we look at the Debian packages we unsurprisingly see that there are none and that makes sense right are we have an S-bomb just dedicated to the npm packages that get installed during your build and there's a separation of concerns so that S-bomb only contains those packages. Just to take it all the way home here we can also use the S-bombs for some of the use cases that we mentioned earlier so here we just have a screenshot of using the gripe vulnerability scanner from Ancor and we're using that on the npm install S-bomb that we had so here you can see that we had found a bunch of npm packages and they had a bunch of CVs in them including some that are critical and high so if you had just scanned your final application image you might not have gotten that information so that's kind of why it's pretty important to be collecting this information as you go along and just to tie it back also to all of the S-bombs that we generate you can see here that we have S-bomb entries for node which is what we installed with the node engine build pack during the build as well as nginx which we installed with the nginx build pack and neither one of these things would potentially show up in the source code of your application so it's pretty important that the build packs generate an S-bomb for every single thing that gets installed during your build there's no way for absolutely anything to make it into your image without having an automatically generated S-bomb for it yeah so as that example just showed it's pretty key to generate S-bombs at each step in the development process including the non-trivial container build step as our demo showed Peketo build packs is one way to help achieve S-bomb coverage at that crucial step giving you coverage across every step in the development process so to summarize when you came into this talk you probably had some sense that S-bombs are important you chose to show up and over the course of the talk you've seen the key considerations that we at Peketo build packs bore in mind when we were figuring out how to generate useful S-bombs for our stakeholders at every stage of the development process including during container builds you can use these same five considerations as a guide when you're determining the appropriate S-bomb strategy for your product or organization or if the examples that we've shown here resonate for you you can use open-source Peketo build packs to help meet your S-bomb needs during the container build phase of the software supply chain so with that thank you for your time and we'll take any questions yeah Ryan can bring you how long did this take to implement SIFT and all that into the end I mean we went through a variety of iterations because perhaps unsurprisingly early on we were kind of like right at the bleeding edge of doing something about S-bombs so the non-linear path of arriving at a solution has taken like over a year I would say yeah like one of the first things that we did was try to like generate our own S-bombs like not relying on outside tool and quickly found that that was a great way to have a huge amount of debt in the area that we were not trying to be experts in right so it's been a long journey the steps in here this is awesome so I was wondering for actioning on this data you mentioned you know potentially combining all the S-bombs created throughout every step of the process have you guys like thought about you know a way to combine that data and make one like massive S-bomb that you can use or is that even practical have any thoughts on that great so the question was around merging the S-bomb documents that are coming out of the build nice do you want to take this one yeah it's definitely a valid question I think that would be something there's kind of a distinction between the build pack implementation and the platforms that we use like for example we used the PAC CLI here you could use you know K-PAC or something like that and I kind of think that the merging capabilities would be something that should occur more in the platform side but that's honestly a good consideration we could definitely see if there's a flag we could introduce to you know merge them together I know that Cyclone DX specifically is really interested in like mergeability and like composability so I know that there are open source tools that are like maintained by the CDX maintainers that are interested in merging and I think SIFT is like anchors also working on that as well yeah just to kind of emphasize Cyclone DX already allow you to merge S-bombs because we do something similar in our pipelines so you can just point Cyclone DX at a set of S-bombs and it'll create this massive one for you the question I've got is around have you started looking at in Toto for doing build provenance? build provenance? yeah so build packs clearly I'm not the most familiar on build packs but you're obviously going through a series of stages and have you looked at something like in Toto to produce a kind of provenance statement of all of those stages in a testable format is that something you've looked at? yeah so the question here is about provenance of the components that the build packs are installing and the build packs themselves yeah so right now we're not doing anything with in Toto at a station that like could be something I suppose we could look at depending on if it's going to provide value one thing to bear in mind is that some of the dependencies that Peketo build packs install like for example the Python runtime that we install we're actually compiling from source so that's like coming from a trusted vendor so in that case perhaps it would make sense for us to provide that at a station since we're controlling it is that kind of the thing you're talking about here? yeah cool awesome thank you everyone