 Hello and welcome to my talk. Back to the drawing board, building containers with S-bombs. My name is Nisha Kumar and I am a senior open source engineer at VMware. So what is an S-bomb anyway? An S-bomb is a software bill of materials. It is a list of components that took part in the creation of your software distribution. It's similar to the list of ingredients you would find at the back of a product. My Coke and Spiritor and S-bomb, Alan Friedman, often uses the Twinkies example. You wouldn't know that Twinkies were not vegetarian until you looked at the ingredients. I prefer the example of the coconut milk conditioner. After looking at the ingredients, you realize that when the supplier said coconut milk, they meant coconut and milk. See the hydrolyzed milk protein there? The concept is the same for software. Each ingredient is a software component. Now what a component means is still the subject of debate. Many folks think it's just the runtime requirements for your application. While others, me included, think it should be all of the inputs to a build system. For this talk, we will focus on packages installed with a package manager. S-bombs are an easy artifact to miss because you don't need it until you need it. When you are troubleshooting a build, debugging an error at runtime, or figuring out whether your deployments contain any vulnerable packages, S-bombs can help. Knowing what you build and what you ship helps you make better decisions about your software supply chain. You may be asking why create S-bombs during container builds when you can do it after a container build using the various solutions out there? To answer that, we need to look back at how we got here. In the beginning, there were basic container build operations. We call commands like docker from, docker commit, and docker run to create a container step-by-step. You can put those commands in a shell script and voila, you've automated your container builds. Except we got very creative with shell scripts doing special configurations, installing things from the internet via Wget or curl, copying things from our host system, copying them out of the container into the host system. And as a result, the scripts got very complicated. To address this, docker created a concept called a docker file. It's a kind of list of instructions that get translated into docker commands. You provide this docker file to the command docker build and you have now automated your container builds easily. Now, containers were also getting very large. There are two reasons for this. The first one is that folks create containers like they would provision a VM or install things on their desktop. Think of what you do to install a particular application on a desktop. You either use a package manager or you straight up download a package from the internet. You make configuration changes. You add or delete files and whatever you can to make the application run properly. The other reason is that container builds create this file systems as a result of using a file system snapshot. Now using the file system snapshot is actually really cool as it allows containers to be reused. So a container that just has a minimal root file system can be used by a Python developer and a code developer. The downside is that rather than store one tar ball, the container stores a series of tar balls. Folks started to get around this issue by copying just the runtime artifacts out of the container used to build them and into a container that would just contain the runtime dependencies of the application. So in response to these behaviors, Docker introduced the multi-stage Docker file. A multi-stage Docker file automates moving artifacts from container to container until they finally land in the final deployed container with just the runtime dependencies. One of the byproducts of automating this operation was to delete the intermediate container images. All you had left at the end of it was the final shippable container and this is what we have today. So further innovation to reduce the size of the deployment container were introduced. You may have heard of Docker Slim. It's a tool that removes all of the files in the container that the app does not use at runtime. So now we have a container build process where we build a container like a desktop. Then we use multi-stage builds to take out just the runtime artifacts and put it in a container and an OS with just the runtime dependencies. And then we further run this through a slimming tool to extract just the files required by the app at runtime. These methods are great to reduce the size of the container and the attack surface of a running container. But what do we lose in the process? We've created an invisible software supply chain. We've made the process so efficient that we neglected to keep track of what exactly gets done during the container build process. What did we install exactly? Did we pick up a vulnerable transitive dependency? Can we recreate our build process? Did we check to see if we got our inputs from trusted sources? So in this talk, we focus specifically on knowing what you built and that means creating S-bombs for container images. To know what's in the container, our current solution is to use file scanners at every stage in the CI-CD pipeline. We scan the starting container image. We scan the final container image before we ship it. If there are any intermediate containers, we scan those too. We check the results at each step and we escalate when the scanner detects an issue. And this works fine. Some file scanners will even generate an S-bomb in one of the recommended NTIA formats required by the US government. But let's examine what we get from our file scanners. To do this, let's take a look at how file scanners work. In general, it looks for patterns and makes inferences about what is in a file based on some rules and heuristics. Typically, a file scanner reads a file, looks up a public database for known patterns, checks to see if those patterns exist in the file and then generates a report of what it finds. The file scanners you pay for may use a proprietary database, may have a front-facing API and a client that will query that API and make some fancy reporting charts so the decision makers can feel good about their decisions. I have a saying, file scanning is only as good as the metadata. File scanners look for certain files like package manifest, shared docs, lock files, license files, etc. They also look for any information within a file such as the copyright text, an SBDX license identifier, or it can look at metadata that exists on the file system such as commit messages if it is a git repository. When this information is not available or cannot be assessed, the file scanners don't report anything. Depending on the UX, this could come across as either the file scanner saying, success, we didn't find anything bad. Or they can say, success, but there was incomplete data so we don't know what to tell you. Now, they're both problematic indicators because the success indicator gives you a false sense of security and the incomplete data gives you some excuse to not do anything about the results of the scanning. Now, we have seen all of these innovations used to reduce the size of the container image but the consequence of this is the loss of metadata that existing scanners cannot pick up. So full disclosure, Ico maintained an open source project called TURN. TURN inspects container images for software components that may be installed in it. Here are some vanity statistics you may be interested in but I can assure you TURN is a legit open source project used by companies like Siemens, Philips, Microsoft and others. Now, TURN is not a file scanner because it doesn't analyze file contents at least not in its default operation. TURN does have something called extensions where it will use a file scanner given to it and point it to an extracted container image. But in the default operation, TURN uses the package manager available on the containers file system or it's certain package manifest that exists on the containers file system and it uses those to inventory packages that may be installed. Now, because it uses the package manager and package art manifest it also suffers from the same metadata problem that file scanners suffer from. If there's no package manager then there's no information. If there's no package manager and manifest there is no information. In general, if there are no clues as to how the files got into the container file system in the first place TURN will fail. So how do we solve this problem of lack of metadata in order for us to figure out what's happening in our container bills? How many of you jump straight to thinking about how EBBF can solve this problem? Well, you don't have to get that fancy. The thing is we don't need to do things like inspect the Docker build logs or watch the kernel or do nmap scans. Container builders already know some portion of what they are installing. So if container builders can create S-bombs for the pieces they are installing and reuse S-bombs that other suppliers have created either container builders or package creators or OS suppliers then they can include them in the container distribution ecosystem. What that means is you can shrink the container image all you want and you will still maintain the metadata about the containers used during the build process. So that's pretty cool, huh? How do we accomplish this? We have to deconstruct the container build process. Right now it is not instrumented and so you can't really stick inventory into the way that container builds happen right now with Docker files. So here I'm going to show you how we can integrate transparency into the container supply chain. We will use builder for building the container, turn for generating the S-bombs, ORAS, CLI for pushing the S-bombs to a registry and six store cosine for signing all the things we created. All right, with that let's get to the demo. So here I have a vagrant box that I have already provisioned. The provisioning also makes use of the bootstrap to create a Debian file system. Let's take a look. That looks like a Debian file system and that gets tarred up. And by the way, all this code is available publicly. I will share the link later. So here we have the Debian tar ball and that is just a tar of this Debian directory. Okay, and now it's time for us to spin up a local registry that we can use to push our artifacts. We'll use Podman for this. What that is is simply pulling a registry image and spinning it up. All right, so let's do that and we're done. Let's check it and it's running. All right, let's build our first container. So what we're going to do is use builder to create a container from scratch. Scratch basically means nothing. We mount that container. Here I am using builder unshare, which will unshare the mount namespace. That means that you can mount without root privileges. And then we add that Debian tar ball to the container that will make our root of s. We commit that container as an image. And then we run turn to generate an SBOM. We're calling Debian SBOM in the SPDX JSON format. Then we use builder to push it to our local registry. And we use ORAS to push the SBOM to the local registry. And we do some cleanup at the end. All right, let's give it a whirl. So we've copied the tar ball in. This commits the image. Turn generates the SBOM. And we pushed all our documents to the registry. So let's take a look at this Debian SBOM, which is the SBOM that turn generated. Okay, that's a lot of data. Let's use jq to filter it. I am looking for packages. And I'm looking for package name. And I'll give it Debian SBOM. Okay, so those are all the packages that came with that Debian tar ball. Let's sign those images. So let's look at our images. Let's sign our Debian image. We'll use cosine for this. Cosine. So I generated the keys separately. And I will sign that image with this tag. All right, so we now have a signature. Can we sign our SBOMs? We sure can. There we go. So we've signed our initial image and we've signed our SBOM. Okay, cool. Now, let's see how we can build the container on top of this Debian container. So for that, we'll look at the script. Now the script uses builder from this Debian container that we built. We mount the container as usual. We're going to install Python 3 just as a demo. We'll commit that container image. And now we'll download our Debian SBOM image. Debian SBOM and then we will provide it to turn as a context and generate an SBOM for the Python part of the image. We will push the image to the local registry. And then we'll push the SBOMs for the Python image that would include the Debian SBOM and the newly generated Python SBOM. And we do some cleanup. Before we run that script, let's remove that Debian SBOM file. And see what happens. Okay. Let's see what happens now. So we pulled it. We're installing Python 3. That's going to take a little while. We've committed an image. Now turn is inventing the image and we're done. All right. So as you can see, we've downloaded the Debian SBOM image, the Debian SBOM from the registry. That's what we call an OCI artifact. And we now have a Python SBOM. Let's take a look at that Python SBOM. So we'll do JQ. And again, we look for packages. And we'll look for the name. Forgot to give it the file. There we go. So as you can see, we installed Python 3, but we got a whole bunch of transitive dependencies along with it. So turn is able to fill in the gaps of what probably got installed along with what the user thought that they were going to be installing. And we can sign those images too. We can sign the Debian image and the Python image, but we're not going to do that here. So what's nice about this method is that if you were going to use a user generated image, you can. So for example, this file is a file that I made by hand describing the Golang distribution. And if you do not have a tool that can generate an SBOM, you can still use an SBOM that is generated by a third party and use ORAS to push that along with the SBOMs that turn generates to the OCI registry. Okay. That's the end of the demo. Let's go back to our presentation. Yay, that worked. Okay. This is already itchy to me because there are some gaps in this workflow, even though it's a start. You may have noticed in the demo that we created a lot of repositories and tags. I had trouble keeping them all straight in my head for sure. Now, the reason why we had to create so many repositories and tags is because container registries are not made to support supplemental artifacts that need to reference a container image. So that means there isn't anything in the OCI distribution spec that says we can organize container images and artifacts related to container images in a specific hierarchy so we can find them. At this time, they need to be managed outside the OCI ecosystem. And this is where a new proposal called OCI artifacts reference types comes into play. With OCI artifacts, you can create references from the S-bombs to the container image, such that the registry can serve that information to clients. You can also move the containers from one registry to another without keeping track of all the associated repos and tags. And that's pretty cool. So this image shows what happens when you use references when creating container image artifacts such as signatures and S-bombs and what happens when you create a derived image and how the artifacts get transferred. So as you can see in the first image, you have the image manifest an attack pointing to that image and you have an artifact manifest that contains an S-bomb and a reference to that image. And here you have a signature artifact that has a reference also connected to that image. What happens when you build a new image? Well, you can create a new hash that points to a new tag for that image. You can also update the artifact manifest with additional S-bombs and additional signatures and create references to the new image. And what that means is that if you are moving this image from the local registry to some other public registry or an internal registry, the supplemental artifacts like in this case the S-bomb artifact and the signature artifact move along with the image. So the kind of UX you should be expecting is that rather than in the demo where we created a new repository called Debian S-bomb we create a reference to the container image Debian 10 within the new S-bomb artifact and that is how we are going to reference it from now on. Okay, so where can you get involved? You can get involved at OCI artifact references and there is a working group right now in the open container initiative that deals with references of OCI, it deals with the OCI artifact structure and references within it and how those can be implemented and this proposal requires updates to the distribution spec and the image spec and it could really use some help with people providing feedback. Also, you can get involved with TURN. TURN has community meetings and those community meetings happen every other week so check it out, get involved. We are looking for feedback, contributions and any other help you can provide. Also, get involved with the update framework. The tough community is trying to deal with integrity around software build and release and the container ecosystem could definitely use some help over there and then check out implementers of the tough specification and that's the six store folks which has the repository which has the open source project co-sign that I use to sign everything, the container image and the S-bombs. So check those out. This demo is at this link so it's very simple github.com slash VMware samples slash containers with S-bombs and you can check it out. I don't expect to have any contributions to it but definitely if you find an idea or some issue with it, file an issue. Ask me anything you want on Twitter. I'm available at NishaKMR. That's not NishaKM. That's my github username. I work on these Slack channels, CNCF, OCI and the six store channel. I'm known, my handle in all of those places is that it's Nisha. I also hang out on IRC on the turn channel on IRC.libre.chat and with that, I'd like to say thank you. Make sure to tweet about the hashtag US Summit and use my Twitter handle that's NishaKMR. I appreciate you listening to me and thanks for watching.