 Hello, everyone. Welcome to our KubeCon EU talk on unraveling the magic behind BuildPacks. I'm Sambar Kothari. I'm an ML platform engineer at Bloomberg. I'm also a TOC member in the BuildPacks project. Joining me today is Nakli Arayano. She is the team lead for the implementation team, which maintains the lifecycle, which is the heart of the BuildPacks project. That's what makes the magic possible. So what's on our agenda for today? We're starting with the basics. We'll start with what container images actually are. Giving a brief overview of the OCI spec. We'll then talk about how container builds typically work. We'll take a look into two case studies, one with Docker build and one using basic shell scripts without any route to demonstrate that container images can be really created from first principles with basic utilities. And then Nakli will take over and talk about how BuildPacks and Lifecycle actually apply this in practice and make complex applications possible through a really simple API. So let's get started. What exactly are container images? Well, like everything else in the Unix world, they're just files. More specifically, structured files, along with just some configs described as JSON blobs. So to put that into perspective, throughout the talk, we'll be using this example application. It's a fairly simple Python web application with some requirements or text that describe the dependencies for the application. And I also have here with me a simple Docker file, which we'll be using for our first case study. We'll be using Docker file because I assume that's what most people would be familiar with. And it also helps us understand various concepts. So let's take a deeper look into this Docker file. It's fairly typical, starts off with a base image that contains the Python interpreter, sets the working directory for the rest of the commands to execute to slash app, explicitly copies out just the requirements.txt file, and then installs it using pip before copying the rest of the source code. If you're familiar with Docker file and how caching works, you'd know exactly why we're doing this. And then finally, we're setting the launch entry point so that Python actually spins up a Flux server and opens our web page. Cool. So if we try and build all of this, we'll end up with something that looks like this. You might be wondering, what? So this is actually the output, a cleaned up output from Docker Inspect. And so a large JSON block. But what we'll be focusing on are these two things right here. First up, the config blob, and next, the layers object. So what exactly is the config blob? It's a set of key value repairs that describe the runtime container environment. So you'll see things here, such as the user that's going to be used for the runtime, the base environment that needs to be set, the entry point for your container, and the working directory. Now, if you look closely, you can map the actual Docker file instructions to some of these keys in the config map. So typically, when you're using these instructions in Docker files, that's exactly what's happening. They're just getting mapped to various keys in the config blob. Similarly, if you take a look at the layers blob, you'll see a bunch of checksums. So what exactly are these layers? They're just compressed star balls containing the file system bundle that will form your container root FS. So all of these blobs you see here are content addressable. What that means is that they can be located and they're named according to their content. So they're named based on the checksum of their entire tar. If you were to pretty print this whole set of layers, we'll see that again, we can map the certain instructions that create layers from Docker file back to the layers that we ended up seeing in the layers blob. So the first five layers that you see on the top come from the from instruction. Then we have the work directory instruction, copy, run, and so on. So in order to put that into perspective, here's what's actually happening in the container image layer by layer if you use a tool like dive to inspect it. So a work directory instruction actually created an empty folder called app. Our copy requirements.txt file actually created a new layer with just the requirements.txt file. With something like a run instruction, something special happens. Docker will take a snapshot before and after, compare the changes in the file system, and any changes are then put on that specific layer. So this is also why you'll often see run instructions with lots of lines of shell commands combined together, because they all end up in one layer. Although not that flexible, the simple assumption has really worked out well for Docker files. So now that we know what layer and config objects are, let's take a deeper dive into how Docker actually uses them and how Docker build works. Before that, I want to talk about a more primitive Docker command that we might not use as often to build the container images, which is Docker commit. So what Docker commit allows us to do is to take a running container and create a container image out of the state of that container. And this is fairly powerful. For example, let's start with a simple busy box image. I'm going to make some file system changes to that container. Keep in mind, this is a running container. This is not an image. And then I'm going to use Docker commit to actually convert the state of the container into an image. And just to make sure that this was actually working and the image persisted my changes, I'm going to use Docker run again to add the demo file. And this is the core principle behind Docker build. So when you're seeing a Docker file like this and you're running it with Docker build, what Docker is actually doing behind the scenes in a very oversimplified manner is it's actually creating various container images, running the equivalent instruction inside that container, using Docker commit to capture the changes, and then create intermediate containers until container images until you get to the final output image that you want. So if you've run Docker build, you might have often seen these things like creating intermediate container for FDE or creating this image 1, 3, 3, so on. So when you're running this, what's actually happening is Docker is spinning up multiple containers, running these instructions, calling commit on it, and creating intermediate images that it then uses to run the next step. So that's how Docker build essentially works. But just to recap, container images are just a few file system layers as star balls and a few set of JSON files config files. And Docker build isn't necessarily the only way to construct these things. So in order to really drive the point home, we're going to create a simple busy box image using nothing but basic shell utilities with no Docker daemon, no root, nothing. Let's get started. OK. So I just have a handy assistant to help me type so that I don't make mistakes, but it's all running live. So we have a simple function that downloads busy box from the internet and creates a few sim links so that we can actually use these commands in the output container image we'll be creating. It's nothing special, just busy box from the internet and sim links to these commands. Then the first thing we need to do to create the container image is follow the OCI layout specification. And that requires us, when using it in folder mode, to create certain directories with certain files that are laid out in a content addressable fashion. So the first thing we need to do is create the directory that will contain our entire OCI image in the OCI image layout format. Then we'll actually create the layer. So in this case, I'll be tearing up this entire bin folder, which will be the only layer in my output image. And then just to make sure that everything's working correctly, I'm just checking what the tar file actually contains. And we've got this is going to be our final file system that we see on the container image. We don't have a lot of layers. We just have one layer with this entire set of files. And then, as I mentioned earlier, all these layers and everything in OCI world needs to be content addressable. So what I'm doing is calculating the checksums and these sizes for these layers so that we can actually name them accordingly. So just to make sure, here's our original checksum. And then what we'll be doing is making a Blobs directory where we'll be storing this output layer and renaming it so that it fits the appropriate content addressable structure that OCI expects. So now it's ready and it's named properly. So let's next check that the name actually matches the digest that we calculated and everything looks fine. Then we will be creating the config Blob, which will describe the runtime properties of the container. So let's do that. So again, the things that we need here are the operating system and the architecture, which we expect the final container image and the binaries within to work on. The base runtime environment. In this case, we're just setting the path variable to include the slash bin directory so that all the executables that we've just put in the image are available to us when we are trying out shell commands. And I've also set the working directory to the root directory. Finally, I've also included the root file system, the layers Blob that I referred to earlier. And I have addressed the layer checksum here to make sure that we are capturing this layer correctly in the config Blob. So right now, these are just uncompressed layer checksums that we've put here. Next up, again, the config Blob itself needs to be content addressable. So we are just running the same commands again to calculate the checksum and size and moving the config Blob to its appropriate location in the Blob directory. Next, there are a couple of other concepts in the OCI spec that are not directly exposed in Docker. The first one is the manifest, which actually captures the content addressable location of the config and compressed or uncompressed digest of the layers. In this case, I want to make things easier for myself and I'm using uncompressed layers. But typically, when you're pushing out to the registry, these would be compressed and the digest that you see here will be different than the ones you have in your config. And then I'm going to do the same thing with the manifest, make it content addressable. And one last important file that we have is the index. So typically, when you're pulling a bunch to Bionic or Focal on different architectures or different operating systems, Docker pulls the right version of that image. The way it knows to pull the right version is something known as the OCI index, which contains pointers to various different manifest for targeting different platforms and architectures. In this case, I'm just building an image for my laptop and I'm not doing a cross architecture build. So that's all I need to specify. And then finally, we need one extra piece of information to make it an appropriate OCI directory. And that's pretty much it. We should have a working image. I'm going to push it to a local registry using a nifty little tool called Crane. So it just allows you to take an image in the OCI layout format and push it to a registry. I have a registry running locally. And it looks like everything worked fine. And just to make sure that this is actually working, let's try and run it. So I'm going to Docker Run and pull the latest version of the image that I just pushed out and try to exec into a shell. And it pulled the right thing. Looks like everything is working. LS shows you have a bin directory with the appropriate things. If we check the environment, it said properly. If I check the user, it said to UID0 because we didn't specify something. And everything seems to be working fine. So this really goes to show that container images are nothing but just starballs and JSON files. And you can construct them without root or any other magical tool like a Docker daemon. And this is really powerful. And this is exactly what BuildPaks uses to create container images without root or without the need for a Docker daemon. So finally, we have Natalie who will be explaining the most important bits of this talk, which is how Cloud-native BuildPaks utilize these first principles to build container images over to you. Thank you. So we are now going to take a closer look at Cloud-native BuildPaks. Before we can talk about a Cloud-native BuildPaks build from beginning to end, we should probably cover what a BuildPak actually is. So very simply put, BuildPaks are software that know how to analyze source code and determine the best way to build it. There are actually a collection of two binaries, one called Detect and one called Build. The Detect binary, it serves to look at application source code and determine if that BuildPak is actually relevant for the application. So you might look for a particular file that indicates that the application is written in a particular language that could be what relates to the BuildPak. The second binary is called Build, and that is what actually does the work to turn source code into a runnable application. Multiple BuildPaks can work together, so you could have a collection of BuildPaks that each provide one piece of the total steps that would be necessary to create something that's runnable. And this allows great flexibility and interoperability between BuildPaks, which we'll see. So looking a little more carefully on what BuildPaks actually do, this is tying back to what we saw with the Dockerfile example. BuildPaks can pull in dependencies. They can run compilation if needed. They can define processes to run when the application is started. They can configure the environment, among other things. And they can also generate a software bill of materials or SBOM to describe any dependencies that they've added. So BuildPaks, when they're doing all of these things, they must follow the CNB specification, which is not loading. There is actually a nice graphic. OK, there it is. So the CNB specification indicates what BuildPaks are allowed to do. We've actually placed some pretty strict limitations on the parts of the file system that BuildPaks are able to write to. As Sam alluded to, BuildPaks are running entirely unprivileged. So that's a limitation. And this specification, while limiting, allows for very powerful capabilities, which we'll see. So we're just going to give a brief tour of a BuildPaks view of the world. This is what a BuildPak might see in the file system when the build is running. We're assuming that the detect phase has already happened. So the BuildPak has opted into the build and now is performing its piece of the puzzle. So we have three directories of interest. The first, most importantly, is the workspace, which contains the application source code. There is also a layers directory that is designated for that BuildPak. So this is some BuildPak ID that's running right now. And it has its own child of the layers directory where it can make changes. Finally, we have a platform directory where platforms can provide their own specific configuration. And BuildPaks might know to look for things there. So BuildPaks, let's say they're providing an application dependency. They would provide that in a subdirectory of their layers directory. So this is some layer, which contains some dependency. And they've also provided a configuration file, which is in Tommel format. Our project uses Tommel across the board to provide configuration at different points in the process. So this some layer dot Tommel is an instruction for the lifecycle about how the layer directory should be handled. And we'll get into more examples of different ways that the lifecycle can handle these layers. So what goes into a layer directory, it's sort of open-ended. BuildPaks can do whatever they want. But there are some conventions that BuildPaks can follow that will make things easier if they, for example, want to provide dependencies that will be available on the path or in the environment for subsequent BuildPaks. So we'll give an example later of how these directories are special. And there are other things that BuildPaks can do like set environment variables. They can also provide multiple layers. They're not limited to just one. They can provide build time configuration and runtime configuration. So we mentioned the process to start at runtime. A sort of unique feature of BuildPaks is that we allow multiple process types to be defined on an image. So it could be convenient to have a single image with two or more different processes to be started, for example, a web or a worker process in the same image. BuildPaks allow you to do that. And we've really just shown, in the interest of time, and making this a comprehensible talk, just sort of a subset of all of the things that BuildPaks are capable of doing. In particular, there's a lot of flexibility to customize the environment for the process at runtime, some more advanced capabilities that you don't actually have to know about if you don't want to, but just suffice to say that there's more things available. And last but not least, every BuildPak follows this pattern. So they are all able to make changes in their own layers directory and adding layers as children of that directory or make changes in the workspace directory, which contains the source code. And that's really it. That's kind of all BuildPaks are allowed to do, which, again, is limiting, but for a very important reason. So now that we've kind of given a BuildPaks view of the world, we want to step back a little bit and give a view of the world from a platform perspective. So as we mentioned, BuildPaks are executables. So they need an environment to run in. And the best way to provide that environment is via an OCI image that we call a builder. So it contains a base image, the build image, which has all of the OS-level dependencies that are required by the BuildPaks in that builder. It has one or more BuildPaks and a binary called the lifecycle, which, as Sam mentioned, is sort of the heart of the CNB project. So as a platform operator, you can decide which BuildPaks are supported by your platform by choosing which BuildPaks go into your builder. To perform a build, you would provide source code, then execute the lifecycle, and the output would be your application image. So just to sort of describe what's in the image, we have a base image, which you'll notice is the run image, not the same as the build image necessarily. In order to, for security reasons, we want to minimize the attack surface and provide a more minimal image containing only the dependencies needed at runtime in our final image. Then we have application dependencies, which dependency A, B, and C. These are layers within the image that directly correspond to those subdirectories of the BuildPak layers directory. And then finally, the app as another layer on the top. So just to look a little bit closely on what the lifecycle actually does at a high level, it is preparing the environment for BuildPaks and then running BuildPaks themselves. So the first phase, I didn't say, the lifecycle actually executes in a series of distinct phases that each have its own responsibility. So the first phase is analyzed, where we look in, let's say we're exporting to an OCI registry. We look in the registry and see if there's any image already there that we built already. And we do that so that we can, by knowing what's in the registry, avoid re-uploading things that didn't change. Then is the detect phase, where we run a series of BuildPak groups to determine which group is actually needed to build the application. And this can be useful if you're a platform that supports a wide variety of language families, you might not want to have a separate builder for each one. You can just put all of the BuildPaks that you support into one builder. And the BuildPaks, by analyzing the source code, will know which group is actually needed for the build. Then we have Restore, that's just copying dependencies that were cached in a previous build back into the build container so that they're available. Finally is the build phase, where each BuildPaks build executable is invoked to actually do the work that's necessary to make a runnable application. And finally the export phase, where we are creating OCI layers by making tar balls from those directories that we talked about, calculating their check sums and creating configs, just like Sam showed. So we're taking advantage of these first principles to actually make an image without needing to use Docker. So we just showed five phases, it's a lot for an introduction. But just to show that as an end user, you don't have to think about all of that. You don't have to think about the lifecycle. You don't have to think about the different phases. There are many ways to perform a BuildPaks build. But one way that we provide as a project is a CLI tool, really intended for local development that you can use to build applications and have them be exported either into a Docker daemon or into an OCI registry. So here you can see that we provide my app image, which is the desired tag of the image. We give a pass to application source code, and we provide the builder that we want to use. But this can be made even simpler because we can infer that the application directory is the current working directory. You can set your default builder. As a project, we provide the specification and tooling to run BuildPaks builds, but we don't actually provide BuildPaks. So there's an ecosystem of BuildPak providers, Google, Heroku, Salesforce, and the Peketo project being among others in providing production-ready BuildPaks that you can choose from. You can also write your own BuildPaks. That's the whole point of having a specification is that you can know exactly how to create your own BuildPak executables to follow this logic that we're showing. So let's show a demo. This, because we have bad internet, is a canned demo. But it's just to show that simple invocation of the pack command. Here, we're using the Peketo builder, but again, you can use any. You can see from the output, oops, I scrolled up too far. So you can see from the output those lifecycle phases that I mentioned. Here, we have analysis. We have detection. In this case, five BuildPaks opted into the build. Nothing to restore. Now we have the build step, where each BuildPak is doing its thing. Here's export, where we're creating layers from those directories and adding them to the image. And finally, we have the image being saved. So let me go back. So this is just going to show in a little more detail what actually happened with that build. We're focusing on the BuildPaks that actually provided the Python dependencies. So here, we have three BuildPaks. They're each doing a separate thing. The C Python BuildPak installs C Python. The PIP BuildPak installs PIP. Then the PIP install BuildPak actually runs PIP to populate the PIP cache and install packages. And the reason it's helpful to kind of break it up into these steps is that, again, thinking about having a minimal runtime image, we only want to include the things that are necessary for the application to run. So in this case, we need C Python. We need the installed packages in the final image. But we don't want our package manager there. So the BuildPaks have designated these layers as being exportable. However, C Python and PIP are necessary for PIP install to run. So we can designate these layers as being accessible to subsequent BuildPaks through the environment. And we'll show how that works in just a second. Finally, all of the layers could be useful on a subsequent build to make the next build faster. So the BuildPaks designate them as cacheable. Then to show it in a little bit more detail, just this is the view of the file system before any BuildPaks have executed. So we have the workspace directory containing our app.py and requirements.text. We have the layers directory, which is empty because no BuildPaks have run. And then we have the environment with the path variable that's inherited from the build time base image. So after the C Python BuildPak has run, we have a C Python layer, a C Python.toml. This Toml file inside will be instructing the lifecycle. This is a cacheable layer. This is an exportable layer. And this layer should be exposed to subsequent BuildPaks. This is getting back to those special subdirectories that I mentioned. So anything in the lib directory is going to be added to the LD library path and library path variables, similarly with include and bin. This is all automatically by the lifecycle. So the BuildPak doesn't have to do anything special. Just put stuff in there. The BuildPak can also configure the environment. In this case, the contents of the Python path.override file would be the path to the C Python layer directory. Similarly with pip, we have a pip layer. We update the environment because this, again, is exposed to subsequent BuildPaks with those special directories. Then we have the pip install BuildPak. After it runs, it created a layer for the pip cache, another layer for the install packages. So just to recap, we have three BuildPaks that created four layers. Two layers are exported in the final image. Two layers are made available to subsequent BuildPaks through the environment. And all four layers are cached. At this point, we didn't have to do any snapshotting. If we go back to the Dockerfile example, creating four layers with Dockerfile might involve creating four separate run instructions, which should be four separate containers and four separate instances of running a snapshot to determine the file system diff, which we didn't have to do in this case because of the BuildPak specification, the lifecycle knowing exactly where the BuildPaks are writing their changes. We alluded to the software build material, so this is just showing how BuildPaks, because they know exactly what they installed, can provide standardized structured S-bomb files to describe what was added. So here we have describing the three layers, Cyclone DX, SPDX, and sift.json files, describing the dependencies that were added. And this is a typical of an image config that you would see in a BuildPaks produced image. You notice that it's very, very similar to what Sam showed earlier. So just to reiterate, images that are built with BuildPaks are really their OCI images. They can be run with any tool that you could use to run an image that you built with Docker. So they can be run with Docker. They can be run with Cates. So this is just to recap. What did we see? BuildPaks, they're just software. They analyze source code and determine the best way to build it. They follow a specification that really limits what they're able to do. Because of that, there is a logical mapping of the dependencies that they install to the layers in the application image that is produced that can be really useful. The lifecycle is a binary that orchestrates the build. It sets everything up for BuildPaks. It invokes their executables at the right time and creates the final application image using the OCI concepts that Sam illustrated. And finally, just to linger on the limitations of the BuildPak API, because we know that BuildPaks only make changes in very specific locations, there is a strict separation of the base image and BuildPak provided layers. This allows us to do something very powerful, which we've demonstrated in other talks, but not this one. So just to show you what we mean, we can imagine if we have an application image, there's a security vulnerability, but that vulnerability is just in the runtime base image. Because we know there's no intermingling of the run image and the BuildPak provided layers, we can simply swap out the vulnerable base image with a newly patched run image. And this is something that can be done just by manipulating a pointer in the registry. It doesn't require a rebuild. It doesn't require new running containers. And it's just that metadata manipulation. And because of that, that can allow you to update many applications very quickly and easily on the order of thousands of applications in just a few minutes. So there are advantages. We think there are many advantages to doing things this way. First and foremost, it concentrates the understanding that it is required to build a secure, efficient, optimized container image just to BuildPak authors and not to application developers. As application developers, you just run pack build my image or the equivalent. And you don't have to think about all of these things, like minimizing the number of layers or logically grouping dependencies. This means that dependency updates are also very easy and straightforward. So if you have a vulnerable dependency, you can just bump the version of the BuildPak that you're using in your pipeline and rerun things. You will get an image that you know is fully up to date. As we mentioned, the layers within the image are logically mapped to the dependencies that they provide, which is useful just to understand what's in the image as well as to, when doing a rebuild, only replace the bits that actually need to change, making things faster. You can get more accurate S-bombs because BuildPaks, again, they know exactly what they put in the image. So you can get an S-bomb that tells you more than something that just came from a scouter. And finally, rebase allows for fast patching of OS-level vulnerabilities at scale. There are limitations to this model by design. So we mentioned that BuildPaks can't write changes just anywhere. They have to write in that workspace and the layers directory. And this is, of course, limiting on some of the things that you can do. Again, by design, they run completely unprivileged. But again, you can't do something like install a system package with a BuildPak. However, our project is currently looking at ways that we can allow for these kinds of extensions in a safe way. So where you would use a BuildPak-like mechanism to extend your build time or your runtime-based image before BuildPaks do their thing, giving a more complete way of doing certain things without having to completely remake your base images from scratch. So that is our talk. We love to hear from BuildPaks users. Or if you're just thinking about BuildPaks or if you have questions about BuildPaks, please reach out. We're available everywhere. There were icons that will show up in a second. That's all. Thank you.