 Hello everyone, my name is Gotti Delorme and I'm a software engineer at Apple. Today we are going to talk about how to run cash efficient builds in Kubernetes using Billkit. So first we'll discuss our goals. Then we will go a bit deeper into Billkit itself and what kind of features it offers. And we'll finish by discussing how to schedule Billkit jobs efficiently in Kubernetes. So our main goal here is to duplicate build system bare metal infrastructure. Many organizations already did the transition from bare metal to Kubernetes for application deployment, but a lot of them are still relying on bare metal for build systems, CI systems or any other kind of automation systems. As part of this move, we want to make sure we keep running secure fast and cash efficient builds. By fast, I mean at least as fast as on bare metal. We are also looking to provide a way for developers to run CI builds locally, like in the same environment. And we also want to maintain like a single way to run builds, a single build system that's not specific to stack, like not specific to Gradle, Go or Restful. It should work with all of them. So Billkit is a builder toolkit created and maintained by the Mobi community. It's also part of Docker since 2018, but it's disabled by default. Compared to Docker, it supports way more features like concurrent multi-stage builds, persistent caches, build time secrets, outputs others and Docker images, and frontend others and Docker file, and also rootless and Daemonless support, which is super convenient if you want to run Billkit in a non-privileged container. So if you want to try Billkit, you can use it directly from Docker. To enable Billkit, you just have to set the Docker Billkit environment variable to 1, or you can also rely on the BillDix command. Or if you don't want to install Docker or you don't want any Docker dependency, you can use BillCTL directly to run builds using Billkit. So let's talk about multi-stage builds. On the left side of this slide, you can see a very simple Docker file containing three different stages. You can also notice the last stage depends on the first one and the second one, but the second one doesn't depend on the first one. When you run a classic Docker build, it will execute each step sequentially, as you can see on the right side. So first stage one, then stage two, and then stage three. When you enable Billkit and you look at the Billkit log, you have the build log, you can see it runs stages in a different order. In the log, we see step number four, then step number seven, then step number five. So what's actually happening here is Billkit will build a graph representation of your Docker file, and will make sure to run concurrently stages that don't depend on each other. So in that example, since the second stage doesn't depend on the first one, we can run both of them in parallel, and then we can finish by the stage two. It also means if you make a code change in the first stage, you will not have to rebuild the second one because it doesn't depend on it. While with classic Docker, you will have to rebuild stage one if you change stage zero because it executes the build sequentially. So Billkit offers a better layer caching than Docker. Another cool feature of Billkit is cache mounts. So cache mounts are super useful when you want to persist caches from compilers or package managers. For example, like when you add or delete or upgrade the dependency, you don't want to have all of them to be read and loaded because the cache layer has been invalidated. In this example, I'm showing how to use cache mount with bundler. So you can see like the first run took the same amount of time with the cache and without the cache. But then when we add, upgrade or remove the dependency, you can see it's way faster if you rely on a cache because it doesn't have to read and load all the dependency. It just has to download the new one or the ones that change. Another cool feature is built-in secret. So with Billkit, you can share secrets with build without linking them into the produced Docker image or its metadata. You can share secret from environment variables or from files and you can expose secret to just specific instruction in your Docker file. So in that example, I'm mounting like a secret client key to be used in a code command as part of my build. Billkit also supports other type of mounts like bind mounts, for example. So bind mounts are useful when you need to mount some directories from your build context, different stage in the build or also different Docker image. And for example, like you want to mount some data or some binary and but you don't want to persist it into your produce image. You can also use SSH mount. It's pretty useful if you need to run some SSH command within your build. For example, like git clone command using SSH URL. And you also have tempfs mount. Tempfs are useful when you need to mount a tempfs directory into your build because you don't care of some data or you don't want to pollute your resulting image with some logs or something like that. Billkit also supports output others and Docker images. So you can produce Docker images with Billkit obviously, but you can also use what they call local output. So local outputs are a way to report a specific set of files from your build environment into your host where you run the build. So that's the output we will use to produce any kind of artifacts like binaries for example. You can also produce tarbols with Billkit. So it supports three different types of tarbols. The first one is tar. So tar is similar to the local output but archived. And then you have the Docker and the OCI type to produce Docker images, tarbols or OCI images, tarbols. So let's see how we can build a simple binary using Billkit and a Docker build command. So on the left side of the slide, you can see a simple Docker file that runs a go build command and then copy the binary into a scratch image. So because we use a scratch image, we know like the image is empty except for the binary we just copied over. So when we run the Docker build command and we specify the output local type, we can see at the end of the log, instead of producing a Docker image, it actually copies some files from the build environment to the current directory. And when we try to run the produce binary we just copied over, we can see it works as expected and we get the hello world. So that's one way to produce artifact other than Docker images using Billkit. Billkit also supports exporting layer caches. It's pretty useful when you have different Billkit instances running on different hosts and you want to share caches across them. Just note it's only export layer caches. It doesn't export mount caches. And also, depending on the size of your cache and your Docker registry, it might be slow to export and import caches. So you will have to decide by yourself if it has a good or a bad impact on your build. So you can use custom front end with Billkit. So front end is a component that converts a build definition into LLB. LLB is a low level builder. It's the graph language used by Billkit. Of course, if you build a Docker file, you don't have to provide a custom front end because Billkit has built-in support for it. So how it worked is basically you have your build definition. So for example, a Docker file that consumed by a custom front end. That custom front end produces some LLB and this LLB is consumed by the Billkit daemon that will run your build and produce the output of the build. So front ends are usually defined as Docker images and specified at the top of your build definition using a specific syntax containing the Docker image name. So in this example, because it's KubeCon and I'm sure we all love to write the ML here. I'm just giving like an example to see how we could replace like a classic Docker file with some ML syntax. So at the top of the build definition, you can see I provide a custom front end. So Billkit will know how to convert this ML into LLB and will be able to use it to build. Because of this, I can actually build this ML directly using a Docker build command if I have Billkit enabled. So it's pretty powerful. So front ends are typically created using the official Billkit Go library. But the Rust community created a crate to be able to write front ends using Rust. Billkit also supports running as non-root users and in non-privileged containers, which is what we are going to use in Kubernetes. Just note you have to disable app armor and sync comp. So for Kubernetes, it might not be a big deal because usually it's disabled by default. But depending of your cluster configuration, you might have to disable it explicitly. So you can do that by adding an annotation and a new field to your security context in your pod specification. And you can also run Billkit as Daemonless. The trick here is to start BillkitD and BillCTL within the same containers. So for example, you start the Daemon first, then you wait for the Daemon to be ready, and then you can start your build using a BillCTL command. So now that we know more about Billkit, you probably wonder how can I run Billkit jobs in Kubernetes. Should I use like a deployment, a state full set, or maybe like a job? Which one is more secure? How do I leverage caches? And all of those questions are totally valid. So to answer them, what we are going to do is for each of the type, we are going to list the point counts, and then we will decide which one works the best for us. Let's start with deployments. So with deployments, you need to manage the Daemon lifecycle separately because you will need, before skidding your build, you will need to have the Daemon alive in your Kubernetes cluster. So you will most likely have a BillCTD deployment containing multiple replicas, so multiple pods. Because of how deployment work, it's very hard to know on which pod your build will land, so you will be unlikely to hit a local cache. Also, because like the deployment Daemon is decoupled from the actual build submission, you cannot really specify how much resources the specific build requires. So you will most likely have like over, you will most likely like over request, or maybe you won't request enough resources ahead of time, so it will lead to poor resources utilization in your cluster. So with stateful sets, we are in a better place for caches because pod names are consistent in stateful sets, so we can rely on something like consistent hashing to hit local caches reliably. So consistent caching is a way to make sure like same builds will always land on the same pod within your stateful sets. But it still suffers from the same limitation as deployment about Daemon lifecycle, you still need to manage your stateful set separately from your actual build. And you also cannot like provide how much resources your actual build needs, so it means you will have like the same resources utilization limitation as with deployment. But with job, it's actually better because we don't need to manage Daemon lifecycle separately. We will start the Daemon within the same pod, so the same containers as the build. So because of this we can also specify exactly how much resources the build needs. And there are ways also to leverage local caches with job, and it requires a bit of build kit knowledge, so we talk more about it in the next slide. As a quick recap, here is a table listing like all the supported features per resource. You can see job is probably the best fit for us. So we are now going to discuss how to leverage caches with jobs. So the first thing we want to do is find a way to identify similar builds, build from the same project, because that's the builds that should leverage the same caches. So what we are looking for here is basically our cache key. So usually the cache key is like a project name or like a git repository, maybe like the git repository and the context path, something like that, basically any key that makes sense to you. Once you have the key, you need to know how to mount the corresponding cache into the job, into your build. If you use rootless build kit and you should use rootless build kit, you will have to mount the cache at that path, so slash home slash user dot local share build kit. Also, you will probably noticed you will have better performances with hospice volumes, rather than distributed or volume or remote volume storage. I also want to use local volumes, but because I don't support dynamic provisioning, depending on your setup, it might not work well for you. So in this example, I'm showing how to mount a hospice volume at the build kit rootless path using the specific key we decided in the slide before that one. Then we want to know how to make sure the build will be scheduled on the node where the cache is. And to do that, we are looking to rely on the default Kubernetes scheduler and we will use labels to identify pods and matching nodes. So we will use preferred node affinity to always attempt to schedule a build where a cache is available. But we also don't want to block a build to be scheduled if the node containing the cache is, for example, full of other jobs and so doesn't have enough resources to handle the new build. So in that case, the build will be scheduled to a different node. Also, we want to make sure we prevent cache collision and to do that we will rely on required pod entire affinity to make sure two pods relying on the same cache. So with the same cache key are never scheduled on the same node at the same time. So here is a slide to show you how to define your node affinity and pod entire affinity. You can use that slide as a reference later if you want to try it out. Okay, so we just saw how to build our job specification to make sure we can leverage the Kubernetes scheduler to schedule our builds and mount the corresponding caches. But it's pretty manual and we probably don't want to have like complex job definition for all of our builds. So we are looking for a way to automate all this logic for us so we can focus on like the actual job specification. So do that one solution might be to rely on the Kubernetes operator pattern. So if you are not familiar with the Kubernetes operator pattern, it's a pattern composed of two components, CRD, custom resource definition, which is a way for you to register your own custom resource into a Kubernetes cluster. And then a custom controller and the role of this custom controller will be to watch and update your custom resources and make sure all the Kubernetes resources required by your custom resource are always created and up to date. And all their status changes are reflected into your custom resources into your custom resource status. In our case, our custom resource might look like this. Basically, it's very similar to a Kubernetes job, except it comes with built-in support for caches. So in that example, I created like a cached job resource, and its spec is like a Kubernetes job spec, except it has a caches key. That key is a list of caches to mount to the job, and each cache is identified by a key and has a mount path where to mount the cache. So let's see how it works in practice. So we have like a custom resource entering the cluster. The custom controller will create the corresponding Kubernetes job will add the proper label keys. And then like the pod will be created, and we will delegate the scheduling to the Kubernetes scheduler. Because none of the node has any caches yet, the Kubernetes scheduler will schedule the pod to one of the node. So in that case, the first node. Now we have a second custom resource entering the cluster, still with the same cache key. The controller will create the corresponding job and pod, and we'll make sure they have the proper labels. Because we have a job running on the first node with the same cache key, and we rely on pod entire affinity. We know the Kubernetes scheduler will make sure that new job is not scheduled on the first node because we will have a cache conflict. So it will schedule the new job on the second one. Then we have a third custom resource entering the cluster. The controller will create the corresponding job at the required levels. We can also notice the first job running on the first node just finished. And the first node was labeled by the controller with the cache it has on it. So the cache built by the previous job. Because the new custom resource requires the same cache as the previous job, it will have the same label. And because we use node preferred affinity, we know the Kubernetes scheduler will schedule the pod on the first node so the new pod can leverage the cache built by the previous job. And now we have again a new custom resource entering the cluster, but this time it has a different cache key. The controller will create the corresponding job at the expected levels and delegate the scheduling to the Kubernetes scheduler. Because none of the node has any cache matching this cache key, it doesn't really matter where this pod is being scheduled. So at that point we have two build, so two jobs with the same cache key running on the first node and the second node, and one job with a different cache key running on the second node. When all of them are finished, the controller will make sure the nodes are correctly level. And you can see the second node has two labels because it has two caches on his disk. So you probably wonder how to clean up caches at that point. So one solution might be to rely on diamond sets to have like one pod schedule on each node. And the role of that pod would be to watch the age of the cache to clean up the odd one or maybe watch the available disk space to make sure like odd caches are cleaned up to make space for the new caches. So what did we achieve here? So we saw how to run secure, rootless, diamond-less, fast, cache-efficient builds by relying on build kits, mount caches, and layer caches. We also saw how to run like the same build kit builds in Kubernetes and locally using Docker build or build CTL. And we also saw how to use build kit to produce Docker images, but also any other kind of artifact like a GoBinary, for example. So it seems like we have a pretty good way to deprecate build systems by a metal infrastructure and move them to Kubernetes. That's pretty cool. And Apple is looking for Kubernetes engineers, so please don't hesitate to reach out and go to jobs.apple.com to see all our open positions. Thanks everyone for your attention and see you later.