 So hey everyone, thank you so much for coming. I'm David, and I'm here to talk about containers. I'm not alone in that, we're at ContainerCon. But on the screen in front of you is a view you may not have seen. And that's a map from around a month ago about container ships crossing the world. And containers literally are the backbone of our physical economy. And so building containers as part of that, it's a good place to be. So how good do you think it is? $8.7 billion, according to, I think, two years ago. So does anybody want to guess how many containers, how many physical containers are used in a year? Just throw out a number. $177 million. So not quite $100 billion, but you're close. Carrying around 1.6 billion tons. And it crosses quite a lot of space. So as you all know, we're here to talk about a different type of container. But we're going to get back to this, you know, kind of understand the actual physical containers a bit later in the speech. So I'm going to talk about OCI, about Docker containers, and specifically how we at AppSciar used Buildpacks to completely overhaul how we built our container images. First, let me introduce myself and the company I work for. So as I mentioned, I'm David. I'm a software engineer in the platform group at AppSciar where I work on their build and their deployment system. I'm also a maintainer of the Buildpacks project. But I've previous experience from VMware, from Google, and from Goldman Sachs. AppSciar, not everybody may have heard of, but it's a 10-year-old Israeli-based startup. It's raised around $300 million in funding, and it's currently servicing 3,000 customers, I think, roughly, with whopping 65% of all mobile apps using it. So what do we do? We basically provide tools for marketers, helping them track their installs, their events, and understand how effective their ads are. So what that means, practically, is that if you click on most ads on your phone, it's somehow going to get routed through our system so that a specific ad campaign will get a specific result. One example you may have seen was in the most recent Super Bowl, there was a Coinbase ad, which was a floating QR code, that used AppSciar under the hood. So just to give you a sense of our R&D department, we're at this point around 400 engineers. We have 850 market services with thousands of instances running at any particular time. We handle 2.5 million events per second, and we have literally petabytes of data that's flowing through our pipeline. So we have some stuff happening. So in this presentation, we're going to discuss five main areas. Number one, I want to describe some of the pain points that we are currently experiencing in our pipelines and our build processes. Number two, I want to talk about why we thought build packs could really help us with that. Number three, now that we kind of had picked that, I want to understand how practically we incorporated it into our system. Number four, I want to talk a bit about our migration process, specifically this thing that we call test-driven migration. And five, I want to share some learnings from the process with you. So let's get started. Number one, can I get a raise of hands for people who attended Mattias' talk yesterday on building container images? Thank you. All right, we got some. Beautiful. So that was very helpful because I think it presented a good introduction. You'll at least know the answer to the first slide, which is if I'm a developer, I want to actually create an image. What is the simplest way for me to do it? A Docker file, great. So that's really great when you're one service. But what happens when things start to multiply, you talk about microservices, and all of a sudden you're faced with four. And then you're dealing with another, something happens, and all of a sudden you have 16. And then they kind of just keep on rolling and rolling. So what do you do then? Basically, if you're an enterprise, things get a lot more complicated, right? When you're trying to organize a fleet of 30, of 300, of 3,000 services, so it becomes a lot more complicated to manage how you build your containers at scale. There are a bunch of basic options. Either you can ignore the problem, and you can just have either a basic Docker file in each project, may not be super optimal. You can also maybe generate Docker files. If you're really cool, maybe use things like Nix or to create pseudo-container images. People also can use Jib or Co., which build Java or go applications. And then you can also use buildbacks. As I'm sure people will tell me later, there are many other options. So I'm going to describe, I'm going to discuss why we went all in on buildbacks. But to understand why, and to go through our thought process, I wanted to describe a bit more about our build process. So our basic developer workflow was like this. If you were an engineer, you would code locally and you'd push your change to GitLab. So that would then trigger a Java Jenkins that first built the compiled artifact, and then it would run a really long Ruby script that would generate a Docker file for that project. It would run that Docker file, and it would push the resulting images to our container registry. And then once it's in the registry, you can use it in production, you can use it running standalone instances, you can run it in tests. And so what were the pain points? There were really four pain points. Number one, it was very fragile to maintain because a core part of this workflow was this really long script that was written a bunch of years ago by someone who had left the company that created the Docker file, it built it and it pushed it. And that script was fragile, it was long, it was something that we didn't want to touch because we don't want to bring down production. If there's an important update to get out, we don't want to cause any issues. Number two, the system as it was currently developed really required user input because they needed to say, how do we build our project? And only then could we, what they created their compiled artifact, we could then create the final resulting image. But that was yet another thing for engineers to do. And also on an enterprise level, it really presents an issue for compliance because you want to make sure that all of your images and all of your codes is ultimately being compiled as, you know, ideally being done in the exact same way as, you know, the engineers have decided. Number three, the script itself was fleaky and it would have inconsistent errors. That was annoying both for the engineers and it was also annoying for us. We didn't want to, you know, we don't want to get the help requests and this failed, what do I do now? And the answer is always run it again. But that's annoying. Number four is that the current system, including how users configured their belt steps and how they built the image, was very focused on Jenkins. And it made it very hard for us to even think about moving to a different system. And that was really important because we're now in GitLab and people really wanted to use GitLab CI to unleash a lot more potential in their CI processes than they could get. But we didn't have like a good designated process to show them and how they could build their image. It was clear to us that we needed a change. We needed to have a system that would solve those four pain points and it would set us up for success in the long term. So now we decided our solution, it should try and directly address those points with three main principles. Number one, it needed to be maintainable. It should be stable, it should be tested, it should be monitored. We didn't want to have any issues. Number two, it needs to be platform agnostic, meaning we don't want to shoehorn ourselves into Jenkins again. Number three, it needs to be extensible. You know, people always want to tinker and do things but it should have really sane defaults. Ideally, developers should just run one command with not having to think anything and it takes care of everything for them. The question is, how can we achieve that? How can we actually physically do that? So for us, it was clear that build packs could help us with that. So we went all in on build packs. Now the question is why? But before we really get to why, you need to understand what are build packs. You know, they were touched on yesterday and people may be familiar with them. But broadly, build packs is a technology to move your code from source into an OCI image without a Docker file. Great, but like how does it work? So it builds upon the OCI specification which defines an image as basically a set of directories together with a manifest file that links together those directories and tells it where to look. So what that means then is that you can have different tools that make different directories and as long as they're stitched together then you can have a beautiful Docker, a beautiful OCI image. And so basically the Cloud Data Build Packs project defined a specification for a build pack which is basically just two tools. It's one binary that calls detect that sees whether or not it's necessary. And if it is necessary, then it will build. It will do its job. And so let's say you can have a node build pack whose job it is, who detects that it's necessary by the presence of a package JSON. And if so, it will download an installed node onto a layer. And then you can have an NPN build pack that runs after that or a yarn build pack that looks for a package JSON that looks for a yarn.lock file. And if it sees it, it will use those tools in order to download the dependencies into a different layer. And as long as it's stitched together everything is great. So the way that you run build packs is using what's called a platform. So in our case, we used the Pax CLI which is the most common platform that's presented by the project. And that's also available as a go library as I'll get to later. Our time here is limited. This is not really the focus of the talk. But there is a lot of other presentations on build packs. And you can also feel free to talk to me later about some more of the architecture. But given all of that, so how can build packs actually help? So out of the box it really helps us with two things. Number one, it's much more maintainable than our current system because we can outsource a key part of our system to something that's not ours, that's well tested, that's maintained by a vibrant open source community. And number two, it's also platform agnostic because we're not tying ourselves into any specific CI platform. All that we're doing is as long as you have some Docker daemon, Docker container runtime that's present, you're able to make things happen. Now the question is great. We at least got two of the main principles settled. But how can we actually incorporate that entire system? So the way that we did it was we created a CLI for our users that would incorporate the entire build process including executing the pack CLI and it would function wherever they needed. And that meant it would function both locally, it would function in GitLab CI and it would also function on Jenkins. And the resulting image would be fine to be pushed to any of our downstream processes. So this was basically the command that a user had to run. You have images, build whatever image it wants together with the name of a service. And with that simple UX, it ended up hiding complexity from the users because we needed to do the hard work of let's it configuring the inputs for build packs, adding monitoring and tracing and adding a lot more visibility into the process so that the users could, so that we could see how everything's working. So the CLI itself, which is the images CLI was distributed using an internal CLI tool chain that we've developed. You can find some more about it on the blog. But to see it in action, on the next slide you're gonna see a demo. It's very short, you're just gonna see me run the build command and then you're gonna see the resulting action. So here I call the command and then you can see it's getting some image, it's initializing tracing, build pack starts happening and now it's running a closure compile. Created the jar, now it's exporting, it's finishing up the image and now which is interesting, at the very end you may have missed it. There was also a Docker build. So there are three main parts of the build process. Number one, we collected the requirements. Number two, we ran a pack build. And number three, which is interesting, we know that, is that we run a final Docker build to kind of complete images. So let's dive into all of these parts a bit more. So the first step that we need to do is we need to collect build information. So we collect some different pieces of information like what language the service is because that could affect what options we pass to the build packs, what run image to use for the image, what start command to inject. And these requirements would be collected either from flags, would be collected from local files and present to the repository and would also be collected from our KV store from console in this case. Step number two was to run a pack build. And on the screen in front of you is pretty much what we call, we initialize the pack client and we can just collect whatever options we need to, run it and then if there's any issues we can diagnose them separately for our users. Now the final step that we did is to run a final Docker build. And this we were running commands that we either can't run natively within build packs because of let's say sandboxing in the process or things that we just haven't migrated yet. For instance, closure services wanna adjust their JVM security settings which exists somewhere where at this point the build packs can adjust. Some services wanna download some jar files. They wanna initialize databases. And to be clear, most if not all of these could be done via build packs. But all of that depends on what priorities your teams allow what priorities everything is given. And at this point we wanted to kind of focus on first gaining the confidence in the build packs process before we moved everything onto the workflow. What that allowed us to do was adding that final Docker build allowed us to kind of think of our migration as a scale. Moving from being totally not cloud native creating this Docker file and doing everything as separate processes to kind of gradually moving more and more up the scale. And obviously we hope to be able to do everything within that pack build within the build packs and to move all the way up the scale. But even if we're halfway down the scale we're still pretty happy with that. So what we are left with there is that we're left with a system that fulfills a lot more of our requirements because it's much more maintainable as I mentioned. It's also platform agnostic as we already talked about. And the nice thing is it's also extensible but it has really same defaults. And this allows us our engineers to do a really non-trivial thing to build and create their container in the most optimal way in a very trivial manner. And to this end users either just use the CLI locally or they use GitLab CI templates that we have that does it all. So all they really need to do is they need to import the CI file. So that it's really overall ends up being a very simple UX for our users. So now that I did that I want to talk for a few minutes more about our migration process. So no one wants to break production. That is everybody's horror story. And while we wanted to make sure we had minimal outages we also wanted to maximize our developers confidence in the system. So how can we do that? As engineers we know there's really only one way to really effectively do that. And that is by using tests. Ultimately you want to test your system in order to know that everything is working appropriately. So in our case we needed to test that the images that we created would match with a really tiny degree of difference the images running in production. How can we do that? So for our case we use a tool called container diff. It's released by Google which is also available as a Go library. And we were able to use that within another tool to essentially verify that all the files we contained within the images were equal and that the images were really similar in size. This ended up being really helpful because we were able to see if specific files weren't moved appropriately into the correct location or in some of our initial testing we were able to see that the jar file that we created was very different. And looking at that difference enabled us to see kind of some changes we needed to make within our workflow. So keep in mind that one of the key principles of Build Packs Builds is that it's really supposed to help you reduce image sizes and that is a long-term goal for our system. But at that point that wasn't what we were focusing on now because we just wanted to recreate essentially the same image with all of the bloat included in order to using our new build system to gain the R&D confidence because everybody wants to just know that their container is working and then once we have that we can then worry about optimizing things later down the line. So in order to test this at scale what we did is we created a tool which we ran using GitLab CI that iterated parallely through all of the services of a language. It built it using the new build system. It would run container diff against the container to really make sure that it would match the image that was running in production and it outputted that onto Google Sheets so that we could really easily analyze it. We could drill down by teams, by services and we could see how our migration is going. And that was really helpful because that allowed us to talk to approach different team leaders and to group failures by essentially what errors kept on happening again and again so we could prioritize what we did first and to kind of knock out the highest number of errors that we saw. So after doing that and we iterated on that again and again and again before rolling it out we were ready. So all of which led to the number of production failures we had after we rolled out. Zero. It was totally seamless. We had no outages. We had no complaints from R&D. No one even really knew that it happened. And since then we've had over 25,000 builds. We've had metrics, traces and logs and sent our observability providers. So we have very effective monitors and we know that our system is production worthy. And with that, we're up to learnings. So as you can imagine there's things, they're bumps along the way and that's kind of why we're giving this talk because I think there isn't a lot of material available right now for how to actually do a large scale transformation for everybody on building with buildbacks. And there's some things we think could be useful for you to know. So number one is democratize information. This is pretty obvious in retrospect but make sure that everybody understands and is involved in the process. I came onto this project as it's already being a maintainer for the buildbacks project. So they're very happy. They just gave me all of it. And at one point that's nice but ultimately when we had a team workshop we really developed, we tried to share the information about buildbacks and what they meant, what we could do with them. We really sped up a lot because then the whole team was working together. So when it's just one or two people working by themselves that's a lot harder than when you can really share all that information amongst the team. Number two is build your own. People very often use the publicly available buildbacks and we had initially defaulted to that as well. Then we were forced to kind of make work grounds about assumptions that their buildbacks made which didn't necessarily match up with us. So for example, one of the buildback families that were discussed is the Pocato buildbacks. So all of their Java family buildbacks they compile the jar but then they unzip it for whatever reasons that they have. That ended up being really hard for us to deal with because all of our run commands which we were injecting and we didn't want to change all of our run commands were calling a jar file. So we needed to initially make a whole bunch of leaps to kind of try and actually recreate that jar file. But everything kind of simplified a lot when we just like why don't we just write our own? So we ended up doing here is we were using one buildback to install the appropriate JVM and then we just run a lane built as in our case we use LAN and Gen to compile closure apps. So once we kind of worked with that process we were able to really move a lot faster because we really realized that you can plug and play a lot more with buildbacks. Number three, focus on dealing with one buildback at a time or one language at a time. Initially we tried to holistically try and solve all issues and no one really wants to deal with that because they want to know before you migrate them that every issue is solved and that's a lot harder when you're trying to fix everything at the same time. But once we focused a lot more on dealing with Java and go on node, then we were able to solve all of the issues with one specific language, migrate those images and then we can move on to the next family of languages. And that focused really helped us make sure that we were giving our engineers a confidence that they needed in order to make sure that they wouldn't have any complaints. And finally, make sure that you're designing a system that works for you. We found that we used what we call the scale methodology where we had that Docker build at the end and using the tests and that was really helpful for us because initially we thought fine but we'll have all of our engineers verify that the images that are created fit with their assumptions but no one wanted to do that. We got really hard pushback from our engineers on that. So once we realized that we realized that we needed to make sure that everything was being tested and we were giving ourselves a full 100% assurance. And once we understood what the requirements from our engineers were, we were able to design a system to give them that assurance and make sure that nothing would happen. And so in conclusion, today I've tried to show you how we overhauled our process of creating new containers and we used build packs together with some other open source tooling to ensure we had a totally seamless transition. As I mentioned, we really found tremendous success in wrapping all the components together within one tool that our engineers could use. And since then we've had really thousands of builds with the build packs with our internal customers being really happy about it. And as I went through, we kind of had the following lessons. That number one, try and share the burden amongst everybody in R&D. Build your own build packs if when you face with images, it's good experience and all they need to be is very simple shell scripts. Number three, you can focus on one language instead of trying to solve holistically everything that you're doing. And number four, define a system, a migration process that works for you. And by using a scheme like we did with the tests, with the timeline, you can really move more and more of your workloads onto build packs. And then by kind of minimizing, let's say the final Docker build, you can make sure that you get all the way into the final container heaven that you're hoping for. So to tie the whole talk together, we focus a lot on what we transport in containers and sometimes less so in how we actually construct those containers. And when we put a lot of engineering focus on that process, then you can really succeed a lot more in simplifying and minifying and vivifying that process and making sure that it's as streamlined and amazing for users as possible. So with that, I'm all up for today. It's been a pleasure talking to you. I'm happy to address any questions you may have on this, on AppSire, on build packs. If anybody has any questions, I think they want you to come up to the mic or you can maybe just shout it. So now, because a lot of it is using kind of internal visibility libraries and logging libraries, we weren't able to open source it unfortunately. But it's really not so complicated. It's basically it's a Go library that uses the Cobra library to make a simple Go CLI and it does some things there to kind of look at local files. So to repeat, would you mind repeating the question again? So no, we actually, so to repeat the question, he wanted to know whether when we create a new service do we need to do any changes for that? So no, we actually made it kind of part of the process by which we create new services internally that now are kind of defaults to using this workflow so that there's no additional steps that need to be done as long as they're doing it within Jenkins. Within GitLab CI, they would need to include a template that we provide, but that's two lines on your GitLab CI YAML file and a few variables that need to be assigned but nothing that much. So we do that afterwards. I think that's kind of orchestrated more by security and that's part of the pipelines that they added on at the last step with all of this. Not so much, I think in general, pretty much all of our internal dependencies are cached and we have an Artifactory instance and so therefore they anyways have to think consciously before they upload anything. So I think that because of that, there's kind of some less, obviously code scanning is still required but that's why it doesn't need to be earlier in the process. Ideally I'm sure that we're going to hopefully shift it left and make it happen earlier. I think it wants to know whether people could use the mic that's standing up here that would be best for people online. With Docker, I can write a Docker file and create add things. So I can write a specific Docker file for each app with the build pack. Can I create a file like Docker file to make all operations, to make different operations for app? Yes, so there is a concept called, let's say inline build packs that currently exists in the open source that you can define a one-off build packs pretty much that lives in, I think they have a specific file called a project Tamil file. Where in that, it would be a simple shell script and we're kind of exposing that to our users as well so that they also, if they want to just, they want to extend the build process, they can do that by adjusting this file. There's nothing else, then thank you so much for coming and happy to talk to anybody private if you need.