 Hey, Mitch Coleman from Dim Sum is going to talk to us about the old and simple way to create Ubuntu and Debian container images, a retro approach perhaps. I don't know about retro, but I do hear a lot of people talking about building their own containers, but they don't talk about how they actually get to the point where they start. So that's what I want to talk to you about, because I want to say it's simple. The old ways do still work, and it should be easy for people to actually use them to get a bit more comfort and a little bit more ability to have control over their environment. You might say, okay, what am I talking about when I say a base image? Base image is when somebody says, hey, just fire up a virtual machine. I want to run Ubuntu in it. You'll often have a menu you pick the Ubuntu from. Where does that Ubuntu come from? Who's a Ubuntu you're installing? Who compiled that? Who configured it? Who customized it for your own requirements? Or if you had your own custom program that you wanted to include in it, then who put that in there? So the answer is it can be you, and there are simple tools to allow you to do this. So I'm not going to talk about a single binary image system, which is another different type of way of building containers. I'm talking about building an Ubuntu template or a Debian template. So you start with a full operating system. And once you know how to build them yourself, you will know where your operating systems come from because you've built them. Now, you don't have to use them, but if you can replicate what you end up with in the cloud, then you can run tests locally to prove that what you're expecting to happen in the cloud is what's happening locally. And it gives you another lever for debugging things, especially what you're doing in the cloud is some kind of fully automated environment. You don't know or necessarily have access to run the tools to debugging in the full production environment. So you might want to build your own cloud images locally and run tests on them and then throw them away and use whatever is provided by your cloud provider because you don't really want to just grab them from somebody around the back of the pub. He might be doing something bad with them. So if you can prove that that's not the case, you can end up with being able to repeatedly and reliably build your infrastructure. And once you're building your infrastructure yourself and have the capacity to do that, it gives you other options, like having the ability to coordinate between all your production environments and your test environments and your random throwaway experiments on some guy's laptop. They can all use the same build process. They can all use something that's verifiably the same thing. So you can have less moving parts and less problems with change in that. If you were building a base image for a small embedded system or a different architecture, sometimes you might be starting with a system that doesn't actually exist. So it's not like you can take the existing base image that exists from somewhere else. You have to be able to build it from scratch. So that is another reason why you might want to build them because they don't exist. Now with Debian and Ubuntu, they're all based on the same kind of deep package system. And this all applies to other tools, other distributions that are based on the same core deep package system. Most of them have the same kind of build process, but I'm only going to focus on Debian and Ubuntu here. There are lots of different tools included in the repositories for doing this. So many that I don't suggest you look at all of them. There's the one basic one, though. The bootstrap is the tool that's actually used under the hood. If you look at how the Debian and Ubuntu installers work, if you could dig low enough down and pull away enough of the fancy front end GUI that they've put onto the installer these days, underneath, it's still running this Debootstrap tool. So you can use the same tool from the command line yourself. And the other tool that I'm looking at is multistrap. And I use that myself in preference to Debootstrap because while Debootstrap is the basic minimum function that all the installers need to use, it is very basic. Multistrap is slightly more compatible with the way that things work normally in an installer, whereas Debootstrap does a whole lot of hacky things to make it work in any environment always. Multistrap does an actual install of the apt package using the apt tools. So it works in a slightly better way, and it supports a couple more features as a result of this. And as you can see on the screen there, there's at least a dozen different package installers that are in the package repositories already. So you can go searching for these and find other ones. And here's just a selection of ones that I have tried in the past. But we're going to look at Debootstrap. Once you've installed Debootstrap, it's really simple to use. It does just take these three command line parameters. You probably want to give it this hard coded one as well of what architecture we're going to install. But the rest of it is what distribution am I installing, the suite? What target directory am I installing it into? So that's my output directory. And what URL am I using to install from? And these are all the standard code names that you expect from all the normal Ubuntu and Debian releases. And the mirror archive can be something you've got locally if you want to speed up. Or just some archive that you find on the internet that you trust. There is actually a chain of trust in this. So there are ways of proving that you're downloading the right packages. But most of the time, you just want to run it. It is that. If I run that, I'll get a container. I'll get a starting point for being able to build my template. If I was to use multistrap, what I get with multistrap is the ability to have multiple different package repositories. So if I had my own internal packages that I've built for my application, I could add that to the list of repositories to install. So I'd have both the Debian repository and my internal package repository. Or a couple of other different non-built into Debian repositories. And I could add together all the packages I might need for my application. So that's the main advantage I get at multistrap. But the problem is it does need a config file because it's a little bit more complicated. It doesn't have a huge config file. The config file does just have a section for each repository that you're trying to install from and extra sections for global things, like what packages I want to install. To prove my chain of trust, I can install various security packages beforehand, key files. And I define each key file for each repository so I can actually have signed repositories and then prove where my packages come from. So I know that I've got, I'm not accidentally downloading something crazy from the internet. Given that that's my point here, is that I want to try and have the faith that I am building something that's reproducible and I know where it comes from, it is important to realize that you have got special named packages in both Debian and Ubuntu that contain the archive key ring public keys. So you can see in my example file up there, I've got the two packages names that you install to get these keys, which means that I'll have all the chain of trust required. Once I've actually got my config file for multistrap, it is again, it's a simple process of running two commands. Unfortunately, it's two commands now and there are technical reasons for that that don't actually matter most of the time. But it is a flexible advantage. If you're running a different architecture or if you want to have partially installed systems so that you can reconfigure them, there is two different phases, the install phase and the configure phase. Most of the time, you just run one after the other. And sometimes you don't need to run the configure phase, but it doesn't hurt to run it. So you just run one after the other unless you know that you don't need to. And given that I've looked at the two tools, I wanna say why I would choose one over the other. You've heard some of the reasons already. One of them is the default install. You know that Debootstrap will work because it's how a lot of people install their distribution. But sometimes the config file means it's harder to run from a command line so Debootstrap's good in that way. But sometimes you wanna have the multiple packages as I've talked about, so Debootstrap's not so good. But then what you've got is the other features here is that because multi-strap is installing its packages the normal way or closer to the normal way, there are certain scripts that it automatically runs as part of the post install of your packages that may be important for your environment. So it gives a cleaner setup. And it's often more important with the multi-vol architecture things. If you're building a base image for a different machine, if you're cross-compiling or cross-installing, I suppose, then you want to use your multi-strap. Now the images, the directories that you create with those two tools is immediately useful. It creates a subdirectory that looks like the root of a file system, a root of a new installed computer. So if you start with that, you can use that straight away. You can actually jump into it and run commands as if it's a full install. And this is useful if you wanna do isolated sandbox for a compiling something, or if you just wanna do a test to find out what happens when you install a package, what files is installed, how does it configure things? You've got a template, you can just jump straight into it and use it. This is not a container. The charoot that I'm running up there in the command line example does not containerize things in a very secure way these days, but it is enough of an isolated environment that you can run your own tests. If you wanna do experiments of your own. So it's useful for some things, but it's not a complete template. If you do run some commands that you might expect to work normally, they'll often give you crazy error messages because the environment's not completely set up. It doesn't look like it's a booted system. It doesn't have all of the parts it needs. And that works differently in multi-strap and deep bootstrap. They've both got some little bits of things that aren't quite working. They will do some things and other things will just completely fail because it's lacking an install step at some point. So the improvements that I tend to do to make the outputs of these tools into something that I do want to use, I separate into a couple of different phases. To build my template, I'll have a fix-up to fix the obvious problems that stop us from working. I'll have a customized step where I install the config that I think I want for my template. And then generally, because I'm building a template to use to deploy lots of times or to give to lots of people, I'll have a minimized step where I say, well, what's the extra space being used for in this system? But delete as much extra space as I can and make the template as small as possible before deploying it. And then when I actually go to deploy it, to instantiate a real virtual machine from a template, there's another step as well. Oh, and I'll mention here, there's some links in the slides. I'll hook the URL for the slides up at the end. There's some links in the slides. I've actually got scripts examples for all of the examples I've put on the screen. So if I wanted to fix up the image, there's three basic things I need to do. But four, I guess from that list, two of the things are removing identifiable information. There are a whole bunch of random numbers that are used to generate the key and the machine ID. So they're not gonna be useful in template because each instance of your template is gonna be a whole new virtual machine. So we delete the private key material. We delete the random number that is the machine ID. And then the host name itself, when you build with the D bootstrap tool, it'll often give it a host name that is a copy of the host that's running the installer, which is not very useful either. So we remove that in a modern system. If you're using a system that actually starts with a system D in environment, it'll automatically generate a host name if it doesn't have one. So we can remove that. Or during the instantiation phase later, we can put one in. And then the final thing that I fix just for all default builds is the reason why in the earlier slide I couldn't do an apt install was down to one file missing in the resolver. So I fixed that. And these are just three basic things. As you install more and more packages, as you build your environment, you might find you need to do more fix ups. But with these three basic things to fix up, what you end up with is a environment that doesn't immediately break when it's booted. To customize it, obviously you wanna have your automation environment be able to log into your built image once it's actually instantiated. So you'll need to set some passwords. Normally I wouldn't set a specific hard-coded password. I'd configure a list of accounts that have the ability to do sudo on it. So that means a couple of other bits of automation and a couple of more files in there to configure sudo. But if you just wanna do tests, you give it a password so you can log into it afterwards. And you configure the networking so that it'll work once it's actually booted into a container. Because the default image won't have DHCP clients. And again, I've used a modern system there with the system D install. And I'm overwriting the resolve config again because this is the customized step. I now know that I have a specific resolver that I can use. So sometimes I'll fix things multiple times. When I go on to minimize, there's a whole lot of things in your basic image that aren't gonna be used. Things like the documentation files or even the man pages. If you're not logging in to your infrastructure, if you're using a fully automated environment, if you're dynamically scaling the number of virtual machines you've got running in some kind of cloud system, you're not logging into each of your virtual machines. You don't need to have the documentation installed there. That's eight to 20 meg worth of space that I can just delete. Same with the localization files. If your application environment has a user facing translation layer, you may require the locales information there. But most of the time, your application will have its own translation layer. So this is the operating system, list of translated files. There's another 20-year-old meg worth of space there. You can just say, right, and this is in a minimal system. And then there's a couple other files that you can find. If you go searching, you'll find that there's a bunch of things that are large and you can say for each one, can I get rid of it? But the basics are realize that you can make it smaller. And there are ways of saying, it's a 200 meg image now, but I can make it a 100 meg image. And if I'm deploying a thousand of them, that adds up pretty quickly. Then if I'm gonna instantiate a copy of this template, normally if I've got a container orchestration system or some kind of cloud automation, there will be a process for doing that that's done for me. If I'm building it all myself, or more to the point, if I'm trying to understand how the system works, which is what I'm trying to get you all to do to realize that it is actually simple under the process, then you might wanna know how to do it manually. So there are a couple of things. You're instantiating this host to a specific thing. So we give it a host name, we give it a couple of entries into the host's file to make that host name work, and we do a reconfiguration of the SSH keys. We deleted them earlier when we did the fixups to remove the private key material, but now we need to instantiate it so we need to create them again. So we just jump in and we tell OpenSSH to rebuild its SSH keys. And once I've done those four steps, I now have a machine that I can really easily boot on my local system. Again, I'm gonna use a modern system. I've installed the systemDNspawn tool. It's the simplest answer. It's not always the right answer. I'm not trying to say you should use systemD, but this is the simplest answer to get started with. If you don't wanna install an entire infrastructure or a entire cloud automation system, systemDNspawn can be told here. This subdirectory booted up as if it is a containerized machine, and it will. And in this example that I've got on the screen there, I've actually even said boot up this subdirectory and throw away all the changes I made. When I close down, when I exit from the virtual machine, which you can do by selling it to power off inside the machine, or from outside the machine, using the machine control command, when I power it off, it'll actually throw away any changes I made during that boot of this machine. So it's a good way of, again, testing things locally. You fire up something that looks like it's a full virtual machine, you log into it, and then you throw away the changes after you've done your experiment. Now to move on from there, if I was gonna build my own system, I wouldn't be using systemDNspawn. It's a little bit new, I guess. But what I could use is something like Libvert. You may have encountered Libvert before. It is a system for managing virtual machines. It has network transparency, so you can have people on their desktops with management tools, log into a cluster of virtual machine hosts. And one of the features that isn't talked about a lot is it does actually have LXC compatibility. So you can take the thin virtual machine disk image that we've got here and boot it up as an LXC image in Libvert. Again, it requires a config file. I've abbreviated here the config file because it's an XML file, so it's quite long. And with two simple commands, you can define it from XML and boot it up. And it will boot up that sub-directory as a full system image. And you can give that to other people to manage and control using the normal Libvert and VertManager tools. Other things I could do with this sub-directory that I've built, I've built a template now in my sub-directory. I could take that sub-directory and use tools like GuestFish to package up that sub-directory. It's just a normal Unix sub-directory at this point. GuestFish is a tool that allows you to create file systems and partitions in a disk image for a full, thick virtualization environment like VMware or QMU or uploading to Amazon. So GuestFish is a framework that allows you to do that. It's got bindings for Python and other languages and shell scripting environment so you can actually automate that as well. And the end result of that would be a QCAL file or a VMDK or some other output virtualization disk image. And then you have to install a bootloader like make it a UFI bootable system. There are steps that I'm not going to go into for that. And various things you can do to make it a little bit better for QMU. And then the end result of that is you've got a full workflow. You can say at one end you've got a config file that gives you all the packages that I might want to install in my environment, including all my custom tools and some default config for my custom tools. And I can build a library of my templates that have my configuration in them using an automated pipeline that uploads the fire end result to a system image repository and it'll boot it up for me and do tests. This is just the framework to allow you to do the understanding of how to do the core of the building. So, any questions? Let's try once again with the throw mic. Any questions on, has anyone done this before? I actually built the bootstrap and those sort of things. Oh dear, now the wrong crowd. Questions anyway? Okay, thank you very much. Thank you.