 All right, good morning, everyone. Can everyone hear me? Yeah, sounds like the mic's working, so. I will try and talk very loud, and hopefully everyone in the back can hear me. So presentation today is a brief look at different strategies you can use and ways to kind of optimize VM images for use with KBM and Kimu as your hypervisor. And I think you all know who I am. Probably you read it on the info about this talk, and basically I work for a company called MetaCloud. We manage clouds for large companies. So a few notes quickly to talk about here, about this presentation. As I said, in my kind of perspective for the talk, there's literally hundreds of different combinations of tools, disk formats, container formats, versions of the software that you can use. And there's a ton of different ways to run images on actually a ton of different hypervisors. We're only talking about KVM Kimu here today, because we've got a little less than 40 minutes to get it all in, which is really not a lot of time. So we make a couple assumptions in this talk. The assumptions are up there. We use particular versions, OpenStack Grizzly, particular versions of Kimu, Libvert, and then in general, we're talking about using Ubuntu and RHEL, fairly modern versions of those as our guest VMs. So, and like I said, there's so many different ways to do this, the original version of this talk I built had like 80 slides, and I was like, well, I don't have four hours, so I gotta cut that down to 20. So this is just one way of doing it, and it's the way that we now recommend for all of our clients, and this is based upon several years of us working with OpenStack and KVM and the underlying technologies. So I'm gonna talk a little bit about that and the trade-offs there. So the high-level presentation here, I'm gonna talk a little bit about what this notion of a disk format and a container format is, and we're gonna, again, tons of different options there. We're gonna focus on raw and QCal2 as our disk formats, and then we're gonna talk about AMI, which is a container format and kind of a disk format and a bunch of other stuff. Then we're gonna actually step through kind of the very high level of what happens when you launch an instance in Nova? It was using OpenStack Grizzly. What does it do with your image, and what does that mean? And then we're gonna delve into a bit about prepping the OS inside the image, because that's the other half. We are focusing on Linux. Sorry, no Windows stuff here. There's a lot of information about Windows, but again, we have to concentrate on something. And then finally, there's just a final putting it all together to show you kind of the recommendations all together as one thing. So tools for manipulating disk images. There's a number of great tools out there and different ways in which you can manipulate these. The kind of the tools we use every day at MetaCloud for manipulating images and working with client images are the ones that are up here. So from the Kimu project itself, we use the Kimu image command and Kimu NBD. So Kimu image really is your Swiss Army knife. It can read and understand a number of different formats. It can convert files between about, I think it's like 12 to 15 different disk formats. VMDK is from VMware, all types of different stuff. So it can also resize a number of different disk formats. So it's a great Swiss Army knife for working with it. Kimu NBD is a great tool for using to mount QCAL disk files directly. It will open them up, look at the partition structure, build block devices so that you can then treat them as block devices and use them for manipulating them. There's also another great project out there called libguestfs. If you guys haven't heard about it or seen it, I highly recommend you check it out. It's a wonderful project and it provides a ton of tools. The ones we primarily use is guest mount. It's another way of mounting file systems. You can mount QCAL too, you can mount raw. It actually supports mounting a number of different formats as well, but it does require fuse. Uses a fuse mount to do that. And finally, vert file systems, which is another Swiss Army knife, if you will. I didn't want to use the same term twice. For looking at, it's really great at examining the partition structure inside raws. That's what we primarily use it for, but supports a number of different formats again. OK, so formats, I keep mentioning disk formats and that sort of stuff. So what am I talking about here? Well, in OpenStack and in Glance, there's two types of formats that come into play. There's your disk format and there's your container format. The disk formats are what I think most of us are failure familiar with. And so a disk format is a format for a file for storing basically partition and block information. There's a lot of those. VMDK is the popular one from VMware. Raw and QCAL2 are the two most popular for KVM. And those are the two we're going to focus on here. There's also this notion of a container format, which is less talked about subject and less probably dependent thing. But what a container format is, is it's a way of saying, well, I can take a disk file, which is like some block info, and I can associate some metadata with that block info, that block file. And that tells me something when I launched the VM. It gives me some extra pointers of things to do. So a container format or a container file typically contains a disk formatted file as well. And one of the more popular ones is AMI. And we'll be talking a little bit about AMI. So raw disk format, what is it? Well, it's exactly what it sounds like. It's a direct representation of the disk. Imagine if you took your hard drive and you just dd it, and you dd it to a file, that's a raw. That's basically what a raw is. It's a direct copy of everything that's present on the physical drive. It includes your partitioning structure, your boot sector if you have it, all your data blocks, everything. Raw disks can be sparse. If you're storing a raw disk file on a file system that supports sparse, it will be sparse, a.k.a. the underlying file system will not actually write out all those zeros. It's probably one of the most widely supported formats out there because it is so basic. It's just a disk in a file. And it can basically be treated as a block device. It's fairly easy with a raw to manipulate that in Linux operating systems. You can use things like file to look at the structure, fdisk to look at the structure, your offset so you can just directly mount it. QCal 2 disk format, that's the other popular one we hear a lot about. So this is the Kimu copy on right version 2 format. It was developed as part of the Kimu project. It supports a number of features. One of the great ones is it can do pre-allocation as well as on-demand allocation of blocks. So what do I mean by that? Well, raw, as we talked about previously, is just an exact copy. Every block is there. Now, if you're storing on a sparse file system, we don't copy every block, but every block is there. A QCal 2 can be made to do that. You can create one with pre-allocation. You can also request on-demand allocation. And what that means is when you can create the file, you say, I want this file to support 100 gigs, but we don't need to write or create 100 gigs worth of blocks in there. We just say this file will grow to that maximum size. So that provides a little bit of a performance improvement in that we don't have to pre-allocate it, at least at launch time. Now, obviously, when we're actually running and writing to that, there's a little bit of an overhead, because whereas with raw, we already have a block available. We just put the data in here. We have to create the block, update the disk structure, and add that. So it's kind of a trade-off. There's also a number of features inherent to the QCal 2 format. So this is different than what NOVO or LiveVirt supports. This is just inherent into the image format or the disk format itself. You can have read-only backing files. So that's a pretty cool feature, actually. What it means is I can take a raw file as an example, and I can then create a QCal 2 based on that raw, and I can say this QCal 2 uses this raw file as a read-only back. And when I create that, that QCal 2 contains no data. It's basically just a pointer to that raw. And if I were to load that QCal 2, it would see everything that's in that raw and presented it and read from it. Now as I start writing, I'm not writing back to the raw, I'm writing into the QCal 2. That's kind of where the copy-on-write part of it comes into play. QCal 2 also supports snapshots. You can do those both internally in the file as well as have external files and have this notion of chaining snapshots and chaining backing files. It also supports compression using Zlib. It's an option you can request, as well as encryption. And it supports AES on the encryption, and I think there might be one other, but so those are all, like I said, built into the file format itself. Okay, so differences. I mean, I kind of talked about it, but let's spell it out here. Raw versus QCal 2. Raw has very, very little overhead. It's an exact copy of what's on your hard disk. So as a result of that, it has a performance advantage. All the blocks are pre-allocated. It's just a straight copy. There's no overhead of a actual file format layer, anything you have to do there. So there's a performance advantage. QCal 2 on the other hand is designed for virtualization and it's actively developed. A lot of folks at Red Hat, I think there might be a few in the room, actively work on it as well as a wide community. And so it's built with this notion of it's gonna be used for VMs. It's gonna be used in clouds. What are the features we need for that? What are the things it needs to do? Also, as we pointed out, QCal 2 supports snapshots. So I have a little star next to that. What do I mean by that? Well, as a disk format, raw doesn't support snapshots. There's no notion of a snapshot in the disk format. There is with QCal 2. However, with Grizzly, there was functionality added for this notion of live snapshotting. So if you're using Libvert and KVM and you have a raw-based file, you can use live snapshotting. Now that was designed and implemented to improve snapshotting with QCal 2s. But in theory, it should also work with raw files. I say in theory, because as far as I know, no one's actually used that feature in Grizzly with a raw file yet, but it should work. So continuing on the differences, QTel 2s with a read-only backing file have a couple of advantages like I talked about. First off, faster instance launching. So why? Well, I have a backing file. All I have to do is create the QCal 2 file and point it at the backing file and it's available. I don't have to copy the backing file. I don't have to make a copy of it as if I was launching it as a raw. Additionally, the size of it or the resize of it can be represented virtually. What do I mean by that? Well, a raw, like I said, has every block in there. So if I want to resize a raw, I actually have to allocate all those blocks. With a QCal 2, if you're using on-demand allocation, you just change the size. This is now gonna be 100 gigs. There's a key move image command to do that. So you just change the size. You don't have to allocate a bunch of additional blocks. So that's faster. There's also an increase in storage efficiency for using a QCal 2. You don't have to duplicate the base images. As we'll see later on with what NOVA does and the way it actually launches a VM, if you're using raw, you have to copy all that data for each VM if it's based on a raw. With a QCal 2, you don't. So one copy stored in a format and then you're gonna have 100 VMs all launched with a QCal 2 based on that same file. So you don't have to, it allows you to do basically thin provisioning of your storage or a much greater level of thin provisioning than you might normally be able to do. So AMI, this is the container format I talked about. And it's a bit of an identity crisis. What do I mean by identity crisis? Well, AMI can be used to describe a container which is what I talked about. It can also be used to describe a file, a disk format. And an AMI also contains a raw. So it's kind of a bunch of things all rolled into one. What is it really? Well, AMI is actually three distinct files that are used together. So the first file is called an AMI. So it's an Amazon machine image. It's actually a raw file. It's just a disk file in raw format, but it contains no partition info in it, just a file system. So this is a bit of an interesting concept. You don't technically need a partition to put a file system on a disk. You could just take the raw device, devsda is an example. You could run makeFS on it and put a file system on there. You can also add partitioning information, add a boot sector, add grub, do all the things if you want to. For an AMI, it's a raw disk with no partitioning info just a file system. The other two files, an AKI and an ARI. AKI, Amazon Kernel image. It's a VM Linus. ARI, Amazon RAM disk image. It's an NITRD. There's nothing special about these files. They don't have to be modified to support AMI, just a kernel and an NITRD. An AMI, the AMI file is booted using its associated ARI and AKI. So you don't need a kernel or an NITRD inside the AMI. In fact, it's not used. You basically use the AKI or the ARI that's associated with it. So launching an instance, how do we actually launch an instance in Nova? So we start with a user selecting an image and a flavor. There's a bunch of other information they can do. We know this. What's important is the image and the flavor for this conversation. That request is then sent to the scheduler. Nova scheduler picks a node, says, hey, compute, go do this. Launch a VM for me with the following information. Nova compute then moves through a series of tasks. And the first thing it has to do is it has to say, okay, I have to make sure that image is available to me. So the images, as they're downloaded from Glantz, are stored in the instance.nrscordr slash underscore base. So think of this like a local cache on each compute node of the images. This is done because it allows us to speed up image launching. If it's the same image used, say, 100s and hundreds of times, I don't have to re-download it every time from Glantz. So it's gonna check there to say, hey, I've been told to use this image. Do I already have a copy of this image? If it doesn't, it's gonna make a call to Glantz, say, hey, Glantz, I need this image, download it. And then what it's gonna do is it's gonna take a look at the image it was given. If that image is not in the raw format, it's gonna convert it to raw. So this is something that's really important. It's actually a bit shocking to me the first time I read the code. It doesn't matter what you store it in in Glantz as. You can store it as a QCowT, you can store it in a number of different formats. It's converted to raw by Nova Compute. And it stores its local cache copy as a raw. And again, it's stored in the instance.nrscordr base. Then what it does is it says, great, I have the image, I need to create the disk file. The disk files are instance dir, instance uuid, disk. That's what we think of as our root disk for our VMs. Now, the next steps will vary depending upon the format you're using. So if you're using a QCow2, this is actually the default Nova. When you launch a VM in Nova with the default configuration, it will make the root disk file a QCow2 that has a read-only backing file of the raw image. It just downloaded and or converted from Glantz. It'll be a dynamically allocated file as in it will not pre-allocate all the blocks. It'll just be a pointer. Backing file, the previously downloaded image. And then it sets the disk size. So again, this is an interesting feature of the way Nova works. In a flavor, you can specify a size for the disk. 10 gigs, 20 gigs, 30 gigs. You can also specify zero, no size. That causes a little bit of a different bit of behavior in Nova. If a size is set, when that QCow2 image is created, it sets that QCow2 image to be whatever the size you specified in the flavor was. From the other hand, there is no size set. Then it doesn't specify a size other than the size of the backing file. What this actually means then is if you create a really, really small image and you leave no free space and then you fail to set a flavor size, you'll butcher, VM up, and you'll have no space. Also, it does mean that if you have a really large image, let's say 100 gigs is how big your image size is. And you set a flavor size to 50 gigs. When you try and launch the instance, it'll actually fail to launch. Nova will detect, wow, you have 100 gigs here. You're trying to create something that's 50 gigs. Those don't match. There's no way you can fit 150. It'll fail. It'll actually fail with an error that indicates, hey, the flavor size is too small. You can't contain that image in here. If it's a raw, it's a little bit different. So like I said, QCOW2 is the default. So in order to launch a raw, you need to set a flag. The flags use underscore cow underscore images equals false. If you just set that flag by default, it'll then create the disk files as raw. There's an additional set of flags you have to pass to Livevert if you want to use some more exotic formats. But the default, if you just tell Nova not to use a cow image, is raw. In that case, what happens is a copy of that backing image is made. So that raw file that we downloaded from Glance and Stored, we make a copy of that, complete copy. And then we resize it. So again, this is different than QCOW2. Since our QCOW2s were dynamically allocated, we could just set the size. We actually have to perform a resize operation here. Again, it's resized to the size of the disk specified in the flavor, or it's just left to the original size. Again, all the same rules apply. If it's too small, it'll fail. What's interesting to know about this resize here with raw and with QCOW2 is all we're doing here is we're resizing the maximum size of the disk file. So what that's analogous to is if I had a 10 gig hard drive in my computer and I ran out of space, and I went and I bought a 20 gig hard drive and I put that in there and I DD'd all the data off that 10 gig drive, put it on my 20 gig drive, then booted from my 20 gig drive, what would happen? All I did was DD it. I'd have a full 10 gig partition and then I'd have 10 gigs of unallocated blocks because they're not part of any partition and they're not part of any file system yet. So all Nova's handling here is resizing the disk file in the underlying disk and what its maximum size could be. It doesn't change anything inside the operating system with the partitions or file systems. That comes later via another tool. So that other tool is cloud in it. So the next kind of part of portion of this presentation is around how you prep the OS. At that point, you've got a VM that's running. Nova boots it. It does the rest of the stuff that Nova needs to do. Gotta get networks, gotta get IPs, gotta get it booted, configure, create an image and livevert, all that sort of stuff. That all happens, gets booted. What happens next? Well, if you've prepped your image properly, you can actually accomplish a couple of things. Again, through some of the functionality either built in to Nova or by using third party tools. So inside the OS, there's a couple of things that we always prep. One is cloud in it and that's what handles a lot of the file system resizing for us. Authentication, again, is just gonna boot that image. So how do you log into it? How do you authenticate to it? How do you use it? Networking, there's a couple of actually interesting things around networking. By default, the image boots. It's got an IP address assigned and how does it get the network info? Needs to DHCP it. At least that's the general assumption. And finally, if you've got volumes that you wanna attach, how does that work? Well, you need the hot plug functionality in the Linux kernel and we'll talk about that as well. So cloud in it, probably the most interesting part of it. So cloud in it's a project, it's a tool, it's a great tool, it's developed by Canonical, the folks behind Ubuntu. And it's designed to help solve a number of these, what do I do now that the instance is up? How do I log in? How do I set its host name? How do I set its network info? So it actually does a lot of stuff, way more than we use it for, and most people do for that matter. You can read all about it. The primary thing we use it for is leveraging Nova's metadata API, which is a EC2 compatible metadata API. The metadata API is really a concept that was originally created by Amazon for EC2. It's re-implemented in Nova. It's actually a very, very powerful feature if you've never played with it. It allows you as part of an instance to set arbitrary data that is then available inside that instance by making a specific call to a specific URL. That URL's always the same, it's a link local address. And so it allows you to dynamically seed data, if you will, into your image or into your instance. And then if you have the right tools configured to run and look at that metadata, it can take action based upon the data it finds. The other stuff that Cloud NIT does for us here is it provides the functionality, like I said, for doing the file system resizing. So Cloud NIT can use something, it can grow the size of a file system, but of course it can only grow a file system to the maximum size of the partition. Since an AMI has no partition table, that means it can grow it to the maximum size of the underlying disk file that we resized. And Boon2 also provides an add-on tool called Cloud NIT RAM FS Grow Root. This is something that they've integrated with their NITRDs and what it allows it to do is it can repartition the underlying disk structure. So if your underlying disk actually has a partition structure, you know, you did the classic, I'm putting temp here, I'm putting slash here, I'm putting boot over here, it can do that. Now, as far as I know, that's only available for Boon2, Red Hat does not have it. Unfortunately, the Red Hat NITRDs are structured a little differently. That may have changed recently, but as far as I know, there's no equivalent in anything other than in Boon2 as a guest OS. How to install it in Boon2 and RHEL? It's available as part of the default repositories from Boon2, so you can just go ahead and install it. For RHEL and CentOS, it's not part of the core OS, but the EPEL has it. So if you're not familiar with the EPEL, it's a set of enterprise-tested-invented packages that are made available for RHEL and CentOS. So if you configure those app yum repos, you can go ahead and install Cloud NIT from there. Or, of course, you can download it and roll it yourself if you want. What's the configuration? Well, we actually use a very, very basic configuration because this goes along with the way we do networking, the way we recommend our clients do authentication and everything. So basically, three elements there are user cloud. All that does is say, when you run and you look at the EC2 API, if you find SSH keys and you're gonna drop them into a user's directory, drop those SSH keys into the cloud user's directory. So I should back up and actually talk about that, I suppose. You know, in Horizon, you can create SSH keys and when you launch an instance, you can specify an SSH key. You can also do that through the NOV API if you want. That's what Cloud NIT does is it reads that data and says, oh, I'm gonna download that key and I'm gonna install that into the cloud user's home directory. Disable root. Make sure the root user is basically disabled from SSH access. Why? Well, what's the password? And also, why are you SSH-ing in his root? Preserve hostname false. So this is actually because Cloud NIT can reconfigure the hostname. So you can set a hostname as part of launching an instance and if you have Cloud NIT and you say preserve hostname false, it'll actually reconfigure the hostname for that instance to be the one you set. So it'll come up and already have your hostname. The final thing which is important only for Ubuntu is by default when you install Cloud NIT, it can look at a number of different data sources. Only one of those is the metadata service or the EC2 service. That's not enabled by default in a default install. So you wanna go through and enable that because that's the only metadata service that we have available at boot time inside the image. So now authentication models. This is getting way more into recommendation. Again, there's a lot of different ways to do this. What we recommend and what we assume is we need to provide a set of images. They're gonna be brought up and booted to a state where they're in a known state whereby the client's configuration management system can take over and turn that instance into something meaningful for them. Whether that's a web server, a database server, an application server, whatever it ends up being. So a lot of these recommendations are assuming that this is not the end, this is only the beginning, this is the basic seed point and after this point, your standard process, whatever that is, puppet, chef, salt, someone doing it by hand following a guide, whatever that's gonna be is gonna be run. This is just an initial starting point at which you can then configure this instance to be something meaningful for you. So what do we do when we prep an instance? Well, we disable the root login via SSH. CloudNet also does this, we just make sure. We also configure it so that anyone who's a member of the allowed pseudo groups can do so without a password so that we don't have to have a preset password for the cloud user and then share it out because if there's a preset password and you share it to everyone, it's basically like having no password, what's the point? So for Ubuntu, that's the pseudo group. In RHEL, the standard is to use the wheel group. The other stuff we do is we go ahead and we create that cloud user and we add it to that pseudo or wheel group so that user's available and they have basically full root access. The expectation is this is how someone's gonna log in. They can log in over SSH using that cloud user. They seeded their key, cloudnet installed it. They can SSH in as the cloud user, become root, run their configuration management system, do what they need. So we add the user and then we basically, we lock the password, aka it's an unhashable password. You can't, kind of like saying there's no password. It doesn't exist. So if you try to log in on the console, you couldn't, the account's locked, if you will. Finally, what we do is we set up the root account and we actually allow it to only log in on the console without a password. There's no gasp, okay, I figured when I said there's no root password, someone was gonna be like, what are you talking about? Why do you do this? Well, so all we've done is made sure that root can log in on the console and we've in fact disabled its ability to log in everywhere else. We don't install an SSH key for it. We don't allow it to log in the SSH. It can only be logged in on the console. And again, we're talking about an OpenStack instance. Isn't that insecure? Could be, but let's think about this. So again, it's an instance. Everyone in the company's gonna be launching this. So everyone's gonna have this root password. How secure is that? It's not. Also, you can only log in on the console. Well, let's think about it. If you have console access, or if you have the ability to get a console through OpenStack, what else can you do? You can snapshot that. You can boot that up someplace else and attach that if you want to. You can destroy that VM. You basically have full control. You own the hardware in essence. Now, with the newer Keystone V3 API as support for that starts coming into the tools, there's a lot more granular access. And you could say something like, this user can look at the console, but they can't do anything else. So as that evolves, we'll need to change some of these recommendations. But for now, with Folsom and now Grizzly, it's fairly safe and secure because otherwise either everyone has the password and or they basically own the hardware. So networking. That was kind of how we set up the authentication stuff. The networking piece. So when you launch an instance, Nova picks a unique or Nova network or Quantum picks a unique MAC address for you and assigns that to the instance. The OS itself can sometimes be configured in a way where it's holding onto its old MAC address and it doesn't want to use that new one and wants to set its MAC address. In a cloud, of course, this can be really bad. If you have an instance that's got a hard-coded MAC address and you spawn 500 of these, you're gonna have a layer to mess on your hands. So you need to make sure that the OS is prepped in a way so it's not holding on to any old information about its MAC address. There's a couple of different ways to do that. First off, there's Udev and Udev rules. So you wanna go ahead and just remove the existing net rules and also disable the writer for it. We don't really need it to rewrite our net rules files for us because in most cases there's only one interface. It's not gonna change. It's got a fixed MAC address. We don't really need Udev to try and figure out have we added more and reorder stuff for us. In RHEL, you also wanna make sure that your actual if-cfg files don't have a HW adder specified that can be specified in there. So you wanna go through and make sure that these are all cleaned up so that it'll actually use the MAC address of the card that's configured for it or the virtual interface that's configured for it. The next step is DHCP. So in NOVA, the expected way to configure the network is for the instance to come up and DHCP all of its info. Additionally, we kind of prefer that instances keep DHCPing. That way, as cloud administrators, if we need to change information about the network, we can kind of do so dynamically and then let the VMs pick that up by DHCPing the new info. So we need to make sure that the instances when they come up are configured to be persistent with their DHCP, if you will, to keep trying. What you don't want is an instance to come up and boot and say, I'm gonna try and get an address. I failed to get an address because there was some network interruption. I'm giving up. At that point, you've basically got a VM that it'll never get on the network until an administrator logs in onto the console and runs some commands. That's not really useful. Again, the goal here is to try and automate as much as possible. The goal here is to boot the instance up into a state where, if you're good, then something like a chef or a puppet automatically takes over and configures the node for you. So we want it to keep trying. That's also true for when we're renewing a lease. We don't want it to try and renew the lease, fail to renew the lease because it's some transient network issue and then say, well, I'm giving up. I'm unconveering my interface and, hey, admin, come reboot me or manually fix me. So it turns out, rel and sentOS and Ubuntu all by default want to try once and then give up, which is not very cloud friendly. So you need to change that. So on rel and sentOS, it's pretty easy. Again, in your ifcfg file, you just say persistent dhclient equals yes and rel does the right thing. With Ubuntu, it's a little fuzzier. Ifup, which is how, in general, with if the way networking is brought up in Ubuntu is using the ifup binary. And when you've got your Etsy network interfaces file configured for DHCP, ifup sees that and it kind of runs through a set of steps. The first thing it does is it says, well, or one of the first things it does is say, if I have dhclient3, if basically espindhclient3 exists, I'm gonna use that and here's a hard coded set of flags I will pass to it. And one of those flags that it passes is minus one. For dhclient, that basically means try once and give up. That's not ideal. So what we actually do is actually just remove dhclient3 from the Ubuntu images. We've also played around with it. There is a dhclient config file. Certain combinations of the timeout and retry values seem to cause the client to persist, but we've had variable luck with that depending upon the version of Ubuntu. And we know for a fact that if we remove dhclient3, it doesn't run it with a minus one flag. Falls back to dhclient, which case anyone's ever looked in Ubuntu, dhclient3 is the same thing as dhclient. It's just a sim link. So removing the sim link actually is, for all intents and purposes, it doesn't really change much other than ifhub says, oh, I'm gonna use a different set of arguments. I'm not gonna give up. Hot plug support. So this is about sender volumes. If you have a volume, and it's attached to the time you boot the image, it's attached, great. Your kernel hopefully is configured probably with the right drivers, sees the block device, configures it, mounts it, and you're off to the races. But you don't wanna have to reboot an instance every time you add a volume. That's one of the cool things about senders. I can dynamically add, remove volumes. But your OS has to support that. So for Linux, that basically means you have to have hot plug support available in the OS. A lot of kernels come with this either statically compiled or already available. If the one you're using doesn't, it's pretty simple to enable it. Basically, you need two modules to be enabled. The ACPI, PHP module, and the PCI underscore hot plug module. If you make those available, once your OS is booted and you attach a volume, if you look at your kernel log, what you'll see is it'll say, oh, look, I've detected a new SCSI device, configuring it, making it available, great. Once you've unmounted it and you do the detach through sender, then you'll see it say, oh, hey, this disappeared. I'm cleaning it up and all that sort of stuff and works fairly reliably. Putting it all together. So been up here talking for almost 40 minutes now about different things. What does it all mean? Well, this is kind of the one sheet that talks about this is what we do. One, we use AMIs. We store all our images and glances as AMIs with associated AKIs and ARIs. Why do we do that? We do that because an AMIs raw disk format has no partition structure. That means that cloud init has an easy time resizing the file system to be the maximum size of whatever you've set your disk to. So this means we can actually store something like 100 meg AMI because that's all the data that we need and then the flavor can say 100 gigs and it's simple. Nova sets the QCOW2 file size to be 100 gigs and then cloud init says, all right, I'm growing that file system out to the maximum size that's available. That allows us to store very little data in the image, only what's needed, but support a wide range of actual disk sizes. We use raw backed dynamically allocated QCOW2 instance disks. Why? Raw does perform better, yes. Depending upon your case, that could be as little as 3%. So why do we do this? Well, for one, it's faster instance launching. We don't have to copy the raw disk file every time we launch an instance. In a lot of cases in a cloud, you don't actually have tons and tons of different images. You have a lot of VMs, but you start from a fairly base, well curated set of instances. Over time, fairly quickly, in fact, all of those images get cached on your hypervisors, which means if you're using a raw backed QCOW2, all you have to do is create basically a QCOW2 file that says I'm backed by that, make it dynamically allocated so I'm not spending time allocating blocks and you launch. That vastly improves the performance of launch time. It also increases storage efficiency, as I talked about earlier, because you don't have to preallocate those blocks and you're based on the same backing file. You don't make multiple copies. You actually achieve a higher level of savings on consolidating your storage. Like I said, snapshots. Until recently, basically there was no good support for snapshots with raws. QCOW2s have it natively, has a better snapshot infrastructure, is better tested. CloudNet, we use CloudNet. I spent a decent amount of time talking about it. Pretty obvious what the wins there are. It allows us to resize the file systems. It'll even restructure your partition structure if you want. And then properly prepared OS. I talked about those things. What's our authentication model and why? How you prep some of the networking stuff and why? And then hotplug. How do you make it so that your sender volumes are dynamic? Finally, more information. So, the OpenStack conference is gonna be posting a video of this talk as well as these slides at some point on their page. I have it already directly available. There's a tiny URL for you guys. I guess it's not so tiny. There's my contact information. My email's there, slightly off you skated since this is gonna be posted publicly. I'd prefer to not get spammed by automatic bots. You can also catch me on FreeNode. I'm C Burgess. I am an active contributor to the NOVA project so I hang out in a lot of the OpenStack channels. So, just say, hey, C Burgess, blah, blah, blah, blah, and chances are I'll respond. You can catch me on Twitter if you want. And today, or well, during the conference, I'll actually be available at the MetaCloud booth, which I have placed there because I didn't know where it was until just recently. The show floor is just right out there in the B exhibit hall. Come in there and look for the booth that says MetaCloud on it. And these are the times I will be there. If you come up to me during these times and you ask me a question about this presentation or images in general, we can talk until we're blue in the face and geek out as much as we want about this topic. I won't try and pitch anything about MetaCloud or anything like that. So, we have about five minutes because I'm thinking I'm supposed to end at 12.40. So, I think we got a microphone. I can take some questions or... All right, I see your hand first, so. So, okay, so the question to repeat it both for the video and so everyone hears is how do you handle high priority OS updates? And then, do you update all the VMs? Do you update the backing files, that sort of stuff? Good question. There's a lot of options there. You could try and do something where if you're using a QCOW 2, pivot onto a new route. There's this notion of being able to take a QCOW 2 and chain backing files and recompress them. That is something you'd have to do by hand. There's no support for that in Nova at all. This notion of pivoting onto a new route or chain, well, Nova can support chain to backing files but there's no API call to do that. So, because of that, because it'd be a manual process. Our recommendation is to do one of two things. One, if your instances are truly ephemeral, if you've really orchestrated your application and your use of the cloud right and all your instances are ephemeral, build a new image and roll through and move to that. The other one is use your configuration management system. Hopefully, if you're running a cloud, especially at scale, you've got a good configuration management system and you can use your configuration management system to push out your security updates or your regular updates. But yes, to answer your question at the underlying level, you could do that. You have to be careful with is the instance running and keeping a consistent directory structure but since there's no support in the APIs for that, we tend to shy away from that. Because again, our goal with what we look at is how can you build clouds with thousands of VMs in them and orchestrate all this through the APIs? You with the C. Yes, so the question is, I mentioned that raws were much faster. I said they have a performance game. I didn't actually say much for a reason. And don't you get a benefit from having multiple backing or multiple instances backed by the same backing file? So yes to all of those things. So let me talk a little bit about that. Debating the performance wins of raw versus KUKAWTU, I could have stood up here for 40 minutes and done an entire talk on that. There is a performance gain with raw. As I've said, it's just a flat directory structure. So in that regard, you have no overhead of KUKAWTU having to dynamically allocate blocks and that sort of stuff. How much of an overhead? Well, that kind of depends on your workload and what you're doing, right? How fast is it really? We've actually seen a single VM with a back end to 8,000 IOPS per second to a KUKAWTU-based file and do a little over 220 megabits per second to that file. So you can actually make KUKAWTU's run really fast if you set it up right. You also talked about don't you get a performance gain from having multiple instances backed by the same raw? Yes, you do. So we're talking about KVM and KEMU here. So that means we're running Linux on our physical host. Well, if you understand the way the Linux file system caching works, what happens is any available memory, aka memory that isn't being used by a user land process or the kernel for some specific task, the kernel views as a pool it can use to cache data. And one of the things it predominantly puts in there is a file system cache. So if you have a hundred VMs running on a single physical machine and they're all backed by a single physical file, that single physical file is basically kept by your kernel in your file system cache assuming you have enough memory. So you get a performance benefit there. You've cached in essence your entire file system. Now over time those instances will start to diverge. And so it becomes more difficult to do that. And if your performance really starts to suffer, you need to look at, as I talked about earlier, either manually pivoting on the new routes you can compress or simply roll a new image that contains all the updates you've made and roll through your cloud and change onto those. So, any more questions? Over there, yes, you, sir. Mm-hmm, yes. Yes, so there are certain conditions where you'd have to regenerate those rules. Once the instance is booted, if it regenerates it, in theory it's gonna regenerate with the hardware address that it's got and hopefully that doesn't change. Yes, if you snapshot you gotta reprep it and stuff. So yeah, you either need to make sure it doesn't get removed or I think one of the files I mentioned in there is the file that the generator uses to regenerate the MAC addresses or that portion. So if you remove that file, you can let you dev rerun because there's other things it can do for you but it won't change the hardware stuff. Yes, a package update will re-add that. That is true, I didn't cover that. So yeah, you have to make sure and be careful as you're using a snapshot to try and promote to an image that you kinda, what we recommend people do is if you're taking a snapshot and then you're gonna turn that snapshot into an image, you wanna run through before you promote that into a public image and make sure that it's been prepped properly and you haven't lost any of your prepping. And Dorf, if that box had been configured for some specific purpose, do you have a bunch of users you don't need anymore? A bunch of other applications that are not gonna start that you don't need, so. Yeah. That'll do the resizing of the partitioning? Right, yeah, it's not done automatically at first boot in the, okay, that's good to know though. Okay, well that's good to know. I actually did not know there was a version in Infadora. Cool, I think I saw, I think we have time for maybe one or two more questions. I think I saw someone, yeah. Oh, great, so there's a Dragon module for it now. Yeah, yeah, some of this is based on when we started working on it like two years ago, so. Okay, so there you go. So apparently when I talked about that only Ubuntu supports the repartitioning, it sounds like there's now a Dragon module. Cloud Utils, I think is, you both mentioned it. So look for Cloud Utils, sounds like it's part of Fedora, but maybe not in EPEL or in the RHEL repos yet, but it's pretty easy to move a Fedora package over. So I think we have time for one more question. Is there, down there? Yeah, so parameterizing the username we use or parameterizing additional user data? Yes, yes, so Cloud in it itself actually, when it's pulling data from the metadata service of which there's several, it's fairly well documented. You can actually set a number of key and values for user data that will change the way Cloud in it behaves and what it'll do. The Cloud in it URL I put up there has the documentation for that. I think there's actually like 20 or 30 different metadata keys it would understand and do something different and unique with. So yes, that does work. You can actually really get advanced with Cloud in it and the metadata service and integrate those together and then have it bootstrap, something like your chef or your puppet and really get this kind of all the way through and in provisioning and that's actually what we encourage our clients to do, right, is this is the first step. This is just getting you an OS that's running and now you use the metadata. Use, you know, integrate it with your chef or your puppet install and go all the way through. So, okay, thank you, we're out of time.