 Morning everyone. So this talk is going to be, as you all know, about Libvert, Nova, and KVM and how they basically all interact. I think it goes without saying that when most of us think about OpenStack, we're thinking about the open approach to OpenStack as in the approach whereby you're using open source components to deploy the entirety of your solution. Now that's becoming more of a robust offering as the OpenStack platform spans out and there's more specialized hardware becomes available, but by and large what we're talking about is the solution whereby you can take everything off from open source and ultimately deploy it. So with that, we see most commonly that KVM ends up being the hypervisor of choice and I'd argue to state that it's probably the most feature rich. So we're gonna talk about how some of the functionality ultimately works behind the scenes within Nova as it deals with Libvert and KVM. So a little bit about me. Why am I, as a CTO of a company up here talking about Libvert and the internals of the way the code ultimately functions? Well, prior to sort of our company growing this was really my core area of focus was to understand really how this interaction all worked and what we found out pretty much the hard way as we were going through our R&D process for building our product is that we had to really learn and understand how the hypervisor was functioning, how a given compute node was ultimately functioning. Each of the different operations within Libvert has a series of complex interactions on the hypervisor which ultimately can play out a number of different ways depending on the way the system is configured. So one of the things I'll preface this entire talk with is the fact that we're by and large talking about a default configuration. So very similar, for example, to what you would see if you were to stand up a DevStack. Obviously, we're talking about a multi-node deployment but we're talking about KVM with QCOW 2 basically just using standard file system so on and so forth and I'll call out the areas in some sections of the presentation where we're making that assumption but to try to cover all the different code paths would have been a talk that would have taken at least the entire day. I myself made some contributions to know if I can't say I've been contributing heavily but the areas of focus that I've had over the last, it's been at least one commit cycle that I've not been able to get anything in but prior to that the focus was hardening the operations that are there basically going through the process of understanding what the failure scenarios are and what we could do to address them and make them more resilient. And one of the more major features I contributed was live snapshots and I'll talk about later what that is, some of you might already be using it today and some of you might find an opportunity to go and start using it once you see how it works behind the scenes and understand what it takes to deploy it. MetaCloud has a unique perspective and the reason for that is we're in the unique position of actually deploying clouds for our clients. So by doing that what does that allow us to do? We actually have had production environments in place running OpenStack, running Nova for upwards of two years. So one of the primary differences there is that in a lot of cases you hear about, okay I've stood up my cloud, let's say it's Havana based or it was Folsom based or something else or Essex based. A lot of times what you've heard is the migration strategy or upgrade strategy for that is, well build a new one over here and migrate everyone over to it. Well we don't have that luxury and that's quite frankly not how we operate. When you bring up a production cloud the intent is ultimately to keep that running for the duration of that cloud. Basically it is production until it's not so it's expected to see upgrades all the way through its life cycle. So what's different about that? Well if you think about Nova and the way the development process is done ultimately what you're factoring for is data migrations. You're not necessarily thinking about things like oh the version of CUMU that I'm deploying has changed, the version of Libvert I'm deploying has been changed. And all of these changes actually have implications to the life cycle of the environment that need to be considered. And we're actually working with upstream to get some of our discoveries and adaptations as part of this process into open source or upstream open stack. Right now it's in our distribution but a lot of it is coming. So our team is based on a group that has large infrastructure background. So again it gives us a unique view, lots of clouds, lots of different deployment structures and really what's most interesting as it pertains to our conversation today is the type of problems that we find and eliminate. So hopefully we can talk through some of what problems we have found and save you guys from the same things. So let's talk about some of the fundamentals here. The first component that we're talking about is CUMU. Now a lot of people still say KVM. It is the kernel virtual machine that is correct but what used to be the case is that the CUMU project and CUMU KVM project were sort of diverged. One was a fork of the other. As of CUMU-13, those have reconverged and is now a single project. So the way to really think about this is that it is CUMU with hardware acceleration. So you've got your VT extensions or the AMD equivalent. At that point, CUMU will make the determination or live via CUMU will make the determination that you can be hardware accelerated and it will bring everything up accordingly. Nothing about OpenStack or nothing about Nova and the compute driver in question really interacts directly with CUMU. All its interactions are abstracted through LiveVirt and LiveVirt does a great job of doing this. It actually spares us from having to solve a lot of problems we would otherwise have to solve. So CUMU, and if you were to look on a system that's running a number of KVM or CUMU instances, what you would see are basically a bunch of independent processes, CUMU system, you know, x8664, that each one of those processes ultimately represents a hypervisor. So what LiveVirt is doing is interpreting via its configuration, interpreting the configuration for a particular instance and launching CUMU with the correct command line arguments to match that configuration. CUMU also inherently provides a monitor interface. The important thing about the monitor interface is really the way it's working is that is the API into CUMU. LiveVirt is using that, but it is not something a user should ultimately consume themselves when they're running LiveVirt. LiveVirt's basically intended to handle all of that and in fact, if you were to interact directly with monitor through LiveVirt, CUMU or LiveVirt would flag the domain as tainted at the first point that you did that, which really the implication it has is support. Aside from that, it doesn't really have an effect. So here are, you know, I have a table up here which kind of shows the different versions of CUMU that were basically provided by the Ubuntu project as part of its cloud archive in conjunction with any particular release. So for example, with Grizzly it was CUMU 1.0 and now with Icehouse it's CUMU 2.0. I encourage you guys to go and take a look at the change log between those two to get an idea for how rapidly the CUMU project is actually moving and the sort of functionality that's becoming available because it also will provide some insight on the direction that the Nova driver can ultimately go and what capabilities it can provide. LiveVirt, as I mentioned, handles all the interactions with CUMU. Everything that LiveVirt does is defined in XML. So any configuration, any instance is defined in XML and that XML is ultimately translated by CUMU to launch, or sorry, translated by LiveVirt to launch a CUMU process which LiveVirt refers to as a domain once it's up and running. There's a lot that you can do with LiveVirt for each of your individual VMs using the command line tool VERSH and VERSH is going to be available on any one of your systems running LiveVirt. I put up the XML reference here because I encourage you guys to go look at it. Basically, if you go VERSH and you dump XML on one of your domains, domain being one of your instances, you can get a basic idea of what Nova is doing when it's creating one of the instances, all the device representation so on and so forth. We could consume an entire talk just going through some of the common options in the XML definition but Nova's handling that XML for you. There's a config generator built in to the compute driver and ultimately it's all available in the reference for you to see what's there. Now, again, and here's the really important one. With Grisly, we had QMU 102 and now with Icehouse, we're talking about QMU 122. There are some pretty big differences as we're going through this. Those version numbers actually look wrong to me but we'll talk about what the implications are either way. So let's talk about some of the integration. So ultimately, there's a couple parts of the code that you should care about. Now, again, we could spend a lot of time talking to how we ultimately land at the driver to get to the point where we're executing our operations but let's talk about what the interfaces are. So basically assume we've already got to the compute node. The reason that I'm calling out the sections that you see there in the compute manager is because ultimately there are a number of calls which start by, for example, you run a Nova boot on the command line. Well, what API call does that actually translate to and what internal Nova API ends up actually being called as a result of that? Well, to really follow and trace some of that code, a lot of times you have to go back through the compute API. So the compute API that I'm referencing is the internal API into the compute driver. From there, what you're gonna call is the compute manager and then the compute manager is going to call the driver and then there's a set of utility functions, utility classes that are available within the Livert directory. Now, why do I talk about any of this? Well, the reality is in a lot of cases you will need to read the code. This project is evolving so rapidly and there's so much functionality becoming available and the process and flows are changing so frequently that it's an incredibly difficult proposition to keep the documentation in sync. Now, the doc team does a great job of doing this but oftentimes it's just moving faster than people can keep up with. So really if there's more contributors needed as there is for any project, docs are one of them, mainly to help keep up with this flow. So get comfortable reading the parts of the code that you are using because it's something you're gonna need to do pretty often. You're gonna be asked the question, what happens when I have this configuration and I combine it with this operation? Well, sometimes really the only answer there is gonna be take a look at the code and over time as you look at it, it's actually written in such a way that it can be read and understood without too much of getting in the weeds with the rest of the code base. So let's start talking about some of the operations and what's happening behind the scenes when you actually go about and run them. So as I mentioned, a lot of these don't necessarily map one to one. So what I mean by that is you're calling boot via the API and ultimately what ends up being called via the driver is spawn. So let's talk about how that happens. So ultimately you go through the API, you go through the scheduler, you go through the compute manager and you get to the libvert driver. The libvert driver is going to handle all the interactions that are necessary for bringing that instance and getting it up and running. So one of the things that it'll do, for example, is it'll make the request to say, give me my network information. And again, this is where we get into do I have neutron? Am I running Nova network? But by and large, just assume there's something there that's going to answer and is gonna give you back the network information you need and utilize the local networking drivers to be able to execute and bring up the necessary interfaces. So one of the common misconceptions there is, well, I thought that my IPs and all of my networking is ultimately gonna be handled by the scheduler. No, the compute node says, okay, I'm at the point where I need networking information, let me go ask the network node. So the first thing that happens, we're going to create our disk files. The disk files are going to be grabbed and placed into Glance. And my colleague did a talk two summits ago about the variety of different image formats. It's still up and you guys can take a look at it. It basically covers all of the different image, supported image types that you can use and what the implications of using them are. But what happens by default out of the box is to convert your images from whatever format they're in to be in raw format and place them into your base directory. So if you go look at your instances directory, what you'll have in there is a directory called underscore base and there'll be a number of files that are placed and there are a number of images that are fetched and downloaded from Glance and placed there. Think of that as an image cache. Now anything that's there really is being used by an instance and there's a process within Nova that'll go and clean up anything that's unused but that's largely your image cache. The next thing that'll happen is you'll go and you'll create your instance directory and the instance directory, again if you look at the Nova tree you'll see a bunch of UUIDs in it. And then you'll create your disk files. Now you've got three types of disks that can be created, disk local which is basically maps to if you have an ephemeral value set on your flavor type and disk swap. Really, I can't emphasize this enough. You shouldn't be using swap in an environment like this. It ultimately, memory ends up being so cheap these days that you're better off just allocating enough memory and then really what are you doing by using swap? You're moving the problem. Now you moved it from memory to disk. Now you're starving potentially the other instances in the environment. So think of an environment where if everyone was going using swap, how would that look? It'd probably be a pretty miserable experience for everyone. So there is one special case that needs to be considered here and you'll see this in a default even open stack deployment or a default Nova deployment and that's when you've got an instance or a flavor type that has a root disk size of zero. What does zero actually mean? It has special cases throughout the code base. Really what it is is I'm not going to touch the original disk file. So I download an image from Glantz. If that image was defined to be two gigs, four gigs, whatever it is, means I'm not going to modify the virtual size of that disk and for ephemeral and swap it means something different. It means don't do it at all. So with swap you should hard code that to zero. So the next thing that'll happen is we'll generate the XML file and that's generated for you based on all the information about the instance, the location of the disk, so on and so forth. You'll notice that Nova places this XML file in the directory. That file is actually never used. That's for your use as administrators. So if you wanted for whatever reason you need to perform an operation that requires you to do something to redefine that domain to basically anything that you need to do at an administrative level, that file's free. You can go delete it. It'll actually get recreated but the point here is that Nova does not use it at all. It's generated as it's needed, placed in memory and then dumped into Libvert directly. The next thing that'll happen is that we'll establish volume connections. Now, this is an area where you actually will have completely different code paths depending on the type of volume back end if you are using volumes that you have configured. So for example, in the case of iSCSI at this process will mean just go and set up all my targets. It doesn't necessarily mean do anything with the targets but have all the connections in place so that the devices are ready and can now be used. Then build the network stack. There are a number of things that need to happen on a KVM node for networking to come up. You need your bridges, in a lot of cases you need your VLANs. You need your IP tables rules. All these things need to be built and in a lot of cases they are built and rebuilt each time but we're talking about the spawn process and the reason we're going into so much depth about the spawn process is because ultimately we're going to come back and reference a lot of these operations because we're going to redo them many times. Yeah, so now that we've got our XML we're actually going to define the domain. Defining it means we're basically telling livevert, okay, this is a persistent instance. This is the configuration of the instance. You're going to be managing it on this particular hypervisor. The equivalent call if you were to do this via versus doing a verse define. So for example, that XML file that I told you that Nova never uses, if you decided you wanted to define the same instance on another hypervisor. Now there's a lot of dependencies that would need to be satisfied. You could take that XML file and use it to define the instance elsewhere. And then we go through and actually do the start because we've got everything that we need to be able to run the instance. Now a failure during a spawn should really be considered something that should not happen. Something went horribly wrong. There's either a misconfiguration, there's a lack of resources. It could be a variety of different things, but there's no way to really retry spawning an instance. The best thing to do, delete the VM, try again. And really the next thing you should do is understand why it failed or maybe that's actually the first thing you should do is go and understand what exceptions were triggered, what failed as part of this process, resolve that and retry the spawn. Now it might not happen again directly. Well, at that point you might have a problem that's specific to one hypervisor out of many. One of the more common ones that we see is I tried to launch an instance and it failed due to scheduling. Well, it can fail due to scheduling because the user was too constrained in what they were requesting. Maybe they were requesting a particular aggregate or passing a number of different schedule or hints that are available to determine or be more specific about where an instance is gonna be placed or you just ran out of some resource, VCPUs, memory, so on and so forth. But that'll manifest and turn back up really quickly and be really clear in the logs when you go to look at it. Reboot, this is probably one of the more common ones that you are going to use over the life cycle of a VM. Now there are two key types here and they behave entirely differently. So the first type is gonna be a soft reboot and what we're talking about is basically making an ACPI call or passing an ACPI event through Livevert into CUMU. It's the equivalent of, for example, it's rebooting or hitting Control Alt Delete in Linux and having it do the right thing and go into Run Level 6 and then ultimately come back in and reboot. You're basically asking the operating system at that point to handle that event. Hard reboot on the flip side is a much more destructive operation, let's call it, a much more thorough operation and one that's involving the hypervisor and all its supporting components more directly. This is one of the areas that we put in quite a lot of effort and it was probably two cycles ago that we went through and the goal was to make hard reboot be sort of the sledge-o-matic of fixing all your issues. Go through and whatever's wrong with the instance, don't assume any state about the hypervisor and go through and make sure everything is back up and running. So for example, you come back up after an instances or for whatever reason an instance fails due to some operation, maybe a snapshot failed, something to get out of that state in a lot of cases, you can ultimately reissue a hard reboot to be able to correct those problems. Now around the grizzly cycle, I believe, functionality was added to the API which was previously required a database change. So you used to have to go in and say, okay, well, if I wanna do a hard reboot which would fix my problem, I need to go and make a database change which will allow me to do it because otherwise you'd be constricted at the API layer. The API will say, well, no, I'm not gonna reboot that for you, that instance is snapshotting. Well, when you know full well that the instance has not been snapshotting for a whole day, chances are that failed and you've confirmed it and you wanna recover that instance now. So what you can do is reset state active. Now, you need to be careful when you're doing this because what you're telling Nova at this point is, I don't care what state you think this instance is, represented as active, at which point you can perform a variety of operations on top of it, such as a hard reboot. The truth is hard reboot can handle a lot more states than are currently being allowed in Nova. Nova restricts every single operation to say, if I'm in one particular state, don't allow this operation and you can go look at the full list if you were to look at the code to get an idea of what's allowed and what's not allowed but basically hard reboot can do a lot of these things and using reset state can allow you to get to a point where you can do a hard reboot. So a combination of two quick CLI calls can get you out of a lot of jam. So how does it work? The first thing that it's going to do, and this is the destructive part, we're actually gonna go and kill the domain. It's the equivalent of doing a verse destroy and verse destroy is another way of basically identifying the process and doing a kill minus nine. In fact, if you wanna do a kill minus nine, it's not that much different honestly. So assume that you're at the point where there's no way to gracefully recover this instance. You're gonna kill it and you're gonna bring it back up. Well, maybe something happened. We lost our volume connections. Let's reestablish all of those. Maybe deliver XML. Someone went in there and made some changes and made them persistent and you can make those changes manually by doing a verse edit on the domain, on the hypervisor. That's a whole other conversation diving into more Livert internals. But let's just say that someone went in and did that and they made a change that didn't take well or Livert accepted it and now you've got a configuration represented there that is not working. Well, we'll regenerate it based on data that's coming out of the database. We'll redowload the backing files. Let's just say someone went in and mistakenly deleted some backing files. Chances are you toasted that VM anyway because it was up and running and dependent on that backing file and now you've done something to it. So we'll validate that the backing files are all available and repopulate them, redowload them from glance, convert them to raw so on and so forth and bring them back up. And then we'll replug the VIFs. So again, make sure all the bridges are there. Make sure all the VLANs are there so on and so forth and reapply the IP tables rules. So you can see a lot of what's happening here is what happens during a spawn. It's all the portions that ultimately end up being the idempotent operations or operations that we've gone through and made idempotent and taking the opportunity to call anything that's safe to assume nothing about the state of the instance. Suspend. This one, you know, there's a couple of... I have mixed feelings about it. It's for a long time, suspend was really the only way via horizon, probably dating back to, not too long ago that power off and power on made their way into horizon. But because of that, suspend is something that a lot of users learn to rely on. And there are a lot of dependencies there that people don't really think about. The call on the back end, and a lot of these are associations I'm drawing. Obviously, Livert is not going and calling Verge anything. Livert's using the Python bindings. But just for the sake of the conversation I've been referencing Verge. So the name is really misleading. It is not suspended in the way we all think about it. So when you go and you suspend your laptop, what's actually happening? Well, everything's being turned off except the memory which is being powered so that you can preserve the state of memory. Well, this is a lot more similar to a hibernate. So what you're doing is you're taking the contents of memory and writing them out to disk. And there's a managed save file from Livert's perspective. The managed save file contains a lot of things. It contains the Livert XML, it contains the BIOSes, the BIOS blobs that were loaded with the VM. And yeah, all of the instances memory. So all of the instances memory is really the key part here. So you've got a VM that has 8, 16, 32 gigs of memory. That's all got to be written somewhere. So you have to think about the fact that just by virtue of issuing a suspend, your users now have the ability to go and write lots of data somewhere. And you have to account for that. And what's frequently not done is the accounting for part of it. In fact, Nova's not tracking that disk utilization. So it's an additional cost that needs to be factored in if you are to allow this functionality. The other thing that occurs when, so let's talk through actually what it is that occurs when you go through and you do a suspend. So you're issuing that managed save. Livert is going and bringing down that VM. It's saving the memory state and bringing it down. What's happening there is the cumul process is actually being stopped. So once the saved memory state is written to disk, the process is actually stopped completely. So what can happen here? And we've actually seen this. We've had situations where we had the process stopped, maybe it stopped and the instance was suspended for months, right? And between that period of time, there was a cumul upgrade. And our cumul upgrade tested just about everything. Or now we've spanned several cumul upgrades and we had only accounted for one. Well, now we go to load that managed state file and cumul is not able to load it. Now, by all accounts, that is a bug and one that should be reported by cumul. As far as I have understood, that should not occur. But it does. And there's a number of different versions that are known to not be compatible with one another. And this impacts actually not only suspend but live migration. Basically anything that's interacting that deeply with the memory state of the VM and making expectations about where specific things are in memory or what the CPU registers are and so on. And then once you're ready, let's just say we've gotten through all this, we got the same cumul version. For the most part, it works. You go through and you do a first start and that'll ultimately go through and resume the instance. And that's what Livert is doing on or the Nova driver is doing on the back end. It's just issuing a start. Livert says, okay, I've got a managed state file here. I'm gonna use it to load that and you're off and running. Live migration. There's again two types of live migrations, completely different code paths. I don't know how many people actually use block migrations or even know that it exists, but it's cool functionality in that the traditional, in the traditional deployment prior to KVM. And I think a lot of the hypervisors now have this either as an additional licensed feature or built in and included, but you now have the ability or have had the ability for a while now to be able to move your root disks out along with your instance. So generally in sort of legacy deployments or the more classic deployment, you have NAS to require this or you needed NAS to be able to do these live migrations across them because both the source and destination needed to have the disks available to it and all you'd be basically doing is synchronizing the memory state. While now we've gone a step further, we can move the disks out along across that, which is great because one of the areas that's very difficult to scale is a centralized NAS deployment. It's hard to plan ahead for that and there's a number of different reasons why it's a problem that you would have to deal with by going that approach, but generally that is not the recommended, not the most scalable solution, but is one that a lot of folks do prefer because ultimately they trust their NAS solutions and there's a certain level of assurance that you're getting with a NAS platform like this. Now I put SAN up here for completeness, but SAN isn't telling the complete story. You're also gonna need a clustered file system over top of this if you were doing it. Block migration, of course, has no special storage requirements. We'll ship the disks along and the important part I wanted to note here is that live migration is actually a really sensitive operation as it pertains to what we were talking about earlier with suspend and it's that the versions of QMU are pretty sensitive because we're dealing with memory synchronization. Now, so a lot of the heavy lifting here is during the live migration process is handled by LiveVirt. There's very little that Nova's doing to facilitate a live migration. It's actually doing the setup on both sides. It's doing the setup on the source and on the destination. We'll talk through what that is here. We've got a couple of slides on live migration. So what's happening during a live migration? The first is despite whether or not you pass block or not, we're gonna make sure that that option is correct. So if you pass block and it's shared storage, we're gonna detect that. If you're assuming it's shared storage, in other words, you didn't pass block migration, we're gonna make sure that it is in fact shared storage. So we're gonna do that check inverse and forward and backwards to make sure that you've got this correct and not be able to stomp all over your data. On the destination side, to make sure that we're able to have the memory state sync and to be able to synchronize all the disks, we're gonna create all the volume connections. And if it's a block migration, we're also gonna go through and create the instance stir because of course the instance stir is not there. So we need to go through and create things and download the right images from Glantz, including RAM disk, kernel, if you're talking about an AMI, and then generate some empty disks for Libvert to fill into. So Libvert isn't actually gonna create the disk, it's expecting the disk to be there and it'll stream that data into the disk. And then on the source, we'll issue the live migration itself. And the live migration does a lot that we could spend, again, a lot of time talking about, but Libvert's handling this for us so we'll assume it's gonna do its job. It even goes so far as to do the comparison whether around what sort of CPU options you've defined on the source side and the destination side to make sure that those are compatible and we'll come back and tell you, you know, I'm not allowing this, it's a scenario that's not supported and block you out of that. And then upon completion of the instance, successfully migrating to the destination, we would have the XML regenerated and again, dumped into the instance directory. Resize and migrate. So pretty often what you'll see is when you're doing an operation like a resize, if let's just say you're looking at horizon or you're looking at the CLI, what you'll see is I'm doing a resize, or the status says resize, migrate, why is that? We don't actually differentiate within NOVA between these two operations. They are very similar in their function and in fact use the same code paths. The difference between migrate and live migrate, they have nothing to do with each other in one another, is that live migrate is intended to be running or is managed entirely by live vert and migrate doesn't use the vert at all. It's basically intended to be run cold. The one of the caveats of having migrate even function or resize function within the environment is that you have to have SSH key pairs deployed across the environment and generally what this means is past phraseless SSH keys but there are ways to actually secure this so that you're doing better than past phraseless SSH keys but generally the way folks have gone about this is they'll secure a certain segment and have all of their cross hypervisor traffic run over that secure segment. So yes, if you compromise that secure segment, you're compromised but there are lots of other services there to think about. The security implications aside, you do need SSH keys deployed. It'll need to, they'll need to be deployed for example, for the Nova user, assume that they're shared key pairs out across them and that you've got them deployed to all your hypervisors. Otherwise the first thing that's gonna happen is you will immediately fail to do that migrate. Now again, resize migrate being the same code base, the truth is they work pretty much the same way. When you do a resize, what's gonna happen is you need to determine can I even fit on this host? If I'm an eight gig memory VM and I'm trying to go to a 16 gig memory VM or same thing with disk or a variety of other criteria, am I gonna fit on the hypervisor I'm on? Well we're gonna ask the scheduler regardless of this and we happen to be one of the available targets so if I'm on hypervisor one and I'm running the resize on my VM, I'm also, I could be a target if I've got allow resize same host is true, the default is actually false. So in all cases with the default configuration you will be rescheduled onto a different hypervisor and then resize is gonna constrain you beyond that from shrinking down. It's not safe to shrink down a disk, you could have all sorts of data and partition tables and basically there's a lot that needs to happen within the disk to be safe so we're actually guarding against that within NOVA by just not allowing it in the first place. So the workflow, this is an area that we have talked about among the NOVA developers for quite some time. I put myself in NOVA developers but I haven't contributed code for a while so hopefully this is one area that I can actually contribute again. So you can see by the fact that we're using SSH to copy files around and the fact that it doesn't fit into the current migration framework that this is sort of an outlier. It needs to be refactored. We're still discussing ideas or the whole NOVA group is discussing ideas for how to approach this and we've got several of them but the point here is that no one thinks that this isn't sort of an acceptable approach. It works but it's something that needs to be refactored. So one thing or a couple of things to know about resizing or migrating, you're gonna do an ungraceful shutdown of the instance. So expect to, if you haven't shut down your applications, you haven't, you know, quiesce your fosters, it's like going and hitting the power button. So plan accordingly for that. And then here's how it all works. We're gonna move the current directory out of the way so you have your instance directory with its UUID. You're gonna move that to an underscore resize directory and the new instance will actually be built or the resize instance will actually be built in a temp directory. So we're gonna take out, we're going to go and reconstruct the new resized image within this temp directory using QMU image and so on. But the first step for that is extracting the image and or extracting the disk and making it such that it is flat. So in a default configuration, all instances have backing files. So you're basically deployed QCAL 2 with raw backing files. Well, as part of the resize or migrate process, what you're gonna do is flatten that file. So if you've got a base image, that's 50 gigs and you were relying on copy on write and using a backing file to keep things small. Well, immediately at the point where you do your first resize or your first migrate, that instance is no longer gonna have a backing file. So it's gonna be the full size of the base image plus whatever your delta is. So something to keep in mind. In almost all cases, and this becomes more the case the larger your environment is, you're going to be moving that file around. So you just extracted this image. It now does not have a base file, so it's quite large. Now we're gonna take that really large file that we just created and SCP it. So needless to say, that is extremely, extremely slow. SCP is probably one of the worst file transfer protocols because it doesn't do dynamic window scaling and a number of other things and you've got encryption overhead and so on. So it is going to take some time. And with instances these days having in some cases hundreds of terabytes of root disk, it could take hours. Snapshots. And I think I'm gonna have to speed up a bit here. So there's two areas with snapshots or two main sections, two main approaches to snapshots. There's live snapshots and cold snapshots. Live snapshots were actually introduced with Grizzly. So prior to Grizzly, what you had worse, all snapshots would ultimately require or cause some amount of downtime. You'd be suspending the instance for the duration of time it took for you to copy the data out from the disk file. Well, live snapshots solves that problem. But quickly running through what a cold snapshot is doing. The first thing that's gonna happen is we need to get the instance to a point where Livert sees it as shut down. And suspended or managed save is one of those states. So we're gonna take a look, whatever it is, if it's paused, whatever it's gonna end up being such that that process is ultimately not running. Once it's shut down, we'll convert it using QMU image to a flat format. Or sorry, the same format, not flat. The same format as the image was launched from when the instance was created. So if I was a QCOW2, my snapshot coming out of that instance is also gonna be a QCOW2. Once that copy is complete, and as soon as it's complete, the instance will be brought back up. So the copy or the upload into glance will actually occur in the background and now your VM is in theory back up and running. Now live snapshots work a lot differently. There are two requirements to being able to use live snapshots. A lot of this will automatically be satisfied by a lot of the distributions that are being deployed by OpenStack today. And that's having QMU. Basically the important one here is QMU13, but you also need Livevert10. I note there that the QMU version being correct, the QMU version check is not always correct. And the reason for that is the QMU version that a particular instance happens to be running is that which was installed at the time the instance was launched. Just to clarify that, if I launched an instance and my installed QMU version was 1.0, I might have gone and upgraded the package to 1.2. Well, not everything's gonna be 1.2 until that instance restarts. Guess what Livevert's gonna report? Well, it's gonna do the right thing. It's reporting what's installed on the hypervisor. There isn't a call today to be able to ask Livevert what version a particular instance is running. QMU, so for a Live Snapshot what we're gonna do, we're gonna keep everything running and we're gonna establish a mirror to this disk and ultimately extract it back and upload it into Glance. So this is the direction you really wanna go if you're doing snapshots. Your users are generally not expecting a disruption here and they'll have one if they're not using Live Snapshots. And then I've just got a couple final notes and tips here, the big one again is you should read the code. The most important thing you could be doing if you're running OpenStack is reading and understanding the code. Debug logging of course and then make sure you're configuring everything that OpenStack and Nova depends on and not just Nova itself. So where am I putting my managed save files? It's pretty important. You run out of that, your suspends will always fail. So clearly this topic is one I could spend several sessions covering so happy to come back and do it again and get into more depth on this and talk about some of the other operations. So thank you everyone.