 Okay, so we'll go ahead and get started. Welcome everyone to part two of Under the Hood with Liv Vert, Nova and KVM. The last time I presented was in Atlanta and I hope you guys had a chance to view the YouTube video. If not, I have links all throughout and I'm referencing everything via a tiny URL if you wanna go pull these up. I'm also gonna post the slides once the presentation is complete so don't worry too much about taking pictures of any of this. It'll all be up on the OpenStack site and I'm sure the YouTube video will be up as well. Part ones slides are all there as well. I'm gonna do my best to cover as much as I can but as it turns out, this is a pretty deep topic. So chances are we will need a part three so if there's interest in that I'll submit for it for the next summit and we'll continue on this topic and continue diving in to this. So a quick review. The last summit and the last time we did this talk we went through the high level flow of all the various operations that one would perform via the compute driver, diving into a decent amount of depth going through the flow. So what we covered was spawn, reboot, hard and soft, suspend, live migration, a couple other things and didn't really get into suspend at the level that I'd like to. I wanted to do that today but we're not gonna have time for it so maybe the next talk. So the one point that I noted and continued to emphasize during the talk last time was how critical it is to actually read the code and I'm gonna emphasize that today but the important part of today is explaining how it is you go about reading the code because as it turns out no matter how much we cover this topic in 35 or 40 minute sessions it's not gonna be as valuable as you understanding how and where to get the information that you need because it's also changing quite rapidly. A lot of the assumptions today are based on the ice house code base. A lot of that is going to continue to hold true for Juno but as I'll continue to emphasize it's important to review this on a regular basis to ensure that what you're talking about and that your assumptions continue to hold true. So understanding the code. I'm gonna walk through some of the NOVA code and LibVirt code to explain how it is you follow the code through OpenStack and through LibVirt. So read the code. I can't emphasize this enough. As much as the documentation team stays on top of everything that's happening within NOVA and OpenStack there's such a matrix of configuration options that it's basically impossible to test for all variants of that. Gate continues to expand. The variants that are tested continue to expand but that's generally encompassing new hypervisors, new drivers rather than additional configuration options. Where a subset or a change in configuration options can dramatically alter the code paths and flow through the LibVirt driver as they do for other sections of NOVA. So really what it is is prior to going and enabling or disabling a feature or changing one of the various options that are available as configuration make sure you're going through and reading the code. The methods and operations are also pretty frequently refactored. So at the last talk I covered sort of how resizes work and how many times data is copied back and forth. That's already changed in the Juno cycle. So that's become a lot more efficient and the only way to really know that it's sometimes noted in the release notes but it's really to go through and look at the code. And then yeah, so the number of options that are there from the standpoint of just the LibVirt driver if you're to go look at the LibVirt subsection of the config file is so vast that it can completely change how robust and how resilient the configuration of the deployment ends up being. So they're really the only way to understand what's going on there is to follow it. So first some basic architecture and how we're flowing through all of this code. So at the top here we're looking at NOVA compute. So we're only focusing on the detail from the context of a particular hypervisor in this case. Going up much further would be a longer talk. So NOVA compute has two major components or classes that we're gonna be talking about today and that's the manager and the driver itself which then call into LibVirt which are then talking to basically CUMU which is managing what are called domains. And domains as most of you are aware are the terminology that's used for defining an individual instance and there'll be multiple domains on a given hypervisor. So first let's talk about in terms of a code standpoint the LibVirt Python API. Why do we care about it? Well, it's important to understand how LibVirt is making the calls into the API so that you can really read the driver code because ultimately or in most cases what you're going to find is that a LibVirt call is made by and large this will hold throughout the code and you can see references I have links up. I just checked one of the links. Unfortunately the LibVirt site is not rendering it but normally it will render and that's the API link. So the important thing to note when you're looking at the API link is the reference that's listed there is the C library reference. If you're just to eliminate the ver on the front and lower case the next letter in the call and I'll show examples of this. The Python API is exactly the same as the C API. So just going through that reference can give you a good idea of what the capabilities are. And any sample code that I'm showing here it's really simple samples that if you happen to have a dev stack that is running hypervisors or running instances can be executed on them to return back some subset of output as is being produced by some of the code. So the first is a connection object. So to do anything with LibVirt to get started with this at all we need a connection object. So you can see the code for doing this is incredibly simple. The URL is just at this point we're pointing at the local system and there's a couple of different ways to point at it so you can do a TCP session for example or a TLS session. But this is just the simplest example by far. You create the connection and then once you've got the connection object you can start making calls into LibVirt that are intended to be executed on the connection object. So in this case we're simply making a call to get the version and you'll get the response back and it'll be the version of QMU that happens to be deployed on the system. The next type of object and this is the more interesting type that we'll be doing and showing more operations against is a domain object. So this represents an object associated with a single instance. Again, as we're going through the Nova code this will become a lot more relevant because as you're seeing calls made by the Nova LibVirt driver they're all made on domain objects or connection objects that are pre-established for you as part of instantiating the class. So you can see this is a pretty simple code sample as well where we're creating a connection object and all we're doing is calling a list domains by ID which is gonna return a list to us iterating over it and printing the name of the domain as well as DUID. So again, I encourage you if you've got a DevStack to later instantiate it, go ahead and run this. You can safely run this on any system that's running LibVirt and QMU. So now that we've got that we can start talking about how the integration occurs from the perspective of Nova and start looking at some of the calls to understand how it is that we're getting into LibVirt. So again, the two important files that we're talking about here and if you traverse into the Nova Tree it can be pretty daunting because the code base is a couple of hundred thousand lines at this point when you're considering tests as well. So it's manager and the driver itself. So the driver methods are where we're gonna focus the most today and the first example we'll start off with and only because it's really simple is pause. So first, how are we getting into the driver? Now if you were to go look at the manager code and trace it down to the pause operation and this code is pulled literally from upstream ice house. So you can just browse the GitHub repository and navigate to this and find this section of code. The part that's highlighted there is driver.pause. I wanted to also show and you would have to go scroll up a little bit to the top of the file in that code to see how the driver object's instantiated but for the sake of this discussion the driver object's instantiated for you and it's going to call the LibVirt driver. So you can see that's really simple. All it's doing is calling dot pause. So it's gonna call the pause method in the driver object and pass the instance into it. That's it. So from this point on, the rest that's outside of this is not dealing with the driver itself. Now what does the driver piece look like? Now this is where we start to see how simple a lot of this is and LibVirt really is what's making this simple for us because it's going and doing all the heavy lifting to talk to the QMU monitor and be able to actually execute on these operations. So pause. All that's being happened there is the first thing that's done is there's a wrapper around the call to return the domain object. So lookup by name. If you go look at that code, it's simple wrapper that goes and says exactly that. It calls LibVirt with lookup by name and says gives the name of the instance which then returns the domain object. So assume now you've got a domain object and all you're calling is suspend on the domain object and you're done. So you've got pause and unpause there. They're both really simple. And again, I only put this up as an example because it's simple not because I feel that pause is a particularly useful operation. A couple of additional locations that are used by the LibVirt driver. There's a whole subdirectory for everything that's LibVirt related. And I would say that the files I've got listed up there, config.py, image backend, volume and utils are probably the ones that you'll be touching the most often. So config.py is what's responsible for generating the objects or classes and subclasses that are then converted to XML and passed into LibVirt. So everything is done via XML and I'll talk about that in a second. And config.py is what's responsible for executing on that. Image backend is intended to abstract a lot of the disk-based and image-based operations on the backend. As it so turns out, and again I will show an example of this, it doesn't completely abstract all the operations. So for example today, there's a raw interface there. There's a QCOW interface and there's an RBD interface and an LVM interface. Ideally, you would have nothing that touches the disk go through anything aside from the image backend class but that doesn't actually hold true. You will see as you're going through many of the calls like we just looked at pause, you'll see actual operations on disk which are occurring which are not going through image backend and that poses all kinds of problems. So there's work being done there to improve on this and abstract it further. In fact, it's going in a different direction but that's for another talk. At a high level, it'll eventually be going through LibVirt and storage pools. So even that level of abstraction will not be occurring in Nova. It'll be handed off to LibVirt. So I'm sure that warrants a talk all on its own. And then each of the volume interfaces for Cinder have a corresponding client driver that's present in volume.py. And then utils.py, you can just, it's all the functions that are generally being used by LibVirt over and over the LibVirt driver throughout the code base. So domains, LibVirt domains. As I talked about briefly with domains, they're simply just instances. So if you go on a hypervisor and do a verse list, all you're gonna get back is the list of instances that are running on that particular hypervisor. So a little bit more about domains. First they're all defined via XML. And you can find the XML reference, I put it up here as well. Because everything that's happening from the perspective of LibVirt and everything that it cares about is done via XML. All of the communication Nova does with LibVirt is done via XML. And then from there, LibVirt actually takes the complicated part and talks to QMU monitor over its QMP. It's a separate protocol that LibVirt is opening connections into and handling all that for you. So if you were to look at the domain XML, a lot of ways you can backwards interpolate what it is that's being done and defined by Nova. I'll pull up an example here in a second. But that's gonna define the configuration of that instance, including any volumes and any changes that have been made during runtime of that instance by Nova. So there are a couple different types of domains. There are transient and persistent domains. It's basically, is that domain gonna stay defined and present after the instance is destroyed? And destroyed is a soft destroy. It just means that it's no longer running. And then there's transient, which means it goes away after a destroy is executed. Everything that's done by Nova is done persistently. XML is continuously regenerated. So if for whatever reason you went in and edited LibVirt's XML for a running domain, it'll get blown away the next time. So that's done for a number of reasons. One, to deal with any potential issues that may have come up, maybe something got corrupted. It's unlikely for that to happen. More commonly, it's someone went in and changed something by hand and this will discourage any amount of that. So hard reboot will always regenerate the XML, which is most relevant actually when you've made a change in how the compute driver is to operate and the change generates a different set of XML. Doing a hard reboot is the only way to apply that change to the instance. And then an important set of logs that's frequently overlooked are the LibVirt logs. Generally you will get a response back or Nova will to its logs so that it can report what error LibVirt happened to report as part of the failure, if there was one. But sometimes you do actually need to go look at the LibVirt log and this is the default location on Ubuntu. It shouldn't be that hard to find on rel. But if you go and look in there, there's an instance name log for each and every domain that's ever run on that hypervisor. So that's the other key. Things are not necessarily cleaned up but you have all the data there and if you're troubleshooting, that's the other place to go and look to see what it is CUMU sometimes responded back with and told you. So sometimes it might tell you there was some issue with the arguments that you passed in, there was some contention, whatever it is, it's gonna be reported there. A quick reference of VERSH as well. VERSH is by contrast of the Python API, the command line tool that's going to be used for doing anything natively on the hypervisor. Anyone who's done much with LibVirt has worked with VERSH. And I won't go through each of these but basically these are a couple of really simple operations that you can run. A lot of people don't know about DOM name which is really important. And the reason that's important is everything that we're doing from a NOVA standpoint these days is in UUIDs. As soon as you get on the hypervisor, you do a VERSH list, you don't see UUIDs. You see domain names. And they're not domain names based on what the user put in. It's programmatically defined domain names. So how do you take and operate or know what you're dealing with? You can do a DOM name on the UUID and get that back and then start operating on the correct instance. So, LibVirt takes all the XML that you've defined as part of the domain and it creates a command line or a set of arguments to call QMU with. And I'll show you guys that in just a second. If you were to look at the QMU command line and the possibilities for what can be executed, there's quite a lot there. So, LibVirt deals with all of that. Now, of course, after LibVirt has executed QMU, there's no way to alter the command line. So the rest of the operations that are then modifying or changing the running QMU process are done via QMP and the QMU monitor protocol. So, LibVirt actually has connections open to each domain's QMU monitor socket and is executing those commands as needed. So, the XML that's fed in actually by NOVA into LibVirt isn't what you'll see if you were to do a DOMXML and that's another interesting part of what LibVirt's handling for us. Some of the values that are presented and need to be utilized when calling QMU are incremented, for example, bus IDs. There's a couple different places where that occurs and you can see that once you take the XML that's generated by NOVA and put it into LibVirt, what you end up with is basically what represents an active domain. And it's the same thing when you start a domain and then you shut it down. What you see in terms of the XML is different and you can pretty easily do a comparison of an active and an inactive domain to see what parts of the domain are actually being handled for you and what parts LibVirt's actually taking care of interpolating for you. So, here's an example of some of the XML. It's just a part of it. It's pretty easy to go and look at one that's running on a DevStack instance or a production environment just by doing a DOMXML. So, that's what we've got here. This is out of a QMU 1.5 instance and we just got a few values here to define sort of the CPU modes and some of the basics like DUUID. This is a much larger file generally but just gives you an idea. And then, if you were talking to QMU directly, this is the sort of command line argument you would be coming up with. The reason I make this point is I frequently hear the question asked of why is LibVirt in the mix here? Why don't we just talk to QMU? This is why. These arguments change from version to version as well, which is another thing people don't know about what LibVirt's doing. So, if you've got QMU 1.2 and 1.5 and 2.0, the best practices and arguments between those versions change and LibVirt knows how to deal with that. So, it'll pass the correct things according to the version it knows it's talking to. So, all the XML for each of the instances that are running across your entire cloud are available in the instances directory. There's a LibVirt.xml file in the instances directory for each instance. And you can go take a look at that one. You can delete it if you want. It has no relevance on the runtime at all. Nova does everything it needs to in memory and passes it into LibVirt during the definition. But you can play around with it and if you need it to administratively define that instance on a different hypervisor, that's how you'd go about doing that. Some important LibVirt configuration that we discovered just in terms of the concurrency. So, we were having LibVirt operations fail when we were doing things like a massive evacuation with a single target. And what it turned out to be was a concurrency issue. So, these are the values that we're using today successfully. Again, the slides will be presented so no one has to worry about writing anything down. These are the values that we're using today and I figured it'd be helpful to show everyone and save some pain to increase the level of concurrency of LibVirt to be able to deal with hypervisors that are running up to 50, 60 instances and have to be mass evacuated or evacuated onto. So, this will allow you to get the level of concurrency that you need. Migration. So, this is an area that ends up being, gets us into the cattle versus pets debate. A lot of folks in the space will say, well, you shouldn't be doing live migrations, you shouldn't worry about any of that and if you need that, you should go to VMware. The reality is that business needs don't match what our views are on cloud. There's still the vast majority of applications out there end up being sort of special snowflakes. They're not instances that can programmatically be regenerated or reproduced rapidly enough or that users don't care about as much as we'd all like that to be the case. So, we have always felt that live migration is incredibly important, both from, mainly from an operational standpoint. If we need to go do a kernel upgrade out across to hypervisors, that's us as operators needing to go and reboot those hypervisors. It isn't just some natural failure that occurs. And without having the ability to do a live migration, we wouldn't necessarily feel comfortable about taking down instances for our own maintenance. Where if a hardware fails, you can just say that's cloud deal with it. So, to give ourselves the flexibility that we need, we've always focused on making sure live migrations work and there are important considerations in doing that. So, let's talk about some of those. So, first are the three different types of migration that are available to you in NOVA. So, the first is Migrate. And Migrate ends up being entirely cold. As in, I don't care what state your instance is, we're gonna shut down that instance and we're gonna copy it around. I'll show you guys the code in just a second for this. It isn't the most elegant operation. It's probably the crudest part of the LiveVirt driver, but that's what it's intended to do. The second type of migration is a live migration. And the live migrations, in contrast to a standard migrate, are handled almost entirely by LiveVirt. So, rather than going out and doing anything crude from outside of LiveVirt and moving files around, LiveVirt is handling all of it. And then block migration is pretty similar to live migration. Many of you are familiar with this. It's just moving the disks around, which makes it, of course, a more risky operation because in addition to just moving the state, which, okay, worst case, you move the state, you fail at it, something blows up, okay, you lost a runtime state. But in this case, you're moving the disks around, so there's a lot more risk of losing the actual data on disk. Now, the risk is small, but it is there. So, as I mentioned, Migrate operates on an inactive domain. It, LiveVirt really couldn't do anything during a migrate, even if it wanted to, because LiveVirt is only intended to be run on active domains where the monitor is available to it. So, basically, a socket connection, or a socket becomes available at the moment. CUMU, a CUMU instance is launched, and at the point where the CUMU instance is down, there's no longer that socket, so LiveVirt can't do anything. And frequently, you'll see that when a migrate is running, or a resize is running, you'll see in Horizon, it'll say resize slash migrate. And the reason for that is, there's really no differentiation in terms of the code path. They are exactly the same code path. And let's take a look at that. I mentioned this is not elegant at all, and you can see that by virtue of the fact that there's a ton of move operations and a set of operations that are going against the disk here. And why is that bad? Well, the moment you're having to do something like RBD, which is the SEF backend, none of this holds true. So, this entire class, or this entire method, breaks horribly at the point that you've got an RBD backend. So, there's a lot of work that's been going, I think as part of the NOVA group, there's been talk about refactoring and potentially eliminating migrate for quite some time. Potential, or in large part because of issues like this, it's not using image backend, it's highly brittle, but just looking at this at a high level and seeing all those shell outs can give you a pretty good sense for where it is. The code paths being different doesn't help either. So, you've got migrate and live migrate. So, let's just say an operation is improved in live migrate. Some guard is put in place to make sure that you've got data integrity before and after the operation is executed. Well, that's not necessarily going to be the case for both migration forms. So, there's a shared storage check that exists, for example, on live migrate, and there's a completely separate one that exists for the standard migrate function. So, that's just one example of having the two different operations is problematic. The other, the bigger one is that admins can frequently get confused. I think one of the biggest areas of confusion in general when we're talking about migrations for folks who are just becoming familiar with NOVA and just starting to use this, why are there two? Why is there a migrate and live migrate? Live migrate should probably just fall back to doing the right thing. That isn't the case today, but now you'll see why. Live migrate, and this is a key. It's not live by default, and that's by configuration, and there's a number of reasons why it's not live by default. So, if you go deploy NOVA today without changing anything, let's just say you get the basic configuration in place to get your environment up and running. The moment you do a live migrate, you're actually pausing the instances. So, I'll show what configuration is necessary in an upcoming slide to make this actually live, and some of the considerations here that are important. So, the first would be that there were substantial improvements in live migration throughput in QMU14 and 15, and there've been continued to be improvements there, but those two releases were pretty heavily focused on improving the performance of live migration. So, one was the amount of time it takes or the cutover period for synchronizing memory state, and the other was, for example, the total bandwidth that was available and could be used between hypervisors. So, at this point, the recommendation is to use at least QMU15, and that's what you're gonna get anyway if you're running REL7 or you're running at least Havana out of the cloud archive for Ubuntu. And in contrast to what we saw for migration, or regular cold migrate, Nova's offloading basically the entire live migration function to Levert to handle. So, there's a dozen different calls that are being made through QMU Monitor that we do not have to worry about whatsoever because Levert's handling all of that for us. So, some of the migration code, and even though this is live, and you would think it's gonna be a lot more complicated because you're doing all of this memory state synchronization, look how much more simple this code base is. If you look at it and you get to the core of it, really the only part that matters here is DOM migrate to URI, that's it. Everything else is just setting up to do it. That is the actual Levert call that's being made there, passing a target host, adding up the flags, and then passing in a couple of additional configuration values. So, again, in contrast, much, much cleaner. So, the configuration values that I was talking about a little bit earlier. Vermigrate live, if you add that value to both of the two configuration options there, and each one defines which flags are passed respectively into the migration operations, that will turn your migrations into being live from previously being in the pause mode that ships by default. We also recommend changing the CPU flags. When you first stand up NOVA, it's tending or leaning towards giving you the best possible performance for your VMs, which is great, except that if migrations are a key part of what you wanna do, you have to have a baseline of what CPU flags are being passed. So really where this comes into play is you've got a cloud environment, you launch it with, let's just say, 100 hypervisors, then you add 50 more. Maybe the 50 more are a slightly new generation of CPU and expose a couple of new flags. Well, if you haven't guarded by having a subset of those flags exposed to the VMs, you can no longer migrate from the old or from the new hardware back to the old. So we're using the CPU64-REL6 profile, which is available on Ubuntu, that isn't a red hat ism at all. There are implications of doing that to be careful about, but by and large, if migrations are important to you, it is something that has to be considered. Max downtime. You won't see a config value for this today in the liver driver, or in Nova, but it is incredibly important. And what this represents is the window in time or the period of time that cumul will allow before cutting over an instance from the source to the destination. Basically by default, it's 30 milliseconds. And what that means is if within a 30 millisecond window, the cutover cannot occur, cumul will keep synchronizing data until it thinks it can do that cutover in 30 milliseconds. The problem is that sometimes you cannot get into that 30 millisecond window. You've got a JVM or you've got something that's rapidly churning memory. That's why you've got live migrations that get hung up, and that's actually a large part of why live migrations are not enabled by default. So we've submitted patches into LiveVirt to allow this because that's one of the prerequisites. There was a restriction over top of it that was preventing max downtime from being set. That's an aversion of LiveVirt that's coming and we'll submit the patches upstream for that as well. For the time being, the easiest way to deal with this is actually to patch cumul and change the default max downtime to even 10x what the default values and it'll make a massive difference in how rapidly migrations occur. Because even with a one second max downtime, it's plenty of time to be able to, or it's basically not noticeable if you're pinging the VM and you have a one second max downtime when that transition was to occur. So now some general operational tips. So the most brittle operations that you'll find in NOVA are always going to be anything that is long running and synchronous. So again, migrations, suspend is another one that's long running and synchronous and let me define what synchronous means. NOVA will do quite a lot of threading, especially by default now, the LiveVirt driver is designed to use a threaded connection into the LiveVirt daemon itself. But numerous operations within an individual thread will block until they return. Well the problem is if the compute process were to die in the middle of that, NOVA has no idea what happened. So anything where that were to occur, let's just say you're in the middle of a live migration, you shut down compute on one side or the other, you bring compute back up, it's gonna be up to you to clean that up. NOVA has no idea where it left off to be able to pick that up. It's a known issue, it's something that folks have been working to resolve but it's not an easy problem to solve. So basically anything that's going to take some time is just gonna increase the window of exposure for you and make things more prone to failure. And graceful stops have been put in but it still doesn't deal with all of these cases. So again, it'll be an area to continue to improve upon and really the only way to deal with this operationally now is to assess via the logs what was the operation that was done. Of course you can automate this and determine whether there are any operations running, query the API, ask it before you shut down a hypervisor is probably the easiest way to do it but that's where things are at today. To recover from these things is sort of challenging because again, NOVA doesn't know where it left off. So it's gonna be up to you in a lot of these cases to go in and figure out what state the hypervisor was in at the point where the failure occurred. So look at the logs for exceptions. It might be stating the obvious but you can generally discern exactly what it is that happened just by reading the, or looking at the exception and now that you've got some idea of how to read the code it might take some time but you'll start to become familiar with okay let me go trace back exactly through the code what it is that occurred and understand exactly where things fell off. The combination of reset state to active and reboot heart is pretty powerful. So once you've determined that you feel pretty confident in the state let's say you know the migration died on the destination you've cleaned things up appropriately you can reset state and do a hard reboot and things should generally get back to working again. And you might run into cases where you're gonna need to go in and brute force things. Kill minus nine, actually verse destroy which is what Nova ends up doing is pretty similar to kill minus nine. So if you end up needing to go that far, so be it. It's gonna happen sometimes there's nothing really wrong with it aside from leaving an instance running in a state that you don't trust anyway so you might as well kill it. So yeah, live migrations are one of these operations that can get stuck. The max downtime alteration makes a big difference on this. So once that is available and again it is already in livevert it has been there but it was blocking to only allow it to run when the instance was migrating it gets complicated but you couldn't then run it because it was a blocking operation that's been sorted out. So that'll come in now into livevert but basically if you have a live migration that gets stuck the best thing to do is to make a backup of both sides of it especially if it's a block operation and kill one of them. We usually kill the destination side and you can generally tell which side it is that you should kill but killing it on the destination is generally the safest bet because if the live migration has not completed Nova has not had an opportunity yet to go in and do any amount of cleanup so everything should still be present on the source and destination but again it's important to emphasize that you should do some amount of investigation and understand the state before you go and start making brute force operations because you could lose data if you screw this up. So that's all we had time for today. Again hopefully we can dive in some more in a future talk but I'm happy to take some questions as long as everyone can hear you. Hey, hey, I would like to ask if there's any work being done on the post-copy quimeo and post-copy live migration approach. The which approach, sorry? Post-copy live migration approach. I'm not familiar with it. Well, there is something because the current state of the live migration works with the pre-copy memory and that's why it's getting stuck so. So chances are, yeah, I mean really it's one of the areas that's continuously being evaluated even as far as the Juno code base there's a new API call that's being used for, from live vert into quimeo. I don't know if the new approaches are being used. What version of quimeo was that introduced in? I think this is not merged yet. This is just a new feature on GitHub and the main point is it shared the memory with the state and with between the source and destination hosts so. I see. Yeah, so I would say it's a pretty safe bet that once functionality makes it into quimeo and then makes its way to live vert if there's anything that live vert needs to do to take advantage of it, NOVA will take advantage of it. Folks are pretty aggressive about introducing that functionality and there's generally a version check put around it. So okay, if I have quimeo, let's just say it comes into quimeo 2.5 in the future. There'll be a check to say, is it in version 2.5 and take advantage of it if it's a separate call. Thanks. Any other questions? Yep. I'm not sure if this problem is relevant to, but you mentioned about NOVA not knowing about the compute or the instance that has failed and you have to clean up manually. So the similar situation I had was, a compute wasn't kind of out, but NOVA was still trying to schedule instances on it. And so how do you clean that up? So that's a little deeper than just compute because that goes back to why is it that that compute instance is reporting as being healthy to the scheduler and the scheduler is still seeing that as a valid target. That one's a little deeper. So we have seen that as well. That can be a number of different issues. And Chet Burgess who's sitting in the front here has probably spent the most time from our group investigating that sort of thing, but basically there are numerous threads connecting back from the compute driver into rabbit. Some of those are, for example, you have a thread that's listening on the receive queue. You have other threads that are listening on reply queues or connected to reply queues. So what you can and Chet, you can correct if parts of this are wrong, but the idea or the point there is you might have one or more of those threads that have become wedged or not connected or not connected to or subscribed to queues that are valid anymore. So there's a number of different reasons, but by and large those sorts of issues generally come down to a rabbit connectivity issue where the hypervisor or compute node is reporting in and saying, I'm good when it's not actually good. Something's wrong with one or more of the connections into rabbit. Yeah, probably we're actually out of time, but I'll take one more question if there's another or not, hard to say. I couldn't tell you. We've been focused so heavily on the vert. So yeah, it's, I have no idea what the state is in VMware or Zen server or Hyper-V. Thanks everyone. All right.