 Hello, yeah, great Right, so we go started now My name is Paul Murray. I'm the lead for the live migration subteam in the Nova project Over here is Andrea We're both from here at Parkour Enterprise. This is Pavel. He's from Intel. We've all been working on live migration over the last cycle What we're going to talk about today is going to be mainly based around lipvert and KBM Kimu as a hypervisor But some of the things are generally applicable some of the APIs that are put in there If I just get straight in you'll see where we're going with this live migration is Used to allow you to move workloads around servers in the data center You can also use migration the reason for doing this could be for host maintenance because you want to take a host down You don't want to impact the people that are using it You might want to do rolling updates if you've got security patches Or if you're just doing something else that requires the operating systems be rebooted, but you don't want to take down your VMs Or it could be done for power optimization So moving everything up to one end of a data center so you can call that end and not the rest that kind of thing You can get a lot of savings out of that. We've done that in other data center scenarios But OpenStack needs some more work to get that going So some reasons how do you do moving of it move things around? So, yeah, there are three types of migrations in OpenStack First of all, there is a non-live migration, which is also called the cold migration this one needs to stop a VM on source node and Then transfer the VM to destination and then resume it on destination We won't cover this in our presentation because well, it is not live The second one is live migration It's about moving a VM to destination while it's keep operating And the next one is block live migration, which is optional To use it to use both live migration you can use at Nova live migration CLI command you need to provide a server which you would like to migrate and Optionally you can provide host BI where that when you provide host scheduling face is skipped and the difference between block live migration live migration is that live migration Transfer only a memory state block live migrate block live migration transfers both memory state and disk state to destination There are four assumptions about live migration first of all it needs to be live Well, it means that the VM keeps operating While is being transferred to destination. It needs to be consistent so that VM on destination host state of VM on destination host will be consistent and identical to the state that was on on the source host It needs to be transparent so that if you work on a virtual machine that is being live migrated you will not notice it at all and Your applications running on virtual machine does not need to be cloud aware There is also a minimal service disruption thing It means that live migration does not cause any significant performance degradation and We just need to pause a VM at the end of live migration for a couple of milliseconds just to Transfer remaining state to destination Okay, so what happens when we actually do this Back in the Vancouver summit We had a couple of talks one from Intel one from HP that were about operators experience reviews in live migration in our case We were still running the HP public cloud at the time So the experience as described here was partly based on ice house Now the point was it works 85% of the time And that's clearly not good enough to actually use it in anger in a production environment No reasons why it might not work where hypervisor doesn't know what type of disks it has so doesn't know where to block migrate them or whether they're shared Migration may fail typically bugs in Kimu or something like that and migration may never end if the VM is writing to its memory quicker than you can copy its memory across the network, then it's never going to get there There's also things to do with network traffic burdening the load on the network and disrupting other Other services that are running in the control plane New resource types that are coming along And not tied in with the migration process and so certain types of configuration can't be done And then to be sold to that Typically operators didn't want to use it. They would rather go through the pain of interacting with the users to say We're going to take your VMs down so that we can do some maintenance and schedule maintenance windows And actually just stopping machines and rebooting them and then bringing them back up again It's a relatively quick way of doing things So what we wanted to do was improve the situation So we started a sub team to work on live migration in the Nova projects and in the metaka cycle We made it a priority in the project to get some work done there And we took a few simple things based on the operator experience that have been put to us It was make it more simple some of the parameters and config options are a little bit complicated So we wanted to cut those down and make everything work a bit more better Make it work used to pick up new versions of the Burton key moment KVM They have bug fixes fix some things in open stack do better CI and make it manageable Because one of the problems we had was once you started the migration There's nothing you could do except wait for it to end unless you were willing to go in and just hack at the machine So make it simple so part of the effort was Try to implement a well-known principle I don't know if you get it so we tried to Make it simple the live migration. It is already a complicated topic and from many point of view So we decided to move some of the logic and let's know what try to do some things for us instead of asking operator to The right configurations. We have done we focus on two main things We remove some input parameters Particularly the block migration and the discover commit and we tidy up some configuration options We have deprecated a live migration flag the block migration flag in favor of the new flag call life migration tunnel As regards the input parameter as I said, we remove the block migration This flag is a way to tell Nova at least before Mitaka was a way to tell Nova how the storage is configured So if to if the instance was not on a shared storage and you want to live migrate that instance You have to pass this parameter equal to true and then no one know how to migrate it It's easier to get it wrong It's an information that we can't assume that all the operators know because we have a complicated and development So because of that we decide to move all this lodging in Nova in a specific specifically in each driver So live work as a new meat that call is on shared storage you can understand the how the storage is configured and Set the value correctly for us then use the concept of off aggregator to to set this value IPv nothing we have done in Mitaka because this flag was not supported before Mitaka so The second things was to remove the discover commit the main reason is because This flag this input parameter was used just by LibVirt key key mu KVM, and so we don't want to expose any specific Drive driver configuration at the API layer There's a warning here if you are running a running upgrade You can't use this option Until you are sure that all your computer knows are running on the same Version of the code that means if you need to use it in a mixed environment while you are carrying on the the running upgrade You need just to specify version before the 2.25 of the API Configuration option as I said these two configuration are going to disappear in Newton because are deprecated now So we deprecated in Mitaka, and we have this new flag, which is a free value Option by default is set to true. That means Do a tunnel live migration force means try to use the Kimu to Kimu Not even me thought for the migration none is a long-term solution The logic is not implemented yet, but the idea is that Nova will set all the parameters correctly Discovering the hyperbias or capabilities so for example when if Nova is running on the Version of Kimu where the TLS is enabling Kimu will try to migrate using the TLS Version of Kimu if it's not we should roll back and use a tunnel mode Now on to the make it work part So one of the things that didn't work properly is Scheduling when you're doing a live migration the operators typically would direct where to schedule a VM2 and the reason for doing that was because The properties that are passed in for doing scheduling when you originally boot a VM are not all kept So if you wanted to do a live migration you wouldn't have all those properties available to you to do the scheduling So now that's all been fixed. So now you can schedule in live migration or getting over to schedule it for you And it will know what it's doing So pure to me taca there was a problem with block life migration when VM has had additional volume attached such block block life migration resulted in volumes being copied to themselves which Often resulted in a corrupted file system Therefore we blocked all such block block life migrations To get rid of this we excluded volumes from the list of devices that are being copied to destination So that this is not a problem anymore The point here is that it requires a pretty new version of libvirt, which is 1.2 point 17 In case of Ubuntu it is point 16, but officially we support a point 17 one So in case you needed to use it in Ubuntu you can change manually in code. It should work However, this didn't fix a problem with config drive attached By default in Nova we support the type of config drive that we support by default in Nova is 966O which is a CD-ROM which is a read-only device Because of a bug in libvirt libvirt basically prior to migration pre-create such device on destination hosts It creates this device as a read-only one and then tries to write it and then it ends up with an exception Therefore Life migration process is interrupted So we are blocking such all migrations still when you have a local config drive as a CD-ROM If you really need to use live migration with config drive attached You can try a Vifat one It works in every scenario for CD-ROM you need to have shared storage configured for your instances for this to work So pure 2-metaka there was also a problem with memory over subscription I Mentioned that you can trigger a live migration and provide a destination host so that scheduling phase is skipped and life migration is forced forced to this particular compute node The problem was that RAM allocation ratio was part of Nova scheduler configuration so that When Nova scheduler was skipped Nova conductor couldn't calculate correct amount of memory on destination To solve this problem We moved RAM allocation ratio to compute node so that both Nova conductor and Nova scheduler can read it both can calculate Available memory on destination in the same way There's also a page modification page modification logging, which is not part of OpenStack itself This was merged into a KVM during metaka cycle In normal case when VM is being live migrated and hyper and VM wants to write something to memory Hypervisor needs to pause VM thread Lockdown where VM writes the memory so that Hypervisor knows which pages are being dirty and then I'll unpause VM thread again With PML, there's no need to pause a VM anymore CPU takes takes care of this so that's a performance of a VM increases up to 8% But it requires a new new pretty new version of kernel which is 4.0 and the fourth generation of Intel Xeon processors Okay, let's make it manageable so this is a Presentation during a cloud summit in each presentation in a cloud summit for Easties slide about the pets and cattle metaphor And so we decided to put it on This is because we had some new operation to make life immigration easy to manage But we could argue that is not necessarily why we should take care of instances. It's just cloud, right? So as this metaphor Explained you can see yourself in two different way. You can treat yourself like pets Let's say like a cat so you name it carefully if something goes wrong with yourself You just try to all your best to recover your server The second view is you just treat your server like an animal in a cattle. Let's say a cow So you probably name your server like 7 1 7 2 if something goes wrong. Sorry. You just kill your server and the outcome of this theory is that If you think your server like a cat, you are probably stealing the old Way to think about tidy instead if you treat yourself like like a cow, you are a cloudy person Actually, we think we want to add one more information to this metaphor we think it's not just having Cat or having cow but actually what you need to do is to take care of a completely different animal Which is called cow cat. I Love cow cat, too This is because what you want to say here is that it really depends on on the situation sometimes you you can just kill an instance of your customer or Or in other cases you just need to do all your best again try to recover your instances So what we wanted to do in mitaka was try to add more tools to the operators arsenal Just to let the operators able to take the right decision at the right time According to according to different cases. They are facing so we decide to add Three main operations progress details will go in detail through all this operation. Just progress details. You can just check How well is going your life migration? You have a way to force a lot of information to complete with a small impact on the instance or otherwise You can even abort the life migration job Progress details. I don't try to guess Okay, so progress details about What's happening to your migration as it's going along prior to mitaka the only option you had was there was a list of You could do a nova migrations list Which gave you a list of every single migration that had ever happened in the entire system and you could scan through that and Try and pick out the one that was relevant to you, but that really only told you that existed So what we've done is we've made a Migration a sub-resort server server so you can now do a nova server migration list or a nova server migration show To get details about migrations related to a particular server In particular We've added in details about what's going on So you can see from here that there are counts to do with the amount of memory and the amount of disk that has Been processed how much is left to go These the amount processed and the amounts To go they don't all necessarily add up the point is at the beginning. I was saying that You could be writing to the memory faster than you're transferring it Which means you could transfer a lot more memory than there actually is because you're repeating the copying things over sometimes So this kind of thing would give you an idea whether you're converging on Finishing or not Whether you're just getting stuck in a state where things are not going to progress or you could actually discover There's some kind of problem if these numbers are not changing then something is just stopped and hung for some reason Okay, so then you'd be able to do something about it Last February one of my colleagues went to an open it was meet up in Manchester, UK When he came back, I was very excited asking, okay, so Paul not this one So Paul what's the the the most requested future the operators wants what was the hot topic and He reported back to me that people just want a way to kill that migration So the man in the hall it's me when when they report to me these things because was a bit It is a point if from our point of view, right? But we need to face the reality love migration doesn't work sometimes doesn't work as we expect it sometimes it's too slow and And so there are many use cases You want because you want to kill a lot migration For example as an operator you start a lot of immigration and you spot a problem on your target node so you know for sure that he's not going to finish you want to stop it or Another use case is you start a lot of migration, but you pick up the wrong instance So you are migrating your database server in staking long and you want to stop it before me Talk at the only way to do that was SSH into the compute node run some bitch command to abort The life migration job now you can do it using the API the command is quite easy So this is using Python of a client. You pass the server ID and the migration ID these will Automatically abort the running job the live migration and it will trigger rollback your migration object will be will be marked as cancelled this is work because it has been implemented just for Kim UK BM hypervisor and If the life immigration Ends when you already ask to to abort the life migration The the action of aborting life operation will fail but probably at that point you ready achieved what you are looking for and Because of the rollback, it's unlikely that we'll work on the post copy left immigration post-copylic immigration. We'll cover this topic later So since we talk I can also force life migration to complete Basically, if you see it that if you see that like your life migration is stuck or the progress is it is low or For example, it is more important for you to finish this particular life migration rather than keeping VM with lowest possible time Doubt time you can use force to complete Basically, it is about pausing a VM on source host so that CPU cycles. I mean VM does not dirty memory anymore So that life migration can end in a finite time You don't have to worry that you will end up with a post VM on destination host because hypervisor will take care of unpausing VM Automatically at the end of the process like abort it works only when Liberty is used as a driver and Kim UK VM as a hypervisor Another interesting change of introducing mitaka. I was the ability to to give a different way to do the life migration traffic By default the life migration traffic uses the IP address of the target hostname so It's some Often this means is using the management network. So maybe the same network used by rabbit MQ all this stuff So the the network would be already Overloaded or if you want you can see the other way around life migration is just putting more load on a Management network. So now we can specify a new parameter is a new configuration parameter called life migration inbound address What this done so I we have put here an example of an open-stack installation with fairly complicated network configuration to Separated traffic so you can see we have a network for the standard VM for external API traffic between API is with and but there is no way to specify a different route for the Life migration traffic now adding this parameter actually what we achieve is this one We specify a separate network to Where the life migration traffic will go between source and destination node? so the only things you need to do is to Ask your network administrator to set it up the network for you At this point you specify on each compute node which IP address you want to be used in the life migration When the last migration starts the source node will send an object to the target node The target node will read the parameter this new parameter and will put this IP address on this object and send the object Back to the source node at that point when the Live migration actually starts to move traffic between the two nodes the source nodes and know to use that specific IP address So you have you have achieved this traffic separation It is a backboard compatible change that means Nova is clever enough to to use the value of this parameter If any otherwise, it just will use as usual the IP address of the target node So let's talk about future of life migration Andra already mentioned that we are working on both copy life migration So the biggest problem if life is in life migration are dirty pages Basically because memory is way faster than network. So we might end up with never ending life migration. For example The model we support right now is a pre-copy one So that VM continues to operate on source node and if it needs to write something to memory It writes it's on on on source node so that such pages needs to be retransferred, right? Because they are being dirty Post copy is a bit different Because in the middle of the process we switch hosts to destination So that if VM needs to write something to memory, it's right is actually on destination host So there are no more dirty pages. However, there might be remaining memory to trans to be transferred to destination so that If VM needs to read from memory actually performance might be might might be impacted Also because VM state is spread between two hosts between source and destination In case of a failure will need to reboot a VM on destination hosts rollback might be impossible in this case The plus here is that there are no dirty pages So life migration ends in a finite amount of time while VM keeps operating We are also working on check destination migration So I already mentioned that you can trigger life migration to particular host if you trigger such life migration scheduling phase is skipped so that Let's say you might break affinity policy or anti affinity policy This is this this feature will provide a new flag in API called check for example That will you still will be able to provide the destination host But Nova will anyway check this host passing it to scheduler so that if it will break affinity policy anti affinity policy or VM will will will land in Another server group Scheduler will use such life migration Okay So I think we've actually gone through Most of the things or all the things that we did in during this cycle and this is just a list of them So we went through the conflict options and the API options and scheduling version of requests Is that there's a long list of stuff here? We've got some things that we're working on for the future and we mentioned two particular things But there is actually a list that we've got that we're going through during the design sessions at the moment And later on today and we'll be coming up for prioritized listed the next things that are going to go on so this is an ongoing process and We'll be looking for people to give feedback on what they think needs attention next so that we can do something about it then This is a slide that's Pavel felt obliged to put into the presentation And there's some credit so one thing I wanted to say emphasise wise here is that um, we got Hewlett Packard Enterprise and Intel represented on the stage here today But actually there was a lot of interest in helping with the Nova life migration We had at least 20 people working on this during the cycle We had quite a lot of help from red hats and maranches and IBM and a few others So I want to give you the impression that we were the main people in this there was more than us If you want to find out anything more about the Nova life migration team That's the link to our meeting page for IRC meetings and from that page There are links to other things like etherpads we use and the specs that we've been writing up and those kind of things Okay, so I think we're done there time for questions, and I should warn you I can't actually see any of you So I have a question We have a number of vnfs that are using either huge pages or That's our RV So what are your thoughts on those two? circumstances So at the beginning of the presentation I mentioned that says these new features are coming in especially Things that refer to something physical They haven't necessarily been built into live migration. So for huge pages That's not going to work with live migration at the moment S of Iov there are plans for that, but again, it's not there and there are a few problems with things like Anything that refers to Numa nodes so Numa topologies The way that allocation of resources is handled During a live migration process isn't working at the moment. So that's an area that we're doing some active work on the main problem is that When you do the scheduling for the remote host it gets an idea of how many pages are available or how much You know how many cores are available and so on and says yeah, okay There's enough space over there Then it sets the migration off and when it lands at the other end it discovers that actually it hasn't got the combination of nodes and pages You know cores and pages or they can't do the packing right and it was no more restriction You mean yeah, so that's it You know supposing you said you wanted to put a vCPU in the same human node as a one good Page when it gets to the other end it knows there's enough pages and enough cores when it gets the other end They're not in the right new mode that kind of thing. So so does that mean that if we have a guest? That's no more aligned And we live migrate him there's no guarantee that he's no more aligned when he comes to the other side right at the moment in Metacruz hit and miss yeah so To fix that We change in the way that claiming the resources is done on the far end So we can know that we actually have these specific ones reserved for us when we get there. So that's in progress Would that come in as a patch or a new never? Sorry, when would we see that is that going to be part of like Newton or? That's being discussed at the moment. Okay, I think a Good answer to that one is actually that area of the Nova project is a little bit under resourced So if there's people that actually want that done then put a bit of attention on it and with we'll help you get it through Yeah Thank you. I'm not sure the question I had about Resubmitting live migration jobs to the scheduler We have a heterogeneous compute environment where we have multiple processor Generations under a single flavor. So like mixing sandy bridge and has well Will it take that into account or is it going to you know blow up if it tries to live migrate something on Haswell back to? Sandy bridge because that has well instructions aren't masks down All right You can do that with your scheduling So I said in the past the operators would tend to it is what we did Would target a particular host for a migration and so typically if we're doing host maintenance That'd be the scenario We would take all the VMs and put them on another host that was exactly the same Because we know that they would fit and they had the right requirements What you're talking about there the processor time I think is actually available in the information that the host reports to the scheduler So if you have the capability set in the filtering it should handle that yes, okay? Sorry, I think one of your early slides you said live migration is not backwards compatible Did I miss that? I want to understand what that means. It's not live migration. It was that specific parameter Oh Yeah, I just wasn't sure what you meant Yeah, so it's not really about live migration not being backward compatible book But about the newest API that is in mitaka In mitaka, you don't need to provide this block migrate flag Nova calculates everything itself So if you have mixed environments like part of your compute nodes are Mitaka and part of your compute nodes are Liberty You need to say Nova that please use the old API because the new API will not work on Liberty, right? Okay, I Guess less of an issue but API backwards compatibility is like a big deal for a lot of users So the issue that's happening here is what we're doing is bringing in Nova Working out what happens To do the migration that requires both ends to be involved So this is the part that doesn't work if you have a mixed environment Okay, so what we're saying is that? When you get on to a mitaka API if you're running at the latest version of the API Rather than I think the the client normally goes to the the lower version, but if you said it so it runs to the latest version then If the two nodes that are involved are on Mitaka, then that's gonna be okay If you're allowing API requests to come in during your upgrade and you've got a mix of Liberty and Mitaka nodes Then the newest version of the API might allow an operation that isn't going to work between those two Okay, okay, so that's what we mean by it's not backwards compatible I mean the new Mitaka node can't interoperate with the old Liberty node Well One more thing So once you've got your environment set up with all with the latest versions of the Mitaka You can still pass the block migrate flag actually so Nova if you got this value, we will use it So but if you don't use it in the newer in Mitaka, but it's all the logic inside the code So this kind is not that compatible actually just during the running and upgrade Okay, thank you So two related questions number one. Is it possible to block live migrate a VM that's using raw storage? instead of Q Cal I would say that's a good question That is a detail that I'm trying to fish out of the back of my mind So remember having a discussion about it. I would say there's no reason why not But we're actually doing some work around image backing so I Would have to ask you to come back and I'll set again Okay, I'll test it just not prepared feelings in that one right now and then the other one is Not strictly related to VM live migration, but you have to send or migrate where you can migrate a cinder volume from one back into another Right now. It's not possible to do that with attached Cinder volumes attached to a VM because Libvert needs to do some dance to make that happen Is that something that's going to be possible to support somehow I Sure, it could be possible, but it's not something we've addressed right now having said that At the moment no for and cinder are negotiating over changing the API to be used to interact and there are things like multi-attachable They have a kind of stuff that goes on. There's a bit of a mess in that area at the moment and so that as a requirement can be brought out as part of the Working out the new API's So if the answer is yeah, it's API negotiation with that sorted out then. All right. Thank you. It could be Okay Thank you