 Okay, can you hear me? Okay. I think you can Hello everybody First of all, thank you everybody for staying so long for the last talk of the summit. So I really appreciate it I expected that I will know everybody by face and I I don't so that's that's good My name is Jacob Libosvar. I work at Red Hat On OVN and Neutron. I've been involved in Neutron for a while and today I want to present to you a Migration from ML to OVS to ML to OVN when I'm talking about ML to OVS I'll say that a lot throughout the presentation Just to make it clear this means Neutron ML to plug in with open v-switch mechanism driver And when I mentioned ML to OVN that it means The same but instead of the open v-switch mechanism driver. There is there is OVN mechanism driver So a little bit of a recap what what is ML to OVS just out of curiosity Who recognizes this diagram? Who's familiar with the architecture of ML to OVS hands up? Okay, that's good. Thank you Who's familiar with similar architecture with OVN? Okay, perfect. Thank you. So just a recap for those who didn't raise their hand up ML to OVS is distributed it uses RPC for communicating the brain the source of truth is Neutron server on the right-hand side and You can have multiple agents in the in your in your environment And Neutron OVS agent that that usually takes care of the two-layer Neutron L3 agent that can be on a different node or Actually, it is on the same node with OVS because you need L2 But it can be on a different node and DHCP agent. It can be different node and metadata node and metadata agent and All of these nodes they they really scale scale out so you can have multiple of these nodes and they can be all along your cluster and There are these kind of services in ML to OVS that are responsible for those particular things like L3 Connectivity and L2 connectivity metadata in DHCP. That's why it's so much distributed Typically, you have a lot of compute nodes in that environment. So what's the difference between ML to OVS and OVN? Probably this so OVN is simpler everything is implemented in OVN controller those those Those capabilities that were in in Neutron L3 agent and OVS agent and all these kind of things So you can cross that out of your of your architecture What is New in this architecture is the OVN central on the right-hand side on the on the top corner You can see there is there are some databases So this is a second database that is in Neutron. Neutron has its own Neutron database that is still source of truth and it Populates the OVN database with the data and OVN then serves as the SDM backend So if we take that into account this is very important when you are when you plan to migrate is To think about where you want to place your services because it doesn't map one-to-one as you can see on this slide How how the OVN will look like after after the migration is done Or if you do a greenfield then then it probably looks something like like this. So you have Simplified architecture you have the controller node where Neutron server is this stays the same you have the compute nodes It just has OVN controller and Neutron OVN metadata agent You have the databases that that are new This is the OVN central and instead of L3 node. You have the gateway node So this is the difference now What do you need to do when you when you plan to migrate when you want to switch from OVS to OVN? So first of all, this is this is based on triple O and It uses triple O to deploy the OVN services It's important and very important and I want to highlight it again It's very important to have those configuration files for triple O correct Then other things that you need to consider are is architecture So you need to you need to think about your workloads how you want to place them and and think about what will be the load on those particular nodes and how many you want them and The difference between OVS and OVN is that it also uses a different tunneling technology it uses Geneva tunneling as opposed to the VXLan that is used in ML2 OVS and Geneva has Header that's eight bytes larger than VXLan Which technically means that your MTU is lower on on the guests on the intense instances and on the ports So when you are switching from VXLan to Geneva, then you need to consider MTU That it needs to be either lowered or you need to accommodate and that's the better approach You need to accommodate your your fabric So if you are not at the top of the limit of your of your physical fabric, then it then it's better to Just increase your fabric and you don't need to care about MTU in OpenStack at all Another thing is in OVN everything is distributed by default you have distributed routing by default and If you want to migrate from from your OVS that doesn't have distributed routing you can migrate to OVN You just need to set things up correctly like bridge mappings because now your compute nodes will need to have access to the external network Another difference is that metadata agent in OVS is placed on one node So typically you have multiple metadata agent workers like 12, 20 so they can serve all all the requests coming in With OVN each metadata agent is placed on the compute node and serves the metadata locally So when you're setting your metadata agent workers Typically with OVN you just set it to one as opposed to OVS where where you have multiple and Then you also need to check your priority gaps because we are catching up with with OVS in terms of features and OVN doesn't support yet everything that that OVN does So now I want to talk a little bit a little bit about what the migration actually is for your your cloud so I'm thinking about when when I was a kid and I had a Lego and I played with it It usually came with with some sort of manual that that told me how to deploy How to build up the Lego so I'm thinking about the manual is actually something like like a triplo That's that's what builds you the the whole thing based on that. Oh I'm sorry based on that Who can guess what is this? ends up so this is this is how I imagine open stack and Each brick is a service or or some sort of layer So if I if I look at some of the bricks I can tell that the red brick could be ML to plug in the yellow brick could be neutron API and White green and and blue will be a mechanism drivers. So if we go back to the to the original Picture and we can tell this is neutron. This is neutron API and ML to plug-in and OVS and SRIOV So we'll focus only on these bricks only on neutron and where you're migrating your environment The only thing that you can change in the process is the green brick. That's the OVS You don't or you cannot change any other service. The service needs to be configured the same as it was with ML to OVS So the final picture will look like this. You just take out the green brick and you put in the The Blue brick with a happy giraffe. That's on OVN When I skip some slides, so I'm going back So the procedure of the migration Is that you have some inputs into the procedure? the one of the inputs is actually your ML to OVS production environment Which was deployed with some triple-on configuration files heat templates Rolls and such a thing you can have composable roles in an open stack and everything is very Customized so there is no procedure that would that would just take your your cloud and without any configuration It would put it to to OVN backend So it's important to have this configuration for OVS. That's on the left left-hand side And you take this configuration and you just modify the configuration to have the OVN bits and This is very tricky So it's it's a good approach to try it out on a stage deployment and validate that your triple-O configuration files are correct and This is how you want your cloud to look like so when you take that green paper Which is your your configuration for triple-O and and you deploy it on ML to OVN then you want to validate that this is what you want you can iterate in case you don't like it and Then you need to take this these files for for triple-O and you can put it along with your Production environment to the migration procedure Then boom migration happens and you have ML to OVN production deployment But again, it's very important to have those configuration files correct and and how we you want to Your cloud to look like it must it must be aligned with that Unless if you don't do that and if there is a mistake or if something is misconfigured then then the migration could fail And that's a big problem Okay, now let's talk a little bit about the Software itself what what the migration actually is The migration is just a bash script that wraps around Ansible The Ansible roles on playbooks. This is the the body of the migration So each each step in the migration is is composed by by a role and each command that you issue to the OVN migration bash script actually calls a playbook so you have different Procedures that or steps in the in during the migration that you can issue using this tool and And it will help you move to to OVN the Playbook itself the Ansible playbook then runs it does some stuff that I'm gonna talk about later and It will at some point call triple O to deploy It will be really like triple O deploy command just out of curiosity who's familiar with triple O I talk about triple O a lot, but Okay, so maybe I shouldn't tell what's triple triple O is basically upstream community project that's used to deploy OpenStack and at Red Hat, it's it's productized to something called director and I'm happy that only a few people raised their hand up because Yeah, it's it's very very complex and very complicated so During this procedure during this Ansible there is another call to the triple O deploy and this triple O deploy Calls Ansible again, so this is very very lots of layers inside of the migration procedure and and that's that's very complex procedure at the end and So I'm gonna talk a little bit more about the steps themselves. You can you can divide them to two three groups One is pre-migration. This is something that you can do a lot in advance when you when you migrated cloud What it does is that you need to get some Ansible inventory for for those place books It uses the inventory that's capable of being generated by the triple O So it's it's really simple and then it's just lines when I talked about the MTU when you don't have the opportunity to To increase your M2 size on your fabric and you want to have your tunneling on Geneve Then there are some tools some steps that will help you to decrease the MTU for for the instances So how it works is that it actually configures the T1 parameter for DHCP agent T1 in DHCP means how how often you should ask for the renewal when you when you get a DHCP is in your in your guest and This is configurable by if you know if you know the ATP agent it uses DNS mask So actually what it does is that it confuses the DHCP agent to some value? 30 seconds and the ATP agent then configures the DNS mask processes to Tell the VMs when they get the DHCP pack to ask for a new pack every 30 seconds so that way you can Minimize the the disruption when when your VMs on on east-west traffic when they when they talk to each other They need to have the same to you. So you minimize that to some like reasonable downtime or amount of Time that that where it will be communicating with with different empty use so once they're done you actually need to wait until the Gasts pick the new value from the HCP agent and then you can go ahead and you can reduce the MTU on your network in in neutron that all can be done as as a premigration staff then it prepares the OVN images so we can deploy the services everywhere and Optionally it can do some premigration resources Then it can validate automatically if the migration was successful and everything everything works Probably in production you you already have your own workloads So you don't want to throw more and test them so we can test them later on with your own workloads Then the next step would be the migration itself with just a quick backup It stops the ML 2 of yes resources like agents so the control plane goes down The data plane still stays so there are things like external processes DNS mask and keep a lively for for L3ha This still stays so that's that's up and running your your VMs can talk Then it calls a triple O just to configure the OVN So we deploy the OVN services and it switches neutron to use the OVN mechanism driver But the that means neutron is already on OVN But the data plane is still on OVS OVS uses the BR end and and the OVN will use some different bridge Some fake bridge just to not mess up with the data plane and Then it switches the bridge so so OVN will start using the BR end and Then that's basically it your your own OVN now and then some post migration cleanups That it deletes the the sidecar things and external processes that are used by ML 2 of yes And it can validate the some some other resources So I have prepared Demo how it actually looks like when you're migrating It's in on website, but I have it here. I'm connected to lab at Red Hat and it's I know it's small. I realized that too late Can you read that it's not perfect. I know I apologize for that so let's Yeah, I cannot do that because later it uses a team X and it's pre-recorded So if I would increase the font, well, thank you for it's a good good point If I increase then then it messes up with with layout of team X. So first of all I'm gonna show The VX line network the important thing is you can notice is that the MTU is set for 1450 If you cannot read it it is 1450 Then I have just a single server running in this small environment It's on that on that VX line network and it has a floating IP. This is ML 2 OVS with DVR You can see the agents are running everywhere from ML 2 OVS And we can talk to the to the VM So the first thing you need to do is to install some bits that will contain those Playbooks and that bash script that I talked about earlier Then we create some working directory and we copy over those playbooks This command is the one that will take your your triple O configuration files from your stage So let's let's say the stage is actually somewhere. I have validated my my triple O configuration files and I'm just taking them back to to my Production OVS environment, so I'll just copy to something that's like an entry point The the only thing that is needed to be done compared to the greenfield OVN deployment Is that you need to add some extra file here that will be used later by the migration procedure to override some Triple O values, so it needs to be at the end of Of those parameters that you're passing to triple O deploy and we generate the inventory So this is how the inventory looks like. It's pretty simple. It just contains the OVN DBs It's based on how your cloud look like So the OVN DBs are currently will be placed on on controller zero the OVN controllers will be everywhere where OVS agent was and It detected where the DHCP agent is in our case It's in control zero, but if it somewhere else then then then it's going to stick with that as well Okay, there's a little mistake so now I'm going to connect to the instance and I'm going to connect to the compute node hosting that instance and We're going to sniff for the DHCP traffic and we'll be observing how the MTU changes on the inside of the guest So this is the tap VM of the instance on the left-hand side on the node and on the right-hand side There is the VM itself So now we're going to change the DHCP agent Configurations as I mentioned before so so it sets the oh, this is ugly. So it sets the T1 parameter for the agent and and then it's gonna it's gonna fire up the The reduce MTU which will reduce the MTU on the network itself So now now we we were in that stage where we needed to wait 24 hours for the the guest to pick the new values But that's why there are so many mistakes. I went ahead and I restarted the Networking in inside of the guest so it starts picking the new DHCP offers again and It didn't go well. So I need to reboot it But then it went up. It's still it's not showing in there, but it still has the old MTU, but the DHCP agent this is the reduce MTU value is now configured to provide the new MTU value which will be seen here So here we can see this is the same network VXLan pre migration But before that when when this command was shown at the beginning There was 1450 now now it's 1442. So it's 8 bytes less and Here on the on the left hand bottom corner if It shows it doesn't It it will show that the DHCP agent offers the new MTU 1442 and On the on the bottom side, you can see that the eth0 on the guest This is this line The eth0 interface within the guest pick the new value. So now now it's on 1442 The next step would be to create the OVN images. So this is a Container prepare file. The only thing that you need to change here is to change the neutron driver from OVS to OVN So it will it will deploy the OVN images and and you call Open stack container image prepare so it will prepare the images here is Open stack triple O container image list grab OVN. So it will make sure or in this demo I make sure that the OVN images are already there on the undercode and are ready to be used by by triple O And now we were about to start the migration. So I Put some monitoring here On the right-hand side top corner, we are on the controller and we'll be Observing how the podman Container containers will be switched during the migration on the on the bottom side We will look at the open flow Rules in VR int so in with with OVN actually everything is implemented with open flow so when you have a compute node and you look at the VR int Open flows, then you see many as as compared to to ML to OVS So what's expected here is that there is some number 39? which represents the amount of open flows currently on the beer in with ML to OVS and Will we can expect that with ML to OVN? It's gonna be much higher and on the left-hand bottom side We will observe the same services as we did on on the controller, but it will be just on the compute node So here I know it wasn't observable because now it's it's kind of broken the layout But I issued the OVN migration start migration Which is the big process that's gonna consume the the triple overalls and it's gonna run across the cloud So it's gonna stop some services That were from from ML to OVS. You can see now the services are disappearing. I Will I will pause it quickly here So on the on the right-hand side here on the controller you can see neutron API is running and there are some neutron HAProxy QRouter and Qt HCP Sidecar containers running, but there is no Neutron the HCP agent so the agent is now is now down to control plane is now down Okay, I don't know why it's so broken. I tested it before a talk and it was fine with me So I apologize that it's so messy But now now it's running. It's it's very like the speed is much higher than in in in normal Normal life. So this is this is a lot faster. It usually takes a lot of time but since we have limited time then then it's it's fastened and Now we're we're in a situation where the triple is doing its stuff So now it's it's going to deploy the whole cloud But unfortunately what it does is that it tries to deploy all the services not just neutron so that's something that can be improved and We just need the neutron bits, but it's going to check all the services that are there in your in your cloud Now we have to Wait until something interesting pops up something interesting popped up. So we have here OVN DB's bundle, which is the DB of OVM. So now we have OVM DB there, but it's not used yet. It's empty and It first deploys all the services and then it starts populating the data and and and it will do the the switch later So now the time is flying There there is the new neutron DB sync container that will be used later for For a sync of the databases and now it started to deploy also things on the compute node. So you can see here There is OVN controller Right here. There is O. There is VN controller. So OVN controller is now running on on compute nodes OVN metadata agent is running there But still it's it's not used yet And we have to wait a bit until it deploys all the services and the migration continues There we go. So now the OVN is deployed and We're about to Start the OVN. So what happens now the sync neutron DB with OVN right here that will look at the OVN database and And in neutron database, it will see that an OVN database is empty So it will start creating the same resources that are in in neutron the same objects that are corresponding in OVN It will start populating them in in OVN And now some some cleanups and And the switch I don't see the flows here, but There here there we go. Now there is 536 so that means currently on on VR int there are 536 open-flow rules before that it was 30 something so it was much much lower now the data plane was switched to to OVN and The sidecar containers and the and the external resources that that are there from from the OVS agents where we're stopped and cleaned and Now it's deleting the the agents so as you can see here the neutron open V switch For example, it's down. It's it's not alive, but we have OVN controller. That's happy and alive and And it's already serving The data plane So we wait a little bit for to clean up the things and now you can see oh, it's gone that there are only OVN services So we try to ping our VM It works We try to connect there Check it and and and it was it was up and running. We were able to SSH to that VM using OVN and That's it. That's a demo And that's the end of my presentation. So thank you again for staying that late Are there any questions we have 20 seconds for questions? Okay, perfect. Hello Thanks for the talk. I Was wondering from the point of view of a COLA user if there are things in your roles that Could be extracted and reused by other deployment systems. So I saw at the end you had a Something generating a script to do a cleanup Is that something that you mean the roles for the migration itself roles? Yeah, yeah So actually I can I can show it to you on on github So if you go to github to open stack neutron there is tools directory OVN migration So it leaves in neutron already. Okay. Yeah, yeah, it's it's all right. It was a triple O And if you go to triple O environment playbooks roles, then there are a bunch of the roles that were that were used and and yeah, they They can be reused there are just some configuration parameters that like for example Like if you if you go to template and activate OVN Then you can see there are some Jinja parameters like OVN bridge. So this is or maybe it is by default It could be here in the role But I think since it's not used for or not intended to be used for a general purpose It could be missing here. Well, it is here. So so good. Okay, but yeah, yeah, it could be used by by other Okay, and is there downtime when you switch from one to the other? I didn't really see in the demo. Yeah Yeah, I apologize for the demo. This is not how I planned it There is a little bit downtime really depends on your scale on the scale of your cloud So when there are multiple machines, it tries to do the switch as fast as possible so what it actually does is that OVN controller has a configuration option to pick the bridge and it's set during the migration set to be our end and The activation basically means it goes and changes this configuration on each node where OVN controllers running to the BR Int the one that really has the workloads and it restarts the OVN controllers. So the OVN controller Will need to fetch the logical flows and implement the open flows So that's where the downtime is on in our environment in our tests. It is around one two seconds Okay But we don't test it really at scale And can you do can you do each hypervisor sequentially or do you have to do the whole cloud at the same time? it really depends on how you configure here in the This is the file that's generated by the generate inventory the very first command that was there in demo and I think there is The setting for actually not it's it's in a simple config Here is the forks so that that's basically an ansible setting that tells how many parallel connections There will be coming to the nodes from from ansible So if you want to do it one by one if you don't mind that basically some VMs will be on OVN Some VMs will be on on OVS during the time because OVS will use VX LAN So your your provider networks will work just fine, but the tunneling will be different So you can really with changing this forks to one you can do one by one. Yeah. Thank you. Thank you for your questions And maybe one fun question for the end. Yeah, your Lego Duplo diagram. There was a toilet. So which service is that? That's a good question actually and I think you can think about it as some really important service because like toilet is important Everybody uses it. So that that was not supposed to be Like shaming but it's it's more like showing that that this is really important service So it can be any service that you consider important. Okay, sure One question I saw it all the time when you when you switched over the ends at the OVN agent controls Afterwards the OVS switch Okay, can you get a little bit closer to the microphone closer to the microphone? I saw it's the OVS as the OVN agent, you know being controlled by the OVN controller Programs the OVS switch in the OVS switch. That's a real work. Yes Yeah, so so so the OVS which goes down and comes up again controlled by the OVN agent and The OVN agent is called OVN controller. So how it works is that it's it's like a binary That's that's running there on on computer node and it fetches some it's called logical flows from the southbound database Here so here is the southbound database This is basically where the OVN controller connects to yeah, and it will get some logical representation of how the Network looks like and it will translate it to the open flows that are Implemented in the OVS bridge. Okay, if that answers your question. Yeah. Yeah, that was my understanding. Okay. Thank you very much Okay, thank you for your question Any other questions? Okay, and thank you again and have safe travels home