 Hello, my name is Från Norval. I'm a senior engineer in chemical and the PTL for the SHARMS project in OpenStack. I'm here to talk to you about how to perform a hassle-free migration from OVS to OVN. First, let's start with the introduction to OVN. OVN is the next generation SDN for OpenStack and other cloud management systems. For OpenStack, the components involved are the central components, and on each hypervisor you place something called the chassis component, which controls the local opening switch. The hard-lever architecture of OVN itself is it has an API facing to the cloud management system, which is called the northbound database. The cloud management system, in this case OpenStack, uses this API to program high-level rules, logical flows, switchers, routes, access control lists, load balancers, and things like that, by communicating to the northbound database. OVN then has a northbound D-MEM called OVN-Northby, which translates this high-level logical information into low-level data constructs into the southbound database, which is consumable by the OVN controller, which runs on each hypervisor. The OVN controller then connects to the southbound database and loads all the relevant information for that hypervisor and uses that to program the local opening switch. So why do we want to use OVN? OVN is virtual networking for open-v-switch built by the open-v-switch community. It is cloud management system Magnostic, which means you can use it for OpenStack, Kubernetes, and possibly others. I believe there is now support for this in LXD as well. It has a different scalability surgery statistics than the MLT OVS solution, and it also has higher efficiency with programming each hypervisor. OVN provides distributed routing by default for east-west traffic. It provides HA north-south routing by default. It also supports DVR if you have a use case for that. OVN uses open flow heavily, so security groups and later free routing and all kinds of network constructs is programmed into the open-v-switch using open-flow rules. This again allows all of this to be offloaded to hardware when you have a capable smart mic and the right kernel level and open-v-switch levels and things like that. The MLT plus OVS solution does not allow hardware floating of layer free routing, for example. This is something you can do with OVN. The control plane is TLS and uses a role-based access control by default, and this means that if anyone compromises a single hypervisor, they cannot use their credentials for the OVN database to destroy other parts of the cloud. OVN also implements a native DNS and DHCP service distributed in each open-v-switch, and this means that there are less agents running on each hypervisor. The OVN community is very active. It's both with software distributions and large consumers of their project are actively contributing. It's a vendor-neutral solution, and you can also do things like integrate with hardware network infrastructure if you have a hardware switch capable of using the OVSDB hardware VTAB schema. Of course, the central control plane is HA, using active active active and raft as a consensus algorithm. So what are the things you need to plan for when doing immigration? First, I'll talk a bit about which versions and other prerequisites. During the open-sac usury cycle, Neutron adopted the ML2 OVN driver into the main tree. So our recommended upgrade path is to first upgrade to open-sac usury on Ubuntu Focal or Bionic, and then perform the immigration from ML2 OVS to OVN. And before you start, you need to use the open-v-switch firewall driver because this will allow Neutron to clean up any IP tables rules. So the next big thing to be aware of is MTU for maximum transmission units for your network. Existing clouds typically use GRE or VXLan talents and OVN uses GNEV talents. So the GRE overhead for each packet is 22 bytes. For VXLan, it's 30 bytes. For GNEV, it's 38 bytes. And the question you need to ask yourself is does your cloud and network equipment have available headroom to increase the packet size? If it does not, the only way to move forward is to reduce the MTU on each individual instance. And this can be accomplished by using DHCP, for instance, that are using automatic IP configuration. Do bear in mind that time needs to pass before changing the configuration until the instance actually pick up the new configuration when it's renewing release. It's also worth checking that even though a workload is configuring its IP address using DHCP, it can still statically configure the MTU. So you should go through your typical workloads and validate that they actually use DHCP for MTU. The instances using IPv6 auto configuration will automatically adjust MTU in real time when OVN takes over sending the router advertisements. And statically configured instances require a user intervention prior to immigration. So you would have to notify the users of your cloud to go in and change their configuration before doing immigration. And then some information about availability of control plan and downtime for data plan performance during immigration. When you're doing immigration, whole regions must be migrated within a limited timeframe. The reason for this is that during immigration, the ML2 OVS control plan will not be able to communicate to the neutral server. So you will not be able to make changes if something happens. And the neutral agents also, the components programmed on the hardware systems will get into trouble if they are without communication with the neutral server for too long. You can migrate one hardware server at a time or do all of them at once to your discretion. But do bear in mind that the instances running on the OVS estate, ML2 plus OVS estate, will not be able to talk to instances running on the ML2 plus OVN estate during the migration on my server. Control plan and individual hypervisor downtime depends on the number of instances on load for about. As you will see in our demonstration soon, we built a three-node cloud on Modus hardware running 1,000 instances across 100 projects. And the control plan downtime we had there from start to finish was 25 minutes. And the hypervisor data plan downtime was about three minutes to clean up after ML2 OVS. And we used about one and a half minutes to do the initial configuration and start up the OVN controller. So we'll talk quickly about the steps you need to make to do a manual migration on that for the control plan. That involves, as we talked about, reducing the instances to if required prior to starting anything. You need to deploy the OVN central components, so they are ready for use and synchronization with Neutron. You need to stop Neutron agents on hypervisors as a first step. And the next thing to do is adjust MTU on the networks, if that's required. This is important to do before synchronization with the OVN database, so we get it right from the start. The tool to do this is a Neutron OVN migration MTU, which is part of the upstream Neutron project. And the next step to do is to migrate the Neutron data into OVN. And we do this with the Neutron OVN DB Sync material. An optional step is to change project network types to Geneve in the Neutron database. The Charms project provides a script you can use to do this. It's open source, and we're planning to upstream this to Neutron. Steps on per hypervisor is to clean up after ML2 OVS. This portrays of removing the patch ports on the integration bridge and removing the whole tunnel bridge. The reason this is very important is that once we start OVN controller, it will program tunnels and patch ports in a different manner than OVS. And if both of these are present in the hypervisor at the same time, you will have created a very effective infinite loop generator and will be able to bring that in later. You need to remove a bridge controller configuration, which Neutron sets up, which is no longer used. And you need to remove all the namesases created by the Neutron edges. This is the DHCP metadata curator, et cetera, et cetera. And the Neutron clean up script can be used to do this. If you used the IP tables driver at some point, you can use the Neutron IP set cleanup tool to clean up any IP sets created by Neutron. And the last step is to start the OVN controller. And now I want to demonstrate how this can be accomplished with automation, using the sharps. To demonstrate, we have prepared a three-node cluster running OpenStack Victoria. And as you can see, it's deployed in a hyper-converged topology, which means that every node is running Ceph OSDs for storage. And we have the Nova Compute units, which represents the hyper-users. There is no dedicated gateway here, as we are using Neutron OpenWizWitch ML2 driver with BVR enabled. And we are running the DHCP and metadata services on each compute node. We have prepared a number of instances as a thousand to be exact. And that is spread across around 100 projects. And each of those projects have their own router, their own network, their own subnet, et cetera, which adds up to a number of ports, which is useful to show how the migration works on the load. So the first thing we need to do is to get ready the control plane components. And as we showed in the previous slides, we need to add the OVN central components. And we have prepared that by adding a bundle, which defines the model of which the charms is deployed in. And we can see the difference between the current bundle we have now, which boils down to adding a Neutron API plug-in OVN unit. That's basically a subordinate unit, which adds the required connections to Neutron for it to talk to OVN. And this gets some extra configuration and relation to the certificate authority, because as we said, OVN uses TLS for authentication. The OVN central components are defined here. And we add that to three LXD containers to have the HA. And we add the OVN chassis, which is the OVN version of the Neutron Open-V switch charm. So let's go ahead and deploy that on our existing deployments. And we can look at the progress here. It's important to note that in the bundle, we added a configuration option for the OVN chassis charm, which instructs it to start with the unit's post. This is very important at this stage in the migration, because on the hypervises, we already have Neutron ML2 OVS managing the local Open-V switch. And if we were to have both ML2 plus OVS and OVN trying to migrate the hypervisor at the same time, you would get a network loop. Now we have deployed the control plane components at this point in time. And those changes have been made to the running cloud. And we can confirm that we have connectivity on the instance on each hypervisor. Now we can see that we have three instances with a floating IP across three different hypervisors. Let's see if we can ping them. All right, connectivity confirmed. Now let's start the actual migration. The first thing we will do is to pause the agents on the hypervisor, the Neutron agents. Let's wait to that to settle. The next step is to migrate the MTU of the Neutron networks. And now we will enable the OVN plugin on their Neutron API so that we can start the initial sync. We can still ping the instances, as you can see. And now we will pause the Neutron units. And now we can start the Neutron to OVN DB sync. That's done. And now that we have the Neutron control plane down anyway, we will also perform the offline migration of the network type. And now we will resume the Neutron API unit. Now we are ready to start the hypervisor migration. You can see that our instances are still responding to the ping. And for the sake of simplicity for this demonstration, we will perform migration of all instances, all hypervisors at the same time. We will start by running the cleanup action on the Neutron OpenM switch units. As you can see, the instances will then soon stop responding to ping. This is of course because we are removing the configuration from the hypervisors, wait for that task to complete. Now the cleanup process is complete. And we will then resume the OVN chassis units, which will perform initial configuration of the OVN controller and start the OVN controller. As you can see, the instances are now starting to respond again. And it did complete by losing around 300 pings, which means a total downtime of five minutes. Thank you for watching. Are there any questions?