 Hello. Today our topic is seamless migration from novel network to Neutron in eBay production. My name is Chen Yuan. I'm from eBay Cloud team in Shanghai. Hello, everyone. I'm Han, and I'm very glad to be here. Share our story with you. Thank you. Okay. So this is our agenda today. First, we will give an overview of eBay Cloud environment. Next, we will give details of migration steps. I will introduction the control plane part, and Han will give the data plane migration and post-migration introduction, and finally, we will do a demo about this novel network to Neutron migration. Okay. This figure shows our original for the network environment. This right side is open-stack controller nodes. These are quite normal. The phone on this left side are all novel computer nodes. In eBay, we also call them high-fizer nodes. As we configured them as novel network might host mode, so you can see that there is a novel network process running in each computer nodes. Also, there are several DNS mask queue process are running in the computer nodes. They provide the DHCP service for all the local VMs, and all the VMs are running in the bridge mode. In Forzen, so the Linux bridge are used. In our configuration, there are two interface for each VM. Each of them connect to one Linux bridge. So this is Forzen environment. This figure shows our target environment of Havana. All the, from this right side, all the open-stack components shall be migrate to Havana, and Neutron service running. In addition, we also have SDN controllers there. We are using VMS and XX as our SDN controller. For this novel computer side, we can see that there is no novel network running, no DNS mask queue, and also Linux bridge is not there. Instead, we install the open-v switch, and all the existing VMs are still using the bridge mode, but this bridge interface all migrate to open-v switch bridge interface. So this is our target. Here, I will give some description about how do we do the control plane migration. First, we still keep all Forzen open-stack controller nodes there. Meanwhile, we will set up Havana open-stack controller nodes, including Keystone, Nova, Glance, Neutron, Reptm, Q, MySQL, all these stuff there. And then, we will create another set of nodes for NSX controller. And for the NSX part, we need to first create transport zoom. The transport zoom is one concept from NSX. Actually, it's just for underlayer network connectivity. And so actually each hypervisor interface must belong to one transport zoom so that the SDN controller can understand the topology of the whole underlayer network. So that's why we must create this transport zoom there. And also, we must do the hypervisor registration to the NSX controller. This is the first step. And next step, we must do the DB migration. And for Keystone, Glance, Nova, this is quite simple. We just export the database from Forzen and import them into the Havana and do the DB sync. Next part is for Neutron. As we know that there is no Neutron DB in the Forzen part, but we still need to migrate all our network conversion from Forzen to Havana. So we need to read the network information, actually, from our not DB table networks and fixed IP tables. Then we call Neutron subnet create these APIs to import this information to Neutron DB. So then actually this has done all network and subnet creation from Forzen to Havana. So after network and creation, next step, we need to do port creation. Once it must be noticed that as the VM has already running, the tap device is there. So when we do the port creation, we must not touch anything in this hypervisor. We don't need to create a second, sorry. So we don't need to create a second tap device. So for this purpose, we actually enable fake driver in our computer for this purpose. Then we start the port creation. For port creation, we get the MAC address and IP information from the Forzen database. And then we get the network ID and subnet ID from Neutron database we just created. And after we get this port information, we call Neutron API create port and then attach this port to the VM. So this port information has been in NeutronDB and NovaDB. And also, this information are in the ASX controller. And we can see that as the fake driver is there, so actually OpenStack controller actually doesn't do anything in the high-five side. High-five side still keep the same. Okay, last step is about secure group. In Neutron, actually it's an ASX plug-in. It will create default secure group with self-reference UUID for ingress. And the purpose is to only let same-tenant traffic in. But as we are migration from Forzen to Neutron, we need the compliant previous network firewall rules. So we don't need to have this limitation. That's why we update the secure group for each port. So we only have this simple rules to allow our traffic there. And this is all part for control plane part. Now we go to data plane. Hand please. Thanks, Chen Yuan. Okay. For data plane migration, it has to be done for each hypervisor node. So for this is one hypervisor as an example. So for hypervisor node, data plane migration means moving the live traffic for all the VMs running on this hypervisor node from the Linux bridge to the OpenVe switch. So how many of you have been using OpenVe switch? Could you raise your hand? Okay, great. So our goal is moving to OpenVe switch so that we can use S-team controller to control all the flows. That's our goal. And the requirement now is, or the key point is seamless. So our topic is seamless. Seemless means we need to migrate flows. We need to ensure the connection downtime as short as possible. So that's applications running on these VMs, at least impacted. That's our requirement. So the first step, we just do the preparation steps. We install the OBS package. And now we can just remove the VM interface from the Linux bridge. So if you ask for the command, it's just a biocontrol Linux bridge command. Please stay forward. And now the connection for all the VMs has been lost. Please note that in this picture, it's just one VNet interface. But actually, it means for simplicity. But it means for all the VMs, each VM have a type device to connect to the Linux bridge. So you can start counting. So see how fast we can restore the connections. And we just renamed the name because we are moving from Fossum to Havana. It's a harmonic name convention. So we just renamed the interface name. And then we can just remove the field interface from the Linux bridge. And now we can remove the Linux bridge. And it's kind of module. And then we just start the OBS service. So you might ask why we didn't start the OBS service before cut the connection. It's because in our environment, doing this migration, we are using Red Hat, the hypervisor running Red Hat. We have Ubuntu in other environments. But for this task, it's Red Hat. And we have two versions. The old version is Red Hat 6.3. And the new version is 6.4. So part of the hypervisor running 6.3. Others are running 6.4. But for 6.3, there's an imitation of Mesh and Linux bridge cannot be running at the same time on the same hypervisor because of a Linux kernel API complexion. So that's why we do it. So this is for 6.3. This is the worst case, right? For 6.4, we just start the OBS service before cutting the connection. OK, now we just set the integration bridge. The integration bridge behind is supposed to be controlled by the SDIN controller. And the BR0 is corresponding to the Fedoc interface. In our real environments, there was two Fedoc interfaces. But for simplicity, it's only one listed. And then we just attach the Fedoc interface to the BR0. So for the hypervisor, the connection is now restored. Sorry, just moving the host IP and also the routing entry for the default route. OK, so next step is add the VM type device back to the OBS switch. So what happened now? Is the connection restored? OK, yeah. Why? Because, yeah, right. So no one can just send or receive traffic until the SDIN controller says so. But now we don't have a controller. So the next step is that's the SDIN controller. In our case, it's ASX. So what happens in this step? Firstly, hypervisor request through OBS DP request to the SDIN controller. And as mentioned by Chengyuan in the control plan migration, we already created the already registered hypervisor to NSX. And we already created the bridge collectors. So NSX controller now knows that this hypervisor is up. And we are just going to configure it with the, you can see this collection between the BRINs and BR0. It's a patch interface being created by, it's instructed by the controller. So that's a bridge to connection can go through. And also, if you notice that there's a Newton-Party UID when we add the VM interface. And this is a very critical information. So also in the control plan migration, we already created Newton-Part. And the part UID information has been populated to the SDIN controller through the NSX Newton plugin. It's in the NSX database. And now we have the same information, Newton-Part UID attached to the OBS. So OBS has already populates information through OBS DP to the SDIN controller. So SDIN controller knows, OK. So this port is alive on this hypervisor. And it's just calculate what kind of flows to be pushed down to this hypervisor for this port. So now for all the VM ports on these hypervisor nodes, they have flows. Then traffic connections are restored. So how long did it take? Maybe 10 minutes? We are using scripts. So in our real environments, our production environment, it's about three to five seconds. I think it's OK for most of our applications. And it really depends on how fast the NSX controllers react into the OBS DP connection. And how fast it just calculates the rules for the port. And your connection latency between the hypervisor and your SDIN controller, that's the factors for the speed. Maybe I should also mention that this is for the bridge mode. But this solution, the same solution can be applied just for overlay. So if it's overlay, in control plan, you may just set up not the bridge connector, but the tunnel connector, STT connector or VX9 connector, whatever. And this step, when OBS DP connection is requested, SDIN controller will just inform, instruct the hypervisor to set up the STT tunnels or VX9 tunnels. So the steps are very similar. And after this, we're not done yet. So we need to be prepared for the, for example, the network service restart on the host or host restart. So we need to configure the if config interface configuration files so that the OBS information is there. Next service restart, OBS bridge will be started for this interface. And if you're using bound to just a different location, that's similar. And you need to be prepared for the next VM reboot. So this is a lip-wrapping. What we did was just moving the interfaces from Linux bridge to the OBS bridge without telling the lip-wrapping. But the lip-wrapping is one supposed to manage that. So we just need to update this configuration file. The standard interface is verse-defined. You update the interface, the XML configuration file with the virtual part type, open-source parameters, neutral part UID. But this is not enough. Because lip-wrapping configuration takes effect when you start the VM next time. So for the runtime information, it's still not updated, even if you execute this command verse-defined. So our solution is just like a hack. We found just updates runtime XML file under this viral-run lip-wrapping directory. And you can just restart the lip-wrapping service and it will take effect. Without this, what will happen? So when you shut down the VM, lip-wrapping will not know to clean up the job for the OBS port. So the OBS port information is not cleaned up. And next time when you start up, when you are trying to start the VM with lip-wrapping, there will be a confliction and you will fail. That's why we did this. And this is very important for us. OK. So real back, this is very important. You don't want to be surprised. So no matter how confident you are, you need to always be prepared. Because you never know what happens. Maybe it's just drifting. A high-priority is drifting configuration or some configuration you never expected. Or maybe something bad happens when you're trying to start the OBS service. You don't know. So the key is just be prepared. And fortunately, the steps we described before just can be reverted step by step. It means you can just detach the VM interface from the OBS and shut down the OBS and bring up Linux Bridge and attach interface to the Linux Bridge again. That's all the work. But you need to make this all automatic. You need to detect if the migration is ongoing. You keep pinning and you keep checking which step you are, like those things. And so you don't have to be panic when something happens. So we are all adults, right? OK. So Chen Yuan will give you a recap. OK. Thanks, Chen. So before the demo, let me just summarize all our steps in one figure. And here, this is all the environment with Fawzen. So first, OK. We shut down the Fawzen network and Fawzen novel computer, then start the novel computer, also with Fick driver first. Then control plane part and the port creation will be done. Then install the open V-switch as the hand has described. So next part is the data plane. Remove the interface. And then finally, just join the bridge interface. And then create the open V-switch bridge interface. Attach to the TAP device and then attach to ET2.0. And set messages to an SX controller and then those BI intent, BI0 are connected, finalized. And at this time, the traffic is working now. And finally, we need to update some files. So prepare for next time VM reboot or hypervisor reboot, so not break the system. So actually, our major migration code is just to cover all this part. As a database, the migration is not covered here as a database is just one time work. So when it's down, all other hypervisors migration actually don't need to do it again. So here, our scripts just focusing on migrate one hypervisor here. OK, so, sorry, one question. Yeah. Fred DHCP, you mean the Fawzen? No V-line mode, actually. For Fawzen, it's a neural network microhost mode. And I think the DHCP at the very beginning, you can see. Yeah, Fred DHCP at the very beginning as the DNS mask queue is running on each hypervisor. So it's Fred DHCP mode, I think. OK, so let's go to demo. Actually, this environment, we've already done the control plane part. The database has been migrated. So Fawzen novel DB has migrated to neutral. So in this window, this command line, this window command line is for Fawzen. So we can just turn over this. OK, we can see there is one virtual machine running on the hypervisor. And OK, it's IP address is 10.244.248.115. And as this is Fawzen, so we can see that Neutron service is not there. OK, as we have already do the DB migration, so we can see here. The novel list can still work in here. But please be noticed that network information is empty as we don't do part creation and don't know part attachment here. OK, Neutron part is also not created, so this is empty. OK, so this node NC003 is the hypervisor node. So we can see that the one KVM instance is running here. And it's using the Linux bridge mode. So the type device name is vNet0. And OK, we just ping the VM here. OK, so VM can be pinged from outside. We keep it there. Now we start the migration code. It's just the same as the figure described. Now we start the migration script. And first is the control plane part. So it need to get the part information from Fawzen and then do this part create work in Neutron. And then go to the hypervisor, do bridge interface, detach, and also do this IP routing, this migration work. And here is the uploaded OpenVswitch package and the install OpenVswitch package. And just notice this pin to VM. So the VMs is still pinnable. So you can notice how long the traffic is broken. And maybe still need to wait some time. It's installed the packages. Lovers. Yeah, we can see if we can get successful. Now I think everything is passed. And this is 6.4, so we don't do the removal. Yeah, now this is done. So I think the pin is still working. So very short, you don't even notice how long the traffic is broken, as it's 6.4. So it's more easy than 6.3. So almost no traffic broken. As the SDN environment is also simple, as we only have one VM. Yeah, it's in our lab. This is not. Not to the real production, but this is a lab demo and here we can see OVS. Okay, naturally all of the Vswitch interface are created and the TAP device, yeah. The TAP device has been renamed and attached to the BI int and the VM is still working. Okay. Thank you. Thank you. Any questions? Yeah, please. Sorry? Code. Code is still internal. But if you want, maybe you can just post it from to GitHub or something. Really to discuss with the company. I think it would be okay. Okay, we will upload the slides to somewhere and publish. Yeah, sure. Yeah, please. Now the computer actually has been, for Havana version has been done before. Yeah. Your stuff for migrations? Yeah, yeah, yeah. I'll prepare. Yeah, yeah. Sorry. Sorry? Rough. How many? Yeah. High-fives? Yeah. Okay. So this is actually, we do this verification in eBay production in several nodes currently, just running several nodes in one rack. As the whole migration plan actually is postponed as the shopping season is coming. So if we stop this whole migration process, we will do after the shopping season. Yeah. Actually, this has been verified in the production environment so everything goes smoothly, so. And the environment for this is nearly 1,000 hypervisors. Yeah. Yes. Sorry? Yeah. MACDress actually is a bridge mode. So when the, actually in the frozen time when it started, the MACDress has been allocated. And during the migration, we reuse the same MAC address. Yeah, actually we don't change anything for the type device. We just detach from the Linux bridge and attach to the OBS bridge. Yeah. Yeah. Okay. That's all for today. Thanks, everyone. Thank you. Thank you.