 Yeah, yeah, it's working. Well, welcome. Welcome to our session today. My name is Jack McCann and with my colleagues here, Vivek and Rajiv and Swami, we're gonna go and present an architectural overview of the new DVR feature in Juno Neutron. We've got quite a bit of material to cover today, so I'm not gonna go through the details of the agenda and I'm just gonna jump right into it. So just to set the stage here, legacy routing in Neutron. What we have here is the wrong slide. What we have here is configuration up here on the top of a couple of VMs, a green network, a blue network, and a router connecting them out to the red external network, and that's the virtual configuration. In a physical deployment, that might look like something that's on the bottom here, a couple of compute nodes hosting the VMs, and a network node hosting the router function. What we have in this model, and can I ask a question, how many folks run Neutron in this model? Okay, looks kind of familiar. Good. What we have in this model is that network node is doing a lot of work on behalf of the VMs. It provides the IP forwarding both for inner subnet traffic, east-west traffic between the VMs, shown by the blue line here. It also provides floating IP traffic north-south for VMs with floating IPs out to the external network, and you can see the red line from VM 2 going out to the external network. It provides a default SNAP function for VMs that don't have floating IPs, and you can see that for VM 1, the brown line going out to the external network. Typically access to nova metadata service tags along, so that network node is doing a lot of work. And the issues there, performance of scalability, it can also be a single point of failure. You lose that node, and you've lost all communication for the VMs behind that router. Enter distributed routing. Same virtual model, a couple of changes on the physical side. The first thing you'll notice is there's now an instance of that router down on each of the compute nodes. So you've got a router down here where VM 1 resides, and you've got another instance of the same router down here on the second compute node, and that router will follow the VMs behind it to the relevant compute nodes. It will not go across all the compute nodes. It will only go where it will have to to follow the VMs. What you get with this model, oh, I'm sorry, the other change is you'll notice the external network stretches across the compute nodes. So you have direct external network access into each compute node. What that gives us is each compute node is going to provide a forwarding for both the inner subnet traffic, which you can see the blue line goes directly between the compute nodes now instead of through that network node. Also for floating IP traffic, you can see the red line going out from VM 2 directly out to the external network, not through that network node. The metadata agent tags along, so you've got metadata service out on all the compute nodes. This has nice scaling and performance properties, and it also has a really nice property in that it limits the failure domain. If you lose this router here, it only affects the VMs on that compute node. If you lose the whole compute node, you've lost the VMs anyway. The rest of the VMs keep working. One limitation of the current implementation is default SNAT is still centralized, so that still goes through a network node. So, high-level requirements and goals for DVR. One of the first ones was to help close the feature gap with Nova, to help achieve Nova parity. Nova solved this problem three years ago in Diablo. Anybody out here running Nova Network? Nova Network multi-host? Okay, not so many. For those that did, this model should look familiar. We wanted this to be a provider feature. Tenants shouldn't have to know or care whether their routers are distributed or centralized. We wanted to be able to configure it on a per-router basis, with a global config knob that says what the default for the routers are. An important one, we wanted to be able to deploy this into existing environments, so you want to be able to take this DVR code, put it into an existing environment, and still have those existing routers function. And then you might want to turn on some distributed router functions, so you want to have centralized distributed routers be able to coexist in the same cloud. Eventually, if you're confident enough, you might actually want to take some of the old centralized routers and migrate them to be distributed. Because that external network stretches across all the compute nodes, we wanted to minimize the user's public IP address space, and we also wanted to leverage the existing code base. We didn't want to go and create new agents, things of that nature. So with that as a basis, I'm going to turn this over to Rajiv to get into some of the more details. Rajiv. Thank you, Jack. So the requirement, Jack, went over. To address those requirements, we have made architectural changes at the layer two, as well as layer three level. Through a combination of control plane and data plane changes, we have desegregated the centralized routing plane, such that most of the routing decisions can be made locally in the compute node, where the VMs reside. All of these changes have been done within the OpenStack Neutron architectural framework, that is of plugins interacting with the agents through RPCs, and users utilizing those functionality using the API and the CLI interfaces. There's very minimal changes in the API and CLI, because most of it has been accommodated within the existing interfaces. So let me illustrate some of the high-level changes. Architecturally through this diagram, this is a typical neutron deployment diagram. So we got one computer node, one controller node here, a set of network node, a set of compute nodes hosting the VMs. So the first changes we have brought, the L3 agent is now deployed in each of the compute node. This brings the control plane, the L3 control plane, into the compute nodes. Next, the router namespace as well as the subnet gateway ports have been replicated into each of the compute nodes, where the network services reside. So for instance, in this example, you would see there are two VMs on the red subnet, so the router that services the red subnet would have its namespace as well as gateway ports created on this compute node. So these are replicated on the nodes that have the red subnet present. Then in order to allow floating IP access directly from the compute node itself to the external network, a floating IP namespace, one per compute node is instantiated. This floating IP namespace has a port on the external network called the agent gateway port. Through this combination of the agent gateway port as well as the floating IP namespace, all the floating IPs on that external network are serviced directly from the compute node, irrespective of which router they are. So the overhead is down to just one extra IP address, but it provides direct connectivity. Enhancements have been made to the layer 2 OVS agent. So as I mentioned, the gateway subnet ports are replicated in each compute node, but this replication is visible only to the VMs and not visible to the underlays and the intermediate switches. So this is achieved through enhancements in the OVS agent, and Vivek would be subsequently going over that. As Jack mentioned, the SNAT functionality can treat news to be provided in a centralized fashion. So on the service node or the network node, you would still see a SNAT namespace. This SNAT namespace utilizes the gateway, external gateway port that's allocated to the router, which has been the case for the legacy router also. So the same port is used for routing all the default SNAT traffic. And to maintain compatibility, all these changes have been done in such a way that you can still have the legacy router, centralized router function, as well as the distributed virtual router coexist at the same time within a deployment. The L3 agent and the L2 has been enhanced to handle that kind of functionality. Now let me switch gears and talk about how to configure a DVR. So we have added just a few parameters. So the first one is the parameter called router distributed. This is in the plugin. This parameter determines what kind of routers are created when tenants create routers. So if this parameter is set to true, then all routers created by the tenants would be of distributed type. If this is set to false or absent, then the routers would be legacy or the centralized fashion. Next, I talked about L3 agent being appearing in compute nodes as well as in network nodes. So a mode has been defined for the L3 agent. So this can be for in the network node, the L3 agent is expected to service DVRs, legacy, as well as the SNAT service. So the agent mode needs to be set to DVR SNAT. For compute node, it should be set to DVR. And if you just want to support legacy or continue to the old model, then it should be set to legacy or just have this parameter all together absent for backward compatibility. Similarly, to enable L2-level distributed routing functionality, there's a flag added called enable distributed routing. You have to set it true for supporting DVRs. Currently, DVR functionality supported requires VXLAN. So VXLAN, tunneling, and L2-POP have to be enabled. And that's listed out here for your use. Now, a lot of us do our development in the DevStacks. So there is a macro available to ease the configuration. It's called the QDVR mode, which goes in the local.conf file. So to illustrate how that can be used, so set it to DVR mode equals to legacy. That gets the legacy setup. We set it up for a multi-node DevStacks setup. In the network node, set it to DVR SNAT. In the compute nodes, set it to DVR. And with that, we get a deployment where the namespaces appear at the correct places and the L3 agents are spawned in the compute node. Further, we will see that as we add the floating IPs, you would see dynamically the floating IP namespaces will appear and they would be hooked up to the external bridge for routing the external traffic. And again, you could still create legacy routers. And this is admin-level CLI available to overwrite the default settings and get legacy routers created. That's about the configuration. Now I'm going to dive a little deeper and go over the north-south routing how that is accomplished. As Jack mentioned, there are two flavors of north-south routing and most of you are very well aware of the floating IPs and the default SNAT. So let's start with the floating IPs. So here I'm going to take an example of a VM, the red one, sending out traffic to the external network using a floating IP. So the traffic arrives into the router namespace from the VM at this point in our router namespace, we have a set of IP rules. These IP rules categorize the traffic, whether it is default SNAT or floating IP. Depending on the categorization, there are different routing entries available. In this case, since this is floating IP, there is a routing entry that applies forward this traffic down to the floating IP namespace. Before the traffic leaves the router namespace, there's IP NAT rules configured for the floating IP, so the traffic gets NATed. Ends up in the floating IP namespace. As mentioned earlier, there's one for the whole compute node. The floating IP namespace has an external port on the external network and the traffic goes out through that path. In order to support the incoming traffic that is coming from external network into the VM on the floating IP, the floating IP namespace has a host route that forwards this traffic back to the QR namespace so that it can be delivered to the right VM. Plus, it also has proxy ARPs enabled that allows external ARPs to be addressed. Now let's switch to the next form of north-south. That's the default SNAT traffic. So in this case, we have an SNAT namespace created on the network node. This SNAT namespace has ports on the internal networks as well as has the gateway external port to get onto the external network. In this case, similar illustration, VM1 is sending out traffic. The traffic arrives at the router. There's IP rules. This time, the IP rules determine this is a default SNAT traffic. So the traffic is forwarded across to the SNAT namespace. And this traffic just goes as if it were east-west traffic that Vivek is going to talk about more after this. Once the traffic reaches the SNAT namespace, the netting and the connection tracking takes place and the traffic is sent out to the external port onto the external network. Now I'll hand it off to Vivek for east-west. Thanks Rajiv. So we are going to look through how east-west routing is going to happen in DVR. Typically, for east-west routing to work, there are three set of elements inside the current neutron architecture have to interoperate in order to push a packet in a routed packet out of the compute node. The first one is the router namespace itself, which is represented as QR, which is a distributed router namespace that's hooked onto the integration bridge. And that's responsible for actually taking in packets and then being able to route them out, like a normal router does in a network node. The second one is the LMAC, which represents the local MAC. For every compute node that runs neutron, we actually designate a unique MAC, which is called as a DVR local MAC. And it is this DVR local MAC which should actually be carried in all the frames, all the egress frames that are pushed out of the compute node as part of distributed routing. And then the third element is a set of OVS rules in OVS bridges, that is, in the integration bridge and the tunnel bridge. These OVS rules, importantly, ensure that they identify which is a distributedly routed packet. And if they recognize it's a distributedly routed packet, they will replace the source MAC of the incoming frame from the source MAC from the BRN to be replaced with the unique DVR local MAC that is assigned to this compute server. So that's what they do. And also the other thing they do is while the egress packets get their source MACs translated to unique LMACs here, the same complementary action happens in the destination compute nodes where a received unique local LMAC is translated to a local DVR router interface MAC. So let's go a bit deeper into this egress. As you can see here, this is basically a simple diagram that shows two VMs, one belonging to red network and one belonging to green network. And these two VMs have to talk to each other. So we need a router to route traffic between them. So typically with a distributed router in place, what happens is this VM would send the frame initially to its own gateway. And when it does so, it will use its gateway MAC and it will send the IP packet to the gateway. This integration bridge will know this QR router as it's here so it will forward that frame to the QR interface. This QR interface is again a distributed router so it will take this frame and then it will repeat off and then see the packet and then it will figure out what is the MAC of VM2 and then it will replace the MAC of VM2 in destination. And here it will put the green interfaces MAC and push it again back to the integration bridge. And now the traffic from here flows back to the tunnel bridge and the tunnel bridge now recognizes that this is a distributed router packet and so it will take the responsibility to swap this QR green MAC into its own local DVR local MAC here. And then what it will do is this caters, say if you configured for VXLAN, this will actually add the global VNI which is probably the green VNI here because it's a packet on the green network. It will put the green VNI. It will put its own local MAC and then it will forward it into the data network. And then on the receiving end what happens is this tunnel bridge receives it. It doesn't do much. It knows that for this green VNI what is its local VLAN. So it will substitute this green VNI with its local VLAN and then it will forward it with the other compute nodes local MAC all the way to the integration bridge. Now here recognizes that this is a distributively routed packet and then what it does is it rips off this local MAC and puts the equivalent green router interface MAC. The router interface MACs and the router interface IPs are the same in all the QRs for any given distributed router they're all the same wherever the routers are replicated and so like the packets are only source routed in the destination the packets directly reach the VM because the translation of L MAC to the QR green MAC is done by the integration bridge puts it to the direct destination VM. So this is how a distributed routing is actually accomplished here. The one more important thing here is the fact that while the tunnel bridge actually like does the translation of you know the source interface router source router source MAC to L MAC at the same time tunnel bridge also has L2 population turned on. So it will be educated by the OpenStack Neutron server on which exact destination node it has to put through this frame. So it will know which width up to forward this frame and so it will put it to the right width up so that the frame as a whole goes right from the source to the destination node with no intermediate spinning in between. So this is how the typical East-West packet flow works with DVR enabled in Neutron. We will go a bit deeper into what typically happens in the bridges themselves. As we saw in the last slide figure we had an integration bridge and tunnel bridge which cooperate with each other to figure out how a packet is distributed distributedly routed and how to send it out and also how to consume it in. So here if you see like what we did is we introduced three new tables one table like is the DVR process table which we introduced in the tunnel bridge and then there are two other tables that we introduced for allowing ingress one to DVR to L MAC table in the integration bridge and then one more DVR learning blocker table in the tunnel bridge. So going back to the egress logic this one talks about how a packet is sent out from a compute node you know it's basically how a packet is routed and distributed distributedly routed frame is sent out. So here if you see like the packet is coming from the VM here it comes from the VM here and then what happens is like here it figures out this rule won't hit because it's a packet that will have a source MAC which is red VMs MAC and then this will go through the normal OBS reaction and this will forward it to the locally hosted distributed router and the distributed router will understand that the packet is for itself because it will carry the local router interface MAC and so the router will route the traffic there is a QR here that will route the traffic and send back the router traffic back to the same table which is maintained by your integration bridge. Again this won't get hit because the source MAC now would actually be the QR interface MAC so again a normal would come so now what will potentially happen is like this normal action would put forward a packet right into the the tunnel bridge table 0 and then so what will happen here is this will figure out that this is from integration bridge and so it will pass to a special table which is a DVR process table. So typically here what happens is like this goes from table 0 to table you know table 2 directly but here like we interrupt the traffic and then what happens is like this this is now sent to the DVR process table now this DVR process table figures out whether this packets source frame MAC is your router interface MAC which is your replicated router MAC if it is so it figures out that this is a distributively routed packet and so it will go ahead and actually translate the source MAC to the DVR L MAC that is assigned to this particular computer and then what it does it forwards to table 2 from that point on the action is similar it takes the green frame and then actually like it forwards it to table 20 the table 20 what it does is it uses the pre-populated L2 rules and based on that based on the destination MAC in the frame it will actually figure out which feet up to forward that forward the frame out and it is just that you know it is just that particular you know like feet up that will be chosen to forward the frame out so the translation from the VLAN actually to VNI happens in actually this table and the and the ability to figure out whether a packet has been distributed where it is done by this new table which is interspersed between table 0 and the earlier table 2 so this is how an egress packet flows away all the way to the destination node and then let's go at ingress this frame comes out here and then it reaches the destination compute node and then like here again like the packet hits the VTAP port and so it comes into the tunnel bridge and then if it's a VXLAN VTAP port it forwards it to table and then here the tunnel ID is taken say the green tunnel ID green VNI is used in the frame that will be translated to local green VLAN that's used in that compute node which is say green 2 VLAN and then like it will forward to table 9 so this this table is again important because we are using a standard local MAC in the underlay we don't want to learn that local MAC because that local MAC would be potentially be used to route traffic across multiple tenant VMs so what we do is we ensure that the MACs coming in the incoming frame are not learned so we drop MAC learning here and then we basically bypass the MAC bypass the MAC learning that's typically available today in the tunnel bridge and then this one forwards it directly to the integration bridge the frame so here the only advantage is that the the frame will now actually contain still the local MAC but it will have translated the green VNI to green VLAN already and now this guy receives the frame and then he figures out it's a DVR out of frame because of the fact that the source MAC now carries the DVR L MAC of somebody else's node and then so he forwards it to another new table which will take care of translating the DVR unique MAC into its own replicated route interface MAC so the crux here is that every other node will be aware of other nodes DVR local MACs and it will also be aware of its own local MAC so that logic operates here so this guy will know that this is a routed packet from some other node and based on that the stripping of MAC and then reinserting the local proper local route interface MAC happens and after this is done since integration bridge knows where the VM port is attached to and so it will directly forward the traffic to the destination VM so this is how ingress from the cloud operates thanks a lot please swamis thank you Vivek check thank you Vivek Vivek has not bored you I think he went into the details of the OBS rules so it's better to know the details so let's get into the scheduling so I think Rajiv and Vivek covered both the east west and the north south so the other part of the thing for the DVR is the scheduler part so when the routers are created on the compute nodes as well as on the service node these routers are created on demand they are not created as soon as a tenant creates a router these routers are not just created for the sake of routers being created in the database they are only created if there is a need for those routers to be there because only there is a VM on that particular network that is being routed then the QR and the SNAT namespace is being created on these nodes so the scheduler plays a major part in here in conjunction with the L2 agent as well as the L3 agent because for every VM port that is being created on a compute node or if you are using a single node installation if it is being created on a service node as soon as a VM pops up and if VM is residing on the particular network which is being routed and if that router happens to be a DVR router then what happens is a message is being passed from the L2 agent, the ML2 plugin to the L3 in order to create the QR namespace and from L3 plugin the Router goes back to the L3 agent to say go deploy these QR routers so let me show you a display so a scheduling event so certain things trigger a scheduling event so if you look at here a create a router does not by itself create a scheduler event but you are just creating it in a DB the only time a router is being deployed in a legacy scenario is when you are trying to add an interface to a router then a QR is created but in this case so you add one or more subnets to the VM so you add one or more subnets with VMs so you already created a VM on a subnet but you are not routed it but now you are trying to route it so you are adding those two subnets to the router then as I said the events are triggered so VM pops up so there is a DHCP namespace being created on the service node and then as I said the whole interaction between the ML to L3 and the L3 plugin happens and then the QRs are being created on the compute node as well as the QR is being created on the service node and if you have a default SNAT configured then you have a SNAT namespace being created so we will wait until if you have configured the default SNAT once you have configured the default SNAT then we will go ahead and create the SNAT and the QR on the service node otherwise if you are not running NOVA service node then we are not going to create the because you are not going to deploy VMs in the service node so we are not going to create the QR it is only for the SNAT so the next one is for FIP what happens when a FIP is being scheduled so you have a scenario where you have the DVR implemented so you have a compute node and you have a service node so this basically represents a north south communication just by SNAT communication but you don't have a FIP namespace yet created now you wanted to create a FIP namespace for VM3 in here so you configure a FIP so once you configure a FIP then a FIP namespace gets created on the compute node and you get assigned an external IP address in here on the external network because our FIP actually consumes one external IP address on the compute node and then internally it's using a local IP address that's used for translation and then once it is achieved then all the traffic of the VM now actually VM3 flows directly into the external network so you don't need to go through the service node anymore so this is how the scheduling is achieved in DVR so the next thing is like okay you are scheduled the router so what happens to the namespaces that you have created and how those are cleaned up so we have an option for users to turn on the namespace cleanup it is left to the users or tenants to configure it we have an option I think we showed it in the configuration part where you can actually enable in the L3 agent whether to clean up the namespaces or not so once you have enabled those options so there are three different actions that actually will trigger the namespace cleanup in a FIP namespace cleanup so we don't clean up the FIP namespace until there is no actually using the FIP namespace if the last VM that's using the FIP namespace is being removed that's the time that we go and remove the FIP namespace because it's a single FIP namespace that is in there so we still use it for the VMs that are currently using the FIP namespace and with respect to the router namespace cleanup so when there is no VMs in a compute node that are currently using the router networks then we actually go and clean up the router namespaces so this is how the scheduler and the agents in combination with the agents takes care of cleaning up your namespace so you're not populating your namespaces too much on your nodes so this has to be enabled as I said this is a user configurable option that we have actually provided it as an open option so if you want you can actually go ahead and configure it I think the best option is to go configure it so that it cleans it up properly again for the SNAT namespace it's the same thing so when you actually go and remove your default SNAT service from your router so your namespace will get cleaned up and while your namespace is getting cleaned up for SNAT so keep in mind the QR namespace on the service node is also being utilized by your DHCP and other services that are using the DVR ports so that QR namespace may not be cleaned because there are certain ports that are currently using it I will actually go through when I go through the load balancers some of the services that are currently using the ports which are part of the DVR network we don't want to remove the QR namespace in there because we are removing the SNAT so being said now about the scheduling now let's go back to the services okay there are currently there are VPN load balancer firewall as a service and metadata as a service so what are the services that we do support with DVR and what we don't have support and what we have plans in our roadmap so let's list out the services here so the LBAS we do have a support for DVR in LBAS and then for firewall as a service we do have a support for north south thanks for the firewall as a service team they worked with our DVR team in order to implement this one so east west we had an issue with implementing the east west because we were not actually doing the routing decisions on both the nodes of the compute node in one direction so we have to revisit that implementing east west during the kylo life cycle and metadata service if you enable metadata service we do support the DVR with metadata service and then VPN currently is still supported as a centralized service that's why we have our design design in such a way that SNAT is still in the service node and it's still centralized and we do have a current patch for the VPN as a service for distributed but still in progress so by kylo we should be having that implemented so the service deployment as a I wanted to go into the details because there was a question in the previous session about how the firewall as a service is being implemented and how it's being used in DVR so if you look at it today as I said the LBAS if there is a web port that's being created when there is an LBAS agent that's running so we as I said the QR we make sure that the ports that are used by DVR related entities we don't clean them up clean the QRs until those ports are not being used so if they are being used then we leave it there and so we can actually route the packets with the LBAS so no issues there and then with we have fixed couple of patches for the LBAS so no issues there for the firewall as we mentioned so this is the legacy firewall the firewall is implemented in this namespace but in the case of north south we have firewall we have to implement in the service node as well as in the compute node so when you look at the service node the difference between the service node and compute node in the service node we only implement the firewall on the SNAT namespace it is not required on the QR namespace because we only support the north south and in the case of compute node the firewall as a service is implemented in the QR namespace the firewall as a service will take care of the configuration of the L3 agent and based on the L3 agents mode it will try to implement the firewall rules on either the QR or the SNAT namespace whichever is applicable and as I said for the VPN we do have a patch and the implementation that we have for the VPN is VPN will also reside on the SNAT namespace so the next one is API changes and extensions as Rajiv mentioned already we do have a minor changes in the APIs and the DB so the change that we have in the API is for administrators to actually have a control on both the legacy and the distributed routers so you can actually there is a create command and an update command we have so you can use these commands to override whatever the global flag that you have set and then you can actually create both distributed and legacy routers so the example the DB that we have modified is basically the router extra attribute so we have created a router extra attribute so if you enable router extra attributes for DVR it will be automatically enabled so we keep it separate we are not disturbing the router table and then we have a DVR host max which is being used by the L2 agent and then the CSNAT L3 agent bindings is to show the bindings of the SNAT namespace and then again we have ML2 DVR port bindings used by the ML2 agent so these are the changes that we made into the neutron in order to support DVR and then the future plans as I mentioned earlier so we do have plans to support VPN so we have a patch out there for review and then full migration support for DVR routers we have a HA for service node so we are planning to work with the HA team to support the DVR migration this is applicable only for the SNAT namespace because since we only have the service node and the SNAT namespace that is residing in the service node to have a HA option so we are going to do the HA for the SNAT namespace and then again IPv6 VLAN support L3 agent refactor distributor DHCP performance tuning and then probably distributed SNAT in the future I hope you guys enjoyed the session any questions thank you any questions there's a microphone sure come to the mic please wanted to ask you if it's possible to have a node with working in both DVR SNAT and DVR just for example if it was only one node just for testing is it possible to have the SNAT you can come here yeah sure give me a minute