 Hello everyone, my name is a soft miller. I'm a software engineer over at the Red Hat Hello, everyone. My name is Sylvain chan. I'm software engineer at Inorance And Since we're just by the way, these lights are blinding. That's amazing yeah, I thought that since we're after lunch we should start with a quick warm-up and I wanted to talk a bit about Red Hat and you know how we're so successful now We bought all of these companies, right? We purchased in advance and we bought purchased Ink tank my own money and we purchased parts of the moon. I think and In fact, we're so successful now that they're pairing us up in hotel rooms So when we go to this conference, you know, we have we sleep two people in the same room. It's uh, it's very European and and You know, we're an innovative company and we thought maybe You know everyone are talking about distributed computing and parallelization and we thought why not paralyze basic human Operations, right? So instead of waking up going to the bathroom Going back the other person my roommate wakes up goes to the bathroom goes back instead. Why not go together? so So we tried it out and there's there's Mike right there so we went to the bathroom and We took a bath together It was lovely. Thank you for asking and You know, we ate a big breakfast. So what do you do after that? So we took a shit together and And that was lovely as well and no seriously, it's possible if you just kind of align yourself You can begin Okay, before we start we want to share How we feel when when we have to to manage or to to operate An L3 agent and this presentation is about a chain neutron especially for the for the L3 node and I think when we have to to operate on L3 node we are feeling a bit like This guy and I'm pretty sure we are not the only ones So let's let's see why exactly Since the high service version we are able to to spawn as many L3 agent as we want We are able to distribute the load across all the L3 node deployed All the L3 adjunct are able to to get connected to the same external network or to Many external networks, but finally we still have the same issue if we lose one of the L3 adjunct all the VMs Connected to to it or relying on it. We lost the L3 connectivity which means that No more private to private traffic and no more private to external network traffic That's that is a situation in the high service version There is some solution. I'm going to expose one one of them Which is L3 agent he'll check the goal of this tool is to manage to recover a such situation And finally by by rescaddling virtual routers hosted on a failing agent to the remaining L3 agent In order to use it you have to to deploy this tool on each L3 agent dies synchronization mechanism in order to to coordinate The migration on dies of failure This mechanism is done thanks to the RPC bus already used by Neutron So let's see what's happened When there is a failure So when there is a failure one of the he'll check adjunct will take care of it and we'll ask the controller to To rescue do the the virtual routers are still hosted on the failing adjunct to the remaining adjunct in this case the virtual router R1 Moved from the adjunct one to the adjunct two. So there are some prompts and cons Using this tool This kind of tool Currently the prompts are it doesn't affect your current deployment So you can use it right now on your high source architecture or deployment infrastructure It is able to to remove a node if it is isolated which means that there is a Something like a ping mechanism to detect if a node is Isolated from the the external point of view or something like that and it can handle or manage a migration It is a distributed service. So no single point of failure with these services and It works since grizzly. It's quite lightweight The cons now it's still not full hs since there is no stateful The migration that is quite long depending on the number of virtual routers finally hosted on all the infrastructure It is not an official open stack project. It is open source you can find it on the github of in advance, but it's not an official product and Finally, it is not the right way to achieve he so Let us introduce the Juno work. Sure So there's also just before we talk about Juno. There's a similar solution Which I saw in a few places on the web and blogs and such is basically to write a python script a python Script that uses the SDK to kind of detect when an agent goes down Take all of the routers on that agent and reschedule them and then just run that script in a crontab or something so there's a lot of these similar solutions out there and what Kevin did in Juno is to bake basically take something like that and bake it Into Juno into a neutron itself in fact into the neutron controller. So there's basically a Code that just runs in a loop That gets all of the agents detects if one of them is dead Basically, if he hadn't sent an RPC heartbeat in a while, he's considered dead by that point So we take all of the routers on that agent and just move them over This is an optional feature. You can turn it on by that configuration flag over there and it suffers from the same shortcomings of All of these similar solutions is that basically you reschedule routers one by one So you basically have to take the entire router and configure it on the new agent and that can take a while it's basically linear in nature and You know, it's it's decent for a small-scale cloud But basically once you cross around a thousand routers, we've seen it take over an hour At which you know that that's an hour's worth of downtime, which is a lot so We thought we'd do something a bit different and Instead of kind of reacting to a failure and moving over all of the routers after it happens We basically preemptively Schedule a router on two different network nodes or however however many you configure And then we basically do a really quick failover once we detect one So it uses keep alive internally Which itself uses VRP internally, which is a Protocol that's used in the routing the physical routing world, right if you have you want if you want redundancy of your virtual of your default gateway your Physical routers can talk via protocol via VRP the master router basically says and the master and the master and the master it's kind of a Narcissist protocol and The other rather basically listens to these master messages And if if the if he hadn't received three of these messages in a row, he basically says you're dead I'm the new master so keep alive uses VRP internally and we'll talk about talk about that bit Just a few points that I want to Highlight kind of from a top-down point of view is that the routers themselves are active passive That is we schedule the same router on two agents So one of those router instances is active and the other is passive However, that means that from an agents point of view all of your agents are now active because if you have for example Three agents that as you can see we take router one We scheduled it on the first agent and the second agent Right and for example router three we scheduled it on the second and third agents So all of your agents actually have active or master instances of routers So they're all forwarding traffic and taking some of the hit More an interesting point to talk about is how do you actually segment your routers into clusters? Both with in a tenant and also between different tenants So VRP or keep alive uses a VRP and then VRP has this thing called a VR ID Which is basically the identification number of a router cluster here. I define a cluster as The the master and the backup instances of router one, right? That's one router cluster and we have three clusters here total. So it's basically one cluster pair router So what we do is that when we create the first h.a router We detect that it's the first router For a tenant and we create a new neutron network. This is just a normal tenant neutron network We use whatever segmentation technology you have defined if it's VLANs or or turning or whatever and we basically create a new network and all of the e all of your H.a routers for that specific tenant will use that network from now on to pass these VRP master messages that we talked about earlier. So Between different tenants we can segment the VRP traffic because each tenant has its own network, right? But inside of a single tenant for example here router one and router two belong to the same tenant We use the we basically allocate a unique VRID pair router, which is what you can see here in the diagram So kind of going deeper down into the details We may change is to the server into the layer three agent. That was pretty much it For the server we basically talked about creating VRP or h.a networks And then each router instance is gets a unique h.a port inside of that network uses that port to send and receive the VRP traffic and For the agent we use we basically, you know the agent basically manage manages a keep alive the process pair router and The all of the IP addresses as we can see here Right, we have two agents that the left one would be the active instance and the right one would be the backup instance in this case, so I have a unique keep alive process on both agents The h.a the blue h.a box mean that's the h.a port that the router gets so You know beforehand you basically had the internal device or the qr device the External device or the qg device if you did you know ip net and s exec Then the name of the router you would see two devices and the loop that loop back device now you would see a new device Which is the h.a device which gets an IP allocation from a configured subnet So all of the keep alive traffic goes through that h.a interface The IP addresses themselves are only configured on the master instance This is something that keep alive basically does for it We configure keep alive and we tell them listen the internal and the external IP address as well as all of the floating IPs They're all basically now vips or or Virtual IP addresses, so only the master instance configures the IP addresses once we have a failover right the the left agent Died the box died the cable disconnected whatever we have a failover keep alive detect acts in Right now it takes around seven seconds to fully configure the new router instance and the IP addresses will basically appear on the the new Agent and the gratuitous ARP will be sent pair new IP address so that your network switches and what not know where the new addresses Another kind of interesting distinction is that The h.a network and the h.a ports within that network We actually hide them from the tenants as it's pretty much an implementation detail So from an admins point of view if you list networks you can see all of your h.a. Networks from a tenants point of view You cannot same from the ports right the h.a. Ports again are hidden from tenants So what if I have three agents just like the example we saw and we I want to know where are my router instances, right? so we can do that with Listing basically all of the agents per router This is what you'll get in June or hopefully for kilo. You'll also get to see where the master is and maybe even influence that Let's head over for a demo Okay, in order to to better understand what what's happens when there is a failure on how it works We're going to show you a little demo And let's start with create by creating a virtual router as usual Let's set a gateway In this demo you will see What kind of recovery time you can expect and now we have the router let's check that we have The same name space used by the both nodes which are the not two on the not three on the at the left side And we are going to start Yes, we can check that we have the h.a. Interfaces and the external Interface Which are the h.a.1 and the qj one And we're going to start a little TCP dump to see on which adjunct the gas traffic he Is going Okay, so we are going to to attach a private network in order to to be able to boot a VM on it and to test the connectivity That's what we are doing right now So now we are going to boot a VM attach it to the private network Let's say the VM one I guess and finally we are going to To attach to associate a floating IP on it, okay We are we have our VM booting the floating IP is associated. So now we can Check if the VM is up. I think we are going to see that just after so let's do a little ping on on the floating IP and let's wait The VM the VM But we can so that the traffic is going through the node at the right top Which means that this is a master and now we have the ping we are going to to shut down this node Which is a master to see what is the the the recovery time we can expect and Normally you can see the traffic going through the the node at the bottom right Okay, the traffic now it is stopped And we will see that the traffic will go through the slave node, which is now the master finally So it gives you a little bit From Just from a very really basic lab test that we did it was it took around seven or eight seconds to fail over a single router and Then ten seconds to fail over 30 routers With kind of a more traditional solution it would it's basically around five seconds per router So it completely scales linearity for 30 routers. That would be 30 times five seconds Which is isn't just just isn't as nice Okay, now we are going to see What is the remaining work we have to do and what is the ongoing work ongoing work, sorry There is some kind of Improvements that we are we're going to to push in the kilo release finally as I said We're able to to know exactly where you are Virtual water are hosted, but we don't know where is the master and we are going to to to submit or to fix this in the next release of Neutron or open stack to to get this information back to the controller and Thanks to the API we will be able to to know where is your master Currently there is no more This is not a stateful solution So you we are going to introduce the contract the solution in order to to keep the session up or some other session There is an ongoing work to to to give us your the possibility to migrate a legacy router to an HA Yeah, and Currently in Giorno the DVR Solution has been introduced, but there is no way Right now to to to use both a solution especially for the for the SNATS traffic SNATS traffic the limitation now Currently we only use one H-chain network pertinent So we have a limitation due to the FAIP protocol, which is to 255 virtual net routers per H-chain network We can remove this limitation by allowing to to spawn or to create a more than one H-chain network pertinent, but It's quite good for for now We can improve this solution the east-west traffic and the north-south traffic, but This is most the goal of the DVR So that's why I'm going to talk a little bit to give you a little overview of the DVR approach it is So the goal of the DVR approach is to almost remove all the traffic going through an L3 node So all the virtual routing Is distributed to the compute node, which means that you you don't have any more east-west traffic going through an L3 node This is mainly the same thing for the for the north-south traffic when you you are using a floating IP But you still have to use an L3 node when you are using the SNAT mechanism So when your VM doesn't have a floating IP Yes, as is Some services could be also distributed like the fire as a service or the DHCP for example Finally without the DVR, so in I-South we have this situation When two VMs are on the same compute node but Attach it to a different networks If a VM wants to the VM1 wants to send some packets or some Traffic to the VM2 the traffic will go through the L3 agent It is this is quite inefficient and we have a bottleneck here and a spoff a single point of failure This is the same thing if the VM4 wants to send packets to the VM1 Of course and with the DVR for the east-west traffic There is an instance of the DVR on each compute node Thanks to that There is no more traffic going through the L3 agent the traffic will stay at the compute node level and Yeah, so no more no more signal point of failure no more bottleneck more efficient For the floating IP all the floating IP are scheduled on the compute node Which hosts the VM So we can see here that there is no more traffic going through the L3 agent Same thing for north-south traffic with floating IPs Although not mechanism is done by the DVR instance in fact there is more than one name space as usual when when you are using DVR It's more complicated like than that, but Finally, I will just put a DVR instance here to For better understanding and as I said, we still have we still need a Central exit point or central exit L3 node in order to to to have External connection when there is no Floating IP and that's what we say here the VM1 Doesn't have any floating IP. So the traffic Should go to to the L3 node Yeah, in this slide we can see that there is very happy plus as not here Which is not the case in the in the journal release, but we are going to work on this for the kilo release So in the kilo release We hope that there will be no more single point of failure for this kind of traffic So just a summarization of all of the million different solutions here. So The first solution that Sylvain mentioned is the RPC health check one Right, we have the rescheduling which is the basically the loop that's built into the the neutron server We have the layer 3 ha one which is based on keep alive and DVR So if we're gonna kind of kind of look at it from a high level point of view and outline the differences The health check one has been there for a while and it's in production in a few different clouds Rescheduling it's just it's really simple. It's basically the simplest way you could solve it With the layer 3 ha the failover is very quick Also, it's the failover itself As opposed to the previous solutions, it's indifferent to the management plane. That is the failover detection and the failover itself works in the data plane Which can be a pretty significant advantage and DVR Basically kills all traffic to the network node or most of it, which is something that we really want to do All right, we have the the releases DVR at this point requires tunneling and L2 pop enabled. There's there's talks about introducing VLANs as well And basically what you actually need to do you to your cloud if you want to upgrade and if you if you actually want to use one of these solutions both the first Solution requires you to install a new agent right on your network node That's probably not such a big deal Rescheduling is the simplest one. You just enable a configuration option and you're good to go Layer 3 ha I want to talk about that for a minute Basically, it's the same for DVR and layer 3 ha you have to enable the admin can enable or disable a configuration Option and from that point on all of the tenant routers will be ha or DVR by default While admins can always overrule from the horizon point of view if when you create a router you can choose It's a radio button right now. It's mutually exclusive. You can either Create a legacy router an ha router or a DVR router. We would like to To be able to create a router. That's both that's Kilo or maybe even for the L release honestly While DVR requires some topology changes, right? You need your computers obviously to be able to to be connected to the external network We'd love to take some questions is what supported sorry Linux bridge mechanism driver with OBS only why is that? They used open v-switch and and basically flow based Logic Okay, so is there any hope for having Linux bridge support for DVR really unlikely? Okay. Well, that's said to here One observation is DVR without Linux bridge support is kind of useless for DVR in my opinion secondly Because I view OBS is kind of an it's like Adding three extra gears to a transmission that does nothing for you. It slows your machine down Anyway, let's not get that. Yeah, there is a significant bug in HA right now in the sense that you can't use L2 pop with HA Yeah, we know is that gonna be fixed in June or is that gonna be put off to kilo? Well, Juno is already out. We're definitely gonna fix it for kilo. It's probably backboardable. I think so Yeah, cuz cuz keep a live D doesn't feed anything back to L2 pop and I know it's a problem Okay, I just what curious when it was gonna be fixed. Yeah, it's a high priority one for sure But but not until kilo then it's not gonna be in between. Okay, I think I had a question about the VRR P Imitation over here. Oh So is the VRR P keep alive does it just detect the node is down or if say one of your Network or nodes loses his uplink, you know his way out. Does that detect that too and fell over or? That's a decent enhancement. I think we have an open bug on that But no right now. It's it's just if you don't receive hello messages So if the the link is dead if the node itself dies Give the keep alive process dies something like that, but if if the uplink Link goes out right now. We don't detect that. No Thanks There's a by the way, there's a list of open bugs tagged layer 3-HA So if you have an issue first check there But otherwise just open the bug or talk to either one of us and we're super happy to get some more feedback And a typical to to know at age a configuration. How do you prevent split brain? Well, we have a few There's a few scenarios that I managed to cause split brain, but they're very artificial Just from our testing. We haven't been able to induce split brain at this point But I think it would be very similar to a physical, you know, just Basically, it's very similar to just basically taking two boxes and configuring keep alive manually like so many people are doing It's a pretty proven Technology, but if you manage to reproduce a split brain again before the bug and we'll fix it Don't you think that automatic risk scheduling should be replaced by VRP instead and have only one HA solution for S3 node should be they're different For one the risk scheduling loops depends on the RPC bus At the other hand, it's I think it's less risky if you're more conservative and you just upgraded to Juno And you want to try it out try the rescheduling loop first probably it's literally just turning on a configuration option and seeing If it works for you layer 3 HA is a bit more complicated It's you know around a hundred times more lines of code. I'm sure there's more bugs there We're not that good of a developers honestly First of all great stuff guys, especially the bathroom stories My question is about the firewall as a service if you use the VIP solution, how do you manage? Is it still possible to use the firewall as a service because it's running on the L2 agent and using the interface so Well, I think this is pretty the same thing, but finally we we didn't test this very well This is this is something that we have to do in the kilo version but mainly you have a All the interfaces are managed by people like the instead of the L3 agent and all the interfaces are the same It's a matter of maturity and testing. Yeah, but if I'm not mistaken it should work. Yeah That's a famous developer thing Yeah, what about IPv6 is we're supporting DVR and keep alive IPv6 fully functional virtual IP and No, I don't think that the IPv6 works with with the DVR. I think this is a non-going work But we have to check with the DVR team For for the EHA. Yeah, I'm sick. This is pretty the same thing Any more questions? Okay, great. Thank you guys. Thank you