 Hello everyone, this is Daniel Alvarez. I'm working for Red Hat For the past five years almost six years already. I've been focusing mostly on on Neutron and OVN and today together with my colleague Luis. We're going to introduce you a new project OVN BGP agent and Let's get it started. Okay, so we're gonna start with a Y Where we're starting a new project? So you probably seen in the in the keynote Today and yesterday there's been We have seen a trend in the data centers that typically We are seeing this very frequently like the shift is towards, you know pure L3 deployments where The layer 2 is normally Just ending at the top of rack level So beyond the top of rack Everything is routed and this is how the underlays are are trending to and We are seeing as well needs for scaling Typically, you know in we have seen this today as well like not not just increasing the number of nodes for for the for the sake of it, but also having better maintenance a better day to operations and You know just usually having multiple smarter clothes versus a large deployment with a lot of nodes this is something that is typically preferred as well and We explore what is available upstream and there's a couple of projects upstream that you're probably familiar with one is Newton and amic routing and Networking BGP VPN. Those are the two main projects available upstream and There were some gaps there. We wanted to to you know to explore how to close them and We will walk through it a little bit later on on the presentation So we decided to to start a new project, which is basically a building block. I could serve to Actually can be integrated in those two projects We of course we open source it so it's right now in the under the open Dev umbrella There's gonna be links at the end and and essentially we what we did is to start something a proof of concept display around to gauge the interest from the community and the users and Basically all of you can help, you know with a future direction of the of the project and we did it under one premise Which is essentially not to to be intrusive to any existing projects. We didn't want to do any changes either To Newton or Corvian. So we wanted to to have something just you know from a deployment perspective and This agent that we were going to talk about today Which is minimally invasive so Minimal changes of course it has some gaps and limitations We're gonna walk through it to and serve as a functional block for what is coming next Okay, so we'll start why we want to move away of of the layer to so Normally the the aggregation layer would be the demarcation point between the the L2 and the L3 domain and You know there there's some issues with with large broadcast domains like failure domains or You know typically broadcast traffic Convergence that you have a virtual IP and it can be moved to some of the nodes that there's gonna be gratuitous harps There's gonna be a lot of devices in the network Learning those the new location of the of the virtual IP Decom issues memory in the devices This is the typical layer to issues that you know probably most of you are familiar and have experience at some point And there's a lot of protocols involved right like You know for achieving bonding LACP you need to configure LACP in the fabric You know spanning tree board protocol those usually extend beyond you know villain and they spread across villain stance and They could cause a vast increase in the in the traffic level like broadcast multicast, right? Gonna walk really quick through this and This is where we are trying to to shift to Essentially bringing routing all the way to the to the lowest layer of the hierarchy all the way to the top of rock and down to the compute nodes there's more diagram here, but This is this is a you know kind of like a how an MP church could look you would have your rack And that is the L2 boundary and that you can host workloads from different clouds and Interconnect them somehow but essentially the the point here is that all the traffic will be routed down to the top of rock and the advantages are of course reducing the the size of the layer to domain reducing You know therefore the the failure domains faster convergence we can achieve load balancing and and You know instead of using traditional layer two bones We would be using ECMP bfd for failure detection and for this we need to run some sort of bgp like routing dynamic routing protocol in the fabric and What we are going to show is with bgp is the protocol that we selected and One of the advantages of using bgp is that most of the time what we are seeing today is that bgp is already being running there in the fabric, so it's kind of easy to Plug in there, right? Another advantage could be you know by doing this avoid static configuration in the fabric You don't need to go and configure static routes in the fabric and maintain them Things like prefix delegation for IPv6 a little bit too complex to implement. So this sounds ideal. Of course, there's There's also some pitfalls like with everything. You probably are familiar with it with the outage last year October I think you know when nobody of us could access Facebook or WhatsApp and that was the DNS service or Facebook They were they were disconnected. So the the bgp routes Were withdrawn so that the DNS service were actually up and running and they were functioning but the the routes were withdrawn right so There's these is not free of problems Okay, real quick. It's a little bit messy as like but The point of this is to show You know what up some of the benefits of of having bgp running in your in your open start cloud and You could have applications that live in different clouds So you achieve better resilience you would have active active Basically, you are announcing the location of a virtual IP from two separate places and the fabric will do its magic to To do that at the traffic either to the one or the other so that leads to Disaster avoidance and recovery you can achieve geo redundancy and Essentially you are avoiding normally L2 a stretching and all those problems that come with L2 when you want to spawn across different sites or geographies and So it's it's about not like there was some some reference this morning Well, not just growing your cluster, but growing your cluster and expanding it across, you know, or taking it to the edge Spanning multiple multiple physical locations again, like easy for simple day-to-operations Usually it's gonna be easier to upgrade a smaller cloud or drain, you know, just migrate your workflows from from a smaller cloud So it should simplify the day to operations And just so Luis will deep dive in in, you know to the actual implementation details and Floats, but we are not touching the east-west traffic. So OVN is still Into picture and and all the east-west routing keeps working on the overlay So we're focusing mostly on on workloads on provider networks and floating a piece Although we don't want to leave behind all the use cases like tenant network advertisement with EVPN and whatnot and We are relying on the on the BGP in the fabric Typically with the spinal lift topology to connect for north-south traffic How we do this? So we chose a far as the routing suite. So there's a lot of tools there We can do BFD able. Luis will dive into the details Now what we are doing is we are running a BGP speaker on each node So the PR would be typically would be the top of rock Of course, you can peer with the DC gateway and advertise directly connected routes. So For the control plane we need to have something so we did it pretty much just that the deployment layer Create a loop back IP configuration that's going to be used to peer with the with the top of rock and and just deploying it for our configurate and Setting up the agent as I said like no changes to new turn or OVN or any configuration at all with those components And for the airplane we're relying on this BGP agent, which is connected to the OVN database So this is actually it's not open stock specific. We have done it agnostic. Of course, like the focus was Open stock, but it just reads from OVN database and It configures things to trigger a far to advertise the routes in the case of EVPN It will configure the the VTEP endpoints for the for setting up the tunnels and it will be in charge of redirecting the traffic to and on from the OVN over like and off to you, Luis Thank you So I'm going to give a little bit more of the test about the VGP control plane and especially the The data plane for BGP. I guess you are well aware already But this dynamic routing protocol basically to change routes And it's basically based on the concept of autonomous system and depending on if you are changing those routes within or across Autonomous system is EVGP or IVGP We use a bunch of features from it like router reflectors, VGP and number, ECMP default routes and we are actually really leveraging the BRF one, the virtual routing and forwarding. This is similar to the Network name spaces, but just for layer 3. So in a sense, it's just kind of routing table insulation and the tool as Daniel mentioned To leverage or to use a BGP is FRR the one that we choose which is a Protocol suite. It's not just for BGP. It has all the protocols, but in our case we just focus on on BGP It's a Linux foundation project. It was a fork from Quagga and just as a It's a pointer in in real aid It was a bird was deprecated in favor of FRR and it came with a cell BTYSH that is what the the agent will use to reconfigure FRR when needed I have there just one example a random one for a configuration file for BGP It's nothing to take a deep look at it It just wanted to highlight the red distribute connected because that's kind of the feature we are going to Abuse or to leverage Basically that means that the local routes are being exposed So in the note everything we add to the devices is what it's going to be exposed outside And that's what the agent will do to ensure that the BMS IPs are exposed outside For the control plane we move to a way where it's a pure layer 3 control plane basically we This can be visible, but we have this kind of spine lift apology We're now the controllers of open stack can be moved in different racks instead of having to be on the same one Thanks to this layer 3 we make the triple O adaptation so that we kind of support this this deployment and do the FRR configuration with With our needs we have like a dedicated networks to connect the controllers on the computer to the to the lift So that we ensure complete the installation there and basically we expose the The BAPs of open stack name points just by adding them into the loopback interface in In the controllers and basically then this is thanks to BGP Share for the infrastructure and then a failover. It's as simple as adding these fit to a different Controller and automatically the it will converge and it will detect a new location I'm not going to give a lot of details about this because the talk is kind of focused on the data plane more than in the Cortone plane, but just to give a heads up here and of course any question. It's We will try to handle it For the data plane as Tani mentioned is the OVN BGP agent what we are been working on It's basically just a Python based diamond that runs in every open stack node It just it is just connected to the OVN cell phone database and just to detect the events So it's kind of reading only mode detects the events that are Needed to figure out when we need to advertise and do to To expose a VM and it basically leveraging a couple of things on the one hand as we explained the the far are BGP to announce the relevant IPs in our case the VMs and the load balancers VIPs And also the kernel networking to redirect the traffic once is in the node to the OVN overlay It has some requirements like of course we need a frr on the node so that we can use BGP it we need to be connected to some peer It's usually the leaf but it can also be the DC gateway at the center gateway if need be and we need to enable ERP and DP proxy so that the traffic can be redirected to to the OVN overlay and the Linus provider bridge the obvious provider bridge can answer to that It's just a simple diagram of how it works basically Haliting again that the idea here was to not need to modify core of EN or Neutron at all so just an addition and an extra agent there It has two main components the watcher and the driver and it's built in a sense that you can plug in different Watcher on different drivers depending on what you want to integrate with for our case We have couple of drivers as supported now like a BGP and EVP and and basically what the watcher does is to take a look at the audience of on database and Especially focusing on the poor binding table events and when certain things happen like usually like you put up a VN Or a load balancer VIP on the Provider network or you associate a float in a bit to them This event is detected and then the watcher is calling the driver and this driver is the one in charge of doing Beneath actions to ensure that the VM is reachable in this node from that point So the first thing to do is basically to ensure that we Kind of wake up for art to make the IP available from that point So basically we just need to add this IP to one dummy device dummy device created by the By the agent so usually you know separate VRF so that it doesn't interfere with the routine But it's just for the matter of exposing the piece of from that point All the infrastructure will know that this IP it's reachable through this note and then the second part of them of The of the work of this agent is to actually once the traffic can be Can reach this note to ensure that this traffic is actually redirected to the OB and overlay and to do that We basically use a set of kernel routine like IP rules and IP rules to redirect the traffic And then also need to to play with some with adding some extra obvious flow to the provider bridge from obvious So that we handle the proper Mac addresses It's simple to add new watchers or drivers and for instance another one can be as Daniel mentioned This is not relying on neutrons or something So it's just taking a look at the OVN so you can add one driver for OVN Kubernetes for instance Taking a look at the cell phone TV detecting the events in especially you will need to do it in a slightly different way Maybe detecting only the events that you are interested in and then maybe the driver need to trigger some extra actions We actually did a POC where we play a little bit around with EVP F and XDP And that's basically is like adding yet another box there of set of actions to be done to expose the VM from there I'm going to explain a little bit briefly the couple of drivers that we have so far We have the BGP one. This is basically in church or main target is to expose the VMs and the load balancers on the provider networks Or which has a floating IP attached to them one caveat here is that it doesn't have an API yet So everything is exposed by default everything that goes into the provider network But it allows us to have like different VMs in different open stack cloud Being exposed with the same IP and then rely on BGP to load balance between them or to read the traffic to one or the other The one that is available And it also allows us to do this without actually not needing to change the Physical network configuration after when you want to add a new provider network. It's as simple as that so It also came with an option to also expose the network but with some limitations like traffic need to go through The network are now there were no support for overlapping IP is no API So that's why we actually work also on the other driver the VPN mode to kind of address some of those limitations I'm going to spend that later on Actually already covered this in the previous one But basically is the watcher the one detecting this type of events either VMs or load balancer ports being created on this Provide a network of tips associated to them this called the driver the driver triggers the kernel OBS or FRR reconfigurations and Then the good thing is that traffic goes directly to wherever the VM or the load balancer is without having to go through the network I'm not again with one Limitation for OVM Octavia OVM load balancers as there is no VM associated to them the traffic need to go through them Through one of the network nodes or the controllers This is just an example of how this work kind of step by step like trying to zoom in a little bit in a couple of holes couple of leaves and basically We have a device there OVM VRF That's basically the device that the agent create to be able to later on Advertise the IPs for BGP and then there is a set of IP rules and a new routing table associated or kind of related to the VRX from from open stack which at the beginning are empty As soon as 1pm is boot up in in this case host one what we need to do is to the agent What it does is to add this IP to this OVM VRF device, which is kind of isolated This actually makes a for our to to advertise that IP so from that point on everyone in the Anyone on the spinelift apology knows that that IP is reachable through that note So the traffic knows how to reach that point and then it the agent is in search also of adding this IP rule Which basically says like all the traffic coming from everywhere that goes to this IP 172 24 115 Check how what to do with this with this by checking the VRX routing table and this VRX routing table It's by now only has one kind of default route that send it to the VRX bridge There is other routes that are added there when exposing tenant networks, but for for the floating a piece and IPs on the provider network. This is enough. So If more details on that Yeah, that's it's just a kind of the traffic flow like when you go from outside of another place It will just come from one of the leaves one of the nicks and then through the kernel routing It will be redelected there But of course if you have another VM on the same note, it doesn't need to go all that way It will just go like east-west kind of traffic and it will just hit the internal flows from from via int Well, and for the other driver and the main idea for this one was instead of focusing on the public IPs Or the provider network was to focus on the possibility to expose the tenant networks And also allow in the overlapping ciders overlapping IPs like meaning that two different tenants may want to expose The same kind of subnet IPs like ten zero zero zero that's 24 or something And you don't need to synchronize them so that they can do whatever they want They are advertising that at the end of the day in different kind of be excellent ideas. So that's how it's segregated or tunnel it and This actually has an API. We are leveraging here the networking VGP as the API for this I'm going to show you in the next slide and This one commit at the moment that it of course need to go through the network our note at the at the time being because it's This is a limitation of ML to OVM and how we it's deploying open stack that the SNAP traffic is centralized meaning the traffic to the tenant networks need to be injected through this OVM Router gateway port that is connected to the provider network But this driver basically allows us to interconnect different open stack clouds either in the same data center or across data centers for different tenants and I'm not sure we're going to have time that we have a demo at the end that we will It's already uploaded in case anyone want to take a look It's a little bit of the scheme house of how this work the API as I said before is based on the networking VGP VPN, but it didn't come with support for OVM So we add some some driver and the working progress patch is in there Basically networking VGP VPN is just an API or the plugins and API that allow us to expose a neutron ports through their networks and router and connected them through an EVPN or VPN and The agent is using this basically. We add just one. It's really simple Service plug-in driver, which the only thing it does is to consider or to take note of this information about the BXLan ID or VNI ID that is going to be used and information about the autonomous system For the VGP that we need to use and it just write that into the OVM northbound DV in the network Associated port there. This will make this automatically translate it and I'm promoted to the OVM southbound DV in the port binding table And this is actually again in the same way as it was in the other driver like trigger the OVM VGP agent So the watcher detects this event and then is in charge again of triggering or calling the drivers So that it triggers the needed steps which are absolutely different for this But this at the end of the day same models like FRR for advertisement and then kernel routing and OBS flows for traffic redirection One thing to highlight here is as you see this is running on the OVM network gateway As I said before it doesn't it doesn't go directly to the VM but need to go through the networking node or the Or whatever this CRLRP port is which is the router network Gateway and then from that point you inject it into the OVM overlay and it goes as in any East-West traffic through the GNIF channel to whatever the VM is This is an example of how this work basically the admin is the one creating the kind of BGP VPN like associating a VNI and then associating it to a tenant That's you may imply some tweaking on the infrastructure like if you have some BXN ideas that you want to use to interconnect things and Then the admin is the one in charge of assigning this to one tenor or another And then the user once it gets assigned this BGP VPN It can freely assign it to its networks or one router and automatically it will get assigned to all the networks connected to that And with that is what's basically triggers the VPN driver to do the needed kind of kernel networking tweaking It has to do several things it base as I said before in the VRF VRF concept So it creates one associated to this To this VNI ID which need to create also some other devices to do the VXLan ID tagging on or in-cap and de-cap So it creates a bridge and a VXLan device and connect it all together And then it use a best pair to actually connect this VRF to the to the OVN overlay through the VREX bridge Then it adds the IP rules in this in the routing table associated to this VRF so that when the traffic goes To this network it's redirected to through that best device to the to the OVN overlay And also in the other way in traffic leaving the the open stack We need to also have some obvious flow out at VRX So that if it's below it's coming from this subnet we redirected it to the VRF So that it came out to this kind of VXLan ID instead of going out the normal path without encapsulation And then just to advertise the IPs of the VMs. It's basically same behavior as before. We just add these IPs Into the also a dummy device. I put it there look back once 0 0 1 and those IPs are the ones that get automatically Advertised through EVP and by BGP in that channel Back to you. Okay, just real quick to rub up and Love some couple minutes to question for questions So yeah, this project and the code you can check it out. You have stream you can send reviews as you were Used to do it with any other open depth project. So we have a Gary you can submit your patches there check out the code whatever There's plans to add more coverage more open stack specific and and eventually move it under the neutron umbrella And there are a lot of information of the flows that we described with precise examples and you know detailed deep dives and even some videos That URL so that's really interesting blog from Luis and Just some limitations and future directions real quick It of course like you know, we've been saying it all along The idea was to keep everything as it is. So, you know, there's no integration You know with data path accelerations or we are falling back to the kernel that it obviously, you know, it's not really comfortable with Things like SRV or OBS DPDK We have things in in mind, but we're happy to hear ideas So far for the BGP mode, there's no API. You could be easily integrated and probably extended the API for user dynamic routing although the the approach that we we've taken is slightly different and You know traffic to tenant networks It's still traversing the gate. We know that that's some that's how we how neutral works today Forest nut traffic and future direction, you know, just coming up with an API trying to make Distributed north-south routing to avoid traversing this extra hub that I was talking about and maybe moving some of the pieces that we have To OVN to core OVN like the NDP or our proxy You know to make it or even even the The routing part move it to OVN so that it can be afloat at some point Some of the interesting things EPPF and XDP. Luisa has been Playing with that accelerated in the ingress part. So it doesn't need to go through the kernel and That this is pretty much it No time for the demo. We just have Few seconds left. So please feel free twice any questions and Thank you so much for Forcoming is we really appreciate that Right. Just let me repeat the question for the for the audience here So the question is about performance and scale. Have we taken any? measures about You know IP advertisement the time that it takes for it to be propagated and failover and how many routes We're able to take and whatnot. So we have done just very limited tests We haven't tested these at performance at scale So we have just tested the failover time. It's below a millisecond. It depends really how you configure BFD So BFD you can parameterize it to you know for different convergence. So there's certain tuning that you can still do But unfortunately, we don't we don't know like what is the actual scale? How many routes we can I guess that it depends on a lot of factors But yeah, I don't have a like a we don't have an answer now We have a colleague though that did some testing not with this type of deployment, but to check like BGP and this stuff It was more for the open stack for the open shift and component use case and it got really nice numbers Related to the scale and the amount of routes that you can expose that it was kind of really nice 100 it was more like the number of peers Or not affecting that much unless you go crazy big so but in this type of environment we didn't test it It was focused on open shift, but using a for our on the same type of approach. Yeah, there was no real bottleneck absurd That's a good point. Thanks for remaining Thanks Sorry, I cannot hear you. You have the mic there, but we can try. Oh Thank you How far is the integration of IPv3 version 6? Integration of how is how far is the integration of IP version 6? It's already working with IPv6 too Yeah Yes, it's I mean we give it a try and it's actually when doing their rules It's exposing both of them like IPv4 and IPv6 And that's why actually we need to enable NDP to make like similar to to to IPv4 For the proxy so we give it a try with IPv6 and it was kind of working We didn't measure the performance either, but it was working fine It is the same approach But instead of using our proxy we're using NDP proxy and obviously there's no floating IP because there's no such a thing in Neutron for IPv6 But otherwise, yeah, it should be working for it using BGP and number in this configuration to use IPv6 to Distribute the rules for IPv4. So Thanks There was other somewhere. Oh, there's a line there. Okay So you've got a similar setup. We don't use OVN. We use Calico The drawbacks are it doesn't support a lot of the Neutron semantics Some of the things we picked up You mentioned at some point you're putting the local IP address on the on the local host interface We found that putting it on a dummy interface is a little bit better because you can set You can turn the dummy interface. You can down it. You can change the MTU on it It's it's a little bit less intrusive. So we we made a fabric zero interface for that I Took a look at the Neutron semantics and I think the reason why there's this Reliance on the networker node is because the ml2 driver The type the networking types are all assuming L2 adjacency. So it's most of the the types there So I think I might be wrong here, but I think a new Neutron type is needed so that you can flag that the provider network is now L3 fabric and You can route directly to it. You don't need to go via the networker now node That's that's most of the stuff the other things is Yeah, a lot of the DHCP and network auto configuration is still not really mature in this space So it's still something we need to do On it. We need to put on all of our machines. Is there any Help in that respect that's that's already been done. I didn't catch the part with DHCP So so with DHCP and those kind of things you can push the normal L2 configuration, but there's no Auto config for for BGP routers yet Right. Yeah, actually with OVN the ATP would be served locally So it would be handled by OVN controller. I mean when the hosts get provisions Because all the provisioning time. Yes. Oh, okay. Yeah, so so that that's kind of a problem at our stage We need to hard cut the Or push the BGP stuff using custom configuration. There's no way to discover that Remotely through the network yet. Yeah. Yeah, so far we're relying on The top of rock being the next hop for all the traffic going out So we are not importing any routes and we're just exposing we're just advertising But that's a good point. So we are relying on a single a Pre-defined get way for all the traffic coming out the node or actually to you can do, you know What we're doing today is ECMP So typically you would have two nicks and you will you'd have an ECMP Route outgoing so you don't need to to know exactly because you're gonna reach the top of rock anyway So that you're moving the problem up to the top of rock and beyond I think our only thing that the top of rock exports is the default route in any case. Yeah. Yeah Thank you good points. Thanks Yeah, thank you at first for this topic and I've been waiting a lot a long time for this feature to come And thanks for working on that I've built open-stack installation with layer 3 rotting in the underlave for quite some years and that I was waiting for that What I'm wondering is The management of the switches to get the BGP session set up Are there any facilities in the project do you envision something like that coming or is this still something I have to do by hand? That at least for now it was out of the scope of this What we I mean for playing around with this we have like a set of Ansible playbooks to set it up, but it was more like a PoC kind of thing not like something for production to configure the switches and this kind of thing So that we then look at it. Yeah. Thank you. Good. Thanks. Thanks And I would just continue My question was the evpn features sounded a little bit like the ovn interconnect feature that it has on its own to Connect different ovn deployments Is there overlap that you intentionally build out or is it just two different solutions for the same problem space? to us to us ten is similar, but the things really different because For the interconnect is more like For the whole tenant right for the whole open-stack and this to kind of segregate the tenants for a different VXLan ID, right? So I think there's some limitation But we actually were taking a look at the interconnect to see if that was Good for us for for this kind of case and we actually wanted to use it But didn't found out how to to fit in in all the you mean the ovn interconnect? Yes, I Think you meant the ovn interconnect. Yeah, but that's using transit switches. Yeah, so you need to yeah But you need to agree on the on the site is an all because you need to share part of the of the networking So you will create a transit switch where the two glass table connect. Yeah, it's Actually, we were we've been looking into that to extend to scale a little bit better like when you go certain, you know after a Essentially you would have to ovn control planes, but you need some management to do between them Yeah, we haven't thought about That in the context of PGP, but it could be it could be interesting because you can you can advertise that transit switches to routes to them Yeah, thank you. Thanks very much. Thank you All right. Thanks. Thank you very much