 Okay, so I think that the room is very full, so I'm really glad to be here. Hi everyone, I'm Christophe Fontaine, and I shouldn't be there, actually, because I'm here because Amiens, Portier, and Veda were not able to attend this conference, so they asked me to present that topic here, but I'm happy to be there and to see all those familiar faces. So what we will be talking about is BGP today. How do we enable multi-cluster connectivity in an efficient manner? But what are we talking about? Because are we talking about deployments? Does it work? And on the efficiency, we'll see the different performance metrics. So it will be a very small performance metrics, but keep it up with me. And I will try to make it quick, so we'll have plenty of time for Q&A. So about Spine and Leaf deployment challenge, you already know that. There is nothing new, the Spine and Leaf deployments are somehow common now. We do have some challenges on the controller side because we have to replace all the layer two mechanism with the layer three capability mechanism to deploy the controller nodes on three different sites and on the compute side, well, somewhat somehow that's easy to do as well. The actual challenge is about the top of our configuration. How do we integrate all of that? We could go with uncivil networking, so that's great, but sometimes it's not enough. Where the fun begins is about the multi-cluster story. How do we integrate that? When we have one tenant, one customer, which deploys a set of virtual machines across multiple data centers or across multiple regions, how do we interconnect that? But basically, even that issue already solved. Because with Neutron ML2, we can bring third-party SDN vendors and that's featureful. We already have a solution to interconnect multiple clusters with BGP and honestly, it just works. You also have, thanks to those third-party SDN vendors, the ability to configure the physical switches in addition to the virtual switch. But sometimes the cost means that we have to do something upstream. And what can we do? The solution is called oven BGP agents. We don't have two drivers, one for each use case basically. The first one for BGP as a simple BGP speaker and the other one, so and that's what we will mainly focus today on is the EVPN driver. So how do we enable multi-cluster connectivity through BGP? And I will not go over and over that presentation because some colleagues already did that last year in Berlin. They already presented the war architecture and how they integrated that into OpenStack. Unfortunately, they weren't there today, but. On the EVPN, compared to the BGP driver, the EVPN also brings the compatibility with the actual OpenStack API so that on the tenant side, you do have the ability to integrate and to interconnect multiple tenants. And the workflow is fairly simple because you have an OpenStack CLI, OpenStack API in order to configure the proper database and the database will be updated within Neutron and you will have all the hooks within the EVPN driver. And voila, you do have those interconnection. So honestly, if it works, don't fix it. But honestly, does it work for everyone? Because if we look at that schema specifically, what do we have? We have the database, so directly from the virtual machine to the integration bridge on one node, your packet is teleported directly to the network node then again within the integration bridge and then you have, what do we have? The kernel routing. What if on those nodes, on those obvious bridges, we have obvious DBDK and we'll go very simple with obvious DBDK? We still have the kernel routing. So this means that we have to print all the packets from the user space down to the kernel space and this action will actually slow down the complete database, unfortunately. So is that efficient enough? We're saying that the kernel is actually slowing down the whole database. Don't get me wrong, the kernel stack is awesome. It has a lot of features on the ability to terminate the TCP connections. It's doing a marvelous job, really. So we could say that it's really fantastic for 99% of the IT workloads. But sometimes we just want to push the cursor a bit further and to go one way, one step above. How do we remove that bottleneck for VMs which are exposed through BGP when they use obvious DBDK, Azure VE, obvious other offload, those acceleration technologies? So what do we need? So we have a DBDK app within a virtual machine with all the tunings that we have. So huge pages, CPU dedicated calls, and so on and so forth. We do have over and with obvious DBDK, once again with huge pages and dedicated calls, and then we have a kernel router. So the answer is simple. We need a DBDK-based virtual router, but how can we do that? If it was as simple as that, we would already have done that. And we have multiple options. And OBS stands for Open Virtual Switch. So that's a layer three, and that's not a virtual router as is. It still has some routing capability because when you have a journey tunnel, what do you do? You are actually injecting that packet into that tunnel. And if you will look at the open floor rules, well, the packet is directly routed by OBS. So how does it do that? Basically, OBS already has some routing capability specifically for tunnels. That route is already configured thanks to a net link because it listens to the routing table. So we have a copy of the kernel routing table directly within Open Virtual Switch. And that's great because it works for edges with OBS ML2 before and OVN ML2 today. So with OBS with VXLan tunnels and OBS with Genive Tunnels today. But it was meant in definitely only for tunnels and for tunneling. So does it support VRF? Does it support IPv4 and IPv6 stacks in order to answer to ICMP message or neighbor discovery messages? As of today, no. So this is really only a layer three switch. So this means that we would have to implement all those different features to make it and to transform it as a router. And because it's a switch and not a router, we also have additional issues. So what if you have an empty mismatch? You cannot fragment the packet. Okay, it's not any more an issue with IPv6, but we still have a lot of IPv4 workloads today. So this means that even on the upstream community, this would be extremely difficult to integrate, not impossible, but to transform a switch into a router that would require a lot of effort. And we need that feature now. So what can we do different? We already have a router within the logical router in the oven. So why can't we use it? Because basically we need to inject the proper routing flow directly open V-switch. And oven is meant for that. So let's try to reuse it. I talk about ARP messages and ICMP V6. It's already handled because we do have that oven controller running locally. So all those packets can be directly injected to an oven controller and it will do its magic to either generate or answer those ARP and the neighbor discovery messages. So on that front, we're covered. But basically in order to have that local open flow rule, what do we need to have? We need to have that routes declared into oven Northbound Database directly within the logical router. But how do we access to that Northbound Database through the Neutron API? That means that for every chassis as gateway nodes, for every network of nodes, we would need to have a kind of a feedback look directly from oven BGP agents, so directly FR, which would inject the routes into the Neutron API. And then a Neutron API would do its magic to forward the API and call the proper oven ML2 plugin so that we'll have the route into Northbound Database that would be transformed into the proper logical flow in the Southbound Database. And finally, transform with oven controller as an open flow rule within that integration bridge. But that's somehow difficult because with one node, that's great. But what if you have 1,000 nodes? Because with the BGP driver in oven BGP agents, you have one FR instance running on, and so one oven BGP container running on all compute nodes. This is why it's going to be difficult to do that because we cannot scale it. Let's put it that. So the third and final option was, okay, we do have one oven cluster. What if we bring another one? That other one would be extremely small because it would be running locally on each network node. But the concept would be exactly the same. Oven BGP agents would just forward the routes locally to the other oven cluster, well, single node oven, and that would push the proper routes directly to another bridge. So that was the approach that was taken by Lewis. And yeah, so what are the numbers? Because when we're talking about numbers and performances, we need to think globally because on the control plane side, it's not a control plane issue anymore because we do have an end-to-end DPDK enabled database. So how did we measure that? It was a very simple setup and only one deployment in that specific case with one provider network so that we have one leg directly connected to DPDK-based application. And on the external network, we do have a floating IP held on the BRBGP. So everything's great. And on the traffic generator side, we decided to use TRX because that's what we use on a daily basis, at least in Red Hat, in order to generate and simulate the proper traffic. So on the virtual machine, we actually have open V-switch, or VSDBDK as well, because we could have bring a third-party DPDK-based router but we already have that L3-switch, which is obviously DPDK, so let's use it as well. And everything is configured, I would say, in a standard NAV configuration. So with huge pages, dedicated CPUs, we do have the proper isolation with Tundi and the CPU partitioning profile, both on the supervisor and within the guest. And the only thing that we modified is that we either deploy with the second oven cluster or without it, so either with the kernel data pass or with the DPDK data pass. The kind of tests and traffic that we inject is fairly similar to the IFT 2544, so that's 64 byte frame size, so that's really small, but this is a standard kind of test. On the configuration, we also have multiple routes with static routes and so on and so forth, but honestly, nothing fancy. What does that mean? If it's not fancy, it just means that we are just back to the existing performance number that we have of kernel versus OVSDBDK, because we have a dual port in Telnick with 10 gigabit interfaces and within OVS, within the virtual machine, we do have some specifically crafted open-flow routes really to simulate a Rusa. So, nothing fancy. As I said, on the kernel routing side, we are at roughly 100,000 packet per second, so it's not low, it's not high, that's basically what we have. And with OVSDBK, we were able to reach a bit more formed million packet per second, so that's a huge, huge difference. But, unfortunately, that's only the base case, because doing that means that we are not using any kind of NAT or any kind of fancy feature. This is really great for the multi cluster connectivity when you are just routing the packets from one leg, from one network to another one. What if we want to use the initial deployment, which is basically with a BGP driver? Does this mean that we don't have a floating IP? If we have a floating IP, we have NAT. And if we have that, it really means that we have contract. And, unfortunately, as we enable contract, the preference drops. Roughly, still on the same characteristics, we can go from four million packet per seconds down to 1.2, 1.3 million packet per seconds. Still, it's much better than what we could have on the candle space, because that's really a zero loss test. Also, what can we do? And I talk about OVSDC flower offloads, but what about those smartnicks? We still have two physical interfaces. So, on the compute side, nothing changes. We have either OVSDBK or OVS other offloads. And we are able, in the L2 mode, we're able to reach line rates with a ConnectX5, ConnectX6, with 10 to 25 gig interfaces. Honestly, that's completely flawless. We do have the SRVP performances. That's great. As I said, we don't want to have that grouting capability to be a bottleneck. So, let's try to put exactly the same setup on the network nodes. So, with OVSDC flower offloads. So, OVSDR offloads. And that's where it hurts. Why? That's because the physical to physical flow cannot be offloaded. That's an existing limitation on those ConnectXnicks. And we cannot do anything. So, on the TC flower story, that's great if you are only using one driver, which is the standard BGP driver. So, when you have the virtual machine and the external network directly connected on the single node. So, the virtual function to the physical function. So, the VM to the physical interface, you can offload that flow and you are able to be line rates. As soon as you move to a network node, so physical to physical, just like any other appliance, you cannot do that today. Of course, we're trying to improve that, but that's how it is. But the real question is, do we need TC flower offloads in that case? Because we have the other option, which is OVSDPDK. And with OVSDPDK, on the physical to physical forwarding, that's where the DPDK really shines. And we can be completely line rates flawlessly. In fact, with only one core, we're able to forward 29 million packets per second. And that's a network node. What does that mean? That means that we have the complete node dedicated for forwarding. So, we don't have two cores, we may have 20 cores to forward that. So, with 100 gig interfaces, routing is easy. And thankfully, even if we do have OVSDPDK and the contract, the user space contract issue, the latest improvements in the upstream, well, basically means that we can actually be line rate as well. And we can push much, much more data. Which means that we can still deploy OVSDP flower offload for your workloads on the compute node, but for the network nodes, as of today, we could go directly with OVSDPDK in a single deployment. So, what can we do better? BGP as an interior protocol. Is that really so great? That's an open question. Because we have something else which is called OSPF, which was basically meant for that. I see some weird looks, but that could be another option as well. And with open BGP agents, we are basically using FR. So, switching from one mode to another one could be an option. It's really depends on you, whether you tell us, okay, that's really interesting to have OSPF as an EGP and to keep BGP for, of course, the multi cluster connectivity, but maybe that's an option. We know that some hardware vendors charge much more as soon as you want to enable BGP compared to OSPF, which is enabled and available by default. So, is that something that would interest you? That's an open question. As I said, the physical to physical flow of loads, so-called the half pinning, so you enter into one interface, you may go out to the same interface, we are actually working on that. In fact, on the DPDK user space for the RT flow of load, we know that the CNX, the CNX driver is actually working on that. So, that could be great as well. Also, all those elements are only BGP speakers who are only announcing either a single IP, so it's 32 slash 128 or eventually a network. But, do you see any use case that you will want to learn routes directly on the compute nodes? Because today, the upstream router is used as a route reflector, that's great, and that upstream router is also used as default gateway, so we don't need today to learn routes directly on the compute node. Do you think that it's something that could be valuable for you as well? So, basically, the open question is what do you need? And I kept my promise that we have 10 minutes, complete 10 minutes for Q&A. So, the mic is open. If anyone wants to ask some questions, and if you're too shy to do that, yes, please. Excuse me? The, no, no. Today, that's a new VPN type five that is used for the multi-cloud interconnect for the BGP VPN, EVP and VX LAN. I don't think so, no. And, is there any specific use case that you would prefer type two instead of type five? Yes, there is another, I'll be separate. Okay, yes? In terms of scalability, how many, for basically, like, zero location in geography, or what type of, how many stacks we're talking about here? Oh, actually, it's about the interconnect, connectivity, so there is no limit in the sense that it's a controlled plane protocol. So, if it's about only reaching the other cluster, well, there is no limit in that sense. You will, of course, have the latency issue between two data centers, so of course, we cannot do anything about that, but on the scale size, basically no issue. Well, on my team, we really focus on the data plane preferences and how we can enable that feature for any of these use cases and not on the scalability side, as of today. Yes? Yes, when we said that we're line rate with the RFC-2544, that really means that we want to be able to push as many packets as the wire is capable of. So, for a 10 gigabit interface, that's roughly, 14, yes, 14 million packets per second. So, of course, for 100 gigs, that's 140 million packets per second. So, from a VGP peer perspective, actually, are you peering the four switches or are there's always... In that case, if I go back to the general architecture, yeah, we are only peering with the leaf switch, which are root reflectors. Yes? Yes? Can we go back to the subject of this VTK VTAR router? I don't know if I get it properly. You said that this is a separate of the N-stack running on the network node. So, yes, indeed. In that specific deployment, we will have two oven controllers running on the same network nodes, one dedicated for OpenStack and the other one dedicated, well, one dedicated for OpenStack with the integration bridge in the BRINs and another one dedicated with another bridge, which is BRBGP. So, indeed, we'll have two independent oven clusters, but that second cluster is really limited to that unique node. So, we will not create any kind of tunnels with the other network nodes. That's really local. So, don't we have a total interface between this two... Yes, we do have a patch port in between the BRINs and so it's not the actual WGP that you will find in the computer node that's a virtual one, but, well, that's a simplified one, but you will have indeed an intermediate bridge between the two, but everything is connected through patch ports, which means that, at the end, even if you have multiple bridges interconnected with patch ports, at the end, you will have only one open flow or one data pass flow. So, whatever the number of bridges that you have, you always end with a unique data pass flow. And that's why we can leverage a single open V-switch with DPDK. There is assumption that there is specific interface to doing this interconnection between the sides, right? That's a patch port. Yes. That's what my second question is. Only the OB and VGP and the SAC should be set for the OB and VGP. Absolutely. Yes. Yes, we always have a unique master for each element. So, on the open SAC side, we only have Neutron, which configures open NOSBAN database, and for the second open agent and second open cluster, it's only open VGP agent which is actually pushing routes to the NOSBAN database as well. Yes. It's like a query problem, FRR. Yes. What's the implication of FRR? Well, FRR is here and in order to peer with the actual router, and we are using our configure FRR in order to advertise the different networks or individual IPs. Oh, you're just driving around and just saying... Yes. Exactly. Exactly. FRR is actually run within the context of the open VGP agent container. So, that's within that. So, it's a separate... I don't know if it's a separate container within a single pod. Yes, I think it's that. So, if you don't have any other questions, anywhere, I will be hanging around here. So, yes? I just wanted to ask, what's the progress and what's the status of the project right now? Yes, you're a good question. So, as you can see today, the open VGP agent code has been pushed from the private, well, on GitHub repository directly to the OpenSec foundation. So, that's now under the initial number radar, and which means that the second open controller, we still have one or two patches that are still hanging around, but they should be merged. If not yet, it will be merged in the upcoming weeks. So, everything should be available upstream, if not already. Yes. So, the next question and the tricky question is about the installation phase and how do we deploy that? But that's completely outside of Open VGP agent project. Okay, so. Thank you very much.