 Hello everyone. Today we're going to cover living on the edge with focus on DCN networking. I'm Anita Traggler. I'm a product manager for networking at Red Hat. And I'm Berna Kaffareli, principal engineer on networking also at Red Hat and working with Anita. So let's dive into DCN Edge, distributed compute node networking and look at some high-level use cases and requirements. When we're using DCN, we have centralized control and distributed edge sites with compute nodes. The goal is to transition applications closer to the user and to reduce the fault domain, lower latency. For these types of use cases, we have industrial IoT and retail customer premises equipment type vendors who are interested in this. There's also a use case for time sensitive applications moving to the edge site. This is to minimize latency. Examples of these are gaming, AR, VR, smart cars, robotics, EIML, as well as NFE, telco use cases for VRAM and the mobile edge computing applications. In this case, we also have for NFE high throughput applications for mobile packet core. They're also looking to move to conserve bandwidth and move applications more to the edge. With edge use cases, the nodes typically have smaller footprints with less number of nicks and a requirement for hyperconverged infrastructure with storage. The typical needs are for retail and industrial IoT, as well as you have a need for deploying at specific sites using availability zones and composable roles to pick whether you want HCI or non-HCI. The hardware performance needs, for example, if you're doing high performance computing at the edge, the types of hardware that you need, whether you need GPUs, ASICs, FPGAs, smart mix for bandwidth sharing. And now we will cover telco edge NFE use cases and dive more deep into the requirements and the architecture. For the telco edge, we will now look at the control plane and the provisioning networks. In this use case, the central sites are typically connected to edge sites via SD-WAN and L3 routing. Routed networks are a key cornerstone for this use case. And our deployment will be more like a finally for hub and spoke type deployment where each site has its own segment or VLAN subnet and for each different network type, whether you're using provisioning control plane storage, so you will have a separate subnet and segment for each of these. Another big requirement is encryption across central to edge sites for API and storage management. Edge sites are typically identified by host aggregate or availability zones to ensure deployment is in the right site. IP address management is a key requirement DHCP services over L3 networks over L3 routed networks is a must have. And IPv6 addressing is required as well since we're running out of address space on the IPv4. So more customers are looking for IPv6, especially for provisioning control plane and storage. Okay, so now to dive a bit deeper into some of these components. So talking first about the controllers, so we regroup them on the central site mostly to have a good ha in between all of them. So we need a layer to connectivity. So with pacemaker and and fencing with IPv4 management for failure. So regroup these controllers on the central side that you can see on the left of this ring. And then we've got multiple provisioning networks, like provisioning internal API storage that will allow to provision all the nodes on the edge sites. So we have a concept where you call a subnet where one pair of networks or either IPv4 IPv6, you can you can choose but you cannot do both at the same time, which is not a natural product because this is really limited provisioning network, not to the data plane. And this in the supernet, which would be like in this example in IPv4, it's 101.1.0.0. Then you will have a separate subnets per site. 0.0.0 for central.1.0 for the first leaf side and 2.0 for the second site, etc. and so on for internal storage. And one of the things that we have to watch out when we do provisioning is how to manage the IP addresses on the remote sites because the HCP is a layer 2 protocol. So it does not travel natively outside of the central site. So because we do not have open stack nodes already in place to do some DHCP agents or provisioning, we use a DHCP relay that will forward the DHCP request and replies back and forth between the central site and the edge sites. Or if your VF allows it, you can use directly config drive and it will self-configure itself with what's stored in this configuration. Next slide please. So continuing on this bare metal provisioning. So we mentioned the world routine network before, but we are using it of course also for provisioning network. So here is again another simplified drawing where we've got this supernet and then we split with on subnets per site and we have on the control plane API and storage. So already mentioned IPv4, IPv6, but you can also provision with IPv6 and do some pixie boot in DHCP v6. You can do it with ML2 OVN which are network backends which we usually use. One caveat for OVN bare metal is that you need the DHCP layer to agent running, which kind of makes sense because you do not have OVN already running to relay OVN DHCP services. And of course, as we mentioned, it's better to have TLS everywhere enabled for encryption as you are running across different sites and or maybe unsecured networks. So encryption with TLS everywhere. Next slide. All right, let's now dive into some data plane network support with provider networks. Our goal is to orchestrate VNFs at specific edge sites with using Nova availability zones and your data plane traffic is mostly north south and should be restricted to the edge site. We need support for dual stack for provider networks since all VNFs may not support IPv6. In our use case that we're showing right now, we have a provider network with dual stack support and the VNF has some interfaces SRIV data plane interfaces using IPv6 and the management interface using IPv4. So we need the ability to have a dual stack deployments for these type of use cases. Network segmentation is also crucial. We need to be able to have separate subnets for each edge site and provide isolation between edge sites. Distributed and site local services are a must for DHCP and metadata can be done using neutron availability zones or using distributed services. We also need distributed DHCP services for SRIOV. On the fast data path options for high throughput and low latency, we need use cases where you could support SRIOV VFs. This is high priority and it's a simpler option. Looking in the future, we have a need for OVSTPDK and OVS hardware offload as well and to use composable rules to identify this. For traffic in our use case here, we have north south traffic coming in from the mobile users traveling over our external provider network and then running through the SRIOV ports and coming back out. This is a typical traffic data plane deployment for VNFs at the edge. Okay, so giving a bit more details about this provider network. So in this in this drawing that you can see it's middle right provider network. So what we call in our definition provider network is usually an admin managed network that is externally routed via the refurbisher, the top of track switches, a gateway. So that's not self-service or isolated network. It needs to be physically connected to the real world. So as we said, this one of course is for the data plane for the user. So it can be IPv4, IPv6 or mixed and dual stack because sometimes you do not have yet IPv6 support. So you need to support both. And similarly to the control plane, you may also need the HCP services in your edge site which is presented here. So you can use what was possible before with the HCP relay, but that's not really a robust way of doing it. With ML2 vs, you can do it with site local services like you deploy DHCP agents and also metadata agents per site. So we have their local to the site. And if you do use OVN, then it has built in DHCP. So it is integrated in OVN nodes. And also talking about OVN, so we have got distributed IPv6 DHCP services. So stateless, stateful and slack. With a small caveat that is still under investigation, OVN by itself is not fully, maybe it is aware of a very small restriction when you use Earth IOV. And now let's look at provider network segmentation, being able to support separate segments and subnets for edge site. As we said earlier, connection between edge and central sites is wire external routing, typically SDVAN, MPLS, VPN, BGP options. It is not, we don't have a stretched L2 network, but multiple externally routed L3 segments. Each edge site needs to have its own network segment and needs to work independently and have their own subnets, which could be dual stack. One option is to put a provider network for every edge site, but this results in a scaling issue and it will require you to configure the same provider network on all the central controllers. Instead, what was an option that was selected is router provider networks, where you have a single provider network and multiple segments and each segment per edge. And then you should have the ability to replicate that segment, reliant segment across sites so that you can support site replication for IoT, industrial IoT use cases and retail use cases. You want to also be able to support site local or segment local services with DHCP and metadata specifically and your traffic should stick to your central site and have distributed or site local services. We talked already quite a bit about routed provider networks, so we wanted to give a small recap slide on what they are. So before in OpenStack and you're trying to general a network was a layer two continuous network, like it's to be Ethernet connectivity everywhere, basically. But now you can use a routed provider networks, which have been around for some time, but are quite well adapted to edge site because these are layer three network composed of layer two segments connected between them via outside of OpenStack routing. So you can see it's it's aligned quite well to the edge topology and we have this segment definition, which is just a physical network name, a segmentation type and an ID. We can say, for example, the typical example would be provider one, VLAN and VLAN ID 2016, something like that. Next slide, please. So as I said, these are well adapted for these kind of large scale deployments like Edge. Routed provider networks are supported with ML12 OVS and SIOV. We added OVN support in the Victoria cycle. You will have the links in the presentation. We have a few limitations on it, but nothing major like you can only have one segment, typically VLAN ID or a physical network name per provider network per node, which is not really blocking because you wouldn't in most situations will not have multiple physical networks on the same node, on the same compute node. For SIOV, you cannot reuse the segment ID across sites. So you need to use unique VLAN IDs. And for DPDK, we have another restriction, which is we do not directly support DHCP services like using a DHCP agent running locally. It would work, but it would have a big performance impact. So in that case, use config drive or use a separated DHCP agent. And OVN OVS DPDK is still working progress. And finally, for Routed provider networks work fine for North-South traffic as well as shown by Anita. But if you want routine between North-South and some self-serviced network, which is East-West, where it will need central SNAT or floating IPs, this is not supported yet in Routed provider networks. There is work in progress, maybe for what a bicycle. Next slide. And we also have some limitations on the Nova scheduler part, because what you provide on networks is a feature with some interaction between Neutral and Nova. Currently, if you use SIOV or PSIPI through, you have to use the same physical network names in the central and remote sites. You cannot reuse segment IDs, which is not too bad either, because it will be easier to find where which site belongs to which network. And Nova scheduler also is not segmental rare, so you've got to map carefully between subnet segment and Nova VBT zones, but I will detail that a bit later. So I listed here over two months where you can boot VMs with Nova currently when using Routed provider networks. And for similar reasons, call and live migrations will only work if you specify the destination that it is on, it will not work magically. But don't despair, because all of these are in progress on the Nova site, so the specification itself was merged during the Victoria cycle and implementation is in progress, so expect it in an upcoming release. Next slide. Okay, so I talked quite a bit already about the VBT zone, so here are some more additional details, because the VBT zones in OpenStack is a big subject and most projects have their own definitions, so important here for IK is we have Nova VBT zones, which are the user-facing interface to host aggregates. So host aggregates is a group of compute node menu that you regroup in an VBT zone, which will be in our case an edge site usually. So it's required for to schedule VMs on the correct edge site, you don't want them to spawn on the other side of the country, and also for performance optimization like image caching. Neutron has of course also have a VBT zones with two concepts, network and router. The network one is to group network nodes that run the services, so formal to VS, it's mostly the HCP and the layer-free agent, and also to provide the networks for scheduling ints within VAs. And you also have the router VBT zones for to give also these ints, but for the router, so the layer-free. And Cinder also has VBT zones for storage, but these are, if you are curious about these several concepts, I suggest you join. We have another presentation more focused on storage and DCN and edge topic. Next slide. Okay, so I'm mostly mentioned ML2-OVS in the previous slide, so what about OVN? So here I listed the mentioned extensions for Neutron, so you see we've got two, which are mostly for display, then network, VBT zone, then router. For network, OVN has built-in distributed HCP support. It's running inside the OVN agent, so that one is good. And we added the router of AVBT zone support for VN in the Victoria cycle review is here. And also, as we mentioned before, we have small limitations for SIOV, plus OVN, plush DHCP. As we don't, as OVN uses a concept called external ports for this case, which is not a rare yet of the network AVBT zones. So that one is still in progress. Next slide. All right, let's look at telco edge requirements for QoS and bandwidth sharing. Edge nodes, as we said, will have a small footprint, and typically have only a single NIC with two ports with bonding enabled. And if you're using 100 GIG, for example, you can't be dedicated. In most cases, you have to share the bandwidth with different network types. There is a new feature called NIC partitioning that allows you to distribute SIOV VFs across network types. And these network types are control plane for SSH, SNMP, as well as storage, control plane VNFs, VNF management, and data plane VNFs. The goal is to be able to protect also from noisy neighbors and to guarantee a minimum bandwidth so that your data plane VNFs do not steal all the bandwidth from your control plane and API. And your API networks are not starved. And also, we want to make sure the NUVA scheduler is not overcommitting the NIC. So the VNFs are expected to declare the minimum bandwidths and the NUVA will schedule using bandwidth aware scheduling. Partitioning is another way of bandwidth sharing. So you can distribute the SIOV VFs across the network type. We have bonding support, but it's limited as marked to active backup. So LACP bonding itself requires a single speaker. We've got a minimum bandwidth support now. We support it for SIOV per VF. So that's minimum bandwidth. But if you want maximum bandwidth, this one is still in progress for our VN. We have some patches merged into a Victoria cycle, but some are still in progress. Thank you. And let's look at telco edge DCN options. Today, the highest priority is SIOV. We have VNF vendors looking to deploy with SIOV. And this is a simple, easy to deploy first option. It's already production ready and mature. And we'll give the VNF vendors the latency and performance needs that they need. But it has limited feature set, no live migration, live migration with hotplug and plug, but no switching and no data path switching capabilities, security groups, etc. OVS, DPDK and OVS hardware offload are the next future ones that are needed. And they will provide the switching capabilities, the SLA guarantees as well as zero loss and management options and option to fall back to the V switch for switching or hardware offload with OVS hardware offload. Yeah, basically, OVS hardware offload has a lot of promises because with all the offloaded flows, you almost get a Bermitt or SIOV performance. You can fall back to the canal data pass if you have got capacity issues or if you cannot offload, typically for a comfort plane or the HCP request, this kind of stuff. You get a trade limiting per VF, so lots of good promises, but it's still hard and complex to deploy. You've got limited hardware support for now, not all features are supported, but it is promising. Next slide. Now, let's look at industrial IoT edge requirements and architecture. The main goal here is to reduce the full domain to the edge side and to be able to replicate and reuse a deployment across different factory or retail edge sites. So in this use case, you have both provider, rather provider networks for external connectivity, as well as the need for tenant self-service networks for East-West traffic capabilities. And in this case, as well, you want to be able to replicate your VLANs and your segments and restrict your overlays only to your edge sites. Take a look at our deployment here. We have our router provider network with a VLAN at this site, and then we have floating IPs for our instances identified. And we have two instances here. We have an IoT VM on one node and a database VM on another node. And traffic coming in from the edge side, from the retail side, needs to go to the IoT VM, which in turn will pass data to the database VMs with some East-West traffic. So we have North-South traffic coming in and East-West traffic going over and overlay between IoT and DB VMs. For this use case, the need is to have again distributed routing and DHCP services at the site, local nodes. And you also want to have your, if there is significant amount of routing, your network nodes and your gateway, OVN gateway nodes scheduled on the same site. Each edge site is independent. And if there is network connectivity issues or disconnection temporarily from the central site, we want them to edge sites to continue working independently and reconnect later. Okay. So as mentioned, IoT has a key difference to our previous use case that we have more cell-service networks. So this is typically East-West and fully encapsulated. So things like SLAN or alternative tunnels everywhere. That's also some requirements like the latency to central site should be below 100 milliseconds. Think, for example, of autonomous cars. You do not want the lag to be too long if the car has to make a query to a central site. So same for resilience. You do not want the car to crash if it has network issues or entering a tunnel. We also have some current, currently limitation that we still do a full mesh of Geneva or the excellent tunnels between all the nodes. We do not limit them between availability zones or the needed compute nodes. And we still are, we also need, if we have a temporary connectivity loss to a central site, we also need to recover properly. So you have to configure, check and proper timeouts. And OVN also has an advantage here with incremental processing where it can recover better by using, getting a full diff of what happened between before and after disconnection time and recover quickly. Okay. And to quickly summarize what we've got today with the DCN edge, we have controlled plane provisioning with support for IPv6 as well as routed spine leaf deployments with supernets and DHCP relays as they're not covered. And then we have telco data plane provider network options specifically for telco dual stack capabilities, router provider networks with segmentation, support for availability zone scheduling, fast data path, SRIOV as well as OVSTPK and OVS hardware offload coming soon, distributed services for DHCP as well as SRIOV, DHCP services encryption with TLS everywhere, Nick Partishning and QoS bandwidth guarantees and bandwidth aware scheduling. For industrial IoT, we're adding two more requirements for self-service networks as well as distributed routing at site local routing services. And that's it. Thank you. Thank you.