 Hey folks, my name is Greg Alkenvard. I'm Senior Technical Director from Mirantis. I'm here today to present you our efforts in accelerating the basic throughput of the VM using Neutron and the SRV pass-through. We've done it with several different interfaces and during this presentation will describe what we did and why. All right, so first of all, let's go through the agenda of the presentation. I'm going to cover the use cases. Primarily, this is designed to handle NFV type of traffic, traffic that needs high performance and low latency networks. I'm going to present you a quick history of the SRV development and why we went ahead and did the work with the 82559 chip, as well as what are the other options. Some of them are available from the folks on the floor, some of them Mirantis is working on in concert with folks like Intel. And I'll give you a brief comparison as to when you should focus on which option. After that, we'll have some time for questions. All right, so first of all, we've been working in the Telco NFV sector for a while and we identified that there is roughly two design patterns that people want to deploy. In the first design pattern, you will have a large backend data center providing mostly OSS, BSS services, and then all of your network heavy throughput devices get pushed into pops out there on the edge. So in this, you have relatively small regional clouds. They're close to the client to remove any residual latency. And the keys to such a cluster are efficiency, performance, and cost. The larger backend data center that implements your OSS, BSS, and some of the less network's latency sensitive functions will have a larger multi-purpose typically centralized cloud. And the key to success there is controlling the flexibility and the cost. In addition to the Telco NFV use case, there is a lot of NFV needs in the enterprise. These typically adapt to removing of the traditional network devices and converting them to low footprint, low cost VMs. Things like router as a service, firewall as a service, load balancer as a service, IDS, application firewall, the typical things that the IT departments need to deploy. Now the deployment pattern here is relatively similar to what you will see in the Telco backend case. So you'll have a large multi-purpose centralized cloud. And the key trade-offs that you need to make are the flexibility of the design and the cost. All right, so SRV has been around in OpenStack. The work got started originally with the PCI pass-through effort done by multiple companies. But the SRV itself got started around Essex with the changes in Nova and Neutron and pretty much got completed by Icehouse. It's mostly been focused on getting a Nick pass-through without regard to multi-tenancy. Most of the efforts are focused on flat networking environments. However, a few companies, Melanox in particular, came in and said, well, we can offer intelligent processing on our Nick. So let's go ahead and introduce SRV with multi-tenant networks. The original driver was introduced in Folsom, got upstreamed in Havana, and integrated with the generic SRV framework in Icehouse. So Mirantis have been working with the driver for a while. We originally deployed it in the Havana timeframe and integrated into our own OpenStack installer called Fuel around the Icehouse. Completely unoptimized numbers. We got 23 gigabits just by bringing up the systems. This was on a completely unoptimized guest. So if you have a DP-DK-enabled guest, you can get 40 to 80 throughput relatively easily. Now, the problem with Melanox is that you needed an extra plug-in Nick that added an extra cost. Some of our environments, especially in the telcos, are in the blade and the hyperscale chassis where there was simply no room to plug in an extra PCI device. So people have been asking us for a while to develop a solution which would allow them to use one of the long devices and achieve similar performance optimizations. So what we've selected is the Intel 82599. It's a relatively common chip that's available in servers out there. We haven't played with the updated Intel chip. I'm sure it has similar capabilities and sometime during this year we'll enable them as well. But this work focused on a particular design for an OEM customer of ours, frankly. So it focused on this particular chip. The chip has a built-in packet classifier that can support addition and removal of VLANs and has basic anti-spoofing capabilities. This is just good enough to enable multi-tenancy. We did our changes on the Juno and we'll be upstreaming them in Liberty, hopefully if the community accepts the changes. The changes focused on getting the multi-tenant networks to work and getting network multipath to work in the VM. So honestly, this is one of the areas that kind of been neglected. Most of the network devices out there have more than a single interface. They have two or three interfaces to enable one interface to be the outbound, one inbound, or to enable the network HA and getting these mapped to physical interfaces has not been addressed before. So we decided to add that code to OpenStack. All right, so for the multi-tenant network, the NIC virtual function gets a Mac and the VLAN assigned. The VLAN will be removed on the incoming packets before being forwarded to the VM. Now, all of this is done by the hardware, right? There's no software switching involved. And the VLAN is added to the outgoing packet, so fully preserving the ability to interoperate with OVS, VLAN, or other ML2 plugins. The anti-spoof is optional. We do have a way to either turn it on or turn it off. If it's on, it will check both the VLAN and the Mac of the outgoing packet. But that means that you cannot do things like migrate the Mac from one SRV device to the other. And we needed to preserve that ability to allow for failover of the network interfaces. So optionally, we can either turn on or turn off the anti-spoofing filter in the Intel chip. Unfortunately, there is no firewall because all of the capabilities are what's only in the chip. So there is no security group support. You can enable security groups by controlling the ACLs on the switchboard front-enders device. So we haven't done the work, but we know how to do it. We've done similar work in the past. So the Juno code base actually mostly worked. They just needed a few tweaks to the XGBE driver in order to selectively enable or disable the anti-spoofing filter. Okay, so the network multi-path actually required a little bit more code. So typically, like I mentioned before, the physical devices have multiple ports. Primarily use cases, HA or segregation of networks. Very few one-arm routers or one-arm load balances exist. Most of them will be in the pass-through mode. We needed to model this in the virtual world as well. So what we've done is we created a concept called device groups. Essentially what we do is we allocate VFs from different PFs and allow these to be assigned to a single guest. There are some updates to the NOVA scheduler. I've listed the proposed changes. Now they were done for Kilo. We're going to resubmit them for Liberty after we come back from the summit. We do plan with the community's help to upstream the stuff in for Liberty. Unfortunately, multi-path is supported, but you can't really enable LACP. Multiple LACP groups on the same physical link runs into some limitations of the switches. So most switches out there won't support it. You may find an occasional switch that does, but most switches out there won't support it. So the failover modes are either Active Passive or Linux Mode 6, which plays few games, so no mode for LACP. All right, so quick performance results. What we wanted to do, this is not an extensive performance study. What we wanted to do is we wanted to figure out if the work we've done is more efficient than just using OVS. So we tested both the small packet performance and the large packet performance. And yes, yes, I know that MSS64 is not a 64-byte packet. Thank you. Already been pointed that out. All right, so we got a 50% throughput gain on the small packet sizes while using 40% less CPU. Now we're using M1 Medium, a single core instance, so which means essentially this test is largely CPU bound on the guest. If we went to a larger guest or a much more impressive test harness, which unfortunately we didn't have at the time we were preparing for this, we would be able to get relatively close to wire speed on the small packets. For the normal packet sizes, we got a four and a half times improvement in throughput versus OVS with 35% CPU less used. Again, this is largely guest bound, so OVS will be able to deliver in a single stream performance roughly five gigabits on a 10 gigabit interface, but at utilizing significantly more CPU. So we're planned to rerun this test and publish a much more extensive blog on how to use all of the stuff sometime in the summer. All right, so I discussed what we've done, but obviously there's always alternatives to the design. So one of the more popular alternatives is a DPDPA-enabled virtual switch. It addresses your need for more flexibility, so maybe VLANs are not enough, maybe you need VXLAN or something else. You do need security groups to enable true multi-tenancy. You possibly need to do other transformations such as assigning diff serve tags or other things on the packet. So software switch is inherently more flexible, which is why software switch is the most popular option for virtualization out there. So there is a couple of options in the DPDK. Intel DPDK, there is two DPDK releases, I should say. There's a side branch that Intel has maintained up to version two, and now they're working on upstreaming the code in. We've tested their initial OVS upstreaming effort, and it does work. It works pretty well, and we're working to incorporating the DPDK-enabled OVS into the Merantys release sometime towards the end of the year. So currently the security groups are still not enabled in the Intel version that's in the mainline of the OVS branch, but Intel is working on it, so that should be available by the end of the year. Now in addition to the Intel's DPDK, which is available if you just grab the latest OVS, you have Sixwind. It's a company that ships a commercial-based OVS with DPDK support, and they have a fairly good performance benchmarks that they've released. So there are other options out there available. If you need something now and you're not willing to wait until the community bits for the OVS get stabilized. So quick comparison, when would you use one versus the other? DPDK will impose a higher CPU load. It'll potentially consume multiple cores just to process one or two 10 gigabit interfaces. You have a relatively higher degree of hardware independence. You still need a DPDK driver on your Linux, but lots and lots of network cards are supported, certainly more than with our SRV approach. And then you have a greater flexibility to use VXLAN to apply diff serve to do other packet manipulations. It will impose a bit more latency, not much, but a bit more latency will be imposed because there is some context switching involved in order to get the packet through the system. And it's typically going to be used as a back-end data center in the telco use case or in the enterprise NFE use case, where the key success criteria is flexibility. SRV will have a significantly lower CPU overhead. It will have low latency. You're essentially talking to the PCI device directly from your VM. It is hardware specific. So we got, we obviously Melanox has been incorporated in fuel. Now the Intel 8259 support will be available in MOS and then shortly in fuel. But it is hardware specific, so you need to develop the work on the specific drivers. Now we talked to Qlogic, who now seems to own the Broadcom chip set. Their chip seems to have similar capabilities. We just haven't gotten around to playing with it enough to get it to work. So eventually we'll get the Broadcom slash Qlogic chip to work as well. And we'll go look at the latest Intel 40 and others to make sure that we can get them to work in this fashion. Now some of the latest 40s claim to have also the VXLAN offload, so maybe we'll even get a VXLAN encapsulation working in addition to VLAN. So the, but the typical use case for this SRV is when you want to build these small pop clouds where you have a small relatively tightly managed cluster where the additional CPU load of the DPDK approach is simply not acceptable. All right. So possible future extensions. Well, we thought about it, about how we'd combine it with a larger universe. And one of the most obvious use cases would be to combine this with ACI, which will allow you to do VLAN to VXLAN. And then all of a sudden now you can run this in your big backend cloud. So pretty cool. We haven't tested the bits yet, but we plan on doing so. Obviously Cisco ACI is not the only SDN that can provide you the big cloud expansion story. So you can use it with the big switch, open flow base forwarding as well. And it should be possible to you to extend either the cumulus or the Rista plugins to provide the similar VLAN to VXLAN transformations to allow for this to run on L3 data center fabrics. All right. So that's pretty much on the presentation. The links are available. I'll post the presentation out on the Miranda site. So you don't need to sit here and take the snapshots, but welcome any questions you may have. One once, twice. All right. Everybody understands everything. All right. If you have any follow-up questions, feel free to contact me. Thank you.