 Alright, thanks guys. Thanks for joining. My name is Jeff Collins. I'm from Ericsson. I'm joined today with Simon Harman from Netronome, and we're going to talk to you a little bit about Layer 3 tunneling support in OVS, what we've been doing, what we're working on, and where we're at. We're going to start out by by taking a look at where OVS is and where it's going, what kind of requirements and use cases we're taking a look at. We're going to then go into the implementation details, walk us through how we're doing it. Simon's going to take us through that. We'll then do a brief demo, take a look at some performance figures, and what kind of changes we're going to see coming up, coming up soon. So to start out, OVS, primarily, usually, typically a Layer 2 type device, right? Usually we just need to be able to get some kind of Layer 2 service provided into the VMs, which works great, solves most of the use cases that are typically out there today, and then if you needed to do any kind of additional Layer 3 functionality, that was typically done on the Linux OS itself, like IP forwarding, IP tables, things like that, right? When it came to the Layer 2 forwarding, there were typically two different types of implementations that we took a look at. One of them was where we provided just straight VLANs up to the VMs through the Linux kernel, or we then started out with VXLAN tunnels, right? But now, now we need to start doing a little bit more capabilities, more Layer 3 type functionality within OVS itself. When you take a look at how OVS is being used today, it's pretty much been kind of the de facto standard when it comes to doing your virtual bridging. It's been used by a lot of different SDN controllers that are out there today. Looking at Open Daylight with NetVirt, that's one that Ericsson supports and does a lot of development in. OVN, Nuage, a lot of other SDN controllers use OVS. So it's a pretty critical piece of functionality to be able to connect your VMs up into your networking aspects, okay? Now, what's important here is OVS is really becoming the place where we're going to implement all these new technologies here. Layer 3 forwarding is being done in OVS now, NAT, firewalls, VPNs as a service, things like that. It really gives us a nice opportunity now to define, to look at how we're going to accelerate OVS, how we bring in these different tunneling technologies, right? And we can do that by doing things like DPDK, hardware offloading, really to address some of the use cases that are coming up. Now, the first one that we focused on, L3VPN, doing MPLS over GRE, that was our primary use case that we've been working on. That one's fully developed now and fully complete. MPLS over UDP, that's one that's pretty well near the end of completion now and should be provided back up soon. The other use cases that we're really working towards, how we do VPN as a service, being able to take IPsec and be able to provide that into the VM itself. Lisp is another one that's being driven, IP over GRE itself. One of the more popular ones out there right now is SFC, Service Function Chain Project with NHS, right? That one's getting a lot of popularity in how we implement that into OVS itself. Now, VXLine over GPE, this is a really interesting one. It's a new one that's picking up traction because not only do we need the flexibility to be able to use kind of whatever tunnel we want, but we need the flexibility and really the diversity to be able to do whatever we want inside of the tunnel as well. Some of the protocols that we can do that with GRE, VXLine over GPE that we just mentioned, the Geneva. When it comes to the tunnel, we need to be able to be very flexible with the tunnel, very versatile with the tunnels, as well as what's going on within the tunnel itself as well. How we provide that integration between the tunnel and the OVS pipeline, both internal and external tunnels to OVS. And then how we offload that and how we support both Layer 2 and Layer 3 functionality, both in software as well as in the hardware itself. Now it comes to the implementation and Simon's going to walk us through some of the details and how we're doing that. Yeah, so I wanted to talk a little bit today about some work that's been going on in this area. So this is mainly focused on the MPL of Silver GRE, but it also solves some of the important problems that would be needed to make other parts of work. So the scope of the work so far is fairly limited, I suppose. We're looking at both data paths that have supported the Linux data path and the user space data path, so that's both with and without dbtk. But on the encapsulation side, we're just looking at the GRE doing GRE, and this enables us to do MPLS over GRE. So an important thing that I'd like to point out before I go into a bit more detail is, from my point of view, the OVS is really it's a mechanism, so it allows you to do lots of different things, but it doesn't really enforce the policy because this is bubbled up to the higher level, so your open flow rules and so on would express the policy. And you can implement policy using the mechanism that might be out of spec if you want to. So I'm going to start this section by giving a little bit of background about some key concepts, basically the parts of the code we had to touch in order to implement layer 3 tunneling. And the first of these is the tunnel vports. So the thing about tunnel vports is it's maybe a little bit different to the non-tunnel variety, is that the encapsulation and decapsulation is done in the tunnel vports, and this is out of scope of open flow or the rules themselves. It's like a black box if you like. And there are a few different varieties of these tunnel vports, and there's the version that goes into the kernel data path, and these are backed by normal net devs. And here the basically receives the net dev will decapsulate the packet and then seed some metadata which describes where it came from, so the outer, describing the former outer headers. And on transmitted it's the opposite. Some metadata is supplied describing what the outer, the encapsulation header should look like, and then the net dev performs the encapsulation based on that information. In user space, with the user space data path things look a little bit differently. So here we arrange things such that the ingress and the outgress, the ingress and egress of the encapsulated frames actually incurred on a different bridge to the decapsulated frames. Now the kernel data path can also support this configuration, but for the user space mode it's mandatory. And so then on this separate bridge we have separate rules which match the ingress and ingress packets and perform push and pop actions accordingly. And so this is the path that's quite different for the kernel. There's no push and pop tunnel actions in the kernel. But like the kernel it's based on metadata, so after decapsulation when the decapsulation, decapsulated packet is accompanied with metadata which describes what the tunnel header used to look like, and on transmit the opposite case. So there's sort of three areas of the code that we had to touch, broadly speaking, to implement this. One is to distinguish between layer two and layer three reports. The other one is two actions, a push and pop ethernet. And lastly some attributes to be able to recognize a flow whether or not it's L2 or L3. So I'll do the V ports first. So there was a few different iterations of this on the implementation side, but the one that seems to be the winning approach is to simply have a mode for the V port. And so in this model the default mode is layer two which is the behavior of all V ports up to now. And then we add a new mode which is layer three. Then we have these actions, the push and pop ethernet. So conceptually this is quite simple. If push or ethernet basically takes a packet and adds an ethernet frame and pop ethernet does the opposite. Due to some... It's not very clear how VLAN should be handled in such a case. So for now this has been deemed to be out of scope, although earlier versions of the code tried to handle it. MPLS is neatly left alone for now, which is nice because MPLS is often a headache. And these push and pop ethernet actions are not exposed up to open flows. They're just internal actions which are managed by the obvious user space at this time. So you can see them when you dump the flows, but you can't add them or remove them as such using rules. And lastly the attributes. So we'd like to be able to distinguish between a packet which is layer two and layer three. So this is basically done through the absence of the ethernet attribute. So the ethernet attribute here refers to the source and destination MAC addresses. And the ether type is the ethernet type. So that's present regardless and you might be thinking, well how can you have an ethernet type if it's L3? And well this is because in encapsulation formats that allow this carrying of L3 packets they usually also allow describing what the packet is. Otherwise you can't interpret it, you don't know what it is. So in the case of GRE which is the focus of this worker there will be the protocol type in the GRE header. So I just included a picture of the GRE header here. So this allows us to pretend if you like that the packet is layer two even though it's not. So OBS is aware of which V ports in the system are running in mode two or layer two mode or layer three mode. It's also always aware of obviously which port a packet arrived on. And so therefore it's aware of whether or not a packet is L2 or L3 both on the import side and also knows on the output side whether or not the V port is L2 or L3. So because of this simple amount of knowledge it's able to determine when it needs to do a push Ethernet or a pop Ethernet. So if it has an L2 packet and it's putting it out to an L3 V port obviously at that time it needs to pop the Ethernet off and conversely. So I didn't draw all the possible combinations here but these are maybe the two more interesting ones. So we go from layer three and we pop, sorry, go from layer three and we push an Ethernet on and then we can output it to a layer two V port and the compass. So the V ports is a little bit confusing topic for me at least. There's sort of three different things inside OBS which are called V ports. And so we have the non-data path portion of user space. It has a notion of V ports and the kernel data path and the user space data path both have their own implementation of V ports and none of these three things are the same at least from an implementation point of view. So with the modifications being done for layer three the V ports in user space have this new layer three flag and this means that data path V ports of the same type for example GRE may actually be running in two different modes at the same time. So this flag only exists in the user space, the non-data path code, not the data path code itself. So to handle this on the kernel data path some switching around occurred a couple of kernel versions ago. So IP GRE is usually used for L3 GRE and GRE type is usually used for GRE L2 GRE and it's kind of switched around and now IP GRE can handle both although OBS will be the first user of this and there are none others in the pipeline. So this basically facilitates allowing the same data path V port to pass packets both with and without MAC addresses, MAC headers in the encapsulated packet. So the reason that this approach was taken rather than another one which I implemented which was to have separate V ports was there's a concern that there could be an explosion in a number of different types of V ports. So if we consider the GRE case where there's L2 and L3 suddenly we've just doubled it but what if there was another new feature that came along and suddenly we had to double it again to 4 and with this approach it's nice to contain just 2.1. So on the user space V ports, the way it was handled there is so these are more flow based and so I added a new flow attribute, the next base layer and this essentially when a packet is decapsulated in the bridge which handles a decapsulation the resulting flow would have this new attribute present and then when that arrives in the next bridge in the chain it will be able to read that and determine what to do with it. So I have to say that this is probably the weakest part of the design but it does work. So far as configuration goes things are very simple so the first four lines which is a complex part is the part that you need regardless of if you're doing layer 2 or layer 3 and then with just the part in bold the very last line you flip the V port from layer 2 to layer 3 by saying true and that's kind of a nice part of this design that's come out. So possible future work Jeff kind of covered most of these already but the hard part about this is teaching obvious to understand how to deal with layer 3 packets. Once that's done we can support many different combinations. So MPLS over IP is one MPLS over UDP, NSH which relates to virtual service chaining and so on and LISP is kind of interesting on this list because the original work that was done in this area was actually the motivation was to enable LISP to be supported in the kernel data path where it's not supported. This is not included in the current patch sets only because it's not the focus but the infrastructures that's there should afford reaching that goal quite easily I believe. And I'd just like to close out by giving some credits because I by no means done all the work in this area. Loran Jacob he did the original work and Thomas Marin also did some work in this area and Edie Bank has done a lot of work in this area where he boasted the updates to the kernel data path, sorry the IPGRE changes that I mentioned earlier so changing the tunneling implementation in the kernel to allow this to be supported and he's actually now taken over trying to get the data path changes merged. Now it's demo time. Over to Jeff. Thank you. So OBS 2.5, that's the version that we're working on right now. In terms of products supported, Ericsson's Cloud SDN switch, OBS 2.5 base is available today. We're going to take a look at some performance numbers on that coming up. Agilo version 2.2 this is the hardware accelerator offloading card. That's what we're going to be demoing here coming up here in a second. When it comes to the SDN controller that's actually going to be programming the flow entries into the Netronome SmartNIC here. We're going to be running Beryllium, sorry. Then when it comes to setup of the DC gateway, there's not really any kind of hard requirements there. It works well with pretty much any kind of data center gateway you can think of, Cisco ASR9K, Juniper MX routers. There's not really any kind of preference there. It's up to you on how you want to implement that. The demo setup, we've got two different compute blades. Each one is representing their own data center, their own cloud instance. Pushdown, again, flows are pushed down through two different SDN controllers. Again, running Beryllium there. Then we have MPLS GRE that's going to be connecting and sending the traffic back and forth between the two computes. They're connected over a 40 gig Netronome Agilo SmartNIC here. Just a picture. I'm going to switch over to the demo now. Really quick demo, especially since we're just going to see traffic flowing. This is going to ramp up here in a second. Running on each one of these computes, we have one VM on each one. Each VM on there has four different VNICs that are going to be attached to it. We're running 128 byte packet size in this example here. We've got 10,000 flow entries that are running on each one. What we can see here is that's capping out right around 20 million packets per second. That's where it's leveling off. I'm just going to stop that and switch back over. Throughput comparison, performance numbers, what are we looking at? We've got running across the bottom. That's Ericsson's cloud SDN switch. Again, that's OVS 2.5. The numbers you're looking at here with a single core, 64-bit byte packet size. Again, doing MPLS over GRE. It's not just a basic VLAN or VXLAN. It has one more match and swap that it has to do here. It's a more fairly complex flow. When we look between 1,000 to 10,000 flow entries around there, we're seeing now a little over 2.4 million packets per second. That's with one core dedicated for your packet processing, running on OVS here. Across the middle, sitting around 11 to 12 million packets per second, this is running with an Agilovirt IO relay. This is an example. If you have a VM that doesn't have a native driver built into it, we can intercept that packet and still be able to take advantage of the SmartNIC or hardware offloading. The most impressive, which is the one that we were just taking a look at there in the demo, the very top line there, that's almost 20 million packets per second. That's with using the full hardware offload acceleration there with the Netronomogilov SmartNIC. Focusing in on that, looking at just the numbers of the 40-gig SmartNIC here and comparing that to where the theoretical maximum is, we're able to hit line rate up to about 500 byte packet size. And then up to that is where we pretty much stick around the 20 million packets per second. So in summary, what we talked about, we took a look at OVS. We've pretty well seen here that it's kind of the de facto standard and everything here is moving towards being integrated into OVS from both Layer 3 and Layer 2 functionality, which has been there for quite a while. It's been pretty well accepted across majority of the vendor SDN controllers that are out there today. And as we do this implementation, as we work on providing the Layer 3 support, making sure we have the flexibility there from the tunnel support into OVS itself. Our latest enhancements are going to be available within OVS 2.7, our target and Linux distribution 4.10, hopefully. Patch is already integrated into the Netronomogilov SmartNICs as well as the Ericsson Cloud SDN switch that are available today. And then we took a look at the performance numbers there where we saw with hardware offloading, we were able to get around 19 to 20 million packets per second. And then taking a look at the Cloud SDN switch, without hardware offloading, we were seeing around 2.4 million packets per second on a DPDK-enabled data path. Okay? All right. Any questions, guys? Oh, hang on. I think we have a mic here. The question is, is this going to be a new plugin or it's native to OVS? Or a new Tron plugin or some kind of? So the OVS portion, the aim is to have it integrated into the main code. Any other questions? You guys are going to keep it easy? Okay, there we go. Any plan to support R2TP v3, Layer 2 vp v3, order 3? Lay it, lay it. Sorry, one more time. R2TP v3. R2TP v3. Not at this time. Not yet anyway. You've mentioned hardware offloading genes, which in particular SR IOV or, I don't know, we're using for 20 million packets per second. That one's an SR IOV. Up front? Just to understand the changes that were made to enable support. Layer 3. When it comes to Linux kernel changes, those are not needed. So those Linux version 4.10 kernel is not needed in case you are using OVS, user space, data path, and let's say DPK acceleration. So there is no Linux kernel. Any other questions? Thanks guys. Thank you very much. Thank you.