 Hello everyone, I'm Ivar Lazaro, software engineer from Cisco. I'm Miguel from Red Cup and I'm a software engineer in Red Cup. I'm Justin Pettit, I work on OpenV switch for VMware. And I'm Thomas Kroff, I work for Cisco as well and I work on OVS and the Linux kernel. Today we're going to talk to you about Neutron and how to improve data path performance with OpenV switch solution for security groups. So at a very high level this discussion will go through a problem statement and a possible solution to this problem. And more importantly we're going to show you the performance result which will compare our solution with the initial problem. So let's start from the beginning. Whenever you create a virtual machine today in OpenStack using Neutron ML 2 OpenV switch driver, your compute node looks more or less like this. And you can clearly see that there is something wrong with this and more specifically what you have is a couple of OpenV switch. So one is for the uplink and the other one is the integration bridge. And when you trigger the VM creation, Nova is going to create the VEAT pair and one side of this pair will be attached to the integration bridge and the other side will be attached to one Linux bridge per VM. So you can see there is an extra layer of indirection here. And on top of all this there's going to be a top interface or even an extra VEAT depending on what you're touching to these bridges and they will connect the VM to the Linux bridge. So for each VM you create, you're going to create from four to five network devices. And definitely when you see this you may ask why you are doing this. And you may think that the answer is this one, but in reality the answer is security groups. So security groups in Neutron is a way to provide port level security to Neutron users, which means that the filtering rules you specify with security groups are brought as close as possible to the virtual machine itself. And these architectures you use so that you can push your IP table rules in these intermediate Linux bridges and open floor rules in the integration bridge for tenant isolation. And then you can ask why not doing all this with OpenVswitch and the open floor rules. Well, the reason is that at the time this was implemented in the Neutron community there were not enough tools from OpenVswitch to make a solution which was in future parity with what IP tables would offer. But you will know more about this throughout the presentation. So how can we stack things properly in this architecture? One possible solution for instance could be to remove that extra layer of bridges so that when you create your VMs or even containers, you can use just one single device for attaching them to the integration bridge and using open floor rules for implementing security group with the new tools that OpenVswitch can offer us. But before going further, let's go with some background with Justin. So first I wanted to just go over quickly what OpenVswitch is. I think most people are probably familiar so we'll do this fairly quickly. But OpenVswitch is a virtual switch that works at multiple layers of the OSI layer and it has extensive programming capabilities. So to program how you want to treat flow behavior, you can use the open floor protocol and we support all versions of open floor although not all the features are necessarily supported and a number of vendor extensions. And actually the vendor extensions are one of the things that we'll be talking about today to enable the connection tracking. Probably the biggest use case for OBS is supporting tunnels so that we can create overlay networks. And in addition to using open flow to configure the flow table, we have this OBS DB protocol that can be used to remotely manage other aspects for example creating bridges and tunnels. And then it has extensive monitoring capabilities as well. And I wanted to mention a new project, OVN, for Open Virtual Network or we call it OVEN. And so this is a new project that we've created that will provide virtual networking for OBS. And it's made by the same team that is made OBS and it will have it work on a number of different platforms. So it will work on Linux, Hyper-V, and we've designed it from the beginning to work with containers as well. And so the point of OVN is to create virtual network as I mentioned. And so we'll have support for logical switches and routers, ACLs and software and hardware based logical to physical gateways. But sort of the most interest to this group is that from the beginning we'll be using this new connection tracking based model to implement security groups, which will have much better performance and security than what's currently available. And it works of course with OpenStack as well as other CMSs. So to implement a firewall in OBS, and this is why we ended up with that model that was shown before where we had to go through IP tables, that OpenFlow is really designed just to work with stateless, just as stateless matches. So you can match on what's in a particular field for packet and maybe some metadata. But that doesn't work for something like a firewall where you need state. So to implement a firewall in OBS, in the past there were basically two ways that you could do it. You could implement it matching on the TCP flags. So you could, for example, enforce your policy on the SIN packets and then allow all AC and reset packets through. And so this works pretty well because you can create mega flows, which are a way of doing wild carding in the kernel. But the problem is that it's not really a great security solution because you're letting all AC and resets through. You're not just allowing connections that had been allowed previously. Another option, which is a little bit more secure, is that OBS supports this learn action, which allows you to insert new flows into an existing OpenFlow table. And the way that this works is that when a packet comes in that you want to allow, you look at the ephemeral port, so for example the TCP source port, and then you create a reverse flow that will allow the return traffic through. And this works a lot better because you're only allowing previously established connections in. But the problem is that this is really slow because we can't do things like mega flows. We can't do wild carding in the kernel, so every new flow has to go to user space. And this can affect performance by orders of magnitudes for flow setup. And neither of these work with things like allowing FTP data connections through and they don't do things like enforce the TCP window. So Linux has this feature called Contract. It's a module within the kernel. And this is actually what IP Tables uses as well. And its job is fairly simple. All it needs to do is... Well, it keeps track of all of the connection entries. It's a very specific job. It doesn't enforce policies. It just says, is this a new connection? Or is this a previously established connection? Or is this a related connection? So as I mentioned, the FTP example, maybe it's the FTP data channel. And so what we can do is use OBS by... We can add the state of the connection to the metadata of the packet that we're looking at. And so this kind of allows us to do stateless matching on something that's stateful. And this is how the proposed solution works. And so what we will do is there will be a new action to send to this connection tracking module. And then we will kick the packet back to OBS to have another look at it. This time with the connection status bits. And so I don't think I need to preach why it's good to do things at the edge. But it's much better to... Better performance doing this at the edge. You don't have a central choke point. And you have better visibility because you're doing the detection closer to the source of the traffic as opposed to further away. So you have more context. And so we actually have a working prototype of this. This will be going into OBS 2.4. The code's actually been ready for quite a while, but we've been working with the Linux kernel community to get this upstreamed. Because while we have control of the OBS user space parts, the kernel is actually maintained by the kernel community. And so while we can make suggestions about what's included, it's not ultimately our choice, but we think we're making some progress there. So here's just a diagram of what the flow of the traffic will look like. So if you have here in one, the traffic will come in and hit this OBS flow table. And the OBS flow table will say, okay, well, we want to enforce a security policy here. But we need to know what the state of the connection is. So then through this contract action, we will send it to the net filter connection tracker. And that keeps... That has a hash table over here that maintains the state of all of the connections. And so now when the packet will go back to the OBS flow table, and this time it will have the connection state set. So we'll know if it's a new connection or established connection. And what we do is then we kick the packet back to the OBS flow table now with those connection status bits set. And now we can enforce the policy and say if it's in a new state and we don't want to allow the traffic from the source, then we can enforce that policy. And one of the features that we use that the contract supports is this idea of multiple zones. And this allows us to be able to have overlapping addresses which you need in an overlay network. So if you have two IP addresses that are the same, you won't have them conflict with each other. Okay, so I'm going to explain how we put this all together for the neutron security groups. The work is based on the original proposal by Amélie Sadovi. I don't know if he's around. If you're around, thank you. And you have the code for the profile concept available on the second URL. That's a great review. Basically, we implement the firewall driver interface that is used on the OBS agent of Neutron. We have methods to update security group rules and vendors. We have methods to manipulate and create firewall rules for new ports. And this module basically translates all those methods and calls into open-flow rules with the special connection tracking rules. We actually made open-flow rules in parity with the current state of APT walls filtering. We have all the anti-spoofing rules for port security. And we also have a source mark filtering that is not yet available for APT walls. And I'm going to put a little example without going into too much detail with two VMs talking to each other with a typical HTTP connection. The VM one is a security group where all the outgoing traffic is allowed. And VM two is a security group where only port 80 is allowed. So we have a shortcut for ARP rules, for ARP traffic with some filters to avoid spoofing. So they can learn each other's IP address. And here's where I won't go into too much detail. You can see it on the slide later. But basically for the first packet going out from one VM to another, it will have to match the specific rule that is allowing the traffic and we put that in the connection tracking table. And then this packet goes to the second VM. In this case, everything is on the same host. So it will have to match the incoming rule that allows the destination port and that it sends to VM two. So later traffic, the outgoing traffic that normally will then be allowed for VM two because it doesn't have an output filter to allow anything out, will go through the... Okay, one slide right now. But basically it will hit these rules saying that any traffic that is tracked with an established or related state will go to the destination port. It will act as a normal switch. So everything goes pretty fast. And thanks, Miguel. We've seen that we have a possible way to simplify how the bridge model will look like on the compute node. And we also saw that we'll be able to solve all of this with OBS and later on OVN. But none of that would matter if the performance numbers aren't right. And I think this is what most people came here for. Like, how are the numbers? Unfortunately, they're looking pretty good. But before we go into the numbers, I want to explain what exactly we measure because comparing numbers is very difficult and we have to know what kind of setup we're talking about given the numbers I'm about to show you. So basically we have two compute nodes and we're running net-perf and corresponding net-server pair on each of the compute nodes. The compute nodes are an IV bridge 24 core machine, two soccer machine, 3.5 gigahertz CPUs. We're running REL7 3.10 kernel and we have eliminated the virtualization overhead. And I'm going to explain why. What we really want to measure is the performance of the actual packet processing and security group enforcement and not the path in and out of the VM. So the numbers here are net-perf and net-server running bare metal on the actual compute node. So this would be what you would receive by running a container and not necessarily an actual VM of the compute node. So we will not see these numbers if you run VMs. We will have additional virtualization overhead, in particular if you're using Votaio and software mode. We're always showing two different numbers. Local, meaning packet goes over the loopback. This would be one container talking to another container on the same compute node or one VM to another VM on the same compute node. And multi-node, which is across the network where we have a 10 gig NIC. We're doing TCP stream tests for various packet sizes and we're doing TCP requests, number of requests per second. So the stream is measuring bandwidth or throughput and the request per second is measuring latency. The lower the latency, the more requests per second you can get through. All right, enough explanation. Let's dig into the numbers. On all the slides, we'll always see a dark blue and a light blue. The dark blue is the existing security enforcement of security groups through IP tables and the light blue is a pure OVS solution. The bar is, in this case, representing the bandwidth in megabits and the line is representing the number of CPU cycles spent. So for the bar, higher is better and for the line, lower is better because we are spending less cycle per megabit. And what we can see here that OVS is doing better on all packet sizes. So from 64 bytes, minimal packet sizes all the way to 64K jumbo frames doing GSO. We're seeing 22 megabit... Gigabit, sorry, gigabit, 22,000 megabit on a local TCP stream. So OVS is doing fairly better. And on the left side, for the smaller numbers, you can see another essential piece, which is OVS is doing better and at the same time consuming less CPU cycles. Consuming less CPU cycle means that those cycles are available for your actual workloads. So we're not wasting them for actual packet processing. So this is local and it's a single net per thread. That's one core sending packets, one net per thread sending packets. Obviously, if you have multiple containers, you would have multiple core sending. So this is what this slide is representing. We're seeing for 16 net per thread sending local, we're seeing OVS is doing extremely better for large packets, but we're also seeing something that we're still looking into, which is this piece here. For certain packet sizes, the OVS solution is doing worse. We're looking into this and we actually noticed that on a different system, we didn't see this glitch. This is definitely something that we need to look into before we can get this merged. This is the only occasion we found where OVS was not doing better. So this is local traveling again. Let's go into multi-nodes. So this is going over the network and over a 10-gig nick. And you can see we're hitting line rate at the size of 512 bytes, size packets. Interestingly, like both the IP table space solution and the OVS solution are doing the same performance, but the OVS solution is doing two cycles. And for smaller packet sizes, that's the first two bars, OVS is doing better and even consuming less cycles per megabit. TCP requests, that's a number, that's a TCP request being sent to the server, sent back, and then the connection is closed and then the next request is being sent. So this is measuring latency. And we can see in this case if I'm 64 bytes or 64K, we can see OVS is doing better for all of them and consuming considerably less CPU cycles as well. This is TCP requests scale, like how does it scale using one core, four cores, eight cores, 16 cores? OVS is doing better at all of them. And in this case we can see something very interesting, that as we scale up, IP table space is always doing about the same degree worse than 16 cores. Remember this is a 24 core machine. So we have 16 cores sending and 16 cores or 16 tasks receiving, meaning we have a total of 32 net per threads sending and receiving. And for the IP tables case, there seems to be some kind of contention issue because the performance is actually worse with 16 cores or 16 tasks than with eight. While OVS is scaling up, you can see this clearly, it's exploding at 16 cores. Whereas with OVS, it's fairly ever. Multi-node, this is sending TCP requests over the NIC. Obviously the NIC is putting an upper cap here. You can see OVS is doing better for all packet sizes. You can see the larger the packet size gets, the more CPU cycles we need to spend, obviously, because there is more data to send. Same required, same numbers, but for number of cores, like how does it scale? Again, we can see OVS is scaling nicely. OVS uses the same number of, almost the same number of cycles per request. We got less of the number of cores utilized. Whereas for the IP table solution, the number of cycles consumed is increasing the more cores we use for actual packet sending. OVS space solution scales much better. Conclusion. We have noticed that both throughput and latency is considerably better than the IP table solution, except for that single occasion that I showed in the second performance slide, and we are still investigating that. We're fairly convinced that it's either a bug or it's something specific with that system that will definitely verify and explain that and try to find the root cause before we attempt an upstream merge. If limited by a NIC, OVS is consuming less CPU cycles, which is highly interesting for any kind of compute workload. I'm going to cover next steps, like the kernel side. I'm going to hand it over to Miguel for the other ones. As Justine already mentioned, this work will be available in the 2.4 lease of OVS. There is kernel support that is needed for this feature to work. It's integration of connection tracking with the kernel, and we are working with the upstream communities to get that included. We're going to the standard process and working on the feedback that we are receiving. We're fairly convinced that we'll get it merged in the next couple of weeks. At a point we will, I think, finally release OVS 2.4 and we can go full steam ahead and actually do the OpenStack piece as well. Miguel, do you want to cover the OpenStack next steps? On the neutral side of OpenStack, what we have now is just a proof of concept, so we have a few gaps to cover. We have functional tests to verify that all the security group cases and corner cases work as we want. Currently, we are only handling IPv4 traffic and we have to cover the gap for IPv6. We have we also have a few gaps when we switch port rules, there is, right now we delete the old rules and set the new one, so there is a few instants when you don't have security on your port, and that's not good, but it's just a matter of refining that. And also we need, of course, to propose this to the Neutron upstream community formally. We write down a specification and we all decide how we are going to do it. And despite that we are planning that there is a proposal to put, as we have done for all the other drivers, to put the reference implementation out of the tree, it's still valuable for experimentation or for, yeah, to have another solution, another operating solution. Also, of course, this will have to be integrated into the OVN solution to have port security. Hopefully this will be available for liberty. And as a last note we have a background configuration, if you want to try this on your own compute on your own NBM locally, you can use this bar and definition to deploy RDO-UNO with the back port of these patch and try it locally. It will compile the latest OBS with connection tracking and make everything available to you in the dashboard. And a link to the main OBS project. The connection tracking code currently is in my private repo, but we hopefully once that gets upstreamed in the kernel, we'll merge that into the main OBS repo and into master. And then this was a presentation that Thomas and I did at the OBS conference back in November and it has additional data, a little bit more details about how the work works or how connection tracking works and also some different performance numbers, additional performance numbers. And with that, I think we have just a couple minutes for questions if there are any. I should repeat the question. You just need to switch the the question is that when we use this implementation if we only need to switch the security firewall driver you only need to do that. There are a few changes to the platform to properly communicate to Nova that we don't need the hybrid beef driver but that's all. So far. To integrate with that also keep in mind that there may be parts in the OBS which agent in which you may need to change the priority rules. Because now that everything is living in the integration bridge some priorities may need to change. So there may be small changes in the agents as well. And of course we need to do that but compatibility with the OBS with the rules. The rules will be completely compatible. Of course, we will need the new version of OBS with CT otherwise it won't work. So the question was since OBS is stateless, where are the IP tables rules stored? So there are no IP tables rules. So what we're doing is that they're written in the OpenFlow flow table. So you would write a flow which matches in forest based on the IP address and the port. Which the bottom slide? Yes, there's more details about how the connection tracker works in that. Sorry? That is a present. Oh yeah, it's online too. So if you go to the OBS webpage you can find where there's links to the videos of the conference. So we do analytics on all of the firewalls. All the security events in our enterprise both in the cloud and not in the cloud. So we took the ML2 for OBS that writes the IP tables rules, hacked it up a little bit so that we get IP tables logs of all the drops and all the accepts. Any thought on how you could generate those security event logs in your solution when you don't use IP tables? I think some of those stats will be available through the connection tracker itself, right Thomas? And then I think... I don't want stats, I actually want log events for all the drops and the accepts. Are you about... You're talking about auditing, right? If you drop a packet you want to have an audit. If I drop a packet I want a log event. That currently does not... Like through the audit system that does not exist for OBS. But that is definitely something that we could have. There's an audit system in a couple of FESA links for example and if you drop a packet there's an audit IP tables target that will lock the event that I've dropped a packet in the audit subsystem. That's definitely something that we could implement in OBS through an audit action for example. That currently does not exist though. Thank you. Migration, are you guys looking into it from IP tables to the new implementation? During the... or benchmarks we had of course we had to switch back and forth for both solutions. Online offline migration you switch off the BM all the BMs on a compute node you switch the driver and you restart the OBS agent and you start the BMs again that works. But if you want online that's... There may probably be solutions with live migration so you could place this new solution in an empty host for instance and then you can move the BMs from another one and do it again with live migrations but this is not something we dig into. More questions? So with live migration does Connection Tracker support that? Is there any code you have to add or does that just work? Well there is the since we use the Connection Tracker from the kernel there is the contract utilities as part of that ship it with Linux and so you could imagine something like contract D but I'm not sure of all the implications there probably have to be some open stack integration to make sure that all of the mappings happen correctly. If you wanted to do that we could deploy contract D between the original node and the new node and... Yeah I figured you could do that just wondering if you've done the work yet or anything to do with the timing and so on. So basically we're at the stage where we're presenting the numbers and we're looking for feedback whether we should continue with this work and against... if there is an official driver then migration is the next step obviously. Okay great. Hi. If you're using kernel contract do you know is it a separate database to the one maintained by IP tables because in the past I've had problems where the contractor has seen the packet twice. So for example if there's a reset the state will be torn down and then something else say for example at the moment a Linux bridge IP table sees it again and rejects the packet. So is the contract database being kept by this separate to the one that IP tables keeps? Basically you can have both like as Justin explained there are zones and if you put them into a separate zone you can have a zone for your IP tables and one for OBS but if you put them into the same zone then the OBS connection tracking entries can be the same and is seen by IP tables as well. So you could have for example the state created by OBS and you can match on the state from IP tables. Basically in the next paragraph there are some details to fix but basically every different tenant network goes to a different zone so overlapping APs are not going to be a problem and also connections on the host will be out of any zone so they are independent connection tracking tables. Great, thank you. Possibly as a slight follow up to the last question. Is there any performance characteristics due to the size of the contract table with this work or not? Absolutely but since like the IP table solution and OBS is using the same contract technology that will have the same issues I would say. Thank you. For the upstream kernel work is that going to be available as a kernel module that will work in what we are deploying today CentOS 7.0 or we are going to go back to the bad old days where you have to install custom kernels. Do you want to take it? Should I? So basically it is the same for us any new kernel feature. You need a backboard into your distribution kernel and you need to take it given the attention that is being paid to this feature at this point I think all the distributions will be happy to backport that for you and make it available in your favorite distribution kernel. This is... His question is currently in the neutral implementation, the reference implementation with the IP table there is an issue when you switch rules to a security group that maybe existing connections will still keep flowing because they are in the connection table so we are leveraging the zones of connection tracking and using the same strategy that AP tables is going to have that is still under development that means that when you change a security group with new rules basically if you take out rules it is going to reset the connection tracking table for that zone that of course will break other connections but it is more secure than leaving connections flowing it is still not perfect because you are... I think that in some cases it is going to break some connections but it is really a matter of how the rules are written the connection tracker does not actually have policy in it it just has the state of the connections so if you write an OBS rule that says allow all established connections then it would continue to flow if you wrote it a little bit more specific than that it reflects the new policy so it is established and it matches the policy then it would be enforced so the question was how will this work on other platforms that are not Linux based and so I think the plan is that there will probably be an effort to create a connection tracker that works on other platforms like DPDK and Hyper-V okay, that's good thank you