 I think we're good to go. Yep. Good morning, everybody. Thank you for coming to our talk, Neutron DSCP, Policing Your Network. My name is Nate Johnston. I'm an engineer and developer at Comcast. My name's David Chalkincy. I'm a network software engineer in Intel. Excellent. So let me tell you what we're going to be talking about. First, I'm going to tell you what DSCP is. I'm going to talk about how you use it. David and I are going to talk about how we implemented DSCP control in Neutron QoS. We'll cover what some of the next steps are. And in conclusion, we'll show you some of the resources for DSCP in case you want to do more reading about it. First, what is the DSCP? DSCP is kind of a mouthful of an acronym. It sounds for differentiated services code point, which doesn't really tell you much about what it is. But DSCP is a protocol for controlling network traffic and categorizing network traffic intended to make sure that certain kinds of traffic get in precedence over others. And I'll talk more about what that really means in the use cases about DSCP. But essentially, the primary use of DSCP is to make sure that when you have too much traffic and some needs to get dropped, the traffic that you want to get dropped is dropped, and the traffic that's really important isn't dropped. DSCP is standards defined in RFCs, primarily RFC 2474. And I have a couple RFCs pointed out here, but later I'll go through or I'll show you that there are a great number of RFCs actually that talk about and define DSCP and how it's managed. So what is DSCP? DSCP is a six-bit field in the IP header. There's a particular field called the diff serve field. It's one byte, so eight bits. The top six bits of that define DSCP. So it's very concise. And it exists both in the IPv4 and IPv6 headers. So this is a representation of the header of an IPv4 packet. The diff serve area you'll see is in red there. You'll note that it's really close to the beginning. This indicates that DSCP is considered very important, because as a router is reading the packet byte by byte, the intention is for the most important things to be first, because they affect the processing earlier on. So it's really an indication that, I mean, putting it before the length, this is something that was considered by the protocol designers to be extremely important. In IPv6, it's even more important. It's right after the IP version. So again, in red here, you can see IPv6. The source and destination IPs occupy much more space, but DSCP is still there and is still very important. So that byte I was talking about, though the DS are differentiated services byte, are divided into two sections. The top six bits, they define DSCP. The bottom two define something else called explicit congestion notification, which we did not touch as part of our implementation in Neutron QoS. That's something that we'll take a look at as follow-up work. And within those six bits, there's a little bit of categorization. And this is really important if you're going to be really using DSCP in depth, or if you're looking at it in a TCP dump, the top three are precedents. And then the bottom three define some slightly different behaviors delay, throughput, and reliability. Not all of those bits are used. Reliability, I don't think, is used in any of the currently defined code points, which are the marks that actually get put into those six bits. So let's say you have DSCP defined. You've got something in that bit. What does that mean? Well, each of those code points, each of those numbers that goes into those six bits, corresponds with a per hop behavior. And the per hop behavior is what I was talking about before. Defining what priority this traffic is in relation to other traffic so you know whether or not this is going to be dropped, or for any other categorization purpose that gets used. And there are different kinds of per hop behaviors that have slightly different meanings. Mostly, you can just use them as a list. But there are a few that have special connotations. Like, for example, there's one called EF expedited forwarding. That's a specific byte code. And that means this is something that really, really has to get through. It's one of almost the top. So in the RFCs, there's some definition that some of these codes may, in certain situations, correspond to certain kinds of applications, like emergency phone calls or things like that. That's something that the RFCs have suggested as a use case. So this is a chart of all of the DSCP marks, or code points. You see that the class column identifies what the DCP mark is named. I've color-coded it for ease. And you can see we've provided some conversion. So hopefully this is a helpful chart. But basically, CS0 means no mark. If you don't use DSCP at all, that's what you get. And they get increasingly more important the further down you go. So something that CS7 is the last thing that will ever be dropped in a bandwidth full situation. So a couple of use cases. How is DSCP used in the real world? First off, what this was designed to do, preferential treatment under congestion. So networks, in a standard case where your links are not saturated, DSCP really, it doesn't matter. Because all of the traffic is going to get through. It's only when you have either an artificial or an incidental limitation on your bandwidth that the network devices will start examining the DSCP packets and making decisions based on them. So in your networks, by prioritizing certain traffic, for example, let's say you have a voice application and you have need to serve certain kinds of calls that are more important, let's say, contact emergency services. Those might be something that you could prioritize over other kinds of traffic, like, say, games or YouTube videos or whatnot, just to make sure they get through. I think that's discussed in the RFCs. Or in a back office implementation I'm more familiar with to make sure that certain absolutely mandatory streams that have to do with transaction data that can't be dropped always get through, even in cases where, let's say, there's a device outage and you have a temporary artificial limitation in your bandwidth because you don't have redundancy in the network path. The other use case, and this is very interesting, this is kind of an off-book use of DSCP that arose. It's not what it was intended for, but it is extremely useful. And that is using DSCP marks as a security policy. So because you've categorized your traffic, you can instruct your router ACLs or your firewall rules potentially to inspect that DSCP byte and make firewall decisions based on that value. And this is what some of the more interesting uses of DSCP are. So let's say you're using a neutron that has DSCP enabled and you configure all of your production VMs to have a certain DSCP mark and all of your development VMs to have a certain DSCP mark. Then you can collapse your firewall rules down to just look at that DSCP mark and say, if you have this DSCP mark, then you have access to the production resources. If you have that DSCP mark, you have access to the development or QA resources, which in a very large and heterogeneous environment, having that kind of ability to simplify your firewall rules really makes things easy. And especially when you have the elasticity that you can have in an open stack environment, it means that you can potentially define things on the fly instead of having them be defined in advance through CIDR-based network firewall categorization. So I have here the example for what Cisco syntax would be. There are similar syntax in Juniper and Nokia firewalls and most other hardware-based traditional network firewall or processing type devices. So implementing DSCP in QOS, how did we do it? How does it work? First off, we tried to get this in for Mataka. We didn't quite make it, but it isn't already merged for Newton. So when Newton is delivered, you'll have this capability ready to go. So in the Liberty release was the first implementation of QOS, generally speaking, and that was bandwidth limiting. The DSCP implementation extends the QOS implementation to include DSCP marking as an additional item that can be part of your QOS configuration. You can have bandwidth limiting, you can have DSCP, or you could have both. It's not a one or the other kind of situation. Attaching a QOS policy to a port is exactly as it has been in Liberty with bandwidth limiting. You create a QOS policy, and you can put in a description for that policy. But instead of in what you do in Liberty where you add a bandwidth limiting rule, you can do a cost DSCP marking rule create with the given DSCP mark. And then once you've added that, you can assign the QOS policy to a port. So except for the middle line, this is all identical to what you have in Liberty and Mataka. It's just the additional DSCP capability. Here we have kind of showing exactly what the interchange is between the neutron server and the neutron agent. So to implement this, the reference implementation is OVS-based. So we're using the OVS ML2 plug-in and OVS mechanism driver. And as part of the previous QOS work, there was the L2 agent extension. So you can have an extension loaded into your L2 agent to facilitate control on the compute node. So there is DSCP-specific code on both on the controller and the agent that gets loaded. So let's say you want to assign a particular DSCP mark to a particular port. So you initiate a transaction to update the port. The compute, well, you can see what the slide says. The QOS policy gets inspected, the compute fetches the rules, and then the compute indicates that the subscription to the policy has been accomplished, and the controller notifies the compute when the policy changes. So here's a little bit more about the QOS extension architecture that I talked about before. So in the controller and the neutron process, you have the core API and then the QOS API extension, which talks to the ML2 plug-in and OVS mechanism driver. And the OVS driver communicates over RPC to the QOS agent extension in the L2 agent, which then notifies the OVS agent using the OVS OFCTL command line. As with all the other actions that control open V-switch flows, it uses that interface to OVS to control the configuration. Oh, I have to spin. OK. All right. So here's a provider network with OVS. This looks similar, I'm sure, to many neutron diagrams that you've seen. But specifically, you can see indicated here in the integration bridge, you can see exactly where the DCP markings are applied. And this makes sure that all of the traffic coming out of the instance gets marked. So for example, let's say your instance, you have a tenant that doesn't know what they're doing or they're doing something wrong and they say, all right, I'm going to decide to mark my DCP traffic this way. Well, that doesn't matter because all their traffic when it comes out, the OVS integration bridge has the OVS flows that mandate the DSCP. And they will erase what the current marking is and apply the marking that's been defined in neutron. And then that traffic goes out the BREX and out. So I'm just going to run through quickly a little bit about the open flow switch. So the open flow switch has a number of tables that you can define flow entries onto. And flow entry consists of a matching case, counters, and an action list. So the matching case is the criteria you wanted to match on. Countries keep track of the number of packets that go through it, the number of bytes that go through it. The time it's been created and the idle time. And then the action list defines one or multiple actions. So you can choose to drop the packet. You can send it on normally. Or in this case, you can modify the type of service field or diff serve field to change the DCP value. So just here, when we first made the first implementation, we didn't want to disrupt the default neutron table that much. So we wanted to try and keep as much the same as possible. So what we did was we made the low priority flow just before the default normal action at priority 0. So we set it to priority 1. It would match on the port it was coming in from, so in this case, import 6. And it would modify the type of service field. In this case, it's put down to 104. The reason for that is that when you are writing to this field, you have to bit shift it by 2 to the left, sorry, which has the effect of multiplying it by 4. So the original value there was 26. And it has to be shifted so that it doesn't overwrite the ENC field. So now it's 104. So after it's been added, you can see in the TCP dump output that it's marked the type of service field. It's put down there as hex for 0x68, sorry. So that shows that the rule did apply. When you're checking to make sure that you've applied the TCP value, it's important to check that the last value is either 0, 4, 8, or C. So that means that you haven't overwritten the ECN bit. Also, something to take or to be aware of is that the lowest bit isn't normally set. So really, the only valid values are 0 and 8. So if it's 4 or C, then it means you've set the lowest bit, which isn't a valid DSCP mark, of which there are 21 valid DSCP marks as shown before. So as we can see in Wireshark then, it shows the exact same. You have the mark 0x68. And it actually shows you the classification as well. So it has the assured forwarding of class 3. And it has a drop priority of 1. So there were some challenges when we were doing this, one of which was that the L2 agent had appended a cookie value onto flows that matched a UUID that was in the OVS Neutron agent. So this UUID was unique to the session ID for Neutron. And it was used to clean up any stale flows that were in the flow table from maybe an unexpected restart. So it would stop disruption. The problem with this was that extensions couldn't get access to it. So it was unique, and extensions couldn't put it on. So what would happen was every time a port was updated or a firewall was updated or it restarted, it would delete all the flows that didn't have a cookie ID that matched the session ID. So the solution we had for this in a very kind of short way to describe it was that we created API that would go into the L2 extensions that assigned them their own unique cookie ID. So this helped, first of all, extensions to have their own ID, which is very helpful, especially when you're trying to modify flows and delete flows after you've created them for any reason that you might want to. And this helped preserve the flows whenever a port was updated, firewall was updated. And more importantly, it didn't delete them or they would be successfully recreated if the agent restarted. So just a bit more detail on the L2 extensions API. It's initialized in the OVS Neutron agent. It gets passed into the L2 extensions manager. And a consume API method was added to the class that all L2 extensions are derived from. This wasn't an abstract class. This was a class that accepted an object and just did nothing. So this way we wouldn't actually break any other extensions that use this when we implemented this. So the change specifically that we made to do this was that just before an extension was initialized, its consume API method was called and the API was passed in. Because if it wasn't implemented before this, the method wouldn't have been overridden. It just meant that it dropped the API object as opposed to using it. So just more specifically on what it does. So as you can see here, when the API is consumed, the extension can choose to either consume it where they could either take the integration and tunnel bridges that have been passed into it. So these are the same bridges that are shared by the OVS Neutron agent. So they have their own session IDs in there and everything. But another thing we did as well was created a new mix-in called the cookie bridge mix-in. And what this did was it generated more unique IDs whenever they were requested. And it added them to a list of approved IDs, which would not be deleted when a port was updated, firewall was updated, or the agent restarted. And these are usually initialized in the initialize of the extensions. So just something it also does is when you request to get these agents off of the L2 extensions API is that it wraps it in a class called the OVS cookie bridge. So for all intents and purposes, this OVS cookie bridge is a pass-through class where everything you do just basically goes straight to the integration or tunnel bridge that you've requested. So what this means is just that you have all the access of the integration and tunnel bridges from the extensions. What it does do, though, is it doesn't directly pass through on the add flow, delete flow, modify flow, or dump flow methods. So what it does there is if no cookie is specified, it will append the unique cookie ID of that extension to the flow. So in this way, if you aren't aware that you need a cookie, a unique cookie to do this, it'll do it for you. But if there is a specific cookie, you want to specify it won't overwrite that for you. So if you want to have a look at what other flows exist from other extensions, you can do that by specifying criteria for all cookies to be looked at. Yes? So yeah. So just another problem then that came up was feature isolation. So now that we've let all the extensions use the flow table, there's another problem that's kind of come up where extensions won't be aware of what other extensions might want to do. So when you're developing for an extension, you might not anticipate another extension using the flow table. So when you're developing, you'll obviously see the obvious neutron agents flows. And those are probably the only things you'll consider. But if, let's say, networking SFC is on at the same time and they're forwarding the packets to other places, then they won't necessarily reach the DACP marking rule. Because if you remember previously, that rule or the priority of the flow we created was priority one. So it's the lowest priority to try and least affect the rest of the flow table. And then it normally forwards it. So most extensions that use the flow table will probably forward the traffic as soon as they're finished with it. And that causes a problem because if we're later on and we're not going to get the packet anymore, means the DACP marking rule won't work when in combination with other extensions. So we tried to preemptively address this. So what we did was we used the resubmit method. And so what we decided to do was there is five optional registries in the onflows. And it's metadata that kind of sticks onto the packet but isn't sent with the packet. So when it comes in and it reaches our flow, what we decided to do was we'd mark one of these registries, and we'd also make that this register being clear with the criteria for going into it. So what the effect of that is is that when it hits our flow, it will modify the type of service field, but it'll also write a value into the register. And then when that's modified, rather than sending it on normally, we resubmit it to the default table, table zero. So that comes back in. And then when it hits our flow again, it doesn't match the criteria for it anymore. And it'll continue through table zero to the next flow that it'll meet. If there's no other flows, it'll just hit the default normal action at priority zero. If there is, those other features will be able to work with it. Now the other problem is what happens if there's a higher priority flow. So other extensions might not have given the same consideration for this, so they could send it on and it'll never get to us. And that's a problem if you're trying to use DSCP. So how we addressed this was we simply made it the highest priority on the table. So it will always hit our flow first, and then it'll resubmit it, and it'll go back into the table and any other feature that wants to use it then has access to it. So if you just see here what we have, just the actual, I suppose, this is kind of for, this isn't the exact flow dump of what we have at the moment, but this is a future plan. So there's plans at the moment to try and work with flow management a bit more with all the extensions. So it's something we're planning on addressing at the summit. So one of the many things we could do is define certain tables for certain extensions and ask that from these tables they get forwarded back to table zero, and then all the other extensions can kind of get through it as well. So just as you can see here, you have register 2 equal to 0x0, and that means if it's not marked by default, it'll forward it onto table 10, and then at table 10 it loads the value, sorry, loads the value 0x37 into the register between the 0 and 5 bit mark, and then it marks the type of service value, active serve, and it resubmits it to table zero. And through this way, it goes back to table zero, processes it, and whatever else needs it can use it then. So another problem is, because this is another iteration to the QoS rules and policies, it's a new rule that's just been added. It raises the question of what happens if there is an agent that is of an older type than what supports QoS. So when we add a QoS, the QoS rule type version incremented to 1.1. What happens if its version is only 1.0? So the agent needs to know the instance ID and it needs to know how to address this. So one of these is the version object, but the RPC callbacks upgrades. So as you can see here, it consists mainly of three parts, the Neutron OVS agent, the RabbitMQ advanced message queue, and the Neutron server. So what you kind of have first happening is that the Neutron OVS agent reports the version type of the QoS rule type. So in this case, it's version 1.1, reports it to the RPC name space resources queue, and that goes to the Neutron server. The Neutron server creates a FANR queue of QoS rule type 1 and that can connect to the QoS rule type 1 objects. That fans out, so all the versions of that same type connect through that queue. So what happens now when a version 1.1 comes along? So almost identically the same thing. It reports its version through the name space resources queue. It creates the QoS rule type 1.1 fan out and it fans out and all the versions of 1.1 are connected to the queue. So I'm just going to hand back to my co-host. Thank you very much. Sure. OK, so the reference implementation was the beginning, but it doesn't cover everything. A few things that we didn't look at, but are on the roadmap for things we'd like to do in the future. First is ingress DSDP filtering. This is something I think can be a part of the Farrows of Service project in the future as they move forward with their version 2 spec. So the idea is implementing filtering at the tenant level so that you can say only allow in traffic with a certain DSDP mark, essentially replicating the kind of filtering capability that you have in your network devices within open stack filtering. I'm not going to do this in traditional security groups because we need to, neutron security groups are very specific. They have certain compatibility requirements, but this is a great item for possible future roadmap for Farrows of Service. And then second, encapsulating packets with the DSDP mark of the traffic inside. So if you have a VXLAN tunnel, for example, and the traffic inside the tunnel has a certain DSDP mark, you can also mark the VXLAN encapsulation frame with that same DSDP mark. We didn't include this because we figured there probably aren't going to be many situations where you have a bandwidth limiting situation within a cluster because typically the kinds of networking you have within an open stack deployment are you have typically a lot of bandwidth. So this is kind of a lower priority item, but it is something that we want to look at in the future because it seems like a natural next step to go. So here we have, I have some links here. If you get the slides for some of the additional future roadmap items that we have, we want to look at neutron support for the explicit congestion notification bits, which are wholly separate and how they operate and what they do from the DCP part, even though they occupy the same byte in the IP header. There's also some work towards a more generalized neutron traffic classification work. In addition to the existing quality of service bandwidth limiting, we're also on the roadmap for QS's minimum bandwidth guarantees, which involves some integration with the Nova scheduler to say this VM will always, under all certifications, have at least this much bandwidth. And finally, ingress bandwidth limiting. So to perform the same bandwidth limiting, we have the current implementation for bandwidth limiting is egress bandwidth limiting to provide the same implementation for ingress bandwidth. All right, concluding. So how do you use it? In the Neutron server in Neutron.conf, you engage the QS plugin as one of the service plugins. In the ML2 plugin and the L2 agent, you note the extension for QS is to be loaded. Since DSCP is just part of the broader QS extension, this is pretty much the same as for Mataka and Liberty. In DevStack, again, you enable QS. There's an enable service, Q-QOS, that's available for that. And here are the directives to engage essentially the same configuration directives that I showed you before, but in a DevStack environment. Other open-sac resources. There's a lot out there about, sorry, other QS-related resources that are helpful if you're interested in this. There's a lot about DSCP and QS in general. There was a great presentation in Tokyo about the initial QOS effort. We have documentation already in the networking guide. We have some work that's in progress right now to implement the DSCP controls for heat templates. And we've linked to all of the changes that have already merged or are currently in progress here. So if you're really interested, not just in the DSCP work, but also the L2 agent extension work that was merged in Mataka and the RPC rolling upgrades, which we described earlier, that was merged for Mataka, you can take a look at those through these links. I mentioned earlier that DSCP is defined by the RFCs, because there are a lot of RFCs that talk about DSCP or differentiated services, which is the general category that DSCP falls into. And so these are the canonical reference. And they define every aspect of how DSCP is implemented and handled, not just with this is what we're trying to use as our standard. But this also will tell you how network devices will inspect and handle traffic that is DSCP marked. I mean, there are patents that are related to how DSCP can be used. And what ways can DSCP be implemented for various business cases. And finally, I just want to say thank you to a number of people in the community. Margaret Francis with Comcast helped hugely in the efforts. Miguel Angelajo, Ihar, Victor Howard, James Reeves, Gary Cotton, and John Schwartz, not an exhaustive list, but this was a really big change to get merged in. And we just want to say thank you to everybody who helped out with that. Legal mumbo jumbo, thank you. And I think we have time for one question. We have one minute left. Is this only work with ML2? It doesn't work with the monolithic neutron? Or can it use with that? It is only implemented in ML2 for OVS at this point. Future implementations like Linux Bridge or other things like that are definitely something that we want to look at. But just to get it out the door and get the concept solid, that's the way we want for the first implementation. All right, thank you very much.