 To show up so we better start so my name is Armando Miyacho. I'm I work for HP as an Nova a neutron core developer. I used to be a Nova developer Oh, see another one is coming in welcome So before we we share this presentation with you I'd like to to set the context a bit When when dealing with a bandwidth guarantees or like quality of service in more general terms, there are a few elements that Can make this this problem particularly tricky to deal with and Quality of service can be you know can be seen by Many people as an overloaded term Obviously, it's mostly a networking concern, but it can apply in other contexts like compute and storage here We'll be we'll be talking about networking specifically and From an open source perspective when you know in open stack in particular when multiple parties try to Come up with that with a common logical representation of the problem and the potential solution to the problem it can be tricky to To get that logical representation and map it to multiple different technologies and and specific vendor solutions and these This can also be made particularly tricky when and challenging when applying that particular logical solution to to the heterogeneous cloud provider environment that that can be out there and even when when that logical model has been agreed upon it can be challenging Determining whether that logical model is is considered like the game the foundation that you know the core of the of The core of it and work can be seen as you know Extensions that that can be that can be used to diversify a specific solution When thinking about you know when in open stack in the past this this this aspect has been it's been dealt in a number of ways and In in various projects with mixed results in neutron. We've looked at this space in the past and They have been again mixed results what we can what we can certainly say is that We haven't we haven't looked at it from an holistic point of view and there isn't been like a coordinated approach that meant that this second this aspect has Has been looked at from again from from from more than one more than more than one person at once so I I am going to now hand over to to Yoshio and he's gonna like have a deep dive into into this bandwidth limiting Guarantees experiment that HP has been has been involved in Thanks, I wonder So I'm Yoshio. I used to work at HP labs So Amando painted a picture about QoS kind of spanning multiple dimensions networking storage compute and I'm going to kind of drill down on Networking quality of service and in particular bandwidth guarantees There are many other kinds of aspects to network quality of service like latency prioritization Limits on packet losses and so forth, but I'm just going to focus on bandwidth guarantees and Even within that context That's also fairly broad Topic because there's many different ways to express bandwidth guarantees and we're going to do describe just one technique that we have some experience with So why would you want bandwidth guarantees? For me the main reason is that we want to provide insulation from noisy neighbors if you have multiple tenants sharing a Network in a cloud you want to have the property that If one tenant is using a lot of bandwidth in the network that should not disrupt the application level performance that different tenants receive So really I view bandwidth guarantees as a mechanism For providing predictable application level performance through put in response times And it's really not just a performance issue. It can also be a reliability and uptime issue if you have Such a severe disruption of your performance that it leads to application level timeouts, for example This can lead to cascading failures in downtime of your service And second reason you may want bandwidth guarantees is to provide a richer level of a richer Specification of service level agreements between the cloud provider and the consumers And finally you may want to have bandwidth guarantees to provide special service for special flows like video, for example I'm not going to talk much more about that although. I think it is an important topic So if we look at the work that's been done previously on cloud network performance, there's been a lot of work done in the Research community and a bunch of papers are listed there if you look at ACMC com for example and Using XNSDI conference you see a lot of work that's been done over the years in both academia and industry Representing you know a lot of smart people working very hard to come up with innovative solutions to provide You know better fairness or bandwidth guarantees and so forth But unfortunately a lot of this work has kind of Stopped there at least as far as we know they have not been Put forward in a in a manner that is open for widespread consumption So like as an open-source solution that's easily consumable by lots of people If we look at what's happening in OpenStack there have been some vendor plugins that provide some level of assurance for bandwidth and Recently there's an effort in Neutron to define a flexible QoQoS API and That looks really interesting to us So in HP labs we decided to take two efforts that we did that were related gatekeeper and elastic switch and implement them on OpenStack as a proof of concept and The hope is that that this work as we share it with community can be used to kind of bridge the gap between some of this research work and The what's available in OpenStack and freely consumable So let's look a little bit more at gatekeeper and elastic switch Both of them provide a very similar abstraction to the cloud tenant The cloud tenant sees a virtual network and the virtual network from a bandwidth perspective should behave like all The VMs of that virtual network are attached to one big switch with links with a Guaranteed bandwidth capacity so any traffic pattern that's such a one big switch could support The network the real the virtual network should be able to support And so it's really the job of gatekeeper and elastic switch to provide this performance property of this one big switch abstraction, even though these virtual machines are Are in reality distributed across a network And interconnected by potentially several switches So if we want to understand a little bit more deeply how gatekeeper and elastic switch work Here we have on the top gatekeeper on the bottom elastic switch the basic trade-off between these is that gatekeeper is a simpler mechanism and Elastic switch provides finer control in gatekeeper So let's look more deeply into how these work. So gatekeeper the basic strategy. Well, let's before we talk about that Let's talk about a common concern, which is admission control when we place virtual machines or instances on a Cloud network. We want to make sure that they don't over allocate the the network resources that they need In the case of gatekeeper The only thing we check is that we're not oversubscribing the host Nick So as virtual machines land on a on a compute node They take they allocate portions of that Nick's bandwidth and we don't want to make sure we want to make sure that the sum of those allocations doesn't exceed the Nick bandwidth For elastic switch, we do the same thing But in addition elastic switch considers what's happening inside the network core as well and tries to make sure that we're not overallocating resources in the network core whereas gatekeeper assumes that the network core is well provisioned and it's not the bottleneck Okay, so now let's look at the the different mechanisms So the basic strategy gatekeeper uses is to use weighted packet schedulers on every compute node and In our implementation, we are using the Linux traffic control Subsystem to do this which has very rich mechanisms for queuing and scheduling of packets And it turns out that just using this mechanism and configuring it right provides that one big switch guarantee for as long as all the traffic is congestion-controlled TCP However, not all traffic is as congestion-controlled TCP. So gatekeeper has kind of a fallback mechanism it's constantly monitoring the traffic rates and identifying non-compliant traffic and Once a traffic is identified compute node to compute node hypervisor to hypervisor signaling occurs that causes the transmission side to have stricter Rate limits when it receives these congested notifications So that's the basic strategy that gatekeeper uses for elastic switch. It's a little bit different Elastic switch sets up all pairs virtual machine to virtual machine tunnels and On each tunnel it monitors the traffic on that tunnel the demand on for the traffic on that tunnel and Also the congestion experienced by that traffic in the network these metrics are then exchanged between communicating hypervisor nodes and That information is used to dynamically control a rate limit of transmission rate limit on each tunnel and In this manner it provides that one big switch abstraction Okay, now I'm gonna hand it over to Jun Young. He's going to talk about the open stack specific implementation Okay, thank you. Oh, my name is Jun Myeong who is a working for HP reps right now I'm gonna talk about more details in terms of how to integrate the gatekeepers on open stack Because of time constraint we don't our covers illustrious Implementation open stack, but currently we have implemented the last switch is based on similar approaches Which I'm gonna talk about now Okay, I'm gonna start from the architectural designs for integrating gatekeeper on open stack Currently we have developed gatekeeper based on the Neutron open stack killer release as well as the junior release So when you try to deploy the open stack we can use me different types of different deployment model So I'm gonna show the bit simple models based on one controller and several compute node When you consider some networking part in open stack so we can run the we are running the neutral servers on each controller part So we can also use some different types of plugins on Neutron For example, or when you try to run the ML to plug in So maybe you can you can use them different types of type drivers and mechanism drivers for sporting networking part And there are several compute node which includes a hypervisor and some agent modules for sporting this open stack As Yoshio mentioned previously, we are currently learning gatekeeper and last switch on the hypervisor So currently we are learning the gatekeeper and each compute node Gatekeeper try to monitor the network status from the leak and then it can try to exchange all control messages between all gatekeeper and then it can control the leak for the based on the given bandwidth guarantee Okay, when you try to integrate this gatekeeper on the open stack So we need some additional information for this bandwidth information a bandwidth guarantee Okay, we have to define some minimum and maximum bandwidth for tenant virtual network or the port But there was no available API is for setting this information in the Neutron But actually currently the Neutron QS group is trying to define this QS APIs Such as the QS policies which includes the minimum and maximum bandwidth information But there was a if there because there was no available API is we have defined our own bandwidth guarantee API extension Also, we need to develop the gatekeeper mechanism driver for sporting additional information for bandwidth guarantee Also, we need agent extensions which can take the information from the control part and then you can pass that information to the gatekeeper For example, when a user set the minimum and maximum bandwidth for each port or tenant We have extended the port update APIs, which is already existing in the Neutron APIs By adding two more attributes minimum and maximum bandwidth And then you can pass to the gatekeeper mechanism driver Gatekeeper mechanism driver passes this information to the gatekeeper agent extensions After taking these values from the gatekeeper mechanism driver It passes these values to the gatekeeper after getting the given a bandwidth guarantee gatekeeper trying to provide the bandwidth guarantee Because gatekeeper provide a bandwidth guarantee in this cloud data center, so we have applied this information to the Nova scheduler As you can see There's no schedule in the control node. We have a nobacomputer node compute server a compute agent in the compute node Currently we can use only computing resources for implementing the Nova filter functions for scheduler such as a number of virtual CPUs, amount of memories, amount of storages There is no networking related resources for flavor In this case, we have a modified Nova compute to report available bandwidth info to the Nova scheduler so we have Sorry developed We have modified the filter function for supporting this available bandwidth info the Nova scheduler This is one of the useful information which we can use of this available bandwidth info for the scheduling Okay, I'm gonna talk about more one more examples read without the bandwidth guarantee We are assuming that use we are using the maybe calling as a 10 gig bps And then we have a two tenants blue and red And you're gonna launch two virtual machines one compute node for the blue and net tenant The first we're gonna generate one TCP floor in blue tenant You're gonna generate the 16 UDP floors in the red tenants So in this compute node What happens in this compute node when you have these traffics there's a bottleneck in this compute node I'm gonna show the result without and this guarantee Okay, if there is no traffic in the red tenant here in the blue tenant We can get a more than 8 gig BPS as a maximum throughput but If we have some traffics in the red a tenant that's UDP traffic So it can consume it consumes all of the bandwidth Because TCP suffers from the these UDP traffics so we cannot guarantee the bandwidth for this blue tenant If we don't have any mechanism for bandwidth guarantee so we cannot provide any this mechanism in the open stack Okay, I'm gonna show the one result based on our gatekeeper implementation on open stack We're gonna saying the 8 giga BPS as a minimum emission bandwidth for blue tenant and we're gonna setting the 2 giga BPS for the as a minimum emission bandwidth for this red tenant as you can see in the pie chart This tenant to a blue and red tenant shared that this bandwidth based on the given bandwidth guarantee And this red a tenant if there is because there is a Five compute node gatekeeper is equally assigned the available bandwidth for the each compute node I'm gonna show the more examples when you have the multiple tenants Okay, we are setting the same minimum maximum rate for all multiple tenants So maybe up to the five tenants So we have some all mixed TCP and UDP floors in each tenant I'm gonna talk about a work conserving here So we have only one Traffics in the one tenant So even though we set up some small amount of a minimum bandwidth So if we can get a more than a more throughput if there is a no competitive tenants here and also We can keep her can shared all the bandwidth evenly for the different multiple tenants Even though we showed only five tenants here. We already Tested more than maybe several hundred tenants the gatekeeper can assigned All the bandwidth evenly based on the given bandwidth or minimum maximum bandwidth guarantee Okay, next Mario will show the more practical and interesting demonstrations instead of this static result Okay, no, Mario Thank you. You know Hello, everyone. My name is Mario as you mean said and I will be sharing with you a video demo of our implementation of gatekeeper So that hopefully nothing will go wrong. This is recorded. So that should be fine There is okay, so Basically in this scenario, we're going to we have two tenants again the blue and red tenants In this case the blue tenant has two VMs the red tenant has three VMs and all five VMs are gonna be spread across four different compute nodes Each of the compute nodes are connected to the network through 10 gig nicks and the bandwidth between Them in the network available for the compute nodes is 10 gigs as well So in this case, we are just fast-forwarding the video a little bit to show you The automatic creation of these five VMs in this particular four compute nodes And we're showing you on the right just a horizon view of these VMs being created automatically We decided to create them using Python script. We could have used heat or something similar But we just decided to use Python in this case So next I'm going to be dragging to the left Three terminals which correspond to the consoles of the three VMs on the left Blue tenants VM one and red tenants VM three and five and the figure on the bottom of the screen It's going to show the throughput as seen on the two two VMs on the right side of the screen VMs to for blue tenant a VM for for red tenant in this case So once this is it up We're going to start with an experiment where with there are no bound with our entities and we have a scenario where the blue tenant sends a single TCP Connection from VM one to be in two and what as we expect we see that the TCP connection is able to get all the available Bound with to it since there's no competing traffic in this particular scenario Next we're going to receive a repeat the same scenario But we're going to start injecting UDP traffic from the red tenant and specifically from VMs three and five towards VM four on the right side of the screen And what we see is that the moment that UDP traffic starts flowing on the network just like you mean showed before UDP basically hugs the entire available bandwidth and TCP basically dies. This is no surprise We this is exactly what we do expect what happens given that we have no bound with our entities, right? So in this case when the UDP flows actually die We see that TCP is able to recover and to take all the available bandwidth to itself until the actual flow is completed So next what we're going to do is show the same scenario But by first activating our gatekeeper component in this in our desktop depth stack configuration And in this case we're going to be setting the minimum rate guarantees and maximum rate limits to be the same values for each of the different VMs and different tenants in this particular case we're going to be setting the the VMs for the blue tenant to have a minimum and maximum of five gigs and The red tenant to have a minimum and maximum of two gigs In this case We're just again doing a very fast forward of the video to show you how we were using horizon and extended Horizon so that we can set the minimum and maximum for each of the ports that are attached to each of the VMs in the different tenants in this case with simply a If you can see on the right with extended horizon so that it exposes the new attributes that we've attached to the different ports in And in neutron when they are created so in this particular case again We're just setting up The minimum and maximum to be the same values for the blue tenant They're being set to five gigs and for the red tenant they're being set for two gigs So next we just repeat the same experiment and what we see is that once the TCP flow starts It is quickly limited to the five gigs guarantee that it was set And when we start the UDP flows it also is limited to the two gig guarantee that it was provided but more importantly is the fact that the UDP flow is no longer interfered with the TCP traffic and Once UDP flow dies the TCP flow continues happily until it it finishes by achieving the five gig throughput That we've assigned to it the minimum guarantee that we assigned to it So finally we're gonna show you last experiment where we're going to set the maximum rate limits greater to the minimum rate guarantees in this case we're gonna be setting The for for all of the VM for the blue and red tenant We're gonna be setting for the blue VM for gonna be setting in a mean and max of five and eight gigs prospectively and two and eight for the red VMs in this case By setting maximum greater than mean the tenants will be able to use more bandwidth than its minimum Guaranteed bandwidth as long as there's enough available bandwidth on use bandwidth in the network and in this case And we're again doing our magic and you setting up the ports to the specify values two and eight and five and eight Respectively, and then we're just going to repeat the same experiment So what we'll see next is that when the TCP flow starts it it is it will be able to achieve more than its five gig Guarantee, but it will be limited to the five gigs and max rate limit that we set for it Next we're gonna start the UDP flow and we're going to see that UDP as well Is able to achieve more than its minimum of two gigs But less than its eight maximum eight gigs and what we're going to see then is that when the UDP flow dies You the TCP flow will start going up again And we'll be using the available bandwidth up to the maximum that it was set for us for it in this particular case We're showing that in this scenario where the maximum is greater than the minimum. We are showing a work-conserving Strategy, but we are where we are able to better utilize the bandwidth on the network with the caveat that we are doing this And we are sacrificing predictability and the term determined a predictability on the network in this case So I'm gonna hand it back to You're sure no Okay, I just want to talk about a few challenges that we see for Kind of getting this stuff into open stack that some of the things that we encountered so maybe the main difficulty was the cross-project dependency that Jun Myeong alluded to for the Admission control where we basically had to modify the notion of the host resource and Extend the Nova scheduler a little bit so That's a kind of dependency that we'd like to get away from in the longer term It looks like there's an effort to move scheduler out of Nova and that seems like a Better fix for this in the longer term But in the shorter term, we might be able to sidestep the issue and eliminate this dependency by a strategy of Proportional resource allocation if you allocate the network bandwidth in proportion to some other resource that Nova scheduler knows about like memory then if you set the proportion right then you can always ensure that you run out of memory before you run out of Bandwidth and so then you kind of eliminate that dependency, but it's a it's a kind of a trick The other issue that we had is that both gatekeeper and elastic switch Work by implementing the Linux kernel module and in the case of gatekeeper This is mostly for fine-grain monitoring of the flows And this is kind of a non-starter for contribution to open stack First we'd have to contribute to Linux and get that accepted before open stack could consume it So that's really not going to go very far so what we're doing now is we're looking at an interim solution that would eliminate the kernel module and Leverage existing features in open v-switch to get an almost as good solution So looking forward. We're hoping that This work that we've done Can be useful for the community and we hope that you know over time that the code can be contributed and be You know useful for people we'd like to work with the Neutron QoS group and the community to really make bandwidth guarantees a reality for everyone So we've been thinking about how we might Stage the contribution that we've done You know and figure out how to contribute it in Kind of bite-sized chunks that would be easily consumable by the open stack community So one possible staging is shown here where you could take the static rate limiters Which gatekeeper uses to control traffic for? TCP and it's really just configuration of Linux traffic control and use that as kind of an initial contribution And then going forward add the congestion signaling mechanism between the hypervisors Again, this was does not involve any interaction with a controller and then finally, you know over time we could Implement more precise monitoring in the Linux kernel. Hopefully get that accepted and then pull back into open stack Some other things that we Kind of encountered along the way. It's shown on the bottom here So one issue is that bandwidth in terms of megabits per second is not the only thing you need to think about in terms of In terms of guarantees even throughput guarantees packet rate is also important the number of packets per second That you're processing because especially for UDP traffic the amount of CPU that you're using to process every packet If the packet rates are high enough This becomes a very significant factor and can even come to the bottleneck before your Your bandwidth bits per second becomes the bottleneck And of course where a lot of services are really concerned about latency more than bandwidth And so we need a solution that kind of balances these two concerns We currently rely on hypervisor-level support for the signaling So I think you know over the longer term we're going to need solutions that can work for bare metal hosts as well and Finally, we've adopted the one big switch performance model called the hose model There are other models in the literature that might be interesting to look at over the longer term And of course there are things like per prior priority per flow per flow guarantees and so forth Okay, so I'd like to Hand it back to Armando to close it out Thanks, Joshua. I mean, yeah before we open up at 2 Q&A obviously I want to thank everyone for sticking around so if you have any any energy left and you would like us to Challenge with questions please go ahead and Here a couple of pointers on these lines I mean in the Liberty time frame This problem space will be looked with you know, lots of interest and we'll try to make progress So there you'll see a couple of pointers where you can you know stay abreast of developments And the next couple days there will be a couple of sessions that may be of interest to you I'm not sure whether the rooms will be big enough for you know fitting everyone But I mean there are other parts associated to those rooms And that's like those are the tools that can be used to a game to stay up-to-date. So there is a person at Mike So I was just wondering I maybe I missed how you exactly you were doing enforcement in this So in Gatekeeper, there are two mechanisms one mechanism is to provide packet schedulers at the endpoints at the compute nodes That is sufficient for a lot of TCP traffic for non congestion control traffic We do send feedback messages from one hypervisor node to another so agent to agent level and and then when such a feedback is Received the receiving agent imposes harsher rate limits on the transmission side In where are the packet schedulers sitting? Are they at the hypervisor level? They're in the yeah in the Linux kernel In the in the host in the compute node host. Oh in the compute. Okay. Okay So I have one question regarding the minute max guarantee and work conserving aspect of your product So let's say if I for a TCP Traffic I set a minimum bang bandwidth to two gigs and let's say that Flow is right now currently not sending anything and then there are like competing UDP traffic if they can go up to like You know, let's say ten gigs. So do you allow that or? Yeah, we we allow that and we make and the way we can get around that is to make sure we never Allocate the full neck bandwidth we allocate like 90% of it So it was always a headroom and when we're above that headroom then we're suspicious that that there's some flows that are not getting Their minimum guarantee and we send feedback in that case kind of proactively So and that's the way that we kind of avoid having to do rate limiting and feedback in the network itself But we can do it on the host so so my question was like, you know, just clarifying So the UDP traffic which is currently running would get like the full 10 gig It would not get full 10 gig, but it can get up into this headroom area and then would get rate limited. Okay. Okay. Thanks first two things is this is really cool demo and presentation. So thank you so much and I also have a couple questions need clarification. The first one is Does this solution really work if for example my VM have a Norse and South bound of traffic? In other words, the traffic does not really That's the night to another VM instead will go somewhere into the internet And then in this case does this solution really work? Yeah for North South Traffic, this is a concern and you know, you basically if you wanted to adopt these kinds of mechanisms You'd have to have kind of a gatekeeper proxy or something that's sitting on edge at the edge. Yeah, I see Okay, another thing is from an architecture perspective or saw in your diagram There's a mechanism driver, which is keep a gatekeeper driver, right? And the gatekeeper driver if I remember correctly talk to the OBS agent There is an extension there and from there actually you send instruction to the gatekeeper on the Compute node. So is there a reason why you have to go through the OBS agent? Why why not your driver's communicator is a gatekeeper on the commute on the computer directly? Actually in this case, there is no reason to depend on OBS at all We just decided to do it for this particular POC But we could just as easily create our own agent on the compute nodes that talk directly to the to the demons to set the Speeds, but yes, absolutely. You're right. Okay, so in the other words, there's actually OBS agent does not involve in this case For any purpose is not required right? Okay. Yeah, awesome. Awesome Yeah, I think that's all my questions. Thank you so much Hi My question is very relevant to the first question of the previous speaker It's about bandwidth guarantees when the back end network is over provisioned. So you mentioned the Gatekeeper I can't have agents deployed on top of rack switch or aggregation layers which where they are reporting back all the usage Right. Is this a model you're going after? No, this is really all all gatekeeper does is lives at the on the compute nodes Right, and this is really no support in the network fabric itself For gate for elastics, which we've have done experiments where we are taking advantage of ECN capability in the network to get better indication of congestion That the individual tunnels are experiencing Okay, so going forward if you want to include the support then having appropriate models deployed across into our past is a Yeah, that can only make it that can only make it, you know better a more Accurate because right now we're kind of inferring things based on on end-to-end behavior But if you got explicit signals from the middle of the fabric, then you could do a better job Okay, and it is this on your roadmap or something later? It might be on HP's roadmap Okay, got it. I don't know Sure. Thanks One question regarding Scalability, so do you have touch a word about the protocol that you're using for gatekeeper? On what network you run this protocol and how scalable can it be if you can have a very large number of compute nodes, right? Yeah, so So we do a few tricks to make it to gain scalability So basically you're talking about how we send the feedback in a scalable way, right? So we have a We have this notion of severity of the feedback And if you send feedback and you find that you're not getting the response that you expected Because maybe there's too many senders and it would take you, you know a long time to send a lot to all of them Then what we do is we start bumping up the severity that we send back to the individual Transmitters and the severity what it does is it causes them to recover their original rates slow more slowly than they would otherwise So it's kind of like a if you familiar with TCP behavior where you have the additive increase multiplicative decrease We do they go through the same multiplicative decrease, but with added severity they the the with increasing severity The additive increase part becomes much slower And so over time, you know no matter how many senders you have eventually as severity is so high that they recover in the aggregate Kind of like a single flow would Is it based on UDP or using something else? Yeah, so I'm actually don't remember That are being sent between the the compute nodes and I'll be intercepted when they are received on the other side Yeah And what the network are you using for carrying this traffic? What network? Yes It's it's a 10 gig ethernet So is it the same interface on the data path for for the VM? So is it a different? Yeah? Yeah, it's in-band so The packet has to be of a format that we can that we can detect and intercept as You know and detect that this is a congestion feedback mechanism Notification before we send it to the VM But it's basically just going through the same data path that normal traffic goes on a virtual network So whatever it is VX LAN or VLAN or whatever it is It'll traverse the same path and then at the destination It'll get intercepted Thank you It's a common more than a question I'm sorry, I'm told we're out of time. So let's take let's take it offline Okay, we'll have the comment All right last question. Oh, yeah, so for implementation you could have used open flow Where you could use different tables for cure setting and then manage the bandwidth according to that and can actually Write the flows accordingly. Did you consider doing like that? What's the limitation? I think that's an implementation detail. Yeah. Yeah, so you didn't know at the end of the Linux kernel So open flows are basically a protocol, right? But at the end of the day, you need a real mechanism that provides the enforcement So we're using our own protocol, but the mechanism at the bottom is probably the same But yeah, we could we could swap out the protocol with open flow. I'm not sure I see a big advantage of doing that Thank you