 Hi, I'm Gary Cotton and this is Gillard's Lotkin. I'm a staff engineer at VMware and Gillard's the VP of Cloud at Radware We're going to be presenting a few ideas on differentiated services and how they're related to differentiated scheduling and Basically, how did we come upon this idea is at the moment Nova relates to all kinds of resources in the same way And we'd basically like Nova Nova to have some kind of differentiated scheduling Which is triggered according to types of service or special applications And basically we're going to explain various ideas that are going on in the community and developments and current support that enable one to perform some kind of differentiated services and hopefully kind of provide an Enterprise grade cloud so that people can run mission mission critical applications on it So essentially kind of the idea is hopefully one day to get open stack to be enterprise ready What exactly does that mean? It means that open stack should somehow or another be able to make use or To run mission critical applications in the public cloud That is kind of applications which today are running the private cloud which are highly available Which have very good performance? Which have guaranteed security and basically they compliance with existing architectures at the moment Which are multi-tier and various fault tolerance models and throughout our presentation We'll describe these and kind of go into various features that that Nova has and We'll provide some kind of details on future developments to show how things can can be improved and what can be leveraged at the moment So in terms of availability there are several models of availability. We are trying to oversimplify them But we call full tolerance to We call full tolerance to a application architecture that's actually able to sustain a single server failure Sometimes it's more than single failure. It means that any fault Is It's resilient to any fault so the application continue to be available. There is no even recovery time high availability It's something that would take the application between second and minutes to recover and Usually when we're talking about disaster recovery, it's within hours and sometimes days to recover so we just want to Make those terms so we will know what we are talking about when we are looking at different Architecture and how we're going to support them and migration Migrating to open stack when we talk about performance. So usually it's it's a measured in two different metrics a transaction latency millisecond up to seconds and transaction load Transaction per second. Okay. So when we are trying to optimize Availability we mainly talk about full tolerance when we talk when we try to optimize on performance We actually relate to both of those metrics and when we talk about security There are several issues. We are not covering all of them right now. It's many data privacy data integrity and be able to sustain a denial of service attack and We will address some of those issues and you probably wondering How all those issues related to the novel scheduler? Well, it does it does and We will explain exactly how scheduling policy Can be triggered to address each and every service level requirement on on this board So when we talk about the availability model There is the availability zone architecture Which is very a cloud oriented and actually not many Enterprise application legacy enterprise application are using this model for availability or certainly not for fault Resiliency we have server redundancy. This is the classic enterprise High availability model and of course we have both and actually I would classify. This is the enterprise disaster recovery way and we will See in details what those Architecturals are so this is the cloud a classic high availability Architecture since you cannot really Know on which VM on which host each VM Is placed then you can design your application to be resilient to the entire zone failure, okay? So that would be oversimplify typical availability model where database is being replicated isn't either Synchronously in a two-phase commit or asynchronously And actually every server failure Certainly the database but all the load balancer every server failure. It's a kind of a disaster recovery failover to the backup domain a global load balancing is typically done by DNS load balancing so the global load balancing may be able to Automatically detect a load balancing failure and failover automatically But certainly won't will not be able to detect at least not immediately database failure or any web web server failure, so The detection of the failure may take several minutes or seconds So this would I would I would classify this as high availability and not necessary a fault tolerance Architecture a fault tolerance architecture. This is would be a typical fault tolerance architecture that you will find in enterprise mission-critical You have load balancer and you have a backup load balancer. You have at least one additional web server so even if one server is done you have sufficient capacity to support the Transaction rate that application need to provide and you have a two database that then synchronize and this is actually local synchronization the previous in the previous model This would be a synchronization of cross a zone communication and actually this is a cost It it cost it cost money to communicate between a zone typically in a public cloud So this would be a server redundancy and This basically designed to be resilient to any single server Failure and that that is a typical Enterprise mission-critical architecture and it would be interesting to see what it means to take application with this high availability model and migrate them to open stack and this is exactly what we are focusing on Another model of course is the combination of both. So you have server resiliency in each zone and When either more than one server fails or the connectivity to the zone fail or all the zone fail Then of course you can use the backup zone I'm looking now on active active configuration actually in some Deployment you don't really need four databases. It's efficient to have one in each location So if the database server fails, then this is a disaster recovery event Okay, and this can be configured to be a fully fault tolerant it means zero recovery time or depends depends exactly how the database synchronization is managed, okay so Let's talk about other availability Kind of with the advent of SDN kind of basically the networking also becomes an issue where basically the Logical and the transport networks also need to be highly available and somebody needs to guarantee that they're also up and available So for example VMware's NSX, which would be a highly available available solution to manage the network traffic between the various instances as Gilad's displayed so say for example if One of the networks goes down The NSX will ensure kind of that the various flows are built correctly so that traffic can get to and from the various instances or the Virtual machines that are running within the cloud and in addition to guaranteeing the highly available traffic Another thing which is very important is to have highly available Controllers so basically just having no single point of failure is something which is also very important High availability also need to be built into the load balancer itself as you saw before it typically we have a two load balancers instances one Active one standby and there is a continuous update of Synchronization of configuration and synchronization of persistent state So the failover will result with minimal impact of incoming transaction Even zero impact on incoming transaction and the standby load balancer actually Check the livelihood of the active load balancer when the active load balancer is down So the standby automatically takes over. This is also almost instantly Done. So this is a full fault resilient. This is typical Radware Alton support that and then other load balancer support that Just just as an example in OpenStack, you will find high availability HAProxy Implementation and unfortunately HAProxy does not support Auto failover to stand by instance and for mission critical application You will need to deploy something like that. Okay to provide the availability to the load balancing layer so I Availability is one it's one-up aspect. We will visit the other aspect of performance and security But just for availability it leads us to Understand that we need to schedule Application like this Seven seven VM application as a group rather than as independent placement Procedure Just to make sure they are not being placed on the same server And we will see an example so that just to make sure they are being placed in close proximity to each other on the network So there will be a low latency in all these cross or in internal application communication and There might be other aspect like security which will visit so basically this is a group scheduling initiative We started it more than a year ago, and it's ongoing effort And we will share with you what is the state of the project What was already done and what we are planning to do going forward to support mission critical application in OpenStack Okay, now I'm going to provide an example on how OpenStack does Scheduling of various instances and to basically show how the example that we've used from the start of a highly available redundant application can be deployed in the cloud so initially kind of the The scheduling at the moment in OpenStack is kind of best efforts one can make use of scheduling hints But those are a bit kind of Laborious and cumbersome so basically one could say deploy the application that's shown in the picture and OpenStack Could have say a bad Networks of sorry host selection where exactly the same virtual machines are deployed on exactly the same hosts What's the problem here if that this host failure the whole application goes down and Kind of there's a high probability of host failure in the cloud. So that's something which can certainly happen so the ideal solution to that is To do anti-affinity just sorry taking one step back over here kind of the the gray squares or the squares that are Utilized already on the host and the white ones are free So we see that certain instances can only be placed on certain hosts kind of the scheduling decisions something that's not trivial And it's something that requires kind of a great overview of the whole picture of the system at a current point in time So as I explained It's kind of a bad scheduling decision could say put the two database servers on the same host and say for example That host goes down the whole application goes down Okay, so what we've proposed with the with the group scheduling is to have a placement Strategy which is anti-affinity which enables fault tolerance. What does that mean? Is that none of the same instances will be placed on the same host? So what we can see here is that the database Will be placed on host 2 and the database will be placed on host 3 So if one of those hosts goes down the application will still be up and running and they're kind of disaster recovery That becomes another issue on how to bring those applications back to life So in order to do that we've got a number of placement strategies. So anti-affinity is one example again without giving Scheduler hints we would never know if Two VMs that actually back up in each other and not end up resigning of on the on the same host and this is this is the first policy for a performance To to assure performance there are other placement policy we want to take into consideration One of them is network proximity. So we need to make sure all application VMs are Close to each other on the network as Gary said before placement currently is best effort and The Nova placement takes only Nova consideration doesn't really it takes a networking consideration and we need to combine those To consideration to provide network proximity Another aspect is host capability, you know The cloud may have different types of hosts some of them stronger some of them SSDs some of them are better connected to the internet some of them are closer to the storage and we Want to take those aspect into consideration When we are doing the placement, maybe the database a VM should be in close proximity to the storage because it's a heavy heavy IO Requirement and we need to take those consideration into account security. This is another big item Here we are starting to cover what we call the Resource isolation or exclusivity So you may want That your VM will never share a host with a VM for from another tenant. You don't trust the hypervisor to prevent other VMs to Sniff into your memory and read your data after it was decrypted. Okay Maybe many reasons for you to insist of having exclusive resources it can be Compute resources, of course, it can be network resources and others. Yes, of course that that would be the scheduler responsibility to do a global a global consideration and the placement request may be failed and you can Reply by say I cannot I cannot place this request. I don't have enough space There is conflict between the requirement and the customer will decide if he wants to relax some of the constraint or He needs to call up the administrator to solve the problem even today. You may you may submit a Placement request that may not be able to be admitted because there is no space Okay, so it's the same the same case. Yeah Depends what the time now actually What the time, okay, yes Yes, it's like 11 that 13. Yes I'm not exactly sure if I understand the question but basically you want them to be on two separate led to domains Yes, yeah, they make use of a protocol called the VRRP Basically, it's a protocol which was Written by Cisco Mike But VRRP works with different layer three domains But basically the the question was for in load balancing how to do failover between different layer three domains Kind of the the basically find a stand correctly and I may be a bit rusty is that Basically over here for the load balancer people are turning to a IP address and that IP address is kind of note Is is Broadcast across the system and who's ever publishing that address is the owner of that IP address So basically say load balancer one will have the it will be the primary device And that's the one which is kind of advertising itself that I'm up kind of everybody turned to me and load balancer one will publish a Mac address for for its IP address The fact that load balancing one is up It's basically sending kind of notification messages to the network which anybody who's monitoring a VRF is kind of listening to that So for example Load balancer to it's listening to that multicast address, and it will see that load balances one is up That message is sent every few seconds So when load balancer one stops receiving those messages It will publish the the virtual IP address that it's the owner and as far as kind of my understanding is that works on different layer three domains Layer two domain, sorry Okay Okay, so basically we've done a lot of the we've shown a lot of pictures and given a lot of Explanations now it actually like to show what exactly is implemented in OpenStack to try and describe also, what's missing and kind of I'd like to say huge strides that we're taking to to make it a Lot better, but kind of following a few discussions from yesterday We could say a few baby steps which are which are being made to to progress things so in Originally in grizzly we added an anti affinity hint Which kind of was extended a little bit in Havana, which basically when somebody boots an instance You're able to pass a scheduling hint which says kind of you're associating that specific instance with a group What does that mean if you deploy another group with the same another instance with the same group? Then an anti affinity scheduling decision will take place basically that means That the scheduling placement will decide on two different hosts for the placement at the Havana Summit in Portland in April kind of we extended that kind of originally Gilad and I we had a proposal for something Which was called Vienna VM ensembles and that evolved to something called instance groups. We're basically today Sadly, we never managed to make it into the Havana cycle with the instance groups But we literally kind of towards the end of the finish line and hopefully we'll be able to get it in soon So I'll describe basically what an instance group is For example, somebody's able to define an instance group and each instant group has a policy at the moment The supported policies are anti affinity additional policies that we'd like to add in the future our network proximity host capabilities For example In addition to this each instance group will be able to display which members are currently part of the instance group In addition to that somebody could append kind of some key value pair data that We don't want to have strict configuration and to be able to flow with data. That's kind of extendable and usable kind of within the scheduler and As I mentioned kind of sadly that's never really made it into Havana, but Yesterday it seems like we've got some kind of approval to get it into ice house and hopefully we'll be able to get this in pretty soon just an example of No horizons not updated at the moment because first of all the instance groups never got in Yeah, maybe in ice house, so I hope kind of crossing my fingers As an example kind of we've displayed the The anti affinity now kind of I'd like to give an example of network proximity Say for example somebody's using blade servers in different wrecks kind of the initial naive placement in The scheduler could be that the whole application that we've got on the left-hand side over here Could be kind of placed randomly across wrecks and across blades within the wrecks So basically that could mean that the performance of the application isn't great Kind of if your database isn't sitting close to the web server Then the access to the web pages could take a long time and kind of the user experience could not as be as good as it Would be expect so kind of using network proximity. We'll be able to place Kind of using also anti affinity to the applications kind of as close to one another as possible on the same blades within the same rack So that will provide kind of as shown before kind of a highly available and a very optimal performance Setup, okay in addition to that kind of as Gillard mentioned We'd like to take things into account or the host capabilities for example IO intensive applications CPU intensive applications and also network intensive applications and to basically reflect these ideas kind of them We're a number of proposals at the at the recent summit Kind of the one was a smart resource placement by yati and debu yati sitting in the back over there So basically he's here to to do that and he's kind of got a nice scheduling patch Which enables one to to do things like that in addition to that there's a guide dug from from Intel who his Kind of wants to extend host capabilities to enable the scheduling to take that into account What one of the issues which also has come up a number of times is storage proximity Essentially what we'd like to do is have the volume kind of that the cinder volume run as close as possible to the actual Instance that's running because the the access to the storage will be optimized and and a lot better This has been addressed kind of in two places at the moment Kind of one is a great scheduling across services, which that was sadly a topic which wasn't really addressed at the summit It was brought up discussed a lot and kind of essentially I think everybody came to the conclusion that Kind of we're not ready for that yet and hopefully next in six months time We'll be able to discuss it but I think kind of my gut feeling is when we get to Europe will still be discussing the same issue and Then back also kind of yati and debu's proposal with the smart resource placement also takes that into account so resources isolation you probably familiar with the Amazon web services a Virtual private cloud service. It's basically a cloud service that dedicate resources specifically pertinent And we we thought we can we can provide a similar type of service without actually up-front dedicated Given capacity pertinent it can be done on a per server request and we we call this a resource exclusivity and this can also be a Nova a nova placement Hint that all apply all a VMs belongs to a certain application or to a certain tenant should not share Resources it can be storage resources. It can be compute resources. It can be Network resource with any other tenant. Okay, instead of actually dedicating a physical infrastructure for For tenants like VPC. We can do that on a shared Resource Cloud Service provisioning but still be able to allocate the specific cost per pertinent. Okay, and actually there is some activity around that Basically, there was one design session that was spent on the on the issue That's to basically enable private clouds or which HP refers to as a P clouds That's something been driven by full day and Andrew Lasky Basically, the idea is to take the notion of host aggregates Which today are configurable by an admin user and to expand that so that a user can also create some kind of aggregate where Where kind of host allocation is something which can be controlled by by the user What does that mean is basically a user can deploy kind of a cloud within a cloud where they'll have their own dedicated hosts and kind of as Gillard mentioned kind of they can have a Security and performance and all kinds of enhancements which aren't really guaranteed in the public cloud today A few additional scheduling topics that came up and which were dealt with One there's been an effort by the guys from Marantus and led by Boris to improve the the scheduling performance That's basically kind of to try and Lower the amount of interactions with the database and to try and cash information locally on the hosts in addition to that kind of the Allocation and the management of the host statistics Is also something being discussed that was by the guys from? from Intel Also kind of one of the contentious issues is kind of how's all of the scheduling data gathered and accessed Kind of one is somebody can read it directly from the from the hypervisor and then to there's this This monitoring and metering feature in open stack called salameter So which kind of how we're going to gather these statistics for for the schedule and how we're going to to use this That's something that's been dealt with by Paul Murray from from HP And in addition to this there was a feature added by a few guys from eight from IBM in the last cycle Which also sadly didn't make it that's for somebody to make use of multiple schedulers Kind of today the scheduling in open stacks global you define in a configuration file Which scheduling filters you want to run and basically they proposed kind of you can have scheduling policies for certain applications You can have certain schedulers which are running so for certain applications You may want to run just say RAM filter and for others you want to run kind of a say anti affinity filters So they enable one kind of to dynamically Decide which filters are going to be used Basically in in ice house what we what we wanted to do and what was discussed yesterday was to expand on the work for for instance groups where this was a collaboration done with Debo and yati from Cisco and Mike from IBM who's who's sitting here and Basically idea was to have kind of a tree which we could schedule at once and to define how this would be managed and and accessed in open stack the the problem was that the discussion seemed to To Focus around the fact that heat is responsible Maybe for the orchestration and the management and the general feeling was that this is something at the moment Which is too complex for the Nova scheduler So kind of we've gone back to the drawing board and we'll have to rethink a few of these ideas and Basically the consensus was that we should go back to the initial instance group support and hopefully build build upon that So I'm just going to describe a little bit about the instance groups that people can get a bit more of an idea and Then basically how that can be used and hopefully within ice house in the first Milestone will be able to to get this through So essentially each instance group will have a unique UUID and that will be what will be used to to reference the The instance group throughout all of the interactions with it. It will also have a human readable name The the name kind of in open stack the names aren't unique kind of the uniqueness comes through the UUID Each tenant will be able to own their own instance group and As mentioned before each instance group can have its set of policies for example the anti affinity network proximity or host capabilities And at the moment kind of each instance group one will be able to read the the members that are in within the instance group One of the changes that we wanted to make was for somebody to be able to configure the members and the properties So that's something which That's pending for the next version So at the moment, I'll describe the flow for the anti affinity filter Basically a user will create an instance group that instance group will be assigned a policy for for anti affinity When somebody does the the scheduling somebody will do Nova boot and they'll pass the the instance group UUID as a scheduling hints to to Nova Nova will kind of select the host which doesn't have any of the instances which on the group that that are running and that basically Once the the instance has been selected the member will be added to the instance group so that that can be used for for future reference and Kind of as I wrote here. Basically, there's a pending support for for group of groups, which sadly yesterday was Was nacked so we should have deleted that bullet item And essentially group members will be removed when an instance is is deleted and So in the debate of whether a cloud should be Application ready or application should be cloud ready. We actually believe both, you know and most of the Workload that currently run in cloud environment are basically application that were written specifically to run on cloud And we see a lot of opportunity helping Existing application mission critical application to the cloud and to enable that the cloud need need to be application ready And it doesn't really means too much, you know, just some key features in the placement and in Anti-affinity proximity resource exclusivity and what's important also here. It's the aspect of Differentiated service. It's not that every application need every feature Okay, but the same cloud infrastructure Instead of providing the same service to all application the same best step for service for all application may require to Provide different services to different application, but still within the same cloud Service delivery environment. Okay, you don't need to create different Sales for each service level you can combine you can Deliver differentiated services on the same cloud infrastructure by for example implementing differentiated scheduling policies, okay, and This is basically The message here. I believe best effort. It was a very good starting point And it does address Many type of workload, but the next step would be Starting to implement specific feature to make open stack Enterprise ready and this is exactly what we are working on. Actually, you are welcome to see a demo of a Lot balancing failover in open stack in in the Radware boot It works and you can see it in front of your eyes and is zero zero recovery time and And You are welcome to start using anti affinity. It's part of Havana even though it's not accessible through a rise on but it Does have a full accessibility through the API Thank you. Thank you Any questions? Yeah, I wanted to say that the networking guys are raising the level of abstraction of the networking APIs to be more like what we're talking about here and You raise another interesting dimension which is talking about specifically about prioritizing between applications Which is a useful additional dimension to add both here and in the networking. Okay, any more questions? I wonder You made a statement at the very end that you said that the first attempt was pretty good, right? It gets you you know gets you going but obviously there's a lot as you outline a lot of additional needs and requirements I wonder how much of this you think There's almost like an 80-20 rule, you know, which of the stuff are the things that you discussed today? And I'm not looking for an answer, but I think we need to think about it as a group What are the priorities? Which ones are most critical now? And do we iterate through these trying instead of trying to solve all them in one shot and waiting three or four cycles? Because if you solve a couple of the problems that you have We'll make a big dent on the the overall challenges. So so basically the order that we choose We believe this is also the priority Starting with availability and then to performance and only in the end for security No, because because I believe the hypervisor provides sufficient security to start with And this is basically the road map that we are pursuing It's not certainly not going to be all at once certainly not in the open stack pace. Okay, and We are taking incremental steps in in providing the framework that would support all of these scenarios So I think what is more important is the framework that support various kinds of policies and as you think each of those scenarios They mentioned translates to some policy ultimately what what has to be implemented and supported in the in Scheduling, okay, I think we're running out of time. Yeah, okay. Thank you