 So, actually a lot of the material that Deepak went over in his talk, there will be a little bit of repetition, but it's sort of a more zoom in sort of thing, more focus on part of the things that he talked about. So I'll go slowly over some of the things and I'll focus only on this part which is autonomic management of resources and clouds. So what I'll try to do is basically just go over the scenario again, I think if people are new to something it's always speaking as a professor I think it's always good to repeat things. So we'll go over the scenario again with some problems and the challenges and such opportunities. I will just do a little bit of review of academic work, there is so much work that it's an exhaustive survey kind of thing in 25 minutes, 30 minutes was not possible. I thought I'll just give you a little bit of glimpse and I'll just conclude with what I think right now I just have a lot of questions in mind. We don't really have, I don't have solutions, there is a lot of work but I don't know whether all the problems are solved. What I thought was a good opportunity since there were industry people here was to just pose allowed some questions that keep coming to my mind when me and my colleagues here are engaged in some research work as to is this a correct assumption, is this what the industry really wants, is this something that we can assume we have. So I'll conclude with those questions. So what is the cloud? We actually learnt a lot about it already in the last half hour, I'll just repeat some of it. So it's basically if this allow me to represent the cloud provider by the cloud with basically lot of hardware resources, lot of machines and servers there, that's the cloud provider, it's the entity that owns all of the infrastructure. Then you have cloud users or software as a service providers which are basically for the moment let's just call them websites. There are small websites that don't want to own their own infrastructure, they can basically get that infrastructure from the cloud and then there are end users. So this is showing that the color ellipse of each color belongs to the particular website. So this is just making the point that each website owner could be getting resources from any physical machine, any physical entity and for example sports.com owners will not know exactly where the resources are coming from, they're just basically buying this thing from this cloud and so they can be anywhere. So that's the basic idea and then you have the end users which are the actual users of the website, there are people who are surfing the internet and actually clicking on various links which are going to end up getting serviced in the cloud. So these are the three entities or actors in the cloud and if you want to figure out as the question as to what is it that each actor wants, all of this is a setup so that we understand what are the real technical problems. So actually everybody wants a lot of things, for this talk we will focus only on performance related stuff because the spectrum of wants and needs is infinite. So this is just about performance. So from the point of view of performance as a user of a website, you really want low response time. You want to click on the link and instantly get a page, that's what you want. What is the cloud user? These entities are called the cloud user, they're the cloud user of the website providers. So what do the cloud users want? They want to keep their end users happy. They want lots of users, they want lots of users, lot of usage, lot of throughput, they want lot of clicking of links and buying of stuff and all of the stuff going on. And they want they're buying something from this provider so they want minimal expense. What would the provider want? The provider would want to keep their customers happy, their customers are the cloud users and they want minimal operating cost of course because they're getting some revenue from the user, they want their cost low so that revenue minus cost equal to profit is high. So that's what they want. This is what I think, I don't own either a cloud or a website or whatever but I think this is pretty obvious is what how the economics and the performance would be related. So how does this all, what happens is that since the infrastructure and all the technical stuff is all here, this as you can see is like content and stuff and maybe some business logic but all the technical stuff is here. What is the cloud provider's task? It becomes pretty complex, they want to keep the cloud users happy. This bubble basically becomes a part of the cloud bubble. It becomes a sum of all of these three things. The cloud provider has to make sure that whatever is running here ultimately the response times that these users see are low, they need to support high throughput for each of these websites. Sports.com was worried only about itself but the cloud has to make sure all of their users have good throughput. They want to, they should make sure they're not over provisioning resources for the cloud users because the whole model as Deepak said is pay as you go. So if I just sell more, that's yeah, I might have some local greediness being satisfied but my value will not be perceived as high if I'm overselling resources to the cloud users. So I must also help the cloud user to minimize their expense and of course my cloud operating cost should be minimized. So where is all this coming to? So now if you look again, this is what the cloud provider needs to do. Coming back to the cloud user again, the middle entity. Coming back to that, I want to minimize my, I'm paying as I go, right? I'm paying whatever I'm using. So I want to minimize what I use, right? If I'm paying for usage, then I don't want to use more than I need, okay? So that is what I want and at the same time, I don't want to make my customers unhappy which are the end users. I need to keep the response time low. So if you look at this graph, so this graph, this was supposed to be the X axis and this is the response time of an application that we were measuring. It's an application in our test bed in this lab on the fourth floor here. So this is actually generated on a Xen virtual machine, which allows you to on Xen Xenized machine, which allows you to give a fraction of the CPU to the virtual machine that you're creating on the physical machine. So this axis here is increasing CPU slice being given to a virtual machine. This axis here is increasing response time. And the various graphs belong to various different load levels being generated on the application that was running on the virtual machine. So you got the setup, there is a virtual machine, there is some kind of a web-based application running on the virtual machine. We are generating some request rate on that web server and this is the changing request rate and for a given request rate, we vary the fraction of the CPU that is allocated to the virtual machine. So if you take one graph, it corresponds to one particular request rate and as you can imagine as the CPU cap is decreased, the response time increases. This is increasing response time. If the load level is high, that increase is sharper. So again a CPU share given to the machine decreases, response time will increase. So as the resource given to an application decreases, response times that are seen by the end user will increase. The point that we are making here is that there will be a desired level here at which we want to operate. Either there will be a requirement that comes from some marketing folks or something which says your application response time should be 200 milliseconds. And then that tells you that at load level, five requests per second, if I want response time to be 200, I should get 70% of the CPU. Or at load level 3, so this will change. If the load level is 3, I can make do with the lesser amount of CPU to get the same response time level. At 50% itself, I get the same response time. So this is all very related, dependent on the load, dependent on how much resource you have and these are the relationships. So the idea is that I should give only what is needed to meet the certain performance. There is also a notion that if I don't have a specific target, I may want to optimize my performance and the optimal point in performance domain is called the knee. Again, if you imagine that this graph was like this, then these points would look like a knee. So this is called the knee and it is kind of before the sharp increase in response time happens. So we could also say that I just want to hang out near the knee. Whatever happens, give me enough resources so that I am below the knee of the curve. Now what happens to the cloud provider is that, so this again becomes the provider's task. One is to figure out each cloud user's resource requirement. This was one graph or one application for one web server, but the task becomes that there are these multiple sets of applications and I need to figure out running on the virtual machines and how much, what resource do I give to these applications. So that becomes again the cloud provider's task. And then to allocate the resources in such a way that the operating costs of the cloud is reduced. So for this point actually I had a really great lead-in slide in the previous talk where there was a graph where the actual operating costs of the cloud of the machines started really exceeding. So this has become very, very important. It is not enough to say, well, I have all of these servers. So I might as well allocate the resources. What is it? If I have the resources, I will just use all of the servers. What is the big deal? I have already invested the capital to buy them. So that logic does not work anymore. You want to actually optimize on which servers are operating which are off and you want to minimize this set here of the number of operating servers. So then the challenge, again there was some generic graphs in the previous slide. I will show you some graphs from a paper that studied these graphs in a real scenario. This is a paper that is published in ITC 2003. This shows typical workload patterns of, it is a study of workload patterns of 22 websites. And it looks exactly like the generic graph that was shown in the previous thing which is, you know, the workload goes up and down. So the challenge is that it is all fine. I can figure out the resources but it keeps changing. And we know that that is not something that is easy. So this is just to remind you about the previous graph that at the point the correct optimal resource allocation was different for different loads. If the load was 5 requests per second, then I wanted 60 percent CPU. If the load is 3 requests per second, 40 percent CPU is enough. So this keeps changing and resource requirement varies significantly. So what is the option? Again we went over this previously. One very naive option would be to provision for the peak and that is clearly wasteful. It won't work out in terms of being the business profitable. All of your servers will be on all the time and you will see those operating costs going up. So really what you want to do is you do this dynamically, be able to allocate resources and power down dynamically so that you get some and consolidate. So earlier as you saw in this picture, these were all over the place. Now if I have sized the resources correctly, I can consolidate them to fewer number of servers and turn off some of the machines. So that is what really I need to do. I need to get what is in old networking lingo was called statistical multiplexing gain and that is very important now in the cloud environment also. I should be able to operate the server. There is another knob that I have with modern servers which is various power modes. I should be able to do that also. So question is how we have all the technologies, we have dynamic resource slicing like in all of the virtual machines you can do that. You can without booting up or down, you can change the allocation given to a virtual machine. There is something called live migration which is again without powering down the server or powering it up. You can actually migrate. So you can move this machine from here to here, power this up all of it remotely. All of these technologies are there. What we need is algorithms, we need algorithms and mechanisms and the intelligence to figure these things out autonomically. We need to find that 60 percent or 40 percent or whatever the magic fraction of resources so that the end user performance is met while making sure that if what was needed was 40 percent, I am not giving 80 percent. This is the fundamental optimization problem. We also then once the resource requirement is figured out, we need to be able to allocate the resources, do what is a sort of a complex bin packing problem. It is not directly a bin packing problem but we need to do some version of the or a manifestation of the bin packing problem so that we can minimize, we can consolidate the resources and turn servers off. So I will just go over few of the approaches that we have seen in the literature. I would not go over, there are some papers doing game theoretic also, there is control theoretic reinforcement, learning base, some heuristic. I will talk mainly about just couple of slides each on these and one on this. Game theoretic is something I am still studying so I am not talking about that today. So what is control theoretic? Basically control theory, feedback control theory is a very classic classical theory from electrical engineering disciplines and chemical and so on from the traditional engineering disciplines which is represented with the picture like this. There is a target system which we want to control. What do we want to control? We want to control some output, some quantitative output of the system and these are some which is passed through some filters and stuff like that so that the output becomes more easily measurable. So there is some measured output, there is a system which we then there is a controller which is basically setting the values of something called a control input. In the traditional language it is called a control input. It is as simple as air conditioning, temperatures are controlled and set at a certain point and there is a control system inside the air conditioning unit which is maintaining the temperature at 25 degree C and there is some norm, some thermostat or something which is turning on and off and you need to control that. So in our reference input is something that is to be directly compared with the measured output. You want to maintain the measured output at some level and what that level is given by the reference input and this block represents an operation which gives you the error. So the difference between whatever you are measuring and whatever it is supposed to be determines the value of the control variable. So in our scenario here the target performance measure will be like the thing that we want to maintain at a certain level, for example response time at a certain level. Therefore we will also measure the response time, we want to maintain the response time at a certain level so we will measure the response time and compare the difference. For example if it is a, and the target, typical target system is going to be a virtual machine running an application, the norm that we have, the actual controlling variable that we have is the CPU fraction value. So a simple intuitive thing is of course if the response time, the measured response time is much larger than what you want it to be, you should increase the fraction of CPU being given to the thing. So that is actually the basic logic that exists in a controller. The whole question is that control system theory tries to answer is how much should I increase it by so that you do not get into stuff like oscillations. What would happen? This is very real by the way, if I just go into a data center and just write some random program which increases the allocation when the response time is too high and decreases the allocation when the response time is low, you can very very easily get into a situation where every time interval when I am making the changes, the response time, the either the resource allocation is too much, this is not what the user wanted to pay for and now then you say oh this is too much resource allocation, the response time has gone too much below what the customer actually wants. Next interval you are reducing the CPU too much and now again response time shot up. So you can get very easily into oscillations of up and down and up and down. It is like the AC also right, suddenly it could become hot, suddenly it could become cold and yes if you say that the average should be 25 degree C, but if it is varying between 0 and 50 and therefore the average is 25, that is basically the system is not working. That is not what you want, so that is the whole theory of control systems and so what you have is this, the next slice value is based on some some constant multiplied by the error of what we really error of the difference between the measures and the big deal is finding this constant and this constant, I am just sort of compressing a whole course into one slide, this constant is based on something called a system model which relates the next step, the response time measured in the next interval to the response time in the previous interval and this CPU slice value. So this is a big sort of big deal how to make this model and how to translate this model and derive a constant from that. So that is where all the hard of the control theory is. So I will just give you a glimpse of the work control theory page work that we have done in IIT. We have a CPU share controller, the CPU slice controller which is based on feedback control theory and this is joint work with Puru and Pugh and Sujisha here and we have little controllers that based on the response time determine the share for each of the machines and we also have a migration manager unit which then based on the resources allocated determined for each of the virtual machines gives migration commands. I will not go into details at all of how it is done, I just want to show you a graph just a proof of concept that it can work, it has potential to work. So this is actually the graph of CPU allocations when two virtual machines are there and two physical machines each VM deployed on one physical machine and it gets its resources, the load coming to it is in a form of a sine wave, it gets whatever resources is what it is actually represents an over provisioning scenario. This one two virtual machines are on one physical machine so they hit a bottleneck quickly and in this case there is some migration going on. So initially one virtual machine is one physical machine is off, one virtual machines load increases and when it reaches crosses a certain threshold. This particular allocation is determined by the control theoretic controller and once it reaches a threshold another machine starts and the cycle continues. So this is a migration based on resource allocation suggestions made by the control theoretic controller and I just want to show you that the under the over provisioning and the migration managed scenario shows the same performance whereas the under provisioned has higher response time, these are the response time graphs. So it just to show the proof of concept that with a little bit of intelligent management things can work. The problem here is that you have you need this system model, you need to do some offline profiling to get those constants so that you can, you can the model can work and you can you know you can figure out how much to increase or decrease your allocations by. There are real theoretical problems like non-linear relationships, the fact that whatever model you have derived works only in a very specific load region and these are really difficult problems and it is not clear how, once you start solving them then you get into theoretical you know mazes of very very difficult things and again then the big question comes as to how does this scale really to large scale infrastructure. The next one is paper that is one paper that has been published recently which is based on reinforcement learning. The basic concept of reinforcement learning is that there is an agent which learns the value of any state that it is in in terms of what is a what is the potential of this state taking me to the real really good state of affairs. So the state here again is VM deployment virtual machine allocations, resource allocations. So the RL agent does actions and what it gets back from the system is a measurement of the goodness of the action. So it could get a feedback like I increase the CPU resource allocation, did the response time increase or did some overload start to happen on the physical machine, did something bad happen, did something good happen and it keeps learning these things. It is fully learning based and it works on two fundamental concepts of exploration and exploitation. Exploration means trying different things is very natural, this is how humans work. Trying different things that you have not tried before and learning what it is and exploitation is using the knowledge base that you have created and trying those actions. So for example, the reward could be the throughput that you are getting minus penalty which you are slapped on because you are not meeting the response time. This is a model from a recent paper. Again there are problems here. Exploration in the field during live operation can be very costly. You try something and in the live data center and the response times are bad or you did some configuration by which the virtual machine crashed or something happened, it can be pretty bad. Environment changes continuously. If you just learn something, it takes a long time to converge to a knowledge base that is usable to a probabilistic model that is usable and it changes. So this is a big problem. The set of possible action is huge and overall learning can take a long time. And then again the theoretical techniques tell you to depend to offline models. So you need to train the agent using offline data and so on and you are back to square one that okay, I need to have a system test lab in which I need to profile the applications. So again how do you scale this? So what we really need here is data center automation solution, autonomic computing solution that actually exploit. They make use of the fact that there is this changing workload to realize statistical multiplexing gains. You need to account for VA migration cost. You can't do something like load has gone down. I migrate this VM somewhere else. Not knowing that the next second the load is going to go back up again. Again there will be oscillation problem. You just decided to migrate something and now you are going to, if this is an autonomic thing now you autonomically migrate it. So you have to be smarter than a human who will not do this, who will migrate a machine but then know that okay, I just migrated the machine. It's not a good idea to or he will not migrate the machine remembering that okay last time I migrated the machine the load went up you know in the next second. So there are migration cost, VM interference issues, avoiding oscillations, all of this while meeting application performance targets. So based on this I would say that for the research community some questions emerge which we can discuss during the panel discussion time as to what inputs are realistic. Can we get lot of papers assume being fed the knowledge of workload patterns? We assume that we should be will get response time measurements. A lot of the work assumes that actual costs of these things will be known or is it really going to be easy to size these costs? What kind of assumptions are okay, switching servers on and off? How many times can you really do that? Does that harm the servers? Offline profiling of VMs running every paper eventually will have this little section saying that we need offline profiling. What kind of economic models are realistic? Finally is autonomic really required or human administrators just fine and all we need to have is good graphical user interfaces. So we will discuss that in the panel. The graph that you showed, the CPU was acting beyond 100%. That is just a convention that if there are four CPUs it goes to 40%, two CPUs it goes to 40%. Yeah, that is a non-CPU machine. Yes, of course, people work load and do the profiling of each load load. Allotted is constant, appears to be already theoretical. Because if you are going to be a cloud going to allow many people to come and host the application, you would not have luxury to go and profilage application. So practical thing, I come from industry, obviously I don't look at. Practical thing would be let it probably go and overuse the capacity so long as not completely out of bounds in terms of economy. Rather to make it so optimal that you go into oscillations. But that's only a submission I have to say. The existing algorithms appear to be looking only at the load which is appearing at the server. Has there been any attempt to model the environment? Because that's essentially what causes the load. Like for instance, if he is modeling a banking application or whatever, the application, the client, if you like, can essentially give you queues which will help you model the system better. Has any work like this been done? They've predicted the workload. The client themselves saying that this is how my workload will evolve or will change. The client has a problem there. That the client is not one. But what the server sees is some total of 10,000, 20,000, 50,000 clients. It's very difficult. But if it's a big client, like say a stock exchange. They can tell you that this is how my load will model.