 Hello, everyone. Hello. Thanks for attending this session. So my name is Pramod Bhandiwad. And I have my colleagues, Aurob Gupta. So today, we'll talk about dynamic over-provisioning. Before we start, I just wanted to tell you where we come from. So we work for Tata Communications. Tata Communications operates open-stack-based clouds in nine locations across the globe. Primarily, our customers come from APEC and Europe. And most of the time, our target customers are enterprises who are just moving on to the cloud. So either they come from a dedicated setup, and then they want to move their workloads onto cloud. Or these guys are already existing VMware customers who would want to migrate to open-stack because of the self-service capabilities and the other wonderful features which open-stack provides. So when we migrate these customers, they have a different set of expectations. They usually expect some enterprise-grade features which are already available with VMware. And also, they would have done some things in terms of clustering and all on a dedicated setup, which they kind of expect those features to be available from the platform. And these apps are not cloud-native by nature. So that's why we make a conscious effort to give them those minimum set of enterprise features which will keep their workloads available, and then which will meet the peak demands and the varying patterns of their workloads. So last time around, we did a talk at Austin Summit, which was to do with instance HA. We showed what had to be done when a compute node goes down. This is in continuation with that project where we have seen issues in our deployments where we couldn't meet some peak demands. We couldn't onboard new tenants because we didn't have enough capacity, although the utilization showed that the resources were unused. There were still idle resources available, but we couldn't onboard new customers. We saw that there was fragmented systems lying around. And then setting these overprovisioning ratios didn't help because of the way the hardware resources are being allocated to open stack is pretty static. Although there is a horizontal scaling available, but it has its own cost associated. So we'll tell you what we've done to overcome this problem, how we have tried to get to a state where we have utilized our systems optimally without impacting performance. But a word of caution here is this may not be a very generic solution, may not be suitable for all kinds of workloads out there. But that's how it goes. We try to solve a specific problem to a specific set of customers and then live with it. So today to get started, we'll talk about, so this is how the flow goes. We'll talk about provisioning resources in open stack, how the resources are being currently provisioned. We'll just quickly go through what are the challenges with the current approaches. And then what are the different kinds of compute patterns we have seen and which are generally diverse set of patterns which the workloads are characterized as and then why we need to change to a static overprovisioning, from static overprovisioning to a dynamic performance-based provisioning. So at the end of the presentation, we'll see that we're not actually overprovisioning, but we are right sizing and we are optimally using the resources there. And then what are the considerations for dynamic provisioning? We'll see the different resources, VMs, and the groups of resources which we need to consider, what actions, what monitoring we have to do, and then what we have done out of that for those considerations. And then we'll talk about a design architecture, and then we'll see what changes we've done. We've tried not to make any disruptive changes. We've used the already available features which were there with hypervisors, and then we've just connected the dots together and then built the solution. And then we'll show you a quick demo on a couple of the features which we have done, then the feature development key takeaways, and then the references. So the current provisioning, how do you overprovision in OpenStack right now is you go to nomad.conf and you set the CP allocation ratios. These are the default set of values, and then this assumes that over the, so if you set this on a scheduler, all the compute nodes will have to adhere to this. And then if you want to change this, dynamically you'll have to restart services and whatnot, so this might not be suitable for everything. Again, so there are ways, they can just still go and set this in individual compute nodes where the compute services are running. And then any change to these values requires restarts. So to an instance you use flavors and then it assumes that over a VM's life cycle, the flavor size are enough. So generally what do we end up providing more resources than required? We go to the upper ceiling and then say you might require these resources. That results in unreutilized systems. And then we have quotas, so we have quotas for tenants which when onboarding a tenant we set those fixed quotas and then we assume that they're not going to change for the VMs for those tenants over its lifetime. So this is again a static setting. So the challenges with these techniques, workload patterns vary, the flavors, what we associate with the VM are very inflexible. We don't have a scope to resize the flavors without disrupting the workloads. And then as usual the performance requirements drastically vary depending on the time of usage, depending on the SLA and its performance tier. And then there's no easy way to predict the workload patterns. There are softwares out there which will give you an indication of how the workloads might behave, but there's nothing available which you can readily use. And then we have unreutilization of resources. We end up because of these static settings and because the current set of provisioning techniques we have generally seen that 30% of the resources are idle. They're not, although the allocations based over-processing will say that most of the resources have been allocated and used, but if you measure the actual performance of the resources we see that generally the bulk of the figure is around 30% of the resources are idle. Because of this we cannot meet peak demands. So as I said before, if there's a tenant which you have to onboard, it doesn't let you onboard them. And then we've seen resource fragmentation over a period of time when you use these static flavors and then in a multi-tenanted environment you try to spin all those VMs. So in a shared manner we see that some CPU intensive workloads are hogging that system whereas RAM might be available or there might be sufficient IO available. And yeah, so performance impact due to ill-configured over-commitment ratios. This is a given given the current strategies we are going to see performance impact even when you increase the over-provisioning ratios to a crazy level there's no way to measure how it's impacting the workloads right now. And yeah, so as I said there are hypervisor capabilities especially with KVM ESX Hyper-V which are available which lets you do live migration, live resize in an automated manner but those things are not there with OpenStack yet. So these are the varying compute patterns we've seen, right? So job workloads, batch job workloads which generally run in during a low peak or during the usually during nights. So those VMs we see that the big inactivity period for those kind of workloads and then there are certain requirements where there's an unpredictable demand. For instance, when you have an application which you want to take it from pilot to production, there's a certain surge of resource requirements. So generally what we do is we take that peak value and then assign those resources assuming that those are going to be used throughout but that's not the case with certain workloads like this and then there's a predictable burst. The workloads which have a predictable pattern like HPC we know when the peaks are, when the troughs are and then in this case again the resource allocation becomes challenging because of the way the flavors and the quotas have to be managed in OpenStack, right? So what are the main drivers of change having now talked about the challenges? So these are the three main goals which I would say that why we need to consider dynamic provisioning. So one is to optimize cost. How do you optimize cost? You reclaim unused resources and then you consolidate workloads and then thereby reducing the need for adding more compute resources than what is required and then improve efficiency by right sizing and being dynamic. So we'll show you how we right size the VM, how we've changed the, we've added certain quotas to a flavor itself which will allow you to play around within a set of reservations for a VM and then being dynamic. So we have built a new API which will do a live resize of an instance. We'll show you how we have done that and improve utilization thereby. So we have added some new filters to the already existing scheduler filters which will allow you to schedule based on utilization rather than allocation and then we do a right placement. So these are all the various considerations when you are doing a performance based cloud provisioning. You have, you can do provisioning for a VM or you could do provisioning for individual VM resources like RAM, CPU and IO. Then you could scale either horizontally or vertically. Sometimes both will be required to have a stable and a stable system and then when do you take these measures? You could either be proactive. When I say proactive, you could have a modeling mechanism. You could have upload characterization then get inputs from those workloads and then adjust the resources accordingly, migrate or scale or do those. And then you could be reactive. You can do a performance measurement and then whenever you see a threshold being reached for a resource or whenever you see a swap in swap out for memory cases, then you could act, you could add more resources to that VM. Usually there's an hybrid approach where we, the ideal scenario is where we are being both proactive and reactive. And then the decision making is measurement based and algorithms. We could apply some algorithms to all the metrics we have collected and then take intelligent and informed decision on how we have to act when there's a resource crunch. And then the techniques are again monitoring, migration and resizing. So these are our design philosophies. We've, the goals for our solution was to have an optimal resource utilization with a minimal performance effect. And of course we've tried to keep it simple. We didn't want to make too many changes which will be difficult to maintain later. So this is how our architecture looks like. So on the left side is the controller and then we have the compute node. So we have a stats collector which will collect both the host and the VM metrics. And then which will again send it back to a aggregator, aggregator is a simple Python script which reads from this open stack bus and then puts it on to a stats DB. And then there's a resource scheduler build which reads data from this DB. What it does is it will keep a snapshot of instance usage and the host usage. The current snapshot of what are the current, what are the utilization parameters for an instance and for a host. And then so we have this policy engine and workload characterization. Workload characterization is still work in progress but we have a simple policy engine which governs the way you act which will tell you when to act, how to act and how to take countermeasures for any anomalies you have seen. So digging deeper, we see that, so we have a compute, we have a controller, we have a SNAP task. So for a performance measurement you could either use Cilometer or there are various other mechanisms out there. So we've used the SNAP which is the Intel Open Telemetry Framework. And then we gather Libvert and the compute host parameters and then the aggregator agent reads that, puts it on to the aggregator DB. Then there's a resource scheduler which is as I said keeps polling for data and then it acts on that, it keeps a snapshot of the current usage. Then the policies are applied and then we call these actions based on what we have seen based on the stat metrics and all. We either live migrate or we add, dynamically add vCPU or memory. So this is how the communication happens from Nova compute to Libvert and then we maintain the action history DB. So what we have changed? So we've added a new API to resize instance on the fly. So live resize is the command. So this live resize makes use of the hypervisor functionalities for dynamically increasing memory and the vCPUs. So KVM has this feature since, I think version 1.2 I believe. So it takes three arguments which is for the server we could either add vCPUs, memory disk is not yet implemented but we've implemented the vCPUs and memory. And then we have new scheduler filter to provision based on utilization. The current, there are no filters out there which will actually measure, which will actually make a scheduling decisions based on the performance metrics. So we've added this utilization filter which is based on allocation policy. This allocation policy does a provisioning based on the reservation set on the flavor. We'll show you how those reservations have to be set. So this is the extend reservation properties to flavors. So every flavor you have the default allocation, what we specify to a flavor becomes the upper bound whereas you can set a minimum reservation for every flavor. You can say that give me, this is a minimum reservation and the CPU reservation and the other parameters become the upper bound. So it gives us this delta to play around with whenever we see there's a need to add more resources. And then we've added horizon screens to show the actual utilization. We have, we can see those screens in our demo which will follow. Yep, so what hypervisor functionality we have exploited is memory ballooning. This was there, this has been there since long but we haven't, there were no code in the Novalip word driver to utilize this. So we've done, so set memory is the call which actually allocates the required memory to the VM. So this is provided, this VM where you have to dynamically allocate memory has a memory balloon driver. So by default, most of the images will have this balloon driver with them. And then we've used the set vCPUs flags function to dynamically allocate CPUs. Again, this, you see the current vCPU, the current equal to four, that's the, that comes from the flavor reservation, that's what gets allocated to the VM whereas the upper bound here is eight. So this required again changes to the Libre driver and then a means for, know what to call this API. Right, so other changes which were implemented. So we've integrated with Intel Snap for measurements. We have multiple tasks which collect the required data. We could dig into any kind of data, can go into any depth and then take an informed decision based on those tasks. And then we have, as I've shown on the data architecture we have built this aggregator agents and the resource scheduler agents. And then the horizon UI where we have added new pie charts which actually shows what was allocated and what is being utilized. So this is a typical action matrix, what we've seen. So based on the symptom, what we measure and what actions we've taken, we've take based on those. So if you see a high CPU usage in the VM, so the measurement matrix is VM CPU time and then the action we take is called at our live resize API to add the vCPUs. Same goes for memory, if there's a high memory usage, the measurement matrix is either it could be based on certain threshold of memory used or it could be from, you know, if you start seeing swap in swap out, that means it is the memory is being pressured, the pages are being swapped out. So it's time to add more memory again. So we use live resize to add more RAM through the balloon driver. And then you could see a high load on the VM and high load on the host together. So that's when you do a live migrate because you're not, the host is incapable of providing more resources to be added to the VM. So you simply live migrate to a host which is being underutilized at that time. And then high memory CPU usage on host. So you decide, you know, you live migrate bunch of VMs rather than just a single VM which will again balance out the workloads. This will result in, you know, cost savings in terms of power as well. And then let's say there's a cool down period, there's nothing happening, there's no workloads, there's nothing running on base. And then what we do is we set, again we go and set the allocations to the minimum reservation which we have set on the flavor. So now last Saurabh to just show you the demo and what we've talked about, we'll show you two demos, Saurabh. Thanks, Pramod. So as previously mentioned, we have added a new Nova shooter filter, name is the utilization filter and an allocation policy to the dynamic co-approvisioning. So now we're gonna see how are we going to use these techniques so that we can have dynamic co-approvisioning of VMs in order to have more VM density. So we'll start the demo now. So this is how our, you know, setup looks like. We have one controller node and two compute nodes and we have created an particular availability zone using one compute node. And we will be using this particular flavor for spinning the VMs. This flavor will use 32 GB of RAM, eight VCPU and 15 GB of disk. And this is how the allocation ratio in the Nova cons looks like. This is the default one. Now we'll go to the horizon. So this is how the horizon hypervisor assembly looks like. It currently shows we don't have any instances running and we have added the pie charts and the columns for the allocation and utilization of the RAM and the VCPUs. So now we'll go to the instance and try to spin some VMs. We'll be using the same availability zone. This is having one compute node that is node six and using the same flavor and we'll be spinning two instances here, right? And then launch, all right? So the VMs are up now. Now, so the same has been reflected in the hypervisor assembly. We can see here it says 64, approximately 64 GB of the RAM is being allocated out of 70 GB. However, the utilization only says two GB. Two GB is being utilized and we have two instances currently running. Now we'll go back and try to spin one more VM with the same attributes. It also need 32 GB of RAM, right? So it fails. It fails because this flavor needs 32 GB of RAM. However, we have only six, approximately six GB of RAM is left to be allocated. So that's the problem here which we have tried to solve. So it shows here, even the RAM utilize is only two GB but since the allocated is left is very less, now only six GB will not be able to do it. So to do that, we'll be enabling our utilization filter. For that, we'll go to the NOACONF and then change the utilization filter attribute. So here we are enabling the utilization filter at the respective lines. So this is added here in utilization filter at two lines. We'll come out of it and we'll do the restart of NOACONF service. So once it is restarted, we'll go back again to the horizon and then try to spin again VM, selecting the same attributes, same flavor of 32 GB and here we are trying to spin two more VMs, right? And then launch, right? So we are able to spin two more VMs because we have taken the utilization filter into consideration which will decide based on the utilized RAM. The same has been reflected here as well in the hypervisor summary. We can see here the RAM utilization is only 3.7 GB out of 70 GB and we are now having the four instances running. However, we have the allocated more than that. So this is the way we try to achieve the more VM density using this utilization filter. Okay, so this concludes our first demo which talks about the VM density. Now we'll consider a scenario when these type of instances are up and running in this environment and let's say the load or the resource requirement for the VM shoots up. It varies. Now how to handle the scenario? To handle the scenario, we'll have this demo. Okay, so for this demo, we are going to use this particular type of flavor here. This flavor is exactly same. Only thing is that we have added here the reservation. We have added the memory reservation to four GB and CPU reservation to four. The total RAM remains the same, 32 GB and the CPU remains the same again, eight. This is how the NOACONF looks like. We have the utilization filter set and also here we have allocation policy set to be dynamic. What this allocation policy will do that, it'll make sure the resource allocation will be based on the filter reservation. So this is our current setup. The horizon shows also that we have currently two instances up and running. Okay, so these are the two instances which is currently up and running and these are spent using the same flavor which I mentioned before. Now we'll just try to go inside the instances and see what is their stats. So we'll go to the corresponding node six which is hosting these two VMs. Yep, so these are the two instances which are currently up and running. So we have added these new columns which is the allocation and the utilization for RAM and VCPU. It says four GB RAM is allocated and 1.4 is being utilized. The same can be shown here in the dump XML. This is on the compute node. We can see here for this particular VM we have 32 GB of total RAM. However, the current says only four GB and similarly for the CPU also the current is four. Now what do we do? We'll log in to one of the instances and try to increase the memory utilization. We'll be using stress command to increase the memory utilization. So this will increase the memory utilization. As the memory utilization increases our scheduler agent sitting beside behind will keep monitoring it and once it reaches one threshold it'll take the necessary size action. So here we can see the free which is going down. The checking is using VM stat is suddenly going up. It means that the resize has happened and we have now more RAM available. Now we can again verify the same with the dump XML. If you will just check initially it was showing current as in four. Four GB now it's changed to six GB. So it has been resized using the agent. The same has been reflected in the hypervisor as well. So here the horizon also says that reflect the same thing. It is having the six GB now. Previously it was four and the utilization is 3.5 GB of RAM. Right so we're gonna perform the same exercise for the CPU as well. For the CPU also if we do the dump XML we'll see here the current is only four or where the total is having eight. We'll go inside one of the instance and check the other CPU output. This also says the same thing. We have the four. Four V CPUs currently allocated. Now here also we'll run the same thing. We'll try to use the stress to increase the CPU utilization. So here using stress, using this command we are increasing the CPU utilization. So as the CPU utilization again increases to one particular threshold the resource should be able to kick in and do the resize, library size. Right so even we can check the same thing in the computer log. We see the library size API has been kicked in and the VCPU has been changed to six. The LS CPU output has also been changed now it's showing six initially it was four right. So we'll just finally verify at the horizon. Here the VCP was initially four. Now it's being bumped up to six and the CPU utilization is close to 23.35 percentage. The same thing can be verified on the dump XML as well. Yes so if you do the dump XML initially it was showing current as in four now it has been changed to six. So this is the way library size is done. So this is the way we do the dynamic resize through using these introduced parameters that concludes our demo. Back to the next one. Just a few more slides which will quickly go through. So I'll just rush through here. This shows the density graph what we have seen when we have enabled utilization filters on the library size. So when you set the allocation policy to dynamic and provisioning based on utilization we are able to achieve much more density what we have seen and thereby we have reduced the number of underutilized systems. So this is our future development. We plan to work with the community to integrate our code. We'll plan to make that upstream. Then we'll see if we are trying to work with Intel to have a workflow engine and then we'll integrate probably with Congress or something to have a policy engine. Evaluate workload characterizations to do modeling, find in monitoring and then storage monitoring as well. So key takeaways, we don't need to set the static over provisioning ratios anymore. We've seen that using the performance metrics you can do right sizing and right allocation without impacting the performance of the VMs. Then we've seen how we have maximized the resource utilization non-intrusively. Previously, flavor resize would have, you would have had to shut down the VM automatically and then there's a disruption to the workloads. Now with this new API, you don't need to do that. You can do vertical scaling to a bit where you can play around from the minimum reservation to the maximum upper limit. Beyond that, you can always do a live migration and then horizontal scaling. So no great engineering required. It was all there. We just added code to stitch together things. Some things to note. So as I said before, there are some prerequisites. This might not fit all. You need to take certain things into considerations like the guest VMs need to have this chemo guest agent which will actually act when you issue this live resize command. Then there's a new typology. You cannot just add VCPUs when you have asked from a flavor which we want to spin based on specific new typology. So right now we don't do any live resize if you have enabled new typology. We'll have to go see how those things can be taken into consideration when we work with community to upstream this. Then there's the same things have to be considered for VCPU pinning and placements. Memory backed by huge pages. Again, we don't act if the memory guest VM memories are backed by huge pages right now. CPU tune, mem tune parameters exist in the worst dump XML then again, we don't do any live resizing. Then this might again impact your meeting billing because of the way the allocation changes and then it might impact your existing systems. Yup, so that's it. All right, thank you. Questions? Do you do any capacity planning and how does the live adjustments impact your capacity planning? Yeah, so that's one of the biggest advantages we have seen from this solution. It gives us a right picture on what exactly, what is the capacity being used right now rather than what is being allocated which might be for a future use. But then the reports we gather out of the metrics collected gives you a picture on the exact utilization. Again, it doesn't predict the capacity requirements in future but gives you a dashboard kind of a thing where you can see how what, at a fine-grained level, how much of the RAM, CPU and IOR being utilized and then you get the idea on what to add when there's a demand one, there's a peak demand. Okay, so as far as deterministic capacity planning and your level of elasticity going forward, are you able to predict the levels of elasticity you need for growth, say 20% on either side? What is your churn rate now that you know your utilization? What are you seeing as far as elasticity? So I think we've not gone, so this system has to run over a period of time for us to get those numbers. So we've been running this in production for a much lesser time. I don't think we have got to that point where we can deterministically say that this is going to be our elasticity going forward. But I think the data is there. We just need to put together things and show that number, sorry. Can they be created? Yeah, so you can use that NOVA library size API. So we have integrated that with the Python Nova client. You can use the Python Nova client APIs to tweak it. Is there a way to test this on GitHub or something? Yeah, so I forgot to add that link on the GitHub. It has, there's an internal GitHub, I'll post it. When the video goes online, I think it should be, when we give this presentation online, I think you should see that on the, I'll put that references chart with the GitHub link. You can pull that code and test it out. Less relevant? Yeah, so, right, so the inflexibility of the flavors led us to develop such a solution. I don't think they have become less relevant. They still give you the limits for the VM to be set, but again, there are certain limitations with the current solution where you cannot change the upper boundary in a live manner. That still requires a reboot of the VMs, so. It sounds like you're removing the onus on the application owner to pick the right flavor. Right. And you're just compensating for their lack of knowledge or experience by then resizing for them. So, do you ever consider just having one basic flavor, like a, whatever, just a medium flavor and then your process will move it anywhere that needs to be moved. I'm just wondering, just thinking in the future. Yeah, so we could get there, but I think there are still workloads which still, they ask for a minimum set of requirements. So how do you set that minimum, right? The upper bound, we can just grow beyond, but when a customer says, whatever you do, I still need my 8GB of RAM upfront. So we still need to honor those requests. So that's where I think flavors will stay, but the elasticity might increase. There might not be, we can still, the medium flavor could be an upper limit flavor where we can see that all these VMs can use up to these resources, which still could be unlimited as well. And you're doing utilization on their apps, and let's just say it's a WordPress app and it's all, it's across your whole enterprise. Do you, so you would still give that to them or you would course correct them to get a different flavor or you would just continue in the operation you're doing now? So this minimum is guaranteed reservation and we build for that. We are getting money for that, despite even when they're not using. So minimum is something which they ask for. We have to honor that. That's why the flavors will continue to have that, whereas we can give them reports and charts and say that we're not utilizing those and then if they say yes, based on some approval process, we can scale it down. So first of all, thanks a lot because it was very interesting. The question is, do you plan to combine this with auto-scaling capacities or original auto-scaling? And if so, how do you see all these automation happening at the same time? So you mean horizontal scaling? Yes, in heat, for example. Okay, so yeah, we can consider that. We haven't given that thought yet, so good point. So probably we can work on this and see how we'll interact, how we will integrate with heat or a Murano kind of orchestration. Can you please repeat that? When you over provision, let's say basically you hosted 20 VMs over the same host and their utilization is correlated so they all see the peak at the same time. Does your system break or how do you handle that type of correlated workload situations? Yeah, so we've built some basic correlation engine with our resource agent, which will... So if you've seen that architecture, we get the data, the resource agent gets the data from the NovaDB itself, which it knows that these 20, these set of VMs belong to the set of tenants and then they are governed by certain policies at a VM level, at a tenant level, and then at a broader global level. And then so based on these metrics, we can act as per the policies. I don't think we can make a decision on entirely based on metrics. It has to be governed by policies. Did I answer your question? Make it offline. Yeah, make it. Yep, thank you. Thank you, folks. Yeah, thanks.