 Hi, thank you for coming So my name is Steven Rosenberg. I work for Red Hat and the mission statement for this Presentation is the goal is to predict when processes will complete in order to start the migration of other processes early In order to use these soon-to-be-freed resources sooner when there is an added value to do so such as for performance improvement or reducing cost So when discussing a process migration I'm referring to the migration of virtual machines and the processes that run on them as a self-contained unit So the topics will cover load balancing full tolerance scheduling types of solutions live migration and predictive analysis so When you want to load balance you can see this is like a round Robin approach Where VM 1 will go on a host 1 VM 2 will go on a host 2 Process 3 will go on host 3 and then 4 will start again on 1 We want to prioritize however and add priority Based upon urgency, so we'll use even distribution within categories The categories will be urgent high priority neutral low priority. No importance. So what could go wrong? Well things can go wrong. So we have full tolerance Network elements can fail hardware resources can fail and there are other resources OS buyers kernel failures and process So one solution is redundancy So you can have two hosts. They're exactly the same. Well, one is active one is passive. You can have multiple Network connections, so if one fails the other one is still viable You have multiple storage items and a lot of different connections and what is it equal high cost? So the idea is to reduce the cost So schedule Dispatching concepts you have a process on a queue The queue can be a regular queue. It could be based upon time. It could be based upon Priority and it can be sorted in any kind of order and basically the idea is you want to be able to schedule Process 8 and so the round robin approach even distribution you started on host 2 and then 9 process 9 will start in host 3 But what if host 8 needs two slots? Well, then you could use smart loading concepts and try to move 6 to 2 host 2 so that you can fit Process 8 into 3 but then there's no space for 9 But if you know that Process 5 is only going to go down after a certain period of time You can start the migration early by predicting and move 8 to host 2 while the other process is Completing and you can move 9 to host 3 and you have a more predictive balance. So that's the concept so scheduling The ability to launch processes based upon needed resources Monitor the amount of resources each process utilizes The types of launching scenarios can be of course initialization as we covered migration for maintenance The hard drive needs to be upgraded or the operating system You need to first migrate the processes before you bring the computer down Are we balancing migration to another host to take advantage of? freed resources and fault recovery migrating after system Processes of failures. Well, hopefully if you can predict the failure you can migrate earlier Okay, so policy units attributes of scheduling we have filters such as where a You have you want to be able to match a process to multiple hosts So the first thing you do is to say if the process Requirements match the host if you need a mission critical and you need a high performance Network card than those with weak network cards will get filtered out then you score and wait those hosts that are left and The best score wins But you also consider how to balance you have even distribution which we discussed you have power saving Because you might want to save course So you control how much of the power saving you want Privatization which we covered affinity where some processes can work together so you can have positive affinity But other processes might have to be on separate hosts such as if they both use the same port so they would have negative affinity and pinning for optimal performance such as if a process works Best for certain hardware configurations. You pin them for just those hosts that have that configuration So types of solutions we have live migration. We have redundancy Live migration is more flexible. You have load balancing. You have fault recovering But you have to minimize the live migration pausing The negative part is the pausing it takes longer redundancy You have distribution of processes that are running simultaneously You have fault recovery of the pausing time is much less because the process is still running, but the cost is high So live migration, how does it work? Well, you have a source post that the process is running on and you have a destination host Our destination host and the first thing you have to do is to set up the network connectivity The IP address is everything has to work the same on the destination host look the same as it did on the source Remote disk availability needs to be the same Migrating the local disk data it has to go from the source to the destination and Then copying the memory states we do it in phases because we still want to keep running the process on the source Until we're ready to move it on the destination to reduce the pause time So we first move all of the memory contents while the process is running and Then we send the current differences that the process is still making because it's still running and then when we find the Minimum difference only then do we pause the process we copy the rest of the data We copy the CPU state and then we Bring up the process on the destination host Because the goal is to limit the pause of the process then we clear up the source. So this is a shows the transitions We first set up or synchronize the disk While the process is running a lot of the source we start the memory transfer We calculate the estimate dime downtime We continue memory transfer in deltas and Then we pause the process and Then we activate the network Complete memory transfer we run the process on the host so that then we can clean up the source so for copying over the actual local disk data you can see here that the The guests move from host one to host two So he can still access the data that's still on the storage on host one through controller VMs And we'll basically pass the data from these local storage on host one to the host storage on two While the VM has already been migrating because if we waited till all the data was passed it would take two Once all the data is copied over then we're okay. He can then access the data locally So now we'll talk about predictive analysis So predicting future occurrences via analysis of past performance is the concept So we'll talk about techniques for predictive analysis process for developing a prediction model Types of prediction models with examples and then applying the concept of scheduling So predictive analytics methodologies. We start at the top. We have historical data We go to the left The first thing we do is create a training set from historical data So we do some preprocessing. We normalize the data. We order the data and Then we pass it to an algorithm that will look for the patterns in the data and Match the patterns such as a neural network would he would adjust his weights Coordinated and then he would create a model Then he would go on go on the right side. You create the testing data You'd feed that into the model you give you the results you compare the results to the expected results You calculate the percentage of error and then you see if it's good enough And usually you have to do that process many times until it learns such as when we fall down We learn from it. We get up and after a while just like humans The concept is the same so there are many techniques for Developing the algorithm The idea is to not be a solution looking for a problem You should define the problem first and then look for the solution that So processes for developing a prediction model so again we stuff in the top We define the project. We collect the data. We analyze the data. We validate the data We create a model as we showed we deploy it we monitor it We see where we are we redefine the project and we do it over and over again until we get it Improvement improvement at the proven to know until we get it right So the types of predictive models and examples Support for vector machine model it's a classification model to predict the category So examples are stock price increases or decreases for predicting quantity we can use regression examples are predicting a person's age based upon height weight health and other factors Detecting anomalies. It's a normal behavior Classifications versus exceptions such as money withdrawal anomalies if too much money is coming out of a bank account too soon That's a red flag. It's something you want to know about clustering discover structure in Unexplored data or unstructured data Finding groups of customers with similar behavior even large database of customers containing their demographics and past buying records that would be good for marketing specialists Who are looking for marketing segments? So applying predictive and analytics to scheduling The criteria for the data can be processing time or iterations You can adjust for the number of iterations of based upon the amount of data that it needs to process And then you can also adjust for resource capacity and priority the percentage of resources used adjusted for capacity and priority and adjustments for anomalies when Calculating averages and I can talk about that as well. So some ideas. We can take selective Techniques applied for other scheduling applications such as combining regression like modeling and functional Approximation using the sum of exponential functions to produce Probability estimates and that would be combining statistical with mathematical techniques others such as machine learning and advanced mathematical models that would be using more AI behind with mathematical models so this is the Proposal you have a predictive Analysts architecture you have schedulers as we said you have various parameters to consider the CPU memory storage Networking and the scoring and of course you have an example for distribution What's new you have a historian the historian collects the data He collects the metrics could be from logging it could be from other places a scheduler Provide The predictor then reads that historian it does its predictions and then it passes Suggestions to the scheduler such as hey this process is going to go down soon and another process needs the resources Let's migrate So tracking historical data the time each process starts and terminates the resources used by each process The time each process uses to migrate the time or iterations that memory or this transfer occurs per size So these are the considerations based upon analysis if early migration can proceed When early migration shall start? error correction anomaly detection for accurate results So anomaly error calculations and methods for consideration in order to narrow down The the results and get more accurate results. You can use statistical techniques Calculating the percentage of error from the mean and eliminate results outside of the threshold You could use be fancy and try to use signal processing techniques smoothing filtering to eliminate the glitches You could try to use machine learning techniques to analyze the patterns and categories between normal and out of range behaviors, but again The solution should fit the problem Thank you I'm flattered. Thank you. Any questions or comments? Thank you. What version of what? I didn't hear I Well, it's not this is a This is a proposal It's not implemented. There are pieces. The whole idea was to give a presentation that would show what we have And what can be done? I did good at the time. I guess, you know, it may be too good Okay, very good Yes, well You have my email address anyone who's interested in this they can contact me we could you know do something move it forward within red hat Within a community people who are interested want to get involved, but we really I'd really appreciate to get the feedback and to see you know what the Enthusiasm level is and we can really do something with this It's it's an innovative idea. I think I predict that it will be future Implementation that only because you need to close the gap between the live migration, which is more flexible and the redundancy Which is too expensive and it's all about reducing the cost for the customer