 Let's get started. So good morning, everyone. My name is Antoine Cabot. I'm a Cloud Computing Lab Manager at BICOM, and Christophe Dion, which is with Cloud Architects in my team. So I hope you all have had a great HP party last night. Hope you're not too tired today. So today, we talk about Cloud Orchestration with Watcher. I will go, of course, in details about what is Watcher in these presentations. Just to let me know, can you please raise your hands if you are an ops guy in your day-to-day job? OK, so Watcher is built to help ops using OpenStack as an everyday tool. So I hope you will be interested in that. So a short summary of my presentation. So a bit of a word about who we are, the private Cloud Reality then, and what is Watcher, how it works in details. I hope we will have time to do a short demo to you and then a Q&A session. So what about who we are? So we are quite new, actually. We are a private research institute. So it's a bit special to be here at the OpenStack Summit. Actually, it's our second summit. We were in Paris. And then we are here with this presentation. So the research institute started in January 2013. So two years ago, and it's based in France. Maybe you know the equivalent in Germany, which is called front offer. So this is quite the same as a front offer, but in France. So our activities are focused on innovation, in network and security, hypermedia, and health. So what we do is, as we are a private institute, is that we take most of academic research around us in France. Then we try to bring it to a certain maturity level and then bring back to the industry with our partners. So one of our biggest partners is Orange. The French telco. And it's very important for us to bring things on the academic side and push it to the industry. The common time frame in Becom is two or three years. So products we are building now are targeted to be in production or on the market in three years. So my lab is focusing on cloud computing and for computing and beyond. So the idea is to think about how we can improve cloud computing management in the coming years. And we are really committed to OpenStack. I think, personally, that Open Source is the best way to produce better code. So as a research institute, it's very important to deliver code which can be trusted by industrial partners. So I think it's important to be Open Source. So OpenStack is a good way to do it. Actually, my team is composed of multiple skills. We have engineers, developers, cloud architects, and also student PhD. So we are lucky enough to have PhD to deploy OpenStack. So that's pretty good. So what about the private cloud context? So what we see today? Actually, I will tell you a little story. So we imagine you are a new engineer which is hired by a company and says, OK, we want to have a cloud infrastructure because we think it's better for us. It's better for efficiency, for productivity, and so on. So you just come into this company and you say, OK. They say, OK, here is the infrastructure. Now make it a cloud-ready thing. So of course, you should open Stack because this is the best tool you can use, which is ready on the market. So you should open Stack. And then you have all the departments of the company in front of you, so the IT departments, the marketing department, and all of them. And you say, OK, now we have a private cloud. So you can use VM. You can use everything you need on demand with an unlimited resources. With unlimited resources. So please do it. And then they restart. And at the beginning, they will all use golden VMs because this is what they used previously. So they want VMs that you will never stop, but never, never. So please, they will tell you, OK, please be aware that this VM must never be moved or migrated or anything. And the IT department will also need containers because it's probably sometimes better for specific services. So they will ask you to have containers in your cloud. Marketing department will also use golden VMs. And sales department needs VM only to do sales computation by the end of the month. So they will probably need them maybe one day during the month. So they will ask just if you're more of VMs. So to do some computations and then stop it. And finally, the accounting department, which is a bit better than the other in terms of technology, will say, OK, we also have apps that are native cloud apps. So they are really built the way they can scale horizontally and everything. So you have all these kinds of VMs into your cloud now. And it's going during all the time you have more and more golden ephemeral VMs and so on. And so your private cloud is probably not enough. So you will have to connect to public cloud to get a hybrid cloud. So they will have to connect to Amazon Web Services, to Google Compute Engine, or to Rackspace Austin. So of course, all these VMs can be CPU intensive or memory intensive or IO intensive at some times because workloads are constantly changing. And the needs from the IT departments or from marketing departments, it always change. So workloads will be fluctuating. So I think it will be OK if you have tens of VMs or 100 VMs if you scale up the admin team. But with thousands of VMs, it's just become a nightmare and you will be probably like this guy. OK, what about the optimization options we have today? So we can rebalance the load. So there are three options, actually. So you can rebalance the load over the entire cluster. So you just take, for example, two VMs that are CPU intensive and you move them to two different physical hosts to have a better performance. You can also consolidate VMs or reduce number of physical hosts. The idea is probably to better use your infrastructure. And you can also free up physical servers. And the best solution is complex because it's probably a combination of these three. So it's quite a hard problem to solve and watcher is trying to do this. So in many use cases from the communities, there is a very good article from Adam Spears, which is here, that he posted two days ago on his blog about what can we do about cloud rearrangement when we run a private cloud. There is also a blueprint which has been submitted by OVH, ovh.com, about how we can move specific flavors like windows instances and so on, all at the same time on the same physical host. And there are also many things about energy efficiency for compute hosts. So they had already projects. So OpenStack.it is one. And Blazor, which has been abandoned since, is already available on Stackforge. And one last thing I want to mention is a blog article from the ICC lab in Zurich, Switzerland, where they are really focusing on making OpenStack more energy efficient. OK, so let's see what is watcher exactly. So watcher is an OpenStack module. So we will have all the required specs. It's very important for us because now it's a big tent. So we have to comply to requirements from the OpenStack foundation. So it's specifically targeted for private cloud deployments. We don't do public cloud. It's available on Stackforge, so we are still waiting from the infra team review to be on Stackforge. But it will be available probably on Monday. It's easily deployable on any OpenStack cluster. So the idea is you have your OpenStack already deployed, and you just had this module, watcher, and it will do all the things you need. It has no impact on OpenStack core components. This is very important. We don't want to change, for example, filters or anything in the Nova scheduler. It uses keystone authentication. So this is really important also. We don't need any identification system so far. It provides a CLI and an API for third parties if you want to automate something from the watcher. What are the benefits of watcher? First thing, it automate lifecycle management of cloud resources. So the idea is to analyze, undergo the workloads of VM to detect if there are CPU intensive, memory intensive, IO intensive, and then move them on the go according to SLA rules. So what we do is we just take all the customers' constraints and we will try to better organize the VMs according to what they do on a day-to-day basis. Of course, this can also be an energy consumption goal. So we can say, OK, I want my cloud to reduce its energy consumption. So watcher will be able to do it. You can also anticipate system overload and performance bottleneck because as we are analyzing workload on the go, we can anticipate system overload. You can also easily plan maintenance supervision like I want to stop this cluster and move all the VMs from one to another during a short period of time. And finally, you can reduce risk of human failure. One thing I should say is that with the actual NOVA implementation, you can do all these things. You can use aggregates. You can use cells. You can use regions. You can use several groups. All these things can be done, but we think that it's too manual. So you have to change your aggregates all the time. You have to say, OK, this cell is, for example, the IO intensive cell. And this one is more for CPU intensive and with flash storage and all these things. So you have to do it very manually. So the idea with watcher, as a starting point, is to say, we can do it with an automation tool. The word about our technical assets, where we are good at, is the profiling of VM. So this is very important to be able to classify a VM and to say, OK, this VM is running like this. So we will probably schedule it on this host. And then the next time we will have to schedule the same image, we will probably do it on this other host because it's better for performance and so on. And we will also provide advanced algorithm to provide dynamic orchestrations of multiple VM. Actually, in NOVA today, you can only schedule one VM at a time. If you want to migrate some VMs, you have to do it one by one. And then with watcher, you can give it a goal, and then it will move all your VM at one time. OK, we'll let Christophe continue on this presentation. Thank you, Antoine, for the first part of the presentation. I propose you now to go deeper into watcher. Today in OpenStack, there's a great component that makes a virtual machine placement. It's the NOVA scheduler. It works fine. It is used in many production data centers all over the world. It's the NOVA scheduler. The NOVA scheduler takes into account some many constraints. And it is done with a pipeline of filters, which are the NOVA filters, such as affinity, anti-affinity, host reservation for specific tenants, and many more. Now let's see how NOVA scheduler makes a simple placement, an initial placement, with affinity and anti-affinity. We have four nodes, eight virtual machine. Virtual machine one and four have an affinity. There is another affinity between virtual machine two, three, and five. Notice that six and eight must never be on the same compute node. And there is no constraints on VM7, the purple one. So NOVA scheduler provides us something like that, which is finer, because it's what the kind of thing we expect. Now let's make a maintenance on the host B on C with NOVA scheduler. So NOVA scheduler has to move some virtual machine and put them on the remaining hosts. One thing that you can notice is that NOVA scheduler doesn't keep the affinity and anti-affinity between the virtual machine, because it does not save the constraints between the VMs. And that's detrimental to clients and to the system operations. Now let's do the same maintenance, but with our module. Today, watcher keeps the affinity between the virtual machines. One more word about NOVA scheduler. It makes mostly static placement, but the system is alive. Virtual machine are changing all the time in terms of number, in terms of CPU, in terms of memory, network, and disk usage. Without changing the cloud configuration, there is a fragmentation. And the fragmentation causes the system performance issues. It's like in your laptop. Oh, sorry. It's like in your laptop. Remember, you have sometimes to make disk defragmentation. It's the same thing in a stack. We have to shut and revise the virtual machine or just move them. Constraints, that's a keyword for ops. In one hand, we have the user constraints, described with a service level agreement. And if we do not conform to the SLA, we'd get penalties. In the other hand, we have the cloud provider's constraints. They are the goals. Cloud provider wants the system to work fine, but they also want to optimize costs. They want to lower the power consumption. They want to minimize the number of nodes used, and many more. And we have also the hardware constraints, which are limited resources. And everyday ops have to find a good trade-off between the constraints and the goals. And we propose Watcher. And to achieve this challenge, Watcher uses complex event processing, time series database, machine learning, and optimization algorithm. Because we want to satisfy the trade-off, which is a complex task, because of the dynamicity of the system, we use an adaptation control loop method to vast. We have the monitor that collects the topology, the metrics, and the event from the system. Then the decision-engined makes complex data analysis. And if change are needed, then it provides a change request to the planner. The planner builds a workflow of actions and provides it to the supplier. And the supplier triggers open stacks modules with the sequences of actions. And it's a loop, because it's an iterative process. In the middle, we have the knowledge. It collects, it saves the re-rivened data and shares them amongst all of the components. Maybe some of you have noticed that it's a map-key feedback loop. Now let's see each module, the monitor. I said that the monitor extract collects data from the system, but there is a huge amount of metrics. It's not big data, but the amount is big enough. And the question is how to get and collect real-time information from the system. For that, we use complex event processing, because it can collect the data. It can analyze the events. It can mix aggregation. It can mix also some correlation filters of data and may reorder them. And for the re-rivened data, we store them into a time series database. But we must pay attention to the veracity and to the quality of the data. Otherwise, the next module, the decision design will provide wrong results. And one thing also that the complex event processing can make is it can detect some specific events and make some kind of automatism to pilot OpenStack. Today, we collect information from cellometers and two other modules, PDU for energy consumption and a kernel module for disk usage. And there's process by the complex event processing. The monitor can also learn from the past. And for that, we are using learning machine tools. It learns from Watcher's previous action. It learns from the workload of the virtual machine. And with this knowledge, it can compute new constraints. It can make some predictive metrics. And that's very useful, because we can prevent noisy neighbor issues. We can detect heavy communication between virtual machine and many more. Let's say an example of how Watcher works with a profiling, virtual machine profiling, because there is another kind of profiling, which is the system profiling. So Watcher with VM profiling. Our machine learning tool learns that there is heavy communication between the VM 6 and 7, that there is heavy workload, CPU workload on VM 125. And remember, we still have the affinity and anti-affinity constraints. Node B and CI back, and some virtual machine moved. And it's what we expect, because we keep all the constraints. The decision engine, that's a hard slide of the presentation. Optimizing a cloud configuration is a multi-objective problem. Optimizing a cloud configuration is an NPRD problem. That means that there is no unique solution to optimize at the same time each objective. But we can find many optimal solutions. And for that, we use optimization algorithm, because they can handle multi-objective. They are here to resolve NPRD program in a limited time. It can adapt to change dynamically. It can provide fast convergence to an optimal solution. And for that, we use numerous algorithms, which are exact algorithms, heuristics, and metaristics. So watcher is here to simplify the ops live. And that means he wants to make an optimization request. So this engine selects a strategy. And so a specific algorithm, it provides a solution. And if they are changed or needed, then the action planner builds the appropriate workshop of actions. This is an action plan that we can get it back to the admin. And then the administrator, if he wants, can apply it through the applier. And the applier triggers open stack modules. This can be done manually with the admin or automatically. So we saw the current strength. We saw the adaptation control loop method, the map key. And the different modules, the monitor with the complex events processing, time series database, machine learning. We saw also the decision engine, the planner, the applier. And now let's go for a demo. Antoine? OK, so we have time to do a bit demo now. So you will be able to play this demo when we, if you clone the code from the Stackforce project next week. So I will go on this screen. That's good. OK, I hope that everyone can see it. So we just sourced our creds. So the open stack is already up and running, of course, with two compute nodes. And this is just to show you the algorithm we have in the conf file. So we will do a bit. This demo is very simple, but it's just to show you that server consolidation can be done automatically. So I'll just show you that all the services are running. So there it is. So we have an API, a decision engine, which we talked earlier, and the applier, which is responsible for applying actions to all the open stack modules. So everything is running. That's good. OK, so we will just check how many servers we have. Yeah, OK, so actually we have five VM running in our cloud. So we will just do a ping on everyone just to check that everything is up and running. And then during this demo, we will start an audit. And then we will see that some VMs will be moved from one host to another to consolidate all the VM on only one host and then put the hypervisor state offline, disabled on the other machine. So here is CLI from Watcher. So we will create an audit template. OK, so we just say, we call it my first audit. And then we set it to server consolidation strategy. So it will be created. And then so you can list it. You have all the CLI commands that we used to in open stack. So and then we will create an audit based on this template. And we want to do it only one time. So we say it's just for one shot. You can also say it can be periodic. So you can ask Watcher to do it every hour or every two hours to run this audit and to try to find a better strategy. So now we will see that this audit will be started. And we will get an action plan. So a list of actions. So it's recommended. So the audit. So here is the audit. So the result of the audit. So we have an action plan which says, OK, we should change the hypervisor state. Then we should migrate one VM to another host again and then change the power state of the provider. So this is all the actions we have to do. And then Watcher will automatically organize, order them the better way it should be. So here is a detail of one action. So we want to migrate from one host to another. So now let's see again the VM list. So we have five VM. We have two hosts. So number five and number six. OK, so on both hosts, the hypervisor is enabled. So we can schedule VM whatever we want on these two hosts. So now let's go back to the action plan. And then we will start the action plan now. So we'll just get the UID. And then Watcher will try to apply all these actions on the different modules we have on OpenStack. So Nova, Neutron, and all these things. And so now you see that the state is changing. So the first one is done. The second one is done. And the third one is ongoing. So you have this feedback that all the actions are played seamlessly. And you have a state for each one. So of course, if any one of them has an error, it will roll back all the action plan. So now we have executed this action plan. We will do the same thing as looking at where are the VMs are located. So they are all on the host six now. And if you check the hypervisor state, we will see that the host five has been deactivated. Here it is. And then just to show you that it's not fake, we just try to ping all the VMs again. And now they are all on the same host, which is number six. So as you can see, it's a really basic example. And this is really a starting point of what we want to do. And now we want to get some feedback from the community to see what we can do with this. If we can improve it, we need partners, of course. We need contributions. So this really must be seen as a starting point for something we can do in OpenStack to improve. We'll go back to the slides. So you can see, you can find the wiki page on the OpenStack.org's website. And Stack4.jl will be available, I hope so next week, with all the code available. And we also will, of course, welcome community feedbacks, especially from Congress team, because Congress is working on the SLA part. And we really want to be in their discussions to include all the SLA, all the customers, all the cloud user constraints we can from Congress. And as you see, we also talked about energy efficiency. So we are really open to discuss energy efficiency, especially with Huawei, which I talked yesterday, and all their partners, if you are interested in this, maybe we can start something around the working group or anything like this around energy efficiency. So please come back to us if you want to discuss it further. And I think that's it for us. So if you have any question, please free to ask. Free time for everybody. No question. I'm sorry? Yeah, so actually, the template is just you create, it's just an object that we use, that you can reuse all the time. For example, when you say, OK, I want a template for several consolidation, you will use it every time you want to do several consolidation. So I can show you all the things which are inside this object. Yeah, it's just an object to be easy to use every day, because you can use it many times. So you want me to tell more about the machine learning algorithm? So actually, we have experts in the team, so it's really hard to talk about it right now. So we can discuss it further. The idea behind machine learning is really to better profile VMs. So the idea is to run a VM, then profile it, and then save it for the next time you will have to run it on your cloud. So the idea is to find, with machine learning, we can find where is the best place to run this VM on our cloud. So this is really the important part. OK, so I think that's it. So thank you very much for your time. Enjoy the summit.