 Well good morning everybody. Thanks for being out here with us today. We know that, you know, as the weeks tend to go on, we're on day three of the summit and the nights tend to get a little bit longer. The mornings get a little bit more difficult, so we're happy everybody could make it out here. We have this big room to fill, so plenty of seats. So we're excited to be here today to talk a little bit about Watcher. I'm Joe Cropper from IBM. This is Ann Ball from Intel. Gina Meal from Becom. And we want to talk a little bit about the resource optimization service for OpenStack. We're very excited to be here. We want to give you a rundown sort of of what are some of the key features and goals and initiatives of Watcher. Where have we been in the past from a historical standpoint? And where do we see this project going in the future in terms of, you know, what do we accomplish in the Newton and where do we see again going in Okada? Also shown on here is the OpenStack logo known as the jellyfish for the Watcher project. For those of you that may be wondering, well, what does a jellyfish have to do with resource optimization? The jellyfish is actually the most energy efficient swimmer of all animals. And so we thought that that was a good pairing of data center optimization and whatnot. And if you want to read more about the anatomy of that Wikipedia has a lot of good information, I don't pretend to understand all of that. So what is Watcher? Watcher is a flexible resource optimization service for OpenStack clouds. We all know that OpenStack has done a good job in terms of when you're getting ready to place a virtual machine initially, put it out in the cloud, you know, we use the Nova scheduler. That handles initial placement. What happens over time is that clouds tend to become imbalanced. So you put a workload out there initially and you've got, you know, hundreds of compute nodes and over time resource utilization can sort of, you know, very greatly differ. You know, workloads start to spin up, become very busy, other hosts may be idle. And so how can we better optimize our environment? So dependent on, you know, do you want to do balanced terms of CPU utilization, memory utilization, energy sort of awareness, there's many different optimization strategies that you could do. There's many commercial offerings that do this. I'm sure many folks are familiar with VMware DRS, you know, we're looking at trying to bring similar technologies into open source economies like OpenStack. In addition to providing several out-of-box optimization strategies, one of the things that we want to try to accomplish is provide a flexible framework. So we realize that many different cloud providers, people whether you're running a private cloud, public cloud, you know, there's probably different ways that you may want to optimize your infrastructure. And so one of the things with Watchers, we want to have a nice framework that's flexible so that you could plug in your own custom optimization scheme. So if you wanted to have some proprietary things, that's perfectly fine. You can use Watcher for sort of that overall optimization framework. So it also can integrate with other external engines as well. And Suzanne will talk a little about this later, but there's some nice plug points so that through your optimization routine, if you want to call out to other, we use the term scoring modules, there's ways, again, nice interfaces that you can tie into some of these systems. So again, think of Watcher out-of-box value that's being provided, several optimization schemes that you can use, but again, flexible and powerful so that you can fit your own routines in there as well. So one of the other things we're very excited with Watcher is that, you know, we have many contributors from all over the globe. In fact, by definition, when, you know, we, I was excited to, you know, learn about Watcher from the folks here at Beacon back in the Vancouver Summit. And so, you know, it was started by their team based out of France. They did a lot of great work. And, you know, IBM and Intel, we joined, wanted to try to help, and there's many, many other companies. Walmart has since joined, you know, Survionica. I mean, many, many folks here up on the screen, ZTE, and it's a global, it's really a global team, and that's great. We're getting a lot of interesting perspectives from all over the place. And so the team, you know, very diverse in nature. We have a lot of new ideas from many different backgrounds. So hopefully, you know, we're always welcoming more people. We're very excited, and, you know, we're out on IRC, and we'd be happy to have additional folks come in and join us. But again, a very diverse set of contributors, and that's been very exciting to be a part of. So a little bit about some of the key features, and we sort of touched on this a little bit, but one of the things that was some of the out-of-box optimization routines that we provide is one of the approaches is VM live migration. So for instance, as different hosts become very utilized, I have a whole bunch of workloads on host A. I need to get them over to host B. How do I solve that problem? I can invoke VM live migration. So most hypervisors in the OpenStack ecosystem support this operation. We can invoke that, and to try to rebalance the cloud. Again, a very, very flexible infrastructure. So if there's other sorts of operations that you would want to plug in, again, VM migration is just one example. You could also powercycle compute nodes, for example, via Ironic. Maybe you want to eventually do right sizing of a virtual machine, shrink it down, grow it up. There's many, many different ways, depending on what your overall optimizations goals are. And again, Watcher provides that nice flexible framework to tie those in. There's also two modes in which Watcher can run that's very useful. So there is sort of what we call a one-time execution mode, or sometimes called one-shot execution. And that's the first mode there, single mode, where maybe as a cloud administrator you want to run one optimization loop. You can go ahead and run that just one time. And that's sort of an on-demand sort of execution mode. There's also a continuous mode where I want my audits to be running continuously in the background. So it's very asynchronous. Think of it like a disk defragmentation process. It's always running silently. You don't really recognize it. It's just happening in the background. So we expect that probably most users of this would end up going down that ladder path. That's the way a lot of some of the commercial technologies work. But again, you have that flexibility in terms of how you want to run Watcher. So how does Watcher tie into the rest of the OpenStack ecosystem? That's one of the key things that we kind of want to try to make sure that people understand here is that think of it kind of like a hub-and-spoke model where you have Watcher in the center, and it's leveraging the other projects and the services that they provide. So a perfect example is when we want to optimize the data center and I need to invoke VM live migration, right? Watcher doesn't go off and orchestrate that or rather doesn't implement that itself. It rather delegates that work off to Nova, which has already done that and tackled that problem. So we're going to leverage the services provided by Nova to do migration. For in terms of gathering metrics, right, there's Solometer, Manasca, some of these other projects that are already doing that. Watcher is not metrics collection. We instead delegate that work off to those projects. Of course Keystone would handle all the authentication. We have Oslo for all the common libraries and routines. So again, depending on how you want to optimize the service, you could call out to any of these other sorts of other projects. Again, if you wanted to start doing bare-metal operations, you would tie into Ironic. So that's sort of the relationship between Watcher with respect to the rest of the OpenStack ecosystem is we want to leverage those services to optimize the environment in whatever way may make sense for your data centers. And at the end of the day, right, this is enabling some new ways to reduce your TCO and that can really be defined as, you know, some people are looking to get the most, maybe energy efficient data centers. So that may be one way to reduce your TCO. Other folks maybe want to be getting, you know, getting the most work done. Maybe your TCO is defined by your memory utilization. So there's many different goals that you're trying to achieve and at the end of the day, as those dynamic clouds and resources, you know, change over time, again, the idea of Watcher is to come in and every time an optimization loop's run, things are in better condition than they were before. Okay? So this slide sort of talks about the overall optimization loop and so, you know, if you were to take a step back outside of the code for a minute, this is sort of the high-level processes that we're trying to cover with Watcher. I'd like to start down at the bottom of this loop in a monitoring component. So this is the phase where all of the metrics are collected into the system. Again, this is happening asynchronously, as I talked about. This is coming in from Solometer or Minasca. Again, you could plug in other metric systems if you wanted as well, but the data's coming in. Then the next phase in the optimization loop, we move into the analysis phase. So this is something that, again, this is a very theoretical picture. So we need to continually, in Watcher, we need to continue to work on this part of the loop. But in theory, one of the things that we would do here is data aggregation. So you would be looking at a lot of the information coming in from the metric systems, start aggregating information. Maybe you have very granular metrics and you want to sort of, you know, summarize those or aggregate them down into sort of a time series format to sort of reduce the overall data. So this is where the data aggregation would occur is in the analyze phase. So this is perhaps where you're starting to, you can tie into some other analytic systems. Again, Suzanne will touch on this. We did a lot of work on this piece and this Newton release. But where you could start hooking into another system, profile data, perhaps identify trends or future predictions as to, I look at all these metrics, what might the past tell me about the future so you could profile your information. That all feeds into the optimization process. So this is where, okay, I've sort of, I understand the landscape of my data center. Now I want to try to optimize it and the input to the optimizer is a number of different pieces of information such as constraints. Maybe there's certain virtual machines that need to be co-located or anti-co-located. What is my overall objective? Again, do I want to pack up a certain host and fill it up with VMs or do I want to redistribute my VMs based on, again, CPU or memory utilization or some combination thereof? So the optimizer is ultimately deciding what steps need to be taken to improve my data center. And then that feeds into the planning phase and the planner is ultimately figuring out, okay, here's all the, for instance, migrations that need to happen. The planner is sort of putting together, think of it kind of like a graph. You know, maybe certain operations, certain migrations can occur in parallel. Maybe some need to run serially. So it's putting together sort of the master plan that are from, you know, state A to state B and state B being better or more efficient than the initial state. And then the last piece of that is the supplier. So this is where all of the plan has been put together. Now I need to go ahead and execute that plan and turn it into reality. And again, this is a, think of it like a loop that will run continuously over. And again, the idea is that every time this loop runs, things are more effective than they were previously. So it's a very, very important thing to realize that just because you run this one time doesn't mean it's perfect and it's going to remain perfect. It's rather a very continuous sort of set of steps that we expect to need to happen. So this next chart here talks a little about the overall architecture of Watcher. And there's some very common things with respect to some other services in OpenStack. So for instance, you have a Watcher API process. You were leveraging the OpenStack message bus. So again, this is very similar to other projects. And I tried to put some numbers on here to kind of help sort of follow it. So if you look in the lower left-hand corner, this is where we have the metrics collection occurring. So this is, again, as I talked about, where we have Solometer, Manoska can be used, Mutual Exclusive, one or the other. That information then feeds into, we've got a clustered data model. And this is sort of an in-memory representation. That was something that was done here in this release that can be updated that the optimization process can ultimately leverage to, again, to query this current state of the system. Step two, or really the second component here is where the administrator starts to interact with Watcher. So as an admin, I have a CLI interface. So if you look in the upper left-hand corner, it also ties into the OpenStack CLI. So it's OpenStack Optimize that calls into the Watcher CLIs. I can run my audits and optimization processes there. There's also a Horizon plugin. And all of these will flow into, when you make that request, it'll flow into the Watcher API. So sort of in the mid-center there. The Watcher API, and then, of course, can delegate that work off to some of the other components. So, for instance, number three, this is where you have sort of the overall optimization or optimizer for Watcher. This is where the sort of the planner lives that we talked about on the previous optimization loop. This is the component that is actually making the decisions about what steps need to be taken to optimize the data center. And then number four is leveraging the cluster data model to ultimately figure out, okay, what does the state of my cloud look like? What VMs are on what hosts, so on and so forth. It leverages all that information to make its decisions. And then number five, we've got the Watcher Action Applier. So this is, again, that component that is taking all the information applying everything, and that leverages up in the upper right-hand corner, the components in box six. So this is where you would call out to some of the other services in the ecosystem. So, again, whether you're doing migrations, you would be leveraging the Nova service for that. Eventually we want to tie into Neutron and Cinder. So the dotted lines represent more of that's a future set of actions we would like to take. There's a number of optimizations in the OS settings, different what volumes are attached to which host. There's a number of things that we can do there longer term, so that's kind of how this fits together. So the next chart I would like to turn over to Suzanne to talk a little bit about the Watcher History, the roadmap, you know, what we accomplished in Newton and where we see ourselves going in Okada. Suzanne? Thank you. So I'm quickly going to go over what Joe said, the Watcher History and where we're going. So, the first time I heard about the Watcher was at the Vancouver Summit where Beacon presented it and like Joe mentioned earlier, IBM and Intel were very interested in this project. So in September 15 we kind of had a formal meeting over at Beacon and that's when we kind of kicked off the initiative. At that point we were really focusing on making sure that, at least from an Intel perspective making sure we brought in a small amount of what I would call strategies even though it's simple that still would allow us to actually test the infrastructure and make sure that the infrastructure could handle some of the optimization that we were looking for. And I think at this point now that we are in October 16th the infrastructure is actually a very good state and so one of the things that we're looking for is more strategies people contributing more strategies so that it's becoming more and more useful for the community. One of the things kind of a big thing that happened since we met last time in Austin is that we're now an official project. On May 31st we became part of the OpenStack Big Tent. Yes. Thanks everybody. Thank you for everybody for participating and doing this. So one of the things I'm going to talk about over the next 10 minutes is really all the great work we did in the Newton release and how far we've come since we became part of the Big Tent and Jean-Emile is going to talk about our plans for the OCATA release. So if we look at some of the the accomplishment for the Newton cycle one of the things that was of a lot of interest for at least Intel was the watcher scoring module. So the way it is today you know we want to achieve some goals we've implemented some strategies and those are part of the plugins that are within watcher but a lot of the data scientists work on a different set of tools that are kind of part of an external framework and so the idea here was to integrate with external scoring engines so that we can take those into account when we actually execute strategies and try to achieve goals and I'll quickly go over this in a couple more slides just to give you a better visual idea of how this actually works. The other thing that we've accomplished is we added the active mode so as Joe mentioned earlier the way that it had been up until Newton was that the admin would trigger manually an audit and get an action plan and then would have to kind of carry that out by hand. The way we have it now is that we have this notion of continuous audit so the cloud is actually can now continuously be audited while it's action plan automatically being carried out and while we still I think it's important that we still have both modes in that not everybody is going to want all actions to be carried out automatically and also looking at from an internal perspective the way our admin works is that they kind of need to trust the system before they're going to let you do anything do some automatic migration and so this is why we kind of have those two modes and eventually the way I envision this is that some of the easy tasks kind of no brainer tasks people are going to just say yes please carry those out automatically without me approving them moving one of your bigger applications around you are going to want to make sure that it really pays off before you want to do that and so I see really both modes being very valuable so actually getting back to the value of you know the action plan or of your optimization one of the things that was added into a watcher this time is this notion of effect if you Cassidy indicator and what that really means is that when you create when you have a strategy the strategy writer can actually put in this the notion of what is the type of efficiency that a certain thing is going to give him or her and just as an example and what this is actually giving you is when you run and audit and you get an action plan back you can actually see what are the actions and what is the efficiency that you're going to gain by applying this action plan so it kind of motivate the admin to want to carry out an action plan if you don't know what your optimization gives you there is really no reason to really think you're going to get anything out of it and I was just talking to some member of the watcher team we've been running some scalability test on an Intel cluster with like 25 compute and as we ran that we ended up creating 177 VMs migrating them around so that we suddenly got 23 out of the 24 nodes free and it had an efficiency indicator that would kind of show that this was worth doing one of the other things that we've been focusing on and I was telling you about that around more value added optimization strategy and I have a slide where we kind of list some of those things that we've added for Newton some of the other things that kind of core features we've added to watcher that might be kind of less obvious to the user but good in terms of efficiency and performance is a local cluster state model and what we've done here is we've created that and where before it was actually created for each audit here it's actually in memory and being updated from specific services as needed and the last thing is you know one of the things that we've been doing is scalability test so 30 node doesn't sound like that much but there is a big difference between running on 5 and then 30 and so it's one step one of the things we're going to also look at is through the kind of the open compute initiative we can get a larger cluster and we've actually submitted a request for even a larger cluster so a lot of us are trying to be ready to run watcher in production and so we do want to be able to make sure it's scale to a decent size these are the strategy we currently have in watcher and again this is very pluggable so anybody who wants to contribute their own strategy are more than welcome to do that and like Jean-Émile mentioned yesterday not all strategies has to be open sourced we do have customers that are doing strategies and then just keeping them for themselves and not outsourcing them we just think it's important that at least watcher has a base of strategies that allows you to get value out of watcher as you know pretty much out of the box so as you can see there is many companies have contributed to the strategies and it range everything from like looking at server outlet temperature and doing threshold to VM migration to basic consolidation and workflow stabilization around VMs and so I won't go over each of them you can read them and they are also on the blueprint but one of the things that is interesting and I think that's one of the powerful things about watcher is I really look at it as an orchestrator with a data analytics engine and so as you're looking at so you can actually have as we move forward we're also looking at maybe having more of a data analytics stack in there so that we can do clustering and classification right there real time around the fingerprinting of some of the workloads so that we can do better placement and also measure a contention of workloads so this is one of the things that I personally start with very cool and I wanted to highlight here is that we have people who are actually part of the watcher team writing blogs about how their contributions are making a change in their environments and so this is I think the first blog I've seen of somebody who actually have been using open stack watcher with their specific plugin and showing that in this case they actually have a more efficient deployment and so you know this is also for all the watcher participants this is great for us because the more we talk about watcher the better the more people learn about it and are aware of our existence so going back to the watcher scoring module so one of the things we were talking about earlier is that in many cases and maybe I'll just go to this one first and come back so if you look at where watcher fits in the ecosystem it's really kind of optimizing an open stack cloud based on strategies that are implemented up until today they were really part of watcher themselves one of the things so the trusted analytic platform and this is just an example of an implementation it could be a different external service but its scoring engine can be published into the watcher scoring module and then it's kind of being used just as the other strategy and what we created here was kind of a nice tie with external analytics platforms through well defined APIs and the reason we thought this was a very interesting use case is really because we wanted the data scientist to not have to touch the cloud and we wanted the admin of the open stack cloud to not have to understand what a plugin is doing or what needs to be done in order to integrate a plugin the other thing that this is giving us is really how do I do this little bread thing there's a button right here oh yeah it might not work so one of the things was really about keeping the data scientist working with a tool that he or she is really good at and letting the administrator feel safe that nobody was actually touching or messing with the cloud the other thing that this allows us to do in this case is actually having the model be trained and learn outside the open stack ecosystem and close to real time publish the scoring engine and it would take action if I go back here so going back to this like I said the scoring engine is really what we would call kind of a generic machine learning service and it has a standard API to interact with any external analytics service in the case of a TAP which is open source as well and is an effort that is led by Intel but hopefully more people are going to start contributing we just focus that as our first example since we knew of we had experience with both TAP and the watcher piece and I will pass it on to Joimine so what are the plans for the Okata release so first of all we want to work on the odyscope so right now we have an audit this is done on the whole infrastructure and sometimes depending on what you want to optimize this is better to just scope the optimization on something for example we would like to be able to optimize an availability zone or a no cycle gates and sometimes we just want to focus on storage sometimes we just want to focus on network so the idea behind the odyscope is just to be able to scope the audit on some resources available in the open stack and the other thing that we are looking for is to improve action plan storage in watcher so in the watcher app layer we are using a workflow engine called Taskflow and we want to leverage concurrent actions of Taskflow in order to be able to migrate several virtual machines at a time because in the watcher planner for now we are able to schedule several actions at a time but we cannot store this information in the current database so we want to be able to do that for many reasons for performance reasons for security reasons and so on and the other thing that we want to work it's a graph based cluster models so right now we are using a flat structure of the infrastructure this means that we cannot for example modelize efficiently network topology and sometimes we want to work on optimizations like have many links between services so the idea behind cluster based cluster model is to be able to modulate network topology and so on according to what we want to optimize this is the best thing to use another thing that we want to do in Ocata release is add workload characterizations quality of service is something that matters a lot for us and for many of the customers that we have one way to understand all the virtual machine are using the resources is to use workload finger printing or workload characterizations know how the virtual machine are using the resources allow us to for example proactively load balance the loads but we can think for many other use cases use notification in watcher so right now when we perform the edit this is done on any other open components are not aware of that and for example watcher set wants to be aware that we are doing an action plan to the systems and for example Nova wants to know that we are doing some things so we want to let the other open component to react to some events in a fashion event fashion way as we say many times we want to add many new strategies one of the key feature of watcher is playable architecture this means that if you are if you have some optimization problems we think that watcher is the best place to do all that we try to we provide everything you need to just focus on your optimization problems so if you have an optimization problems and you want to work with us we will be really happy to help you so this is the strategies that we plan to add in the Okata release so first of all Wal-Mart on Intel wants to work and to use finger printings in order to guarantee service level objectives so the idea behind that is to some of the workload that we are running in the infrastructure are critical for the business some of them are less important we can imagine for example some continuous integration jobs that are not critical and we can kill them in order to guarantee the service of critical systems another thing that we want to add in the Okata release is to work with ZTI that's an elastic extension of strategy sometimes sometimes when you choose initially the flavor this is not a well choose and we want to be able to add for example some vCPU or RAM to guarantee the service so thank you very much for the for us we have a wiki with many documentations for example we will explain you very well your own strategies we have an ERC meeting every Wednesday and if you want to see what's in action we have a video online thank you is there a mic if you ask your question we will repeat it yeah we will repeat it that's fine I mean yeah I can take a so I think the question was maybe is there some other projects that are doing with respect to self healing networks and maybe doing some of those sorts of operations how does Watcher tie into some of those so that's a really interesting question because when you talk about the whole idea with Watcher is providing this generic framework and I guess this is my take and others can feel free to add to this but it provides a nice framework for optimization the types of optimizations you may be doing right now they are mostly compute centric so we are starting around for instance CPU utilization and balancing or having workloads that are sort of packed up on a host or doing round robin sort of deployments eventually with how Watcher if you remember back to one of the architectural pictures I showed you could tie this and link it into Neutron or Cinder or some of these other things so I see this is really providing a much broader scope in terms of whereas some of these other projects may focus on very specific sorts of things that are relevant to maybe their project only and again depending on how if you look at your overall data center there could be a number of different ways in which you are trying to optimize that and again I see Watcher as sort of a nice utility or service you could start running that again has these generic out of box routines and others could then plug in their own implementations and I think that you know we are again I guess the short answer to the question is a very broad you know the Watcher system can be very broadly defined again some of the other ones are maybe a little bit more specific I don't know if this is energy to me anybody want to add to that? So we are coming at Watcher from workload kind of optimization and you know for example finger printing of the application with phases and trends being able to predict as I am launching my next app is there going to be contention with existing apps that's really where I see Watcher I don't say that you know self healing networks is anywhere in that category and you know self healing networks is to me what I would call a niche you know it's kind of a smaller piece but I am really looking at you know how do I increase the utilization of my overall cloud from let's say the 20% service CPU utilization to you know 50 or more towards the same type of utilization that large service providers are getting just because they have a much bigger mix of workload than customers in my opinion many other projects are rule based on what is more about analytic models for example you have an application we will learn the basements of the application how it works usually and after when we learn how this application is working we will say okay now we see that compared to the history this machine are not the same behavior and you can learn that you have to think about how many other projects are okay if this virtual machine is above this threshold do that and we have more learning ways to do things right the overall framework exactly any other questions yeah I think the question was about running watch with all the versions of OpenStack or what level does it support I think it's Miteca Mitaka we really went under heavy development back in Mitaka and obviously now with Newton that's where we would really recommend that people start with the current release in the watcher configuration file you can specify what is the current version of Keystone so you can yeah sure sorry there's a question just there can this be used for initial VM placement so that's a great question can watcher be used for the initial placement of the VM and the short answer is really no in fact we're looking at that's what the Nova scheduler does when you initially place it and that would continue to be so in the future down the road what we want to use watcher for is that post VM deployment optimization process you look at the workloads in the cloud you're going to have long running jobs and short lived jobs so even though it looks like you have the optimal initial placement over time suddenly you end up with a lot of fragmentation and we're focusing on kind of the fragmentation and making sure that we optimize the cloud as it's running given that we feel currently that that's something that's missing and we need time to learn how the virtual machine is running so anything else thank you so much join us on IRC if you're interested appreciate it