 Hello. Hi. Very good morning, and welcome to this Health and Mon session today. This is the first session we are having on Health and Mon, and we have some of the follow-up discussions in the unconference also today at 5 o'clock and tomorrow as well. So, let me introduce myself first. I'm Divakar, one of the architects working at HP, and I've been working on cloud software that we built in HP. Today, I'm going to take you through the journey what we had with Health and Mon, and maybe the Health and Mon is a cloud monitoring software which we have built on top of UrbanStack. So, we'll know more about what Health and Mon is about, and we have shown interest in providing the monitoring solution for the OpenStack cloud. Also, we have implemented some of the blueprints out there which is out there in the OpenStack blueprints, and we'll walk you through that one. So, when we start with the OpenStack cloud, you see the OpenStack cloud with network, compute, and storage. So, you see the different personas here. So, you have a cloud admin, you have a tenant admin, and you have a cloud user. So, mainly if we take three different personas, we see the cloud admin, tenant admin, and the cloud user. And the way these three personas see the data, it's going to be different compared to what he's looking for or she's looking for. So, here we have this Health and Mon module which is sitting on top of OpenStack and looking at the OpenStack cloud deployment. And it mainly looks for and it provides cloud inventory and it will learn about what this OpenStack cloud is deployed on and what is the hypervisor behind the scenes, what is there, and it will provide the cloud inventory data. And also, it looks for the cloud resources and for the cloud resources, it finds the utilization data for the cloud resources. And at the same time, it provides alerts and notifications and data for the thresholding based on the alerts and notifications and utilization data. So, mainly we want this Health and Mon solution to be the main monitoring solution for any of the OpenStack cloud resources that you would see in the cloud and also for any of the components that you will see as deployed in OpenStack. So, as part of this one, there are many functional areas we want to cover and we want to provide a common architecture using which you can monitor different components and cloud resources as well as the cloud deployments that you will see in the cloud. So, we are looking for a pluggable architecture which is similar to what we have seen with the OpenStack other components like NOVA where we have different pluggable drivers for different implementations that you will see. So, similar to that when we have a pluggable architecture defined for Health and Mon where you can plug in different drivers for which get us towards different hypervisors that you will build your cloud on and using the OpenStack. So, we have defined a standard resource model and a persistence model for Health and Mon where we define a resource model which will work for most of the cloud resources that you will see in the cloud today. And there are different choice points here with the resource model whether we want to go with a generic model or a specific model. So, right now we have defined one resource model and that's a debatable one which one we will say whether we want to go with a generic model to persist your data or you want to go with a specific model to say for each and every resource that you will see. Whether you will define one particular model and go with persistence for the same. So, we will see more on that one in the coming slides. So, we want to provide the cloud resource lifecycle events say now if you start an instance or deploy a new instance when that instance gets created you will get an alert and event for the same. And you will have similar events provided for all the monitored services like compute and it can be a storage it can be a network. So, it's another event collection framework we have defined within the Health and Mon module which will provide the events notification for different services. As you know NOVA has a REST APIs which will provide the data for cloud resources today with the resource extensions as well as that it provides the extension model for providing the NOVA APIs. So, we have explored the extension based model today to provide the Health and Mon APIs as an extension in the NOVA API itself. So, that you can once you deploy the Health and Mon solution you should be able to access the REST APIs to the REST APIs and of the NOVA API. So, we are defining two more components here data provider and proxy drivers. So, data provider mainly we are looking at providing the data for any of the consumers say let's say it can be a pull model and a push model. And if somebody wants to have a push model they can go with a push model and somebody interested in doing a pull model they should be able to do that one. So, right now we are defining this as an architecture and currently we don't have this data provider as of yet and we are defining we are going to define that. And as part of the proxy driver we will talk more on this proxy driver which will drive mainly the implementation for different hypervisors and how we will plug it in the implementations for the health and monitoring solution that you will have behind the scenes for this proxy compute. So, when we look at the different cloud personas let's look at the data requirements of a cloud user. So, here is a cloud user and you have a cloud wherein a user has deployed his services. And for the deployed services he will use the service and for this user service he will pay the bill. So, what are the key use cases we are looking at here is. Okay, as a end user I am interested in knowing what is the current metrics of the services that I am deployed in terms of memory utilization, disk utilization, it can be with respect to IO and network. And accordingly I am going to pay for the same. So, I might be interested in providing a history of utilization data and the similar statistics I am looking for. And it is important to have and analytics the data for the end user for the deployed services. So, likewise there might be there are a lot of use cases which will drive the data that is acquired from the cloud resources. So, if we jump on to the tenant admin. So, tenant admin has a cloud and tenant admin has access to the coders and available capacity and the images and the different flavors. So, here he can create the users here for his tenant and tenants can deploy the services. And similarly what a tenant, a cloud user would have seen the data and what he is looking for, similar data is looked upon by the tenant admin but in a different context. So, if you take a cloud admin's perspective of a lot, if you take a logical view of the cloud. So, now a cloud admin has access to all the services that is deployed in the cloud stack and so you will see the different components over here. Which will say if you do today you do know a managed service list. So, that is basically going to give you the different services that you will see as part of the cloud. So, a cloud admin will have access to our view of the cloud resources behind it. And since that is the servers network and storage and the cloud admin can create different tenants. And as part of this you will bring up a cloud wherein people can subscribe for the services and deploy their services. So, as part of this health and bond you will see the data. Today as part of the compute what is the data that is published by the compute you will see in general. Inventory which is, everybody is familiar with that is the compute and the instances. And you have the storage wherein you will deploy your instances which is defined by an instance path. And you have a compute network. So, these are the generic constructs that you will see as part of the compute network and storage within the open stack. And for which you will correlate the related users data as well as the alerts and notifications data. And that is what you will get. So, when you compare the a virtualized view let's say if I built my open stack cloud using KVM. So, how that this data is going to differ. So, now that you have seen the compute and instances and as part of this health and bond if you implement KVM cloud. So, as part of this inventory you will see a compute. So, if you want to have a drill down of what a compute consists of you will see the cluster and it can be a VM host. So, today we have seen that with the computes no longer it is tied down to one particular host and we are defining different models. And I think there are going to be blueprints talk on in the Havana summit wherein we can manage one compute with, manage multiple compute nodes say it can be a cluster it can be a group of clusters. It can be a resource pool or a group of resource pool or it can be multiple clusters and resource pool combinations together. So, as part of that means being it's important to know what is what constitutes a compute today. So, you may have to have a drill down between the cluster and the VM host and similar to that instances in KVM terms it's a VM right. So, instance part is basically a storage pool in the KVM and the network you'll have a virtual switch and we've put. So, in correlation to all the resources that you will collect as an inventory you will need to collect the usage data for those and also you may want to get the alerts and notifications and you may arrive at threshold based on the data that you would collect. So, when I look at for an ESX same inventory data now you can see that it as I told it can be a compute can be a cluster resource pool and a VM host. So, instances again it's a VM here instance part you know it in the KVM case we saw that it's called a storage pool and depending on what hypervisor you are using for building this cloud. You may want to know different resources that is behind the scenes and also there are different properties attached to individual resources when you realize that. So, in the compute network you'll have a virtual switch it can be a DV switch and there are different port groups available and for each again you'll collect similar data. So, when I drill down to Hyper-V so again you have the VM cluster and VM host instances VM instance part here it's a disk volume and the compute network is virtual switch and switch port. So, as you see here data is differing between the different components out here and the logical concept is today is the compute network and storage that you will see as one entity and if you want to have a drill down of each of these implementations and if you want to make use of that particular data and what hypervisor provides it's important to have a drill down into the or insights into the hypervisor provided data. So, this is the usage and alerts and notifications corresponding to the inventory that you collect and this is what we provide as part of Health and Mon. Wherein we will provide the drivers required drivers for KVM ESX and Hyper-V today. So, similarly when I get into the cloud infrastructure so in the infrastructure a client admin will have what is the cloud infrastructure is built on. What is the enclosure you have, server pool you have, physical server and virtual machines, what disk whether it's FC LAN you have used, whether it's ASCII LAN you have used, what is the switch and port is connected to and there are a lot of physical server or physical hardware details which is available and there are a lot of opportunities here to provide this kind of data in the inventory here. So, if you look at the cloud inventory manager use cases for a cloud admin. So, these are some of the bigger list of use cases that is available. You may want to provide a catalog of the resources that are available which can be made use of in a scheduler to make intelligence decisions about the inventory data which is available. You may want to have a view of the cloud resources along with its usage and this is another special case what we are bringing out here is today we can manage what is available as part of the open stack resource and if there are any instances that is deployed outside we will not have a view. Tomorrow there might be use cases where we will say okay there are already deployed instances available and I want to bring that into open stack. So, if you want to do that one we should be able to do that one with this inventory that we will collect. So, for an ongoing management of the infrastructure services metadata that you will collect it may be required for an autonomic analytics use cases and you may be able to do better provisioning judgments to where you want to deploy your new instances into and you can make the scheduler logic better. So, we are using the data that you will gather as part of this health and more. So, again this is more of related to the autonomic data that you will collect wherein you can make decisions to say okay when this particular threshold happens what action you want to take and such stuff can be automated as part of the data you collect. So, likewise you have different alerts and notification use cases out here for a cloud admin. For the utilization data you may want to collect the utilizing data that for each of the cloud cloud resources for the analytics purpose you may want to do it for get a better scheduler logic and you may want to do a better capacity planning in order to say okay if I collect the data for utilizing data for over a period of time you may be able to identify what are the different cloud workloads that you are running and whether you can optimize some of the workloads that you have deployed and how many more instances I might need depending on workload that I am running. So, you may be able to get the cloud based thresholds based on the data that we collect as the usage. So, currently where we are. So, we have this Health and Mon module on the data applied and which is available as open source under Stackforge. So, you should be able to check out the code going to github.com slash Stackforge and Health and Mon. So, we have implemented, we had three blueprints, Inventor Manager, Alerts, Notifications and Utilization blueprints for KVM and we have that implemented for KVM today. And we have the build scripts available for the IPM which you can deploy in the Fedora or in SNTOS and shortly we will have the build scripts even available for Ubuntu. And as I guess you have the time you should be able to even deploy that anywhere in the OpenStack, OpenStack, wherever you have deployed the Nova compute or the controller. So, this is how you will do, you will just do a git clone and go to the Health and Mon directory and when you do the git clone and just run this build script. So, you should be able to get an IPM package. So, how do I install and configure Health and Mon? So, now you know that how you build the IPM. So, if I am using a single node setup, build the IPM. So, you will get two packages today, this Python, Health and Mon and Health and Mon IPMs. So, when you install automatically you will get the scripts which will add the required end-race for the Health and Mon in the Nova config. As I told earlier that we are using Health and Mon as an extension of the Nova API. So, any data that you will want to access for Health and Mon will go through this Nova API extension. And this is the entry that is required in the Nova Con to make that extension available. So, this is automatically done by the RPM package itself. And since it is a KVM implementation here, Libert URI, if you are, since this is a single node setup, you can set the Libert URI to KVM. The local KVM instance. So, similarly if you want to manage multiple KVMs. So, we have the support for both SSH and SSL and we have documentation available. How would you set up an SSH between the controller node and the KVM servers? So, similar to the Nova Manage CLI command, we have the Health and Mon CLI command where you can use it using this one. You can set up the DB and you can do the upgrade and upgrade like what you do with the Nova Manage. So, we have a service called Health and Mon today which will start the service. So, when I see architecture or a deployment view of this cloud, as I told earlier, we have the Health and Mon service which is running on the controller. So, we are trying to redefine this controller and Health and Mon architecture a little bit. You will see this is the controller node, what you have, wherein you will deploy a Health and Mon collector. We are mainly breaking the Health and Mon component into two basic components here. One is the collector which would run on the controller, open stack controller and these are some of the things like Health and Mon word proxies. So, mainly this proxy is nothing but the driver that you will see in a Nova compute perspective. Now, if I want to manage a group of KVM computes, so I should be able to do that one through this Health and Mon word proxy that is there. So, I can have a KVM word proxy, I can have a Health and Mon word proxy which can manage ESX and I can have Health and Mon word proxy which can manage HyperV. So, you can deploy this word proxy in any node, it can be a physical node or it can be a VM as well. So, as you can see, you have this word proxy model which can be distributed across and you have centralized service which is running and which can manage n number of nodes here. So, that's how we have made this one as address, that's how we are addressing the scale out here which will basically collect the inventory usage and the alerts data. So, here is a little bit of further drill down into what goes into the collector and what goes into the Health and Mon word proxy. So, here in the Health and Mon collector you will see the event listener and the handler, event perf data handler and here you have the alerts notification engine which collects the inventory monitor and performance monitoring data here. So, if I am trying to build a new driver for Health and Mon, let's say, so this is what is required. You just need to write one Health and Mon word proxy driver which you can deploy and that should collect the required inventory and the perf data and the alerts data. So, as long as you write this word proxy, you should be able to send that one into the collector here and the collector will process that data into the DB ship. Currently, for doing this inventory collection into the collector, we have defined a standard resource model and as long as you stick to that resource model, you should be able to plug in your data and you should be able to provide the data. Let's say if I want to build a driver which will collect the data using say my collectee. So, using collectee, if I want to write this driver, I should be able to do that one here and I can just send that data over here as long as I stick to the resource model that I have. So, as part of this next steps for Havana, right now we are on Stackforge. We want to get into the main open stack. So, we are currently implementing these blueprints for ESX and Hyper-V. We are looking for integration with multiple projects and we do start with integration with Celerometer wherein Celerometer and Health and Mon together can address both metering and the monitoring solution for the open stack cloud. As I said, we will soon have the Debian scripts available. So, here there is a switch here. You can see that earlier it was dash R and then next is going to be just dash D. You should be able to just get the Debian build. So, I will quickly walk you through the demo of the cloud data, what you have. So, here I am logging in as an admin, an is an admin administrator here. So, as part of this here you will see the Health and Mon tab. So, here is the drill down into the resource model that you will see. That is the cluster resource for VM hosts and sensors. All these are under Health and Mon. Today we do not have a view of any of the computes anywhere in the horizon. This is just an attempt what we have made just to show what the data is available. And we will go with the community to get buying from all the horizon guys to what we can expose as part of the Health and Mon out into the horizon. So, here you are looking at a KVM host wherein you are seeing the data for the KVM host. So, you have the utilization data here for the host. There are various parameters here. So, you are seeing a relation between the VM host and the sensors out here. And you can further drill down into the network switches available, port groups available. So, this is the storage volume wherein when you deploy your services you will get into. So, further drill down into the VM you will see the details related to the generic details here. The utilization data out here and you have the specific details related to VM disk. So, you have the data storage volume and if you are attached to network you will get the details about the network adapters or VNICs which is available on that instance. So, we just looked at the KVM part and now let's just drill down into a cluster which is available as part of the ESX. So, this is the cluster what you have. There are three clusters. As part of the cluster you are seeing this is the name of the cluster here and further drill down into the hypervisor specific details related to the ESX cluster here. So, these capabilities what we have defined here is the DPM enabled, DRS enabled, HN enabled. All these values are coming mainly from the ESX cluster perspective. And these are the vendor provided properties which we can make use of wherever is required. Let's say now here in the DRS and DRS case we may able to do that make use of the DRS enabled cluster. Let's say if I want I always want to deploy my instances onto a DRS enabled cluster or a highly available cluster. I should be able to make use of the data when I do the scheduling. So, here you are looking at the capacity pool. So, here you are looking at ESX VM host and you have different instances deployed over here. So, we are still working on some of this stuff here related to which is which and storage related to ESX VM host. So, here is a drill down into the Hyper-V VM host. The usage data at the VM host level. There are two instances which are out there on this VM host. So, this is the drill down for a particular Hyper-V VM. Here is the alerts and notification part. All the alerts and notification that you will see you can look at the instances and VM host also related to the storage or network anything that alerts that would come. You can view it in this dashboard here. This is a work in progress and we will have the drill down into what that alert is and how you can take actions based on those alerts. So, that's about the short demo of the month. So, with that one I will open it up for any of the question answers here. Yeah, right now what we are looking at is providing the required data for any of the systems that want to integrate with Health and Mon and then take action based on the data that we provide. So, right now we don't have a mechanism where we can take the action on alert that can be integrated with say if I want to integrate this with heat which is providing a cloud watch and we can feed the data into the cloud watch and that can provide the required data. And action on alerts. Yeah, the information that we pull for the inventory is wherever we are able to do the action, sorry, event collection based on the events. Say if I am collecting for an ESX I can do that one based on events that is given by the via SDK that I'm going to use to connect to the, see the vCenter or the ESX server. So, that is very instantaneous and you will get any of the events that is occurring or any of the inventory changes that will happen in an ESX environment. It will be done as soon as it happens. And in case of say KVM and HyperV it can be based on a polling cycle which we do, which can be configured as part of the NOVA.conf entry. By default we will make it five minutes and if you want to have a final interval you can do it say one minute. User data today what we collect is by default is five minutes. Again it's a configurable parameter that we can set it up in the NOVA.conf. So, you can basically collect any interval you want. Yeah. So, the question is how often we collect the inventory data and the utilization data. So, the answer is the inventory data we collect by default every five minutes and in cases wherever Hypervisor supports the event mechanism we collect the inventory whenever that particular inventory change happens. And in terms of utilization data we try to collect the inventory data every five minutes. So, you will have a sample for VM host and the VM every five minutes. So, right now we have not implemented the logic where we persist the data depending on the user's requirement users can pull the data from Health & Mon and persist the data. So, we are going to define we are going to work on it in the next blueprint to define how much of the data we want to save and how much data will be available so that's in future. Yeah. Right now thresholds data we will have and we are looking for integration with other systems wherein we can define the thresholds. Say as I told heat is the one where we can define we can feed in the data and you can define what threshold rule you want to apply. Say if I want to say have my CPU utilization is beyond 80% so give me an alert. So, I should be able to do that one. Right now we are collecting the metrics related to the CPU memory IO and the disk network. CollectD is the one integrated in the sense right now I showed you the this is the picture here. So, basically you need to implement a word proxy which will today in order to interact with the KVM server here you are using the live word interface. So, here for ESX you are going to use the VI SDK. Similar to that one if I want to use the collectD to interact with any of the services here. Let's say I'm going to write one bare metal driver. So, I'm going to use collectD to collect the inventory data and alerting data for bare metal. So, I should be able to just write that driver here and plug it into this service. How much overlap do you see a health amount related to Cilometer? And in the long run if there is overlap how do you guys to you know work to each other? Yeah, with Cilometer what we see is that mainly we are looking at the monitoring data out here. And Cilometer is mainly providing the metering data. So, we are working towards the metrics and Cilometer is having the meters. And with Cilometer and health amount integration we should have the combined solution where we will converge on one solution to provide the metering and the monitoring data. Yeah, that's correct. Here you really don't see too many agents out here. We are looking at a proxy model wherein we will not have one proxy for each of the compute. There can be n number of compute that can be managed through one word proxy itself. You don't need to deploy a one word proxy for each of the nodes here. So, you can manage say 100 KVM host using one proxy. In fact, if you are having a as I told you about the one node setup, right? So, with that one node setup if you want to manage hundreds of KVM servers, all that you can do, all that is required is you deploy your controller and in the controller itself you can run this one word proxy. Yeah, when we say optimization here, let's say I'm collecting the inventory data which gives me let's say if I drill down into the inventory which provides me the details about the cluster and VM instances. So, if I see a VM instance getting moved from one host to other host as part of the inventory, so I see that resources are getting at a threshold and I may want to deploy more resources to the VM host level or I may want to deploy a new host into the cluster. So, such optimizations are possible and also optimization is possible even at the provisioning site wherein I can say I want to always deploy to a DRS cluster or a HA enabled cluster. So, I can do that kind of optimization with the data that you will provide. Yes, so, since the data is available as part of the APIs out here, you should be able to access that one through the Nova APIs and you can write one scheduler which can make use of the health amount data. Today, in fact, you see the Nova compute which is periodically providing the scheduling data. So, tomorrow we are looking for an integration between the Nova compute and the health amount which health amount can provide the required inventory and alerts data which can be integrated with the schedulers. Are you saying fine grained or okay, as I said by default we are going to collect it every five minutes and if my deployment requires it to be different value, it can be increased or it can be decreased. And also, depending on what hypervisor you are using, you may be able to do it instantaneously also. So, as soon as the cloud inventory changes, you should be able to sync up with that cloud inventory and you should be able to provide the highest and greatest data always. Yeah, maybe use the Libert drivers for the KVM. Yeah, here the part which talks to the KVM compute out here, it is there in this word proxy which will do a remote Libert connection to the KVM and it uses the Libert APIs and also it uses the Libert event APIs also which will look for any changes that happens on the VMs. Yeah, so what I was looking for is mainly whether it is hard coded as the five minutes or the collection interval is five minutes or it adjusts itself to say whether it collects it beyond five minutes or below the five minutes. Monitoring what happens here is a continuous process and we collect the data as per the configuration settings. And depending on wherever the eventing is provided, you will have the data instantaneously when the instance that particular event occurs. Otherwise, by default anyway you will go and collect it every five minutes. So that's the logic. If I understand your question correctly, what you are asking is we collect the data every five minutes and whether you have some kind of aggregation logic, right? So right now we have not defined any of the aggregation blueprints as such and we intend to do the aggregations and post-processing and preprocessing of that one in future. Definitely it is there in the plans. Sorry, I didn't get you. Since we are listening to the events that is occurring on the hypervisor side, whether new instance came up or instance was deleted or instance is updated, we get to know about the current status of that VM. So we will monitor the status of the VM to say whether it's a power state. So if power state is down or power state is up and accordingly we will provide the alerts. So as part of this alerts, you can listen to those alerts and take actions. One last question. That is actually part of the autonomics that we are thinking of and that is still in plans and we haven't implemented anything yet. Thank you very much.