 Hello everyone, my name is Yuri Gregory, I'm a software engineer at Red Hat and today I'll be presenting the Iron Promoters exporter. This will be just a bit short compared to the presentation on the open infrastructure summit last year. So let's go ahead. In the agenda for today, I'll introduce a bit of the Iron Promoters exporter, present a demo and talk about adventures and limitations and open for question and answers. So the Iron Promoters exporter. Our use case is related to hardware monitoring. When you have an infrastructure, it doesn't matter if it's more bigger or huge. You always want to keep an eye on how your physical nodes are doing, like the temperature, the power consumption, the usage for memory to try to figure out if something can go wrong to basically predict or if something goes wrong, you want to know which machine has the problem. In the June release of OpenStack, Iron add the support for sensor data. It was related to sensor data to be stored in Ceylonter and it was only available for the IPMI hardware. And in the training release, we were able to introduce the Iron Promoters exporter that it's basically a tool where you have a flash application that will be able to provide the metrics that are sent by the sensor data in the room from the nodes to an Oslo message driver, not fire driver that will pass this metrics to the Promoters format and this will be available so you can connect the Iron Promoters export application to your Promoters server and have the metrics and define alerts for our infrastructure. The demo that I will show. It consists in Bifrost setup that I used to have IronQ running and it's running the PowerEdge R640 and I have two IronQ nodes, the PowerEdge using IPMI and an HP ProLiant using Redfish, and I have in my notebook a Promoters server and alert manager running in containers. So for the demo, this was pre-recorded, I used this one during the summit, so basically I already have the setup running and I'm listening to show that the IronQ servers are running and they are enabled. The Promoters server, it runs on port 9608 so we can see there is a flash application already running on it. Now I'm trying to list the bare metal nodes that I have, I have two, the first one, the Dell machine that it's power on and active and the other one that I have is the HP that's power on and in Enroll state. Now just confirm that each machine is using the correct type of driver, so doing a note show for the machine, we can see that it's using the IPMI and for the HP, it's using Redfish driver. And now I'm going to show the configuration that needs to be set, so you have the IronQ Promoters export to configure. First you would need to change some configuration in the conductors section for the IronQ configuration file. Basically you need to set the send sensor data set to true and if you want to get data from undeployed nodes, it's possible also, you just need to set the send sensor data for undeployed nodes to true. And send sensor data in preview, it will tell the amount of time that will take to collect data again from the sensor data for the nodes you have available. So in this case I've set to 9 seconds. The second part is in the Oslo configuration that we need to set, Oslo not message, not fire, the notification. So we have a specific driver that we want to have enabled that it's the Promoters exporter. A transport URL needs to be set to fake and location will tell where you want to store the data from each node that you'll be collecting. If we go to the data we can check if there is something already available there and you have for each node a file that contains the metrics and after each second you are able to update the metrics based on the sensor data that was collected. We can see that there is a difference in the amount of lines that you're having for each file. This is due to the driver that you are using for the machines because IPMI we report different type of metrics and redfish and also based on the hardware that you are using it can also change the amount of metrics that are enabled to be collected and exposed by IPMI or by redfish. This is how it looks like to the file for with the metrics. First you have a description about the metric first line with help and then the type and normally we can only have gouge and then we have the name of the metric and the labels that are used to it to identify the unique metrics and you can set different type of labels basically. We are using to the instance UID, the node UID and entity ID that we got from the IPMI and from the redfish basically to say hey this is the entity that is giving this type of metrics and the last value in the line is the value that was collected. It's not good to look at files basically just to see the values of the metric so locally since I have the parameters running with alert manager I will try to show to you guys that we have an interface. We have the alert manager running and it's on port 1993 and the parameters running on 1990 so accessing my machine we can try to see a specific metric and filter and check how it's doing the time series that we have for that machine and the specific metric and you can choose to filter from the different type of name metrics and also you can filter by the labels that you have for each metric and basically this is how parameters looks like for people that are already used to it and you can select the metrics that you want to be monitoring and having your dashboard and here I'm showing a different metrics bar met power status basically 0 it means that it's power off and 1 it's the machine needs in power on and also you can define alerts in parameters to each metric that you you want and this will be available in the alert manager option in a few seconds and we can see that for example clicking the alerts bar for parameters we have a power supply failure failure it this was defined by me and basically we can see that for one node we have the power subsets failing and if we go to the alert manager we are able to see the alerts that trigger that and find all the information about the metric that gave the the alert trigger basically and I think there would be this and there is a description about the why the alert was triggered and indicates like the expression that was used its bare metal power status that is the name of the metric different from 0 and you can see the configuration that was used to instantiate the container for the prometos locally and prometos works based in the pool model so we this part identifies where the where prometos should try to collect the data and the endpoint and the interval that it should be trying to scrap data from the specific endpoint so by default and from its exporter runs on part 9608 and with that the presentation is over and basically talking about some advantages and limitations that we have in the prometos exporter it's easy to start and configure it's vendor agnostic the data collection is no intrusive so because you have a node deployed by ironic so you don't need to have anything also start in the machine to be collecting and exposing the data this will be done by ironic itself and has the integration with prometos some of the limitations basically is that we it only works at the moment with IPMI and Redfish driver we all have support for gauge matrix these basically means metrics that we have a single value to represent that because prometos also has support for histogram and different things so for now we all have support for gauge matrix and the set of metrics that you have available for each node it will depend on the hardware type so let's go for some questions and answers thanks a lot Yuri it's very impressive demo I have a couple of questions unless someone else wants to go first no okay my first question is the data center on the picture have I ever been in that data center yes okay I recognize the certain data center on the slide one okay so yeah I took the picture doing the mid cycle that we have answered very nice so my next question is like if I want to extend this say like I want to like collect something else which parts would I need to touch because you said like it's basically defined by the hardware type but what actually defines what kind of metrics are collected so basically what I was able to identify that for example if you are using the IPMI for two different machines let's say that I was using IPMI for the machine and the HP machine they will probably represent the metrics a bit different so what I did when I was creating the Iron Prometheus exporter there is another Oslo Notifier driver that it would only collect the payload that we have from the sensor data and with that I was able to try to identify some of the patterns that we have between the hardware and come up with the name of the metrics basically this was defined basically by me but okay things can change and people may not like the name of the metric and they want to change so we would need to think on how to improve and let the operator define okay I want to change this metric name for this one it would be something that would be possible I would say and what normally happened is that different versions of the IPMI can maybe report in a different format different name and basically okay maybe you don't have all the all sensors from they are using HP machines are not the same that are using the machine so the data that is report it's different but it will have a like the normal value for example temperature let's say and what can happen is that if the sensor is not enabled in the machine okay you are not be collecting for example let's say the fun speed for your hardware because it's not enabled so okay the sensor data will try to get the big load from the machine using an IPMI and it will report back to their implementers exporter and our implementers exporter will try okay these are the information that I have from the machine so I'm going to try to pursue if there is a no value or something that will trigger okay I don't have information to create a metric with this basically but if before there was a metric we would just try to put a value that would identify as a problem it will occur for example if I do remember that is zero and one to identify the power status so if it's different from zero according to the documentation if I don't remember it means that you have some problem in that machine so if we couldn't collect anything and before it was having already a problem we will keep like okay the problem is still going on and we can't have any information about this and for example in red if you're using redfish we introduced redfish I think it was after train the support to collect metrics for that so it would be in Usuri and basically there is the whole difference between the implementations that the vendors are doing with redfish so it's a bit weird when you try to collect the metrics and you start saying okay it doesn't follow any type of pattern so it's complicated when you want to try to find the partners to identify the metrics when I was working to provide the demo I found like some different bugs related to redfish basically and I was able to fix that so there are some bug fixes already available in the latest release for example in Victoria if I do remember but if you have a different type of you want to use for example a different driver let's say a drive or ILO they have support for sensor data already I didn't do any tests like trying to deploy the machine using iDriveDriver and collect the sensor data because it has a support already in the ILO so people can just try to collect this data using the file driver for notifications and okay if they don't want to add the support for the IPE if you can provide me the payload basically I would be happy to introduce and figure out on how to pass the payload and add the support for the IPE basically and basically there are plans for improvements in the IPE I've mentioned a lot of them in the open infrastructure some presentation that I gave it's casting light on bare metal meet the product's exporter it's available on YouTube already in the open infrastructure channel on YouTube and basically if you have any questions related to that feel free to just send them to me to the opensec discuss or ask on IRC or now if you already have a question and also this is used by the metal tree community it's enabled on metal tree and I do remember that some people were asking how to use and you can also try to use for virtual bare metal if you are using redfish because virtual using IPE my you are not able to collect in type of metrics okay thank you are there more questions for Yuri if no thank you very much Yuri thank you