 Hello, welcome everyone. Thank you for attending our presentation today. My name is Pramod Bhandevaad. I'm from Tata Communications. I work with the Cloud Enablement and Network Services Group. I have with me Pratik Goyal from Tata Communications, Nishi Ahuja from Intel, and Jason Wenner from Mirantis. So today, we are going to talk about an exciting new solution which we are building with the help of Intel and Mirantis, which will help assure workload availability with OpenStack Clouds. So this is what we are going to talk about today. So the first thing we'll see is what are the challenges the enterprises are facing right now in bringing their workloads onto OpenStack and to address those challenges, what needs to be monitored, and what can fail, what are the gaps right now in the ecosystem which to address those failures, and then what are the options available. And then we'll dig into how we went about solving this problem, what were our design goals, our architecture, and then what algorithms we build, and then we'll see what are the components involved. Then we'll talk about what are the development options, future development plan we have, and then we'll also take you through a demo. So this is a very simple ask from an enterprise customer which is keep my VM running. Today, we heard from Dana about two modes, mode one and mode two, which is the legacy applications, whereas in the other category here is the cloud native applications. So when enterprises want to move their legacy applications onto cloud, they expect continuous operation and fast and automated recovery. So OpenStack does provide scale and redundancy at the infrastructure layer to provide HA for applications built for horizontally scaling cloud applications. So it is designed for applications which are built for failure, which are designed for failure. This, it seems that they have voluntarily excluded some features just not to impact their ability to scale. So that's the primary reason we see that. And so to talk about the challenges involved, I would like to invite Jason, winner from Merantis. He's a chief architect at Merantis. We'll talk about what are the challenges involved from the enterprise standpoint in bringing their workloads onto OpenStack Clouds. Susan? Been working with OpenStack since roughly 2010 in the bear time frame. And the number one ask has always been, I don't want to be a sysadmin. I've deployed my tool, keep it running. And this really defines the bulk of the workloads that we experience today is that somebody is expecting the developer or the workload owners expecting someone else to take care of the care and feeding of that and assuming that the hardware is going to be maintained to high levels of perfection. In the cloud world, that doesn't work. So what do we have to do? We have two common patterns. We have horizontally scaled, resilient applications, the modern cloud architecture, where hopefully the team has built a complete infrastructure cloud deployment automation framework. And maybe 1% of the cloud applications have that, where any failure is automatically handled by the operational tooling. The bulk of them have some manual level of installation. And anything that reduces the operational load of keeping that workload up, whether it be on the dev side or in the production side, is a serious enabler of the user. And then we have the 99% case for the workloads, which are the legacy applications that are not horizontally scaled or resilient. And they depend on the infrastructure to maintain their availability. And OpenStack has really let these users down. Ultimately, what this team has done has built a framework for enabling intelligent operation, automatic operation for workloads to deal with issues on the hypervisor so that the users and the operations teams don't have to. How many of you here are operators that are dealing with my VM failed help on a regular basis? So this story is for you. So I think it's particularly clear that there are still concerns for enterprises to move their workloads as ease onto a cloud platform. So to provide the kind of availability that the enterprises have come to expect, we need extensive monitoring. And to determine what needs to be monitored, we first need to understand what can fail. So this is, so it can be broadly classified into three categories, the failure categories. The first one is infrastructure failure. Infrastructure failure means OpenStack services could fail, as well as the hardware running on the OpenStack. The hardware running the OpenStack can fail. So by hardware, we mean the physical compute nodes, the controllers, the network switches, routers, storage nodes, and all the other components associated with OpenStack. So this is one category of failure. Then we have the guest VM failure, where the operating system which the guests are running can fail, or the virtual networking, or their storage might fail. And then there's the application failure. So our focus is primarily on the infrastructure failure here. How do we intelligently handle infrastructure failure? So to reiterate, so that's the challenge, being prepared to handle both component and total failure. So for any kind of application, the objective is to be prepared to rapidly recover from any kind of service disruption or disaster. So the solution here is to build auto remediation using predictive algorithms. So auto remediation involves just much more than just clearing an alert. So it involves a role engine, a workflow engine, which analyze the event, and then they notify using an ITSM notification mechanism so that the operators get to know about the problem. And then there's a workflow engine which will tell us, which will tell the framework, how to go about resolving this error. So that being the challenge, we will look at the typical points of failures in our deployment. So this is a typical open stack deployment in an enterprise world. So we have the controllers failing, we have the monitoring server itself failing, and then the compute nodes could fail, and then the storage network, the management network where the operating operations team will log into the systems to manage that could fail. And lastly, the open stack control network followed by the VM tenant network. So at the control plane, I think open stack pretty much handles the high availability pretty well through use of pacemaker and Corosync. But all the other layers, there are still gaps on how well do you monitor these systems and how do you take remedial actions. So these are the gaps. So now having looked at what can fail, so these are the gaps we have identified in order to assure the workload availability. So the first gap is the compute node failure. So it's pretty well known, right? So what happens when compute node crashes? We don't have a mechanism where all the VMs are automatically evacuated. So there are solutions out there like pacemaker remote which have been integrated, but it's still pretty complicated as far as we have observed. And then there's predictive component health degradation. So how do you predict whether a compute node will go down or not? So what are the factors involved in predicting total failure? So there could be a correctable errors as well as uncorrectable errors. How do you determine whether any event which has happened with the compute node will eventually lead to a compute node crash? So that's where, so there are no known solutions out there to gather this intelligence from the platform. Then there is thermal awareness. So thermal awareness is where we have, we need tools which will measure the exhaust temperature, which way we need tools which will do a thermal stress evaluation. And how do we balance the nodes so that they are equally distributed and the thermal footprint is well below the margins? So even an overshooting might, overshoot of a thermal temperature will impact your CPU course, will impact the dims and other components involved. So we need to handle this. Then there's another gap where we don't consider the QoS or performance degradation. So we can, there are a lot of actions to be taken when there is a performance degradation observed on the compute node. And then the largest gap here is the framework for auto remediation. So there are monitoring servers out there, monitoring services which have been deployed. What do we do with those events? Right now it's just with a flag or notification, they send emails, they open a ticket, that's all, that's where the action ends. So there's, so what we have done is we have built a framework for auto remediation where we pick those events, we analyze those events and then we take certain correctable actions. So these are the current monitoring options that, so we talked about the two aspects of the solution. One is the event gathering, the other is how do you use those events to auto remediate a problem. So these are, we've listed some of these tools out there in the market which can be used as a monitoring framework. So they have their own pros and cons, we're not going to dig deep here, this is pretty well known. So now based on the gaps and based on what's out there, we have designed these goals for our framework to provide continuous workload availability. So the goal number one is to take reactive measures. So this is when a problem has already happened, you know, the compute node has crashed, so what do you do? So the simplest action is to evacuate all the VMs to a healthier node. Then the important goal is to be proactive by predicting failures. So there could be multiple failures with CPU memory, there's thermal, there's a requirement out there where how do you put a host into maintenance mode? So you need a host to be put into maintenance mode and all the VMs should automatically be live migrated onto a new node so that the original host can be bought down. Then goal number three is to be flexible. It should work with any kind of a monitoring framework out there, be extensible. So how do you, the framework should be extensible enough where you can add new samplers, new collectors, new rules engine where you can specify what kind of metric needs to, needs what action and then be scalable. So it should be a pretty obvious requirement. So this is how our architecture looks like. Pretty, so one of our main mantras was to keep it simple. So we used Xabix, Xabix, we found it to be pretty powerful monitoring framework. So we've used Xabix to collect all the metrics here. So what's new here is on the right hand side, on the compute side, you see different data collection tools. We use the smart month tool to get intelligence from disks like disk errors and disk temperature. We have, we use the Intel machine check architecture here. So Intel machine check architecture has enhanced MCA logging which gives early indication of hardware failures. So it says a dim, there's a memory failure, then we get the actual dim address and then the action could be to deactivate a dim or deactivate a CPU core, right? Then we use the Intel node manager to get thermal intelligence here and we have custom tools or SNMP data to collect performance so that we can do a QoS performance-based migration and then we could keep adding any kind of intelligence we want to gather. So all these data is routed through Xabix, they go through some transformation flow where from Xabix monitoring server and then let's put onto the open stack notification bus where we have written this instance as a agent. Instance such a agent picks up these messages and it analyzes what kind of an event it is and then it puts it into a DB. So and then there are two agents, the monitoring agent and action agent which are looking, which are monitoring this DB for any events, let's say a dim failure occurred. So the action agent refers this event then goes through a policy engine. The policy engine will tell the framework what kind of an action it has to take. So we'll see how the policy engine looks like and the monitoring agent will actually monitor so suppose the action was to do a live migration of a specific VM, the monitoring agent will check whether this action was successful and then it has a certain workflow on what needs to be done. So digging deeper, so we see this is how the flow happens. So we have from the compute node, we collect information from various tools we mentioned earlier and then from the monitoring agent it goes to a data receiver in Xabix monitoring server and then we have defined new templates and triggers inside of Xabix to let us know what needs to be done on certain events and then we have written certain scripts which actually do some bit of caching to avoid having to dump all the notifications on the bus. And then instance HA agent picks up these messages, puts on to the node setters DB, then the action agent reads this node setters, it goes through policy engine, rules engine and then the workflow engine and calls certain NOVA APIs for live migration, evacuation, scale up, scale down and then there's a monitoring agent which is monitoring for the success or failure of the event. So yeah, so these are the changes we have implemented as part of the solution. We have added multiple samplers, we have used the Intel node manager, Intel machine check architecture, we've added new collectors, we've built these health agents which ideally we would want to be part of NOVA so that NOVA knows the exact health of the compute node. We've built new NOVA scheduler filters which will be used, which will refer to the DB we mentioned so that it can take appropriate action and then we have added new screens to horizon which will show the actual node health. So this is how a sample policy file will look like. So you can see that exhaust temperature or inlet temperature if there's an anomaly out there. So what is the action? Action is to do live migration and what is the notification mechanism? Notification mechanism is to raise a ticket through your ticketing system. Same as, similarly we have other errors mentioned here. So one more thing you can note about is the management network failure. So sometimes the open stack control network or the management network can fail. So we don't want to be taking, we don't want to evacuate VMs in that case. So you can mention what kind of action you have to take in those scenarios. Similarly for storage network, you can do a storage backend migration from one storage platform to another. So these are the workload migration algorithms which we have built and I would like to invite Nishi on stage to take you through what we have done, what are the different algorithms we implemented. So what is Intel doing with Tata? We are working with Tata to expose the telemetry which is the platform features which includes your sensor information from the platform to take any actions around thermal stress or failure of the components or thermal conditions or system going down. If you look at the other monitoring tools or orchestration and automation tools, not many are today using this platform features. So that is one of the reasons that you can't detect the thermal issues, you can't detect the reliability or failure issues. This helps the data center operator with visibility into the platform features, with visibility into the component level issues, thermal issues or reliability issues. It helps in reducing your OpEx, reducing your total cost of ownership and having your environment available all the time, reliably. So some of the algorithms we are working with Tata on some of them are very simple algorithms but are very powerful. So let's say some of the thermal event based algorithm which is just based on an outlet temperature. Now you'll ask what is an outlet temperature? Outlet temperature is a sensor which is not physical sensor on the board but it's an average temperature across the server. It's a very important sensor value because the higher, like there are some components behind the server which have lower temperature thresholds as well as some of the cable wires behind the server have lower temperature thresholds. We have had customers who are moving to, there's a trend in the industry to move to higher temperatures and that results into higher outlet temperature or higher utilization on the server also results into higher outlet temperature. So if that exceeds the thermal threshold we have seen issues like systems shutting down. So although it's a very simple scenario but if you had a way of monitoring the outlet temperatures and taking an action an action could be if it exceeds certain threshold migrate the work to another so that you can relieve the stress on that system. So that's a simple scenario. Another one is a airflow based sensor and taking action on that. Air flow is a pretty important feature which was not available in the prior generation platforms. Today with Haswell onwards we have this new feature which tells you the amount of air that is required by the system. That's important because if you're not getting the right amount of air to the systems the components can either fail or it can result into a long-term reliability issues. So not just using airflow sensors but the inlet and outlet sensors we have built certain algorithms which can either define the component failures whether the fan is failing or the heat sink is fouling and then you can take actions based on that. Then we have other algorithms like thermal stress prediction. This is purely based on prediction not reactive it is more determined if the system is thermally stressed. So we are building some algorithms based on machine learning which is taking not only the power inlet temperatures, outlet and airflow and building that total predictive capability. So you're predicting thermal stress before even it happens and taking actions. And then as Pramod stated we are also exposing a lot of errors from the platforms which can help us determine CPU failure predictions, memory and the disk. There are others like the performance degradation and compute host issues. We are still working on those kind of algorithms but this is just a sample of the algorithms we are working with Tata on. This takes us to the slide on what are we planning to do? This is more from Tata's side what they are planning to do and we will be collaborating with them on some of these issues. So they want to enhance the user interface work by importing templates into the monitoring framework and build more policies and workflows. They want to integrate their instance HHA agents into the NOVA. They want to integrate the workflow engine into the OpenStack and then they want to enhance their algorithms or today they are basically using random VMs but in this case they want to use more impact VMs to migrate their work to. And the tagging VMs is also similar approach where they want to attack particular VM where they want to migrate their work to. As we saw they have added quite a bit of sampler which includes node manager and other places they can pull the data from. They will be including more samplers in that. They are going to also work on storage monitoring and back end migrations and also work with merenties on fuel plugin development. With this I'll want to bring Pratik on the stage so that he can demonstrate that simple scenario of outlet temperature based migration. Hi, I'll be demonstrating how our framework deals with three different scenarios. The first scenario will be the thermal overshoot. The second would be where you want to put the host into the maintenance mode. Third and the last one would be when the compute host crashes and you want to do an evacuation of the... So this is a schematic diagram of the scenario that we are talking about. So we have an OpenStack killer deployment with merenties 7.0 and wherein we have one controller which is node 46 and then we have two computes, node 48 and node 49. And there are a certain number of VMs running on node 48 compute. Along with these nodes, we have a Zabx monitoring server which is going to collect the data from these nodes. With our framework page into this deployment, Zabx monitoring server starts collecting the platform specific data as well. And one of the entity in that platform data is exhaust temperature in this scenario. And say there is exhaust temperature overshoot on a node 48 compute. And whenever Zabx monitoring server picks up this data from node 48, it checks it whether it is within the threshold limit that we have set or not. If it crosses the threshold limit, it alerts our framework saying that this node has overshoot the exhaust temperature. And our framework then comes into play and it's configured to live migrate a VM from the node 48 to node 49. So basically an unhealthy node to an unhealthy node. So let's look at the video of it now. So this is the fuel environment that we have. Sorry? The video is not up. Can the video come up on the screen? We have a video player here playing up here, but. Does it extend it or something? Maybe it needs to be. Okay, if you do it then we will process. Slow. So this is, we are on the controller right now. And this is the service list that we have. So you can see that there are services here for node 48 and node 49. So node 48 and node 49 compute services are up and enabled and we have four VMs, stress zero, one, two and three, which are active on host node 48. All of them are on node 48 right now. And let's go to horizon and look at the instances. So these are the four instances that we have. Again, we are showing it up. And the thing is here, we have a lot of four VMs. We have two with a flavor of tiny flavor and two of them are boot from volume so that we cover up the different flavors as well. So this is the node health custom page that we guys have done it. This shows you the health of the node. It's an overall health of the node. And we see that there are a list of instances that are there. So it lists as there are four instances in it. The state is okay. There's a faulting event right now. There's nothing. Everything's all okay. And let's look at node 48. So if you want to look at each node's platform, specific events, you have to click on it and then you go and see it. So there's an exhaust temperature, inlet temperature, disk temperature and correctable memory error. The second column shows you the current value that the Zavix monitoring server is getting for these events from node 48. And then the next column shows you the threshold values that we have set. Any there means that we have not set the thresholds really. And since everything is working under the threshold limit, the status is okay. So we'll go and look at the Zavix screen for a moment. And this is the node 48 definition that you can see in the row. There are triggers to it which have been set and the template that we have is node health template. So this is the trigger that we were talking about. This is an exhaust temperature overshoot trigger, marked in blue, and the threshold value is greater than 40. And whenever the value that Zavix reads from the server, it crosses this threshold, we have actions defined for it. So whenever it crosses, the trigger value becomes a problem. And then we run a command to intimate our framework above the same. Similarly, we have another action defined. Whenever the exhaust temperature value comes down the threshold value, we set the trigger value to okay. And again, we intimate our framework for the same. So let's go and simulate our exhaust temperature overshoot. The simplest way to do it is just change the threshold value under the current operating value. So we are gonna set it to value 20, change the threshold. And then we're gonna go and look at the node health page again. So right now it says all okay, because Zavix has to read that value from this server. And now we see that how did we change the threshold value by the way, first of all? So that's done by calling the Zavix API. So our horizon page has called this Zavix API to change the threshold and it received a positive response from Zavix. And similarly, you see the last line which says the received update from Zavix for node 48 on even exhaust temperature. So that means Zavix has seen that the new value that he has received crosses the threshold and it has updated our framework for the same. And let's look at the node health page. If it's updated or not, it says yes. And here you see that the state has become warning. The faulting event is exhaust temperature. Corrective action we are doing is live migration and the action status is started. So right now it has not picked up any VM. It is just that our framework knows about it. So this is where we see that, you know, the Nova scheduler filter also comes in. So once he picks up the VM that he wants to move, he checks out which node is healthy. So in this case, he sees node 49 is healthy and he tries to move it. And the last line says you started the live migration of VM test underscore three with a VM ID from most node 48. So this is a split screen that we are showing up. On the right hand side, you can see that the event has been started and you can see a VM ID also over here. So he has selected a VM ID and he's migrating it. Let's wait for a few seconds. Yep. So now he says the action status has changed. It is migrating the VM now. Yep, so he says that the VM has been migrated. So live migration has completed. And we can see node 48 has only three instances now. And we'll just show you the triggers that the trigger value for this exhaust temperature we showed has changed to 20. Because we call the Xabix API to change it. And if you look at the list of instances that we have, we see that test underscore three, VM has moved to node 49 and rest of them are on node 48. We go back to node health. We see one of the instances have gone to node 49 and three are on node 48. So having done this, node 48 now is classified as an unhealthy node for our framework. And node 49, it still remains as an healthy node. So if we try to create a new VM, it should go to node 49 and not 48 because we have even done an over filter scheduler wherein he goes and he checks whether it consults our framework to find out whether this host is healthy or not. So that's where we are creating a VM now. And if you go to admin instances, we would see that test four is on node 49 now. And if you go to node health, we see that there are three instances for node 48 to for node 49. And the same screen again, the logs of the Nova scheduler where we see that node 48 is not healthy and hence filtering it out from the list. But node 49 is healthy, so he's continuing to pick it up. So this is the second scenario that we have, which is the maintenance mode. So if an instance admin wants to, so yeah, so before that, the system setup remains the same, the controls and the compute and the Xabix monitoring server. Now since admin comes and he wants to put the node 48 into the maintenance mode. So to complete this process of node 48 to be into maintenance mode, we need Xabix server to also understand this activity because Xabix monitoring server is continued to connecting the monitoring data from all the nodes. So if you take node 48 into the maintenance mode, the Xabix server, it should stop collecting that data so that it does not generate any false alerts for this during the maintenance period. So that's where node 48 informs Xabix about the maintenance mode, then Xabix then intimates our framework about the same thing, and our framework understands that if the host is going to maintenance mode, we have to live migrate all the VMs from node 48 to another healthy node. So let's look at the third scenario that we have, that's an evacuation wherein the node has to crash. So this is one of the most common scenarios that we all are aware of. So say for some reason, node 48 compute crashes and there are certain number of VMs running on it. This monitoring server Xabix is collecting the data from node 48 at a regular intervals of time. Now when the node 48 is gonna crash, Xabix monitoring server is gonna see if he can communicate with node 48 on one of the links. If that link fails, he tries to the multiple other links that he can talk to. If all the reachability to node 48 from a monitoring server fails, that's when Xabix server declares that the node 48 is unreachable. And once he declares that it's unreachable, he updates our framework for the same, and our framework knows when a node is unreachable, I have to evacuate all the VMs from that host. It's a control functionality. All the information is there on the controller. Sorry. Now we don't have to reach the VMs on the compute in order to evacuate. It's a restart. It's a restart on the other destination host. No, it's not my guess, it's evacuation. Yeah. So yes, that's all we had. We are open for Q&A. We have a minute on. Yeah. So that's based on the scheduler changes we have made. So it goes and checks our DB on what are the healthy nodes available, and then the filter kicks in. Sorry, yeah. I'll just hand out the mic. Oh, it's out here. So in order to move machines to another host, or multiple hosts, in case of multiple VMs, you'll have to decide what is the right host that will take the VM in. For that, usually you have to implement a bin packing problem, like first-wish descending or some other mechanism, right? So which is the algorithm that you used to move the VMs into the existing host? So we've left that to the NOMA scheduler. So we don't decide which host, we don't want to play the functionality of what the scheduler is doing already. So what we have done is we've added filters to the NOMA scheduler, which will, as part of the selection process, read our node health DB, which we saw that node status DB, and pick up a healthy node and then move the VMs there. Okay, that's fine, yeah, because the NOMA scheduler's not very smart, actually, so you have to have some sort of a solution. So yeah, that's why we have linked it to our... Thank you. So can I ask a, sorry. Right here. There you go. Oh, yeah. Two questions. One, where do I find the code for the stuff you guys have done? So we share the presentation. We have a link out there. Okay. So we are going to post all the code, documentation, the API, access, everything onto that link. Fantastic. Second question, can you give some characterization of the latencies, like when a node fails? Let's say you have 100 node cluster, right? So how quickly do you detect a node failure? So currently, it all depends on how the polling intervals you have set, how frequently you want to poll for the errors. So ideally, we've kept it at three to five seconds for some events, and it goes to 30 seconds for the other events. I think it's not, we cannot give you a definitive answer on what should be the polling interval. It all depends on the workload and the events being generated. So I think ideally, it's three to five seconds for... But have you tried getting it down to a millisecond range, does it work? No, no, we have not. I think as part of the testing process, probably we'll do it. That's not gonna work. But I don't think we'll get there. That's not gonna happen. You're not gonna be polling at milliseconds. What happens if your monitoring server can't see a compute node but Nova can? Okay, so that's one of the links which has access. So the way we have deployed here is, the Xabix is part of the controller network. So whatever Nova sees as part of controller, Xabix will see, unless you know, the interfaces on the Xabix server goes down. But then, so what it does is on the evacuation, then the node evacuation, if the compute node is able to reach the other services, we don't evacuate. We don't evacuate in that case. So it's kind of a console, the distributed HA guys who have implemented. So we have the knowledge of node reachability spread across through that. Thank you for the presentation. And I'm working kind of a similar thing. Oh, okay. Yeah, it's called Masakari. It's on the G-Tab, but... Have you read about it? Yeah. I saw your... Yeah, we're working on the same kind of aspects. But I want to, one question, but when you evacuate the VMs, you have to fence the original compute node, right? To make sure it went down. Otherwise, yeah. We do fencing, yes. We do fencing. Yeah, we do fencing. Okay. So through IPMI tool, we just do a shutdown of the computer. Oh, you do it through the IPMI? Yeah. Okay. So do you have any monitoring on the KVM level? That's part of the future development plan and we want to add those checks because we think we've seen problems there and there's information out there where we think we can take corrective actions on. So that's going to happen soon. Okay, thank you. So I'll stick to the code in detail. Yeah, sure. I think we'll... Hi, I'm Sudhanda from Intel. I just wanted you guys to know that we've started kind of a project called WATER. And it's doing a lot of the same things. And so I think it would be great if you guys actually came and participated because it sounds like we have plenty of efforts and we're working with Nishi too. And so her algorithms are already in the WATER repos. And so we kind of... I just wanted you to be aware of this. We are aware. We are from Intel. So we are working with her. So she's already mentioned to us about WATER. Perfect. So we will see how we can integrate with that. How many of these projects have you got going on in detail? This is like a third I know of. Could I just invite all you guys to... There's a session that Adam Spears and Doa Deja from also from Intel are running. It's like pets and hypervisors state of the nation. So yeah, again, same thing you said. Could we all get together and agree on like... On one of the... Exactly. Let's consolidate if it's... When we're looking to decide what metering or metrics you need for your algorithm, we're not looking at, you know, coming up. We have some kind of algorithm that are part of the package, but it's really about creating a very flexible framework. And so I'm not trying to tell you how that you need to do things differently. I just want to take the best of what you have and make sure that we can have one single framework and everybody can get everything what they need out of it. Sure. Yeah, I think we're leaning towards like a mistral based solution at the moment, but that can be flexible. No, no. And OpenStackHA is also an IRC channel, but it'd be great if everyone could... Yep, I'm doing okay. But it doesn't make sense. It's not effective. Yeah. Could you... Are you one of the authors of this? No, no, I'm the pacemaker guy. Let me just take a photo. And, you know, we have an IRC channel and I can also announce where we're meeting their OpenStack dashboard. Yeah. We said it can set as events every time. Yes, I can do that. Yeah. So everybody can... I thought we've done that already. Oh, really? Yeah. We'll do it again. Okay. Thank you. Thank you, folks. You can visit us at D-15 if you want to discuss further on this. I have one more question. Oh, sorry. Yeah. So somebody mentioned that you guys have predictive algorithms for, you know, basically estimating when the workload is going to be more or when the temperature is going to be higher. Right? You mentioned that. So what techniques are you using for that? I'm curious. Are you using statistical learning? You know, were it moving average or something like neural networks? SVM algorithms and then K-Mean, those kind of algorithms. But yes, some of them are Intel IP, but some of them we are exposing out to the community. So yes, we can share that. If you want, we can work together and... Did that answer your question? Yeah. I don't know if you could send me an email, and then I can... Yeah, it is. Thank you. Thanks.