 OK, thanks everyone to join this presentation. This is Xinhui Li from VMware. And this is Yithin from IBM. And they are both the core of Sunlin project. And Yithin also the heat core. So here I will first introduce Yithin for the background introduction. OK, thank you. OK, for this topic, we have two parts to cover first parties, the requirements for the HA solutions in the open stack. And the second part is the auto heating solutions that we implement using the Sunlin project. So this is the first part. We already have several presentations for HA in the past summit, which means that the HA solution in the open stack is a very important part in the open stack community. So the HA solution here is to help to eliminate the single point of failures and also help to achieve the cloud SLA and the applications for the clients. So when we are building a single system, we might face many facility failures, like the power loss or the fire in the database or something like that. So we might need HA solutions to protect the service running in our cloud. And also it is not a perfect word. Anything can fail at any time. So the network component in the cloud in your system is not reliable. The network card can fail, and the cable may be disconnected, and the physical network could be offline. So we normally use the bounding technology for the network cards and connect the two network cards into a different physical switch. And also the storage component is not reliable. So we use the RAID technology to create a cluster of hard disks. And also our operating system is not reliable. Maybe it will crash maybe because of running out of memory or maybe a kernel error. And also the applications running on the operating system is not well coded or it may not handle an unexpected errors or if it's being attacked. So all these things will cause the failures in a single system. And in the cloud, we have hundreds of thousands of machines here. So anything can fail at any time in our cloud. So we must prepare for these failures. So in the cloud, we are running many services like cloud management service or the network controller service. And we also use some software-defined technologies like software-defined networks and software-defined storage. All these technologies are used for the virtual machines running in our cloud. And the virtual machines are actually running the applications that are used by our clients. So the SLA of our cloud actually impacts the SLA of our clients. So that's why we need HRA in our cloud and why we must design a solution in the cloud for our clients. So when we are building HRA solutions in the cloud, we need to consider many aspects. So normally in our cloud, we have some multiple availability zones and multiple regions. And we normally tell our users to deploy their virtual machines into the different availability zones so that if one availability zone is not available or some of the hosts are under maintenance, that the applications or the virtual machines running on the cloud won't stop their service. And when we use the multi-regions, when we will help our clients to design their network topologies and also help them to keep their data consistent, so that's how normally we do in the cloud. But actually in the open stack, we don't have auto-healing solutions for the virtual machines. If a virtual machine is failed, who will recover the virtual machine and who will provide the SLA of the virtual machine? So there is no such a solution yet in the open stack. So OK, I will go through this picture. So in the open stack cloud, actually we need four levels of HRA solutions here. The first level from bottom to top, the first level is the host level. In the host level, we have some virtual solutions for it. So we don't care too much about it. But the second level is the open stack level. In this level, we are running some open stack services like open stack APIs. And we are running some OVN technologies, software-defined network technologies like network controllers like Dragonflow or OVN or OpenControl. And also, we need to provide the HRA solutions for these services. And also, we should use a database in the cluster and use the message queues in the cluster. And we have some technology for this level, like the pacemaker and HRA processor and zookeeper. And the third level is the virtual machine level. And in this level, we normally use instance group. And for this group, we should provide some health policy on it. When we provide the HRA solutions for this level, we should consider when the virtual machine migrates. How do we migrate the network? And how do we migrate the volume that are attached to this virtual machine? And also, if we can use the KVM live backup for virtual machine, that would be great. But we don't have a common solution for this level yet. And the fourth level is the application level. And this is the most important level for our clients, because our clients are actually running their applications in our cloud, not just rent a virtual machine. But we haven't reached the application level yet. Maybe we will in the future. So this shows how our clients use our cloud. They normally use an orchestration engine, like heat, to create a bunch of virtual machines. And as a cluster, and distributed this bunch of virtual machines into the different availability zones. And then use the software deployment technology, like cloud init, Ansible, Puppet, or Chef, or Salt, to deploy their applications on the virtual machines. But heat is not a live management tool. So heat will not recover the field of virtual machines. And also, in our cloud, actually, we treat the virtual box as a black box. But actually, the user cares more about the applications running in the box. So normally, we don't have any solution to monitor the applications running in the virtual machines. We can only monitor the virtual machine status itself. So unless we run some install, some agents in the virtual machine to monitor the status of applications and send out the habits to some monitor service. Or we can use the load balance to monitor the exposed pod. If the pod cannot be reached for a period of time, we can think that the application is died. So we can do the recover operations. OK. So that's how we design our HAA auto-healing solutions on the urban stack. So normally, the client will create a bunch of virtual machines as a cluster here, like a VM cluster. And normally, we could install an agent in the gas OS to monitor the status of the applications and send out the habits to the monitor service. But normally, some of our clients will reject to install the agents. So we could use some LibreWord to check the virtual machine status or use the HAA process to check the exposed pod status and send out the message or send out the habit to the monitor service through the hypervisor. So when the monitor service didn't receive the habit for a period of time, you will think that the applications running in this cluster is failed or is not available. So all the monitor service received a failure message. And the monitor service will send our alarm to the HAA agent here. When the HAA agent received this alarm, and he will check if a VM cluster has a health policy, actually in this health policy, we can provide some different level of SLA for different users. So HAA agent here will check the VM cluster if it's attached a health policy. If it already has a health policy here, then HAA agent will try to make this VM cluster back to healthy. So he actually will do the recovery operations here, like rebuild or recreate or migrate or something like that. So that's our design for customers. Actually, this design will go through three phases here, the failure, the recovery, and the monitor here. When a normal virtual machine goes to the failure, first thing we should do is to detect if it is a what kind of failure it is. It is a hardware failure or software failure. And then we should decide if we need to do the fancying here. OK, the fancying here, it means that the fancying technology here could help us to avoid the three brain situations. For example, if one of the supervisors' network is disconnected, and then we will not receive the heartbeat, and then we will try to do the recovery operation, like recreate another virtual machine and install all the applications on it. And then the network comes back. So at this time, we have two copies of this virtual machine that will cause the three brain situation here. So when we do the fancying, we will kill the original virtual machines so that we only have one virtual machine running in the cloud. And when we detect the failure, we will go to the recovery operations here. Before we recover the virtual machine, we will temporarily disable the monitor policy for this virtual machine, and then we will do a recreate or rebuild the virtual machine. And when the machine is ready, we will try to install the applications into this virtual machine. And then we will re-enable the monitor policy for this virtual machine, and then monitor service will continue to monitor this virtual machine or monitor the applications running in the virtual machine until it die again. The design of this picture is simple, but the details is not easy to handle. So next part, I will introduce my colleague, Xinghui, to give you more details about how we implement the autohealing solution using Sunlin project. Thank you, Yixing. OK, so you already know how availability seems to be very complicated and very complex, and anything can break the availability. So we're trying to leverage Sunlin to help this scenario. We cannot see Sunlin will handle everything, but we're just trying to help. OK, so this problem domain is very huge, so we can only choose the job right following into the range of Sunlin to resolve. So here we're trying to focus on the VM-level availability. And here we're trying to firstly give a quick overview about the Sunlin scope. Sunlin actually providing the clustering service to help the cluster create provisioning and operation management, something like this. And as this graph you can see, we encapsulate the different kind of results, compute, networking, storage, by profile. That's abstract to mention what kind of results you want. And using the profile, you can create, delete, resize, and scale out, scale in your cluster. And this makes us very good at the cluster operation management. And besides the cluster-level action, you can definitely operate any single node, and you also can divide the nodes belonging to a cluster by role. That means we can well support the blue-green deployment. That's very useful for the transparent rolling upgrade. And we can attach, detach policies with the cluster to manage and guide how the actions I just mentioned work. So this makes us very useful for the auto-scaling and auto-healing scenarios. So that's all these functions can be exposed as RPC already exposed as REST API. And they have implemented the overall documents and examples. And I want to emphasize two K plug-in mechanisms we provide because that's the foundation of our auto-healing support. So one is a profile. That means we can use this abstract to mention what kind of results you want to manage. And nowadays we have four kinds of results. That's container, heat-stike, NOAA, SVM, and ironic physical host, something like this. And we can also define a group of rules to check out your thoughts before or after the connections are performed, such as placement and auto-scaling and deletion, all these things. Different policies, we can cooperate them together, such as if we attach both the scaling and placement. That means you can control where the node you want to place when scale out or scale in happens. So that's very useful. And because we can leverage all these mechanisms to set up and to end customizable auto-scaling loop by yourself for any special purpose. Actually, this is a foundation we provide. I mean, based on the Seline's scope actions and the profile plug-in, such kind of foundations, they indeed extend some functions especially for the auto-scaling support. Here, as I show, we start the heat-health manager inside each Seline engine because Seline supports multiple engines for scalability and that consideration. And in this health manager, we can help to provide the polling or listening detection help to know the status of the cluster and the node. And after that, you can do the recover or something like this, the loop. We're trying to provide the in-bed kind of detection for you to help the typical but generic scenario. And on the other side, besides the in-bed detection mechanism, we also provide the receiver. That's abstract to help to receive the message or expose the URI for the third-party monitors that can be open stack monitor or non-open stack monitor to call together. And we provide the lifecycle actions as you already know, delete, create, resize or something. But besides all these things, we indeed provide the cluster check and the cluster recover to action specially for auto-healing. That's because we're trying to use them because all these functions already exposed as a REST API. We're trying to help those who care about application level failure. And we can use the REST API to close the loop with their own application failure detection mechanism. And we provide different kinds of policies such as the health policy and placement and scale out, scale in such kind of policy to collaborate together to build the loop and let the user to customize. Okay, I will give more introduction one by one all these extensions. The first one is deployment. Firstly, I can see the auto-healing is a key capability for the cluster management and operation. But we can see the strategies we used during the placement or result scheduling stage will have a long lasting impact on the availability of the workload. So here we provide the different policies such as affinity and affinity and accuracy and the region things to help this stage. And nowadays we provide the anti-affinity policy to help the NOAA results. That means we can help to decide if we need to put the VIMs into the same hypervisor or different hypervisor. And across Asia and the region, that means we can give the weight of different ROM or region. And they suddenly will ask validation from the K-Stone and the NOAA to know if the given ROMs and regions by it or not. And then schedule the node into different to the zone and the region. So across Asia and across region policy can be applied with the NOAA profile and the heat stack profile. So that's our help on the placement. And then the next one is about failure detection. As we already analyzed, the availability is very complicated, very complex and the only component can fail and nothing can immune to the failure. So if we want trying to pre-comprehensive detection, that's, I think that's impossible for Celine to do that. That's definitely out of our scope. So what we are trying to do here is provide some help to the typical and the generic scenarios. And we provide some plug-in mechanism to collaborate with a third party to close the loop. So that's what we're trying to do. The first one is we provide the health manager and it can cooperate together with the health policy. That means once you attach health policy to the cluster and underlying the policy will register the cluster to the health manager. And the health manager will start the detection based on the detection type defined in your health policy. Now these two types are supported. One is the node status polling. The other is the VM life cycle events. That's both by polling or listening to the events of the NOAA to understand the status of node and clusters. And on the other side, besides the, that's just the kind of in-bind mechanism. That's very simple, but we're trying to help the basic scenario. And if you want more comprehensive detections from the third party monitors such as the Celerometer, and you know, Nigel's, that's totally non-open-site monitors. Or any enterprise monitor such as Verobs and Arcane, something like that. You just use the receiver. That's the results created inside the Celerometer engine. That means there are two types of, you know, the receivers we can create today. Users can create a receiver to trigger some specific action on behalf of some user or program when special events or alarm fairs. So that's the reason why we can cooperate together. We have two types of receiver. One is a webhook, that's a URI, exposed for the third party monitors to just post the HTTP request. And the other one is a message that means back on the back end we can create the ZAKAR queue and receive the message from the monitor to trigger the specific action. So that's the detection part. And we believe by the external receiver, you know, collaboration loop, we can collaborate with some, you know, application-level failure detection recovery scenario. But that's definitely after for the Celerometer scope. And the next one is about the recover action. Actually recover is independent of the policies driving it. So here we're trying to see we provide diverse and different kinds of options inside the health policy for users to, you know, to choose. For the heat stack profile, the left side, left side, list of actions that we support today, recreate, update and convert if they are ready we can support, definitely. And then for the NOAA profile, we support the reboot, rebuild, recreate. And for the recreate, just as the eastern side, underlying we implement the fencing of VIM-level, that means delete, really delete the VIM before, you know, start another one. And we support the migrate code like a migrate, live migrate. And of course, we can support the special capability from different hypervisors, such as we're trying to, you know, collaborate with VIR and the KVM and then such kind of, you know, special things we can expose their capability into the, as a sunny action and invoke them, you know, in the health auto scaling loop. And after, you know, all these introduction are conceptual. So here I'm trying to give some examples how to use all these functions. There are three ways to consume all the functions. The first one is, of course, command line. And Seline has a very good client and the command line support. And the second one, you can use a heat template. That means definitely you can define a template to create a Seline cluster and attach a health policy or placement policy visit in one file and all the processes are totally automated. So definitely you can do that. And the REST API, we recommend that because we already exposed all the function size REST API to help those to divide player all the solutions or products. And there is a fourth I didn't mention is about Seline has its own dashboard, but we will not show the dashboard here. As I showed on this page, and you can use a profile to create a cluster and attach a cross easy policy with a cluster. And after attachment, the placement of the new node or delete which node from the cluster be all guided by the policy. And as you can see, if I give the two easy, the same way, that means I want to put the new node into the roles in a balanced way. You can just attach a cross easy policy and anytime the scale in or scale out happens, we will put the new node into different role. And similarly, if you want to scale in, that means you want to delete something, we will choose some proper candidate for you to do that to comply with this amount of different roles. And this example is you can attach a health policy and that's registered the cluster to the health manager service. And the service will start listening, start listening the VIM life cycle events. That's the NOAA notification. And then triggers NOAA to do the recovery actions or something like that. And the third one, this one is very different from the several previous pages because that's not belonging to Selene because here we're trying to use WebHook to cooperate with any other third party monitors. Here I show the VAROPS alarms. That means whenever the alarm is fares, they will post the HTTP request to the cluster WebHook and trigger the recovery actions. And then in the next page, actually I want to go very quickly about how to write the templates or what kind of commands you can leverage. The first one is the profile. Definitely you can put all the flavors images, all these things into a YAML file and use the YAML file under the command line list layer to create your profile and use the profile to create the cluster. And as here I show, you can definitely build a cluster with a minimum size, has a two instance and the maximum size is just as you want to define. So very, very easy to use. And this is a sample for the cross easy policy. Here I list two rooms and give them the same weight. And definitely you can put all the properties content into a YAML and attach it to the cluster to enable it to help the placement of the nodes across the different zones. And this is actually similar. That's a health policy. The command line things I didn't list here because that's similar just as you know, create the policy and attach the policy. But here one thing I just trying to emphasize is the detection type here we use is the YAM life cycle events listening and the recovery action we define is the rebuild. And the last one actually is about the profile and template things is about the receiver and alarms. Here I showed how to build the various alarm. Here as we can show, we use the disk remaining space to trigger the alarm. That means if the VM system disk is less than two gigabytes then they will call the receiver recover. The receiver recover is defined on the left side. You can see here the action attached with the receiver is the cluster resize. And that means once the disk remaining space happens less than the desired threshold we can trigger the resize. Okay, after all these introductions actually during our development we identify some tasks as our next step. The first one is we're trying to extend the detection type we can support. We will add two types. One is the member failure of the load balancer pool. The other is about host status events. Now this, sorry, today, or KTVA already know the status of the member in load balancer pool, but it's regretted, didn't post the events outside. And similarly, in NOVA we had a bit of things but it didn't send out the notification about the host status. So we're trying to change that to help this auto healing purpose. Next thing is about the recover actions. We're trying to see if we provide a rich cluster, management and operation actions. Here we're trying to help the host failure so we will add the host fencing accordingly based on our support to the detection type. And then definitely we can collaborate with other projects to do the workflow. That means once the detection failure happens we can trigger the workflow to happens. And then we're trying to extend all the mentioned functions to the container. That means not only NOVA heat, we're trying to support more container things to do the auto healing. And for the customizable actions besides the workflow, we're trying to support the scripts maybe just specified or written by the user itself. We can just bring them into the loop to do the automation. So that's all the things we recognize nowadays. And one thing I want to clarify is the high availability of auto healing is very important but hard to resolve in one day. And now this we're trying to do is subdivided the data domain into different problems and we will resolve it step by step. And on the path we definitely will keep our mind open to collaborate with any projects and any ideas to achieve the same goal. So thank you, and it's a Q&A time. Thank you. Thank you. Why you need Selene database and what data is stored here? Actually in Selene database we store several things. One is the model things, that means the cluster VM node, cluster all these things and all the actions, you know, policy action, all the management model we store them into the DB. Selene create all the things and Selene will operate and manage all the things. So we have to keep everything together, including the engine, you know, the running information to support the multiple engine and you know, to operate everything. Yeah. And as I understand Selene installation also should be high available. How it works now? I'm sorry? As I understand you need to provide HA for Selene tool, right? Yes, actually we're trying to, yeah, that's a different problem as we described in the first part. Actually that's two different things that we're trying to help is to help the cluster, you know, auto-hailing things. And for Selene if it belongs to the control plan that's where it's, you know, more mature solutions there such as pacemaker, remote pacemaker, cursing, you know, these things, that's another field. Go ahead, thank you. Thank you for presentation. I'm Sampath from NTT. I couldn't figure out, you said like, you're gonna delete the node, when you did the reactions, you're gonna delete the node and start a new one, right? The VMs. Yeah. How do you provide HA once you delete a node? I mean, that means you lose all the contents in the node, right? So then once you start it again, that will be a totally new node. Yes, actually that's the reason why we provide the, expose the recover actions as options for users choice because only user know how, you know, what's the impact if the failure happens, they will choose the proper way to recover it. Okay, that means it's totally configurable, right? Yes. Okay. We give them more reachable, you know, support here. Okay, so another question is though, all the monitors you had to pre-implement into the cloud, right? Yes. So all the computer nodes, whether they're gonna, you don't know, user gonna deploy the HA cluster on that node or node, but you have to deploy all the nodes Yeah, that's a good question actually. That's because the monitor actually, that's the outer side of the sentence scope. What we try and deliver is a in-bed, you know, monitoring failure detection things as I presented earlier. If you attach a health policy, then they will enable, you know, the in-bed detection. But, you know, if you want to integrate with the third party monitor, you will, you know, provide the failure and the sending test to provide the receiver to close the loop, you know. Okay, thank you. Thank you. Hello. Thanks for the presentation. Are you considering it to be in scope to support high availability of like a single pet instance? Would you consider that as like a cluster of size one, or is that out of scope for sending? Actually, that's a good question because nowadays actually we didn't think about that to be honesty. You know, but we can collaborate with, you know, other projects if you provide, you know, the functions we can collaborate together and sending training to provide the basic plug-in, you know, free work tool, you know, to close the loop. Yeah. Because it seems to me that this is, it's kind of more focusing on the cattle side than the pet side. And with pets, there's more like recovery after you've done fencing is can be a bit more complicated because you have to, rather than just booting up a new version of the instance, you might have to, you know, recover the data with the same volume attached somewhere else and do an over-evacuate and so on. Yes, that's the reason why we're trying to, you know, support the customizable action because that's very important for the user to define the scripts or all the process, how to recover the node and the cluster, yeah. Okay, thank you everyone. Thank you. Thank you.