 Okay. Good morning, everyone. I'm a little bit nervous because, well, it's a huge cloud. And I'm nervous also because I'm presenting a very controversial topic today. This is all my personal opinion, but anything I'm presenting today don't take it personal. Thank you. Yeah, just yesterday I got some t-shirts about kettles. I think a lot of you have heard of that topic about puppets and kettles. There are people just don't believe high availability is relevant in the context of cloud. So let's see some different thoughts today. So I'm going to talk about high availability and overview for those people who just haven't been here. They are confused why they are here after all. So then I will present my understanding of high availability in the context of OpenStack. There are four types of different high availability in my opinion. And then I will talk a little bit about OpenStack HA, the OpenStack HA per se. And then I will focus on some technologies. You can deliver VMHA and application level HA, and where we are heading next. And finally there's a crazy idea about whether HA can be offered as a service. So the first is about high availability, the concept. As you know, even we are using cloud today, we are still facing a lot of failures from the hardware, the hypervisor, the host operating system, the virtual machine, the guest operating system, your own software, your application, anything can fail. And what makes things even more complicated is we have software-defined shared storage. Software-defined shared network is pretty a complicated setup. So it's not a trivial issue to get things highly available. There are different thoughts about high availability, whether it is needed at all. So my understanding is there are some cases. For example, not every application today are designed, architected for cloud, on cloud. There are a lot of traditional enterprise applications that are migrated to cloud. And for those users, for those customers, they really want to name their VMs. They really want to protect their VMs applications and make sure things don't break. There are other use cases, even you are running a public cloud, maybe you can offer high availability as a value-added service and say, do you want to pay 50% extra money if I'm offering you a high availability? VM or application? Maybe that's something doable. And there are some other use cases as well. We can touch about that later. How to achieve HA? In general, there are three things to do. Number one, for whatever kind of HA, you need some kind of redundancy. From the hardware layer, you may need some redundancy at the register level, the bus level, chipset level, but that kind of thing is not the focus today. And even above that level, every layer of software, if you want to achieve certain kind of high availability, you need some notion, some concept of redundancy. And if you want to provide redundancy, there are things you need to pay attention to. For example, you will need to make sure you have your capacity well-planned. If you want to migrate, if you want to failover, you will need to make sure you have that capacity well-planned at place. And there are associated other considerations about cost, where you can afford it. And the next technology is about detection. There are a lot of detection technologies out there today. Either you can monitor yourself using kind of watchdog technology. If something fails, watchdog will reboot you, something like that. And you can send some heartbeat messages to outside world so that someone is watching. When you fail, they will reboot you. There are different kind of failure detection technologies. And the third is about recovery. If you don't want any text messages or phone calls early in the morning telling you, okay, my VM failed, can you reboot it? Can you recover it? High availability has to be automatic, has to be transparent. And how can we achieve that? And for recovery, what users really want is no service interruption time. Can we achieve that? And okay, I've talked about that, about high availability. It's not just availability. It means things has to be automatic and it will run out there. You don't have to touch it. If something fails, it will recover by itself. So next I will talk about the four different types of high availability in the context of OpenStack. First is the OpenStack HA itself. If you search OpenStack HA high availability from Google, the most information you get today is about how to ensure the OpenStack services, the computer controller, network controller, message queue database, all these services can be made highly available. That's the focus of today from the community. But I think there are something more than that you may want to pay attention to. For example, your host level availability. You don't want to use some pretty old machines that can crash in a moment. And there are more than just compute. There are issues about network storage and all the physical layer resources. That is the foundation of the whole high availability solution. Though that is not my focus today to talk about. The next is about the virtual machine, your virtual server. Can we make that highly available? This layer is really about how to make the virtual resources, virtual machines, virtual networks, virtual storage highly available. How to make this high availability management easier. The last high availability is about user applications, user services. And that is our customer, our user, really care about. They don't care whether your VM failed, the host failed, the storage failed. So long as their application, their service is available, they just don't care. It's your job to make things highly available. It's not their concern. So the last thing is about application and service level availability. About OpenStack HA today, you can get a lot of information online. There is a high availability guide from the community. You can download and follow all the instructions there to set up your own highly available OpenStack deployment. And today is more focused on avoiding single point of failure from whatever services in the service portfolio. And the implementation I saw today is mostly about pacemaker crossing setup. It's a Linux HA technology. It has been there for quite some time. And there are some other proposals using HA proxy or keep LabD. That's a different option. But the problem I saw is OpenStack HA as it is today is largely seen as a deployment pattern. It is not part of OpenStack project. You still need some special skills, some special expertise to get it set up right, to get it management well. So that's one problem. Another problem is it is only about deployment. It is not about after deployment runtime management. I'm wondering if any developers from the community interested in improve this situation make this OpenStack HA management part of OpenStack. So here is a picture I stole from Red Hat. It's about their recommended setup about HA deployment. It's not an easy thing to do. Besides pacemaker crossing based setup, we can see a lot of efforts from the community to support high availability. I'm calling it intrinsic support from OpenStack. For example, in Nova you have host aggregates, but host aggregates is a concept that is abused everywhere. There are a lot of debates about host aggregates. Can I have two host aggregates over life? Can I have one host aggregate inside another one? It's pretty flexible, but a flexible idea is a concept really up to you how to make use of host aggregates. The concept itself doesn't tell anything beyond that. It is sometimes associated with availability zones. Inside Nova today we have service groups. It's kind of internal hard bits so that you can monitor the OpenStack services, especially the Nova components, whether they are active or live or not. There are other supports from message queues, from sender, shift, and from all other projects. People are working on this, although we are still not there yet. Here I'm showing how you can get from OpenStack today about the service status. The upper table shows us that we have services from Nova and when they were updated and their status. For Neutron we can get almost a similar status, but this is all about OpenStack service availability, per se. Next I would like to talk a little bit about VEM and application layer, high availability. The most straightforward thing to do to get a VEM application level, HA is to deploy pacemaker causing, for example, HA stack at the VEM layer. You treat the VEMs as your physical nodes. Then you can protect whatever application service as you did when you were not using virtualization or cloud technology. You can do that. The good news is this thing can be done today using HEAT more easily. In Ice House, HEAT has software configuration, software deployment technology. I think the HEAT core team will present that today or tomorrow. If you are interested in that, you may want to join their design summit. We are software config and software deployment resources. Actually you can make a better use of configuration technologies such as SAP or Puppet if you are really into that. This is an experiment I did using Red Hat's high availability extension. I set up a cluster at the VEM level. Then I tried to reboot one of the VEMs. You can see so many things happened. The right-hand side is the node I rebooted. After 12 seconds, the service I was protecting that is a load balancer started running on the left-hand side that's on VEM1. It can be done and it's not very slow. The thing is to get the pacemaker crossing set up correctly, it still needs some effort. You need a few months to get yourself familiarized with that. That's something we can do today, but there are limitations. One thing is about ease of management. Even you can set up a cluster, a cluster at the VEM level. You still have to manage it if you want to adjust the HA policy. You want to add new resources to be protected. You want to adjust some parameters, something. You need to manage that, manage that, but it is not part of OpenStack. The other limitation I saw is you need to develop some resource agents. It's a concept from pacemaker and crossing. It's an adapter that can tell the pacemaker layer when your application, your service is active, how to reboot it if needed, how to stop it if needed. It's an application-specific component. If you want to use pacemaker or crossing to protect your own application or service, you need to develop this and test it. That is how things are done today to protect OpenStack service itself. The third thing is about intrusiveness. You have to install something into the VEMs, but sometimes our customers don't accept it. They really want us to treat their VEMs, their images as black boxes. One of our customers told us we have a lot of images that have been used for a long time. We don't want to install or customize those images. We have hundreds of them. Some are from the financial department. They don't want to touch anything a bit in that image. They are very sensitive. How can we provide high availability while treating the VEMs as black boxes? In this case, we cannot use software config, software deployment, you can't touch anything inside the image. We need some technology to solve this problem. Before I talk about a solution we have tried, I will show you something we can get for VEM application level H8 today from OpenStack. Still, redundancy? Can we get redundancy? Yes. In Nova, we have server groups. Server groups is not a topic for debate. There are some people who really hate this idea. Server groups should be removed. There are people who really favor of this. There are no consensus on that yet. There are proposals about virtual clusters or virtual ensembles. I don't see where that is going. If you are using heat, there are resource types today, instance groups and resource groups. For detection purpose, I think the RPC notification is widely used today. Later on, as it measures, we may migrate to also notify. When something interesting happens, you get notified. You can react to that event. The denominator is evolving down this path. Finally, about recovery. There are several interesting things happening in the community. Some of my colleagues are proposing fencing support into Nova, Cinder and Swift. Fencing for those who are not very familiar with H8, it means if I'm suspecting a node is strong, I need to make sure that that node is actually strong. It's strong for sure. Sometimes there are cases we can't get any heartbeat from a VM from the management network, but that doesn't mean that VM is really strong. It may still occupy an IP address. It may still be writing shared stories. It may still just work now. The only thing you got is, okay, I'm not getting any heartbeat, so we need to make sure that node is really down, probably off. Then we can rebuild it, reboot it, or whatever recovery actions can be taken. There's a fencing support. Our colleagues have proposed this blueprint and implemented it. There are already rebuilt and evacuation. That's another term today. It's called remote restart. If a node is fading on one host, you may want to restart it on another host. That is called a remote restart. The good thing about reboot or rebuild is you can keep the VM ID not changed. The IP address is the same. It's completed a transparent solution. The problem left is maybe we can leverage this support from NOVA, ID orchestration layer. Another support for recovery is the HAV starter resource from Heat. There are example templates if you have tried Heat. Using that template, when your VM fails, which is detected using a heartbeat send from inside the VM, your VM is deleted and recreated. Let me show you how it works later on. There are some support today, but it's still ongoing work. Here is how it works if you are using Heat to create an HA setup. Using some sample templates, you can get from Heat templates project on GitHub. Basically, what you will do is you create a template. In that template, you create a NOVA server. That's your VM. You create an HA restart resource and point this HA restart resource to your VM. You create an alarm. This alarm will trigger HA restart. You associate this alarm with your NOVA server. When something happens good or bad from inside the NOVA server, you get an alarm from that alarm resource. That is how you create a setup. Here is how it works. You need to install Heat CFM tools into your VM image. In that image, there are tools called CFM push states, CFM hub. This can be hooked to the ground D. For every 10 seconds, every one minute or so, you can send out a heartbeat. Here, CFM hub is a tool modeled after the cloud formation technology. You can actually specify some service is supposed to be active all the time. It can check whether that service is still there. If it is not there, it will trigger an alarm. The implementation is this alarm is sent back to the CloudWatch API. Implemented by Heat because all the implementation in old days in Heat is something similar to CloudFormation. That is why the team started. After that, you will get an alarm inside Heat. There are some background worker threads taking whether an alarm should be triggered. Then you delete and recreate your VM instance. That is how VMHA can be done. I'm labeling here yesterday. The Heat team has decided we are not doing anything more than that regarding the CloudFormation support. The API CFM, that API is duplicating. They won't do anything more inside Heat about watch rule about the alarm resource. Here is what it can be done today. It's pretty similar, but there are some differences to be highlighted. We see this event trigger, this heartbeat messaging sending out from inside instance to back to Heat, that's not changed. But for getting some metadata for your VM or your services in that VM, the team is using OS CloudConfig today. That means something more to inject into your VM image. That could be acceptable. For alarm, today you can use Cyanometer. The team believes that is the right direction to go. Whenever it is possible, you should use Cyanometer to monitor and send back alarms. What's next? Here is really my personal opinion. I'm still arguing this with the Heat team. We can make this whole thing a little simpler. Actually, we need to duplicate CloudWatch API. I think that's the decision already made by the Heat team. Then we need something inside the VM, something there, send heartbeat or any event related to availability, back directly to Cyanometer and trigger an alarm there. If we cannot touch the image, we can leverage NOAA support to get some heartbeat messages. This is the difference. If we can get some monitoring agent installed, deployed inside the VM configured, you can monitor the availability of your application or service. If that is not acceptable, the only thing you can get is VM level availability. That's something you can get a notification from NOVA. After that, we need a signal or alarm from Cyanometer directly to Heat. We don't need any proxy or broker along the path. Then the fourth and fifth thing we need to do is we need to add resource type in Heat to provide a notion of VM groups. These groups of VMs, they can protect each other. If anyone fails, some other will cover it. You can specify some high availability policies for this VM cluster resource type. Then when something really bad happens, I would like to suggest to remove that HR restart resource. Actually, it is deprecating. We can leverage the support from NOVA directly to do whatever you want. Reboot the VM in place, rebuild it. That means download the image again and restart it. Migrate it if it is still alive or evacuate. We have a lot of options here. This option can be specified when you are creating the VM cluster resource type. It looks like a solution, but there are still a lot of open questions as we can see. For example, can we do this? We are deploying a group of VMs, but we don't want these VMs to share the same PDU, same power supply. We don't like these VMs to share the same network switch. Can we do that? It's about the physical level placement. There are scheduling extensions desired as well. Can we reboot this VM if possible first? Then if I have some abandoned resources, I can reboot others. That's a priority between VMs. Can we do that? It's an open question. For failure detection, it's really up to you about what you mean by failure. If I'm expecting 10 megabytes per second throughput from my VM, but I only get one megabyte, does that mean the VM is still available to you? Maybe, maybe not. If it's not satisfying, if it's not meeting my QoS requirement, I may see that VM not available. You had better reboot it, restart it. That's a different question. It's really about what's your definition about availability. And reliable detection, if you really want to get a reliable detection, sometimes your application or your service needs to get involved. Only your application or your service know whether they are live. It's a pretty difficult topic. Even today, Unipace Micro-Crossing, we can only check whether that process is still there. Whether the PID files created is still there. If it is not there, I would suspect, okay, that process is done, but it may not be true. So a reliable failure detection is really a difficult challenge. And the next open question is about reasoning. If I see frequent failure from a host, that means something to me. I need to check that physical host. But how can I get that insight? We need to reason about some frequent failures, frequent types of failures, and see if something we can do beforehand so that we can make sure things are really robust. You can do this by collecting logs and analyzing them, but sometimes logs are bad. Dogs reveal some information you are not supposed to read. And the next question is about HA management. So far, I've been really focused on computing a virtual machine availability. But in a real cloud setup, the implication is more than that. You have stories, networks, well, network is really a mess. You have 1,000 ways to break it. But something is not available. Can we detect it and recover it automatically? And on time? That is a really difficult question. In terms of heat support, I would really like to see VM availability evolves into a stack availability. If the whole stack is running as expected, okay, that's good. But if any component in that stack breaks, the VM is there, but it means nothing to me now. The port is gone. It means I cannot communicate. So maybe we can go one layer up to provide high availability for the end users. Next is about maybe we can leverage whatever support from the hypervisor layer or the host layer. We have already done that when providing high availability for open stack. We are using pacemaker crossing, but there are some other choices. For example, if you want to provide VM level availability and you are using vSphere, the support is out there. Why OpenStack is not using it? If you are deploying a VM on power machines, for power platform, you have PowerHA. If you are deploying Hyper-V machines, they have their VM availability set concepts support already. So maybe we can leverage all the existing high-high capabilities from that layer. And I have some good friends working in QEMU to make QEMU capable of fault tolerance. You run one VM there and there could be another shadow one when it's on another host. And in the background, the memory pages were synchronized highly frequently, maybe using RDMA. And there are a lot of interesting technology out there we can leverage from. So if all this technology can be used, maybe we can do something like H8 as a service. So next is my crazy idea about maybe we need another project for H8's purpose. For this kind of a project, I'm proposing a lot of things, but it is not so complicated one. We need a generic H8 management service that can be used for host level, VM level, application level, and even OpenStack H8. It's almost the same. We can do that and we can leverage whatever support from the hypervisor layer, say the KVM or Hyper-V or whatever with Sphere. The only thing that is different is based on who are you. If you are an operator, you can take the OpenStack H8. But you are a user and user using the same infrastructure. You are not allowed to see anything about OpenStack H8, but only about your own application, your own service. But the whole infrastructure, the code base can be the same. What we need is really about some well-defined service APIs and some common H8 policies that can be specified. If that is not clear, let's see a few pictures. Here is what OpenStack H8 is a pretty abstract diagram. We have high availability enabler technology. It could be pacemaker, vSphere, or whatever. We can have had here is H8 Daemon, just like Nova Compute is installed on every host. We can make this an open architecture. The driver can be a plug-in. We can provide this H8 as a service. It has its own APIs and engines. Once we have this project, this kind of support, the whole concept can be leveraged by other projects such as Hadoop or Heat. Just a quick switch. Here is what a VM-level H8 can be provided. We still have the same enabling technologies and VMs as the resources to be protected. What we can get from this picture is VM H8. Next is if these boxes are VMs, what we can get here is the application H8. Most of the code base could be the same. I don't know if some developers from the audience are interested in it. We can work on this. What we need is just some common APIs to create a cluster, to add a node into a cluster, remove a node from a cluster, specify its policies, define resources to be protected, define resource groups that are services to be protected. You may need to define some fancy support as well. I think that's something maybe some of you are interested in. If you are really interested in this, you can let me know. That's the end of my presentation. So advertisement, some IBM sponsored sessions you may want to take.