 We're gonna get started this presentation has become a tradition of sorts at the open stack summit I think it's the first the third time That we're doing this so we're calling this the high availability update as we can see from the turnout here today We have a significant portion of the open stack user community that is Very interested in high availability and in following the progress that open stack is making in the high availability field What we would like to present to you today is a brief overview of The recent changes and additions and improvements made to open stack in the recently dropped Havana release and a bit of an outlook as to what we can hope for in terms of high availability for ice house and Here is where this thing is failing us Carry on say it all fix this so As we heard yesterday in the keynote that two-third of the people are new here So you must be asking this question who the heck are these guys? so let me introduce you to learn house He is the CTO CEO and principal consultant at Hashtag so he is the one of the most sought-after High-ability guys now the stack environment Whoops, that was too quick No, I got it. It's fine. Thank you. Take your hands off the keyboard. Good And this is say Darmani say Darmani is a guy that works for us out of Delhi, India He joined us this year And I'm very happy to have hired him because he's I think one of the most brilliant minds in the Indian Open-stack community and we're very lucky to have him and he's with us as a senior consultant And he has gotten his hands dirty on a number of high availability open-stack projects this year So I'm very happy to have him with me on stage This is commonly what we do not want right, this is the kind of stuff that we want to protect against and At a certain scale Some failure is inevitable at scale something will always fail So the real challenge for high availability in open-stack or in any cloud platform really is how do we make sure? That an individual hardware or software failure does not affect our user-facing service does not generate downtime For our user-facing services and it is ultimately high availability that gets us there and what we want to start with Today is as always as we're doing here for the third time at this summit We want to start with talking about high availability and the changes and high availability at the open-stack infrastructure layer and As you know infrastructure layer as are those services which basically underpins an open-stack cloud these services are some services Which are created by open-stack and some services are that we merely consume like the database and the MQP and what not So infrastructure is high availability for open-stack infrastructure like API services MQP We are a rabbit MQ and those are sort of things And that of course includes the API service the open-stack API services themselves as well So we're really talking about two different categories of services We're talking about on the one hand services that are Essential for open-stack to function, but not part of the open-stack code base proper So examples for that would be our relational database management system Most people use my SQL but Postgres and other backends are also supported The aim QP service most people will be using rabbit MQ Q It is also supported zero MQ has some support for specific open-stack services And on the other hand we're also talking about those services then that underpin an open-stack cloud that are part of the open-stack core code base Such as for example of our API services controller services, etc. Etc. We're talking about infrastructure and infrastructure high availability specifically to distinguish it from the high availability of Open-stack guests so open-stack virtual machines, and that is a separate topic that we're gonna get to near the end of the talk so Here's a change from those of you who may have been sitting in the predecessor version of this Presentation in Portland when we talked about this in Portland We had five different node types that we needed to consider for high availability purposes Since then something important has happened. We have two new integrated open-stack projects We've got heat which is open-stack orchestration, and we have a c-lometer, which is the open-stack metering and Monitoring and whatnot service that we can eventually use for billing and alerts etc. Etc. So rather than five different node types with two new services We've got two new node types, and those are here at the bottom of the list I'm gonna go through the full list really quickly We typically have some form of a cloud controller node that runs services like for example our glance registry or our Nova scheduler Other scheduling registration services. We've got our API node which runs one or well typically several of the Open-stack restful APIs those are essentially Clever web servers that then interface with the relevant open-stack components We've got the network nodes. Network nodes are the node type that ensures connectivity not only between the individual Guests but also the connectivity of the entire cloud and its tenant networks to the outside world to the public internet We have compute nodes. I guess that's a no-brainer any cloud environment We need nodes that host hypervisors and then manage and run virtual machines virtual guests We've got our storage controller nodes. That's a special one that manages the Cinder volume and cinder scheduler services So anything in cinder that is not API and not actual storage back end Belongs on a storage controller, and then there's the two new ones So we've got from cilometer. We get a new node type called the metering node Which is a node that runs for example the cilometer central agent So it is responsible for collecting various counters and gauges from various bits and pieces of opens back and Then we've got our orchestration node the orchestration node is the node that runs the heat engine So it runs that component of heat that is actually responsible for implementing stacks as per the definitions in either the heat orchestration templates or AWS cloud formation templates depending on what template language template DSL We want to use So out of these seven five-year-old two or new as a result of ongoing development and the new to incubate Integrated projects that we've had an open stack that just graduated with that just Had their first official release drop with Havana three weeks ago The good thing about this is sorry the good thing about this is for all of these we can essentially use the same high availability Stack to achieve high availability for all of these services Except that we only need to configure those stack that stack in different configurations based on this node type For some of these it makes perfect sense to for example have more than two instances of a specific of a specific service Such as the API nodes they can handle like two or three or four or five different instances for a scale out if we need that For others it makes more sense to deploy them in an active passive configuration That high availability stack and that we're talking about here is the pacemaker stack pacemaker is sort of the default Linux ha stack and has been Exactly that for the last decade or so So we have reference configurations for pacemaker open stack edge a covering all the nearly all the companies We have these resource agents written by Martin So you can just copy paste these are cookie basically cookie cutters You just copy paste this to reference configurations to your system and you can make all your services highly available and This is one such example of cloud controller ha so you have this cloud controller node running your Messaging services a mqp you couldn't run rabbit mq or you could run cupid and then there's your database and which is then run by pacemaker closing cluster and You can do active as a fellow worse for that And now let's one thing that we want to get into in a little more detail is database high availability on that is There we've seen a change not so much in open stack itself but just in terms of the Database replication high availability technology that is becoming more and more popular among real-world open stack users that are Interested in ha for a relational database and from my SQL that is Galera Galera is a multi-master replication facility for the mysql database Implementing a technology that it refers to as right set replication or ws rep that is becoming the de facto standard for synchronous multi-master replication capability for the mysql database and in contrast to legacy Options such as for example tossing mysql database on to The dRbd which is just distributed replicated block device, which is sort of a kernel block replication technology is We can extend our high availability here across more than two nodes in in fact in Galera We have to have a minimum of three nodes or two nodes plus an arbitration demon to ensure To ensure replication and high availability The good thing here is that we can use it both for For high availability and for load balancing to a certain extent simply because When we are replicating mysql with Galera We can read from all of the database nodes at any given time So we have a sort of a natural read scale out and there is also a Certain amount of multi-master right capability meaning there is the possibility to write to multiple masters at the same time however in order to ensure consistency it of course means that Galera must take care of conflicts Internally, which means that inevitably if we're hammering if we've got a really busy application That are hammering like a dozen different Galera nodes at the same time Some of those transactions will fail will need to be rolled back because of conflicts and things like that And that generally tends to drop our average database latency so what most people do with Galera is they follow a single writer multiple reader approach and Another advantage of this solution versus solutions that operate on block replication technology Is that at any time the database on all of these? Galera replication nodes is actually consistent. So on failover There is no recovery that is necessary for a quote-unquote crash database instead It's completely consistent We can directly fail over and we can immediately start using the database which is nice because it Significantly drops or fail over times and therefore it contra and it contributes to overall availability Rabbit MQ rabbit MQ question The question was do I have experience using Postgres? Yes, I have a lot of experience using Postgres in synchronous replication mode, but not with OpenStack clouds specifically Postgres as of version 9.1 has a synchronous Replication mode that you can use Relatively similarly to Galera I would say that the user base is much smaller than the one from my school Galera And at least in my experience that is particularly true for people using a database for OpenStack So by all means yes, it should work But I can't offer any specific hands-on experience with Postgres sync replication and OpenStack in that combination if there is someone in the room that can please raise your hands Raise your hand now. I'll be happy to have a discussion afterward Do we have a Postgres for OpenStack expert in the room by any chance? Nope Then that makes it a question for the mailing list Or for IRC there is plenty of help around or for a hallway discussion So RabbitMQ is the MQP implementation that most of the OpenStack deployments are using and You can make RabbitMQ highly available in two ways first you can have active as a clusters of RabbitMQ running and second you have mirrored queues and the down point or the pain point in mirror queue is that consistency issues because in Some notes you will see duplicate messages and in some notes You will find that the message hasn't arrived and in case of active as a cluster You must have that along cookie on every node that you have that your MQP or RabbitMQ is running and The combo is the Python library that's implementing RabbitMQ Glance multiple image locators So Glance has implemented this functionality in Havana that you can provide alternate locations for downloading the image So you just can provide the two URLs So in case if it fails to download the image from one location it can fail over to the alternate location and Cinderbeck and for Glance, they have now had driver for Cinder. So what is does that? If your Cinderbeck end is highly available, it's set RBD or something and your glances storing images from the RBD then it automatically becomes highly available We had a question at the back. So the question was are there any comments on 0mq as an as a message broker service. Yes, absolutely. I would love For 0mq to be completely supported and fully available because if you have a completely peer-to-peer Brokerless message bus, then you don't need to worry about failing over the broker state or synchronizing the broker state Is Eric in the room Eric Vindish? He was here earlier. Yeah, but he seems to have left He was the guy that wrote most of the 0mq implementation in OpenStack originally The 0mq currently has Known issues in the sense that is not entirely clear whether they've been completely eradicated with the Neutron specifically the Neutron DHCP agent And apparently the SAT state of affairs there is that No one has really made the effort to completely stamp out those bugs. So that unfortunately sort of unchanged from From Grizzly if your network if your cloud infrastructure happens to use Nova Network You will be unaffected by this because this is a Neutron specific issue But unfortunately 0mq support is not quite where we would love it to be because from an HA perspective Having a brokerless peer-to-peer messaging service would solve a lot of issues in one fell swoop. That would be great Okay, we were here We have covered this next And here's sort of a just to go back to that that is sort of a standard example of API node HA the nice thing about the OpenStack API services that all of them are essentially Locally stateless that is to say they don't write any local data to the node that they are running on Pretty much all of the data that they need to share with other services that is in some way volatile that is to say has a Lifetime of about 30 seconds or less Typically goes into the the message bus the a mqp Cues and anything that needs to be volatile goes into it needs to be persistent goes into the relational database And there's pretty much nothing that these API servers are very little that these API services actually store locally so in the end what that means is that in terms of Making these services highly available pretty much the only thing that a High availability manager needs to worry about is keeping these services specifically highly available So that is to say keeps tabs on Whether a specific API services running if it happens to not run because it has crashed or it has run into a problem With the HA manager pacemaker will do is it will attempt to restart the and recover the resource in place If that doesn't fail if that fails then we can simply fail over to another node I saw a hand at the back So the question was you're you're relying on on layer to VIP failover versus load balancing is actually layer 3 failover because you You're failing over IP addresses not Mac addresses But what you you announced that via an ARP broadcast? Yes the Why is that and not load balancing because a load balancing cluster like ha proxy is great for load balancing But if you have a service that is actually inherently active passive Such as for example if you're using highly available rabid mq, but without mirror cues Then the load balancing service itself doesn't help you that much Instead you can simply fail over the virtual IP address and that's it, right? Sure. That's another option. Yeah So there's there's old pearl saying right there's more than one way to do it also applies in Python based projects Yes, so the question was in the real world everyone does load balancing. What's the point of using pacemaker these days? Yeah, for this case, I know that I'll dispute the in the real world Everyone does load balancing simply because I've seen plenty of projects that use exactly this approach The question for for many people that we've worked with has been if I want if I need a fail a specific failover technology I or let's put this way if I need failover for a set of services and a and one set of those services mandates or Makes necessary once set of failover technology like pacemaker Then I can just go ahead and use that same technology for the other services as well where that is also good enough Okay, if your entire if for example you were building a high availability setup That is entirely based on HA proxy for example I don't know how you would do that all across the board and open sack But if you did then it would be perfectly fine to say I'm gonna use HA proxy for this as well or L Directory or whatever you'd like but since you're you're already deploying pacemaker Why not use it for this as well because that makes the whole system a little more streamlined and I saw another hand here So the question was about authentication tokens keystone authentication tokens. How is that synchronized? It's actually an excellent question one of the things that happened in The grizzly release is that we moved by default from token authentication to PKI based authentication Making that failover in keystone and this is exclusively about the keystone service rather than the other API services making this failover Requires a certain amount of extra work in my PKI infrastructure It requires no extra work if you go back to the token The UUID based tokens that we that we previously had if you have And I'm afraid we don't have the time to go into this in much more detail But please find us afterward if you want to explore explore this further. I saw another hand up in the front I would like for us if possible to keep rolling because we do need to take the schedule into account And we'll be happy to take the questions later. I hope that's okay with everyone right this specific question later If that's okay with everyone. Okay So they are interesting developments that happen in the compute. That's to know what so in grizzly we had Nova evacuate So it was like we can evacuate an instance from a dead host to another compute host And the important part there was yeah only evacuate a specific Instance yes We can evacuate only one instance and the thing is that you can only do it from the dead compute node And it is still the case right now But what happened in Havana that we got a new feature the extended version of evacuated we moved to host host Evacuation now you can evacuate an entire host a dead host all the virtual machines to another compute dot which is Much more user-friendly than having to evacuate say a hundred machines a hundred guests specifically one by one if one by one If you can just evacuate the host still The same issue applies that I mentioned in my Portland talk for those of you who have seen it on YouTube The the term host evacuation is a bit of a misnomer because if your city is under a threat from an incoming typhoon for example What your your your civil defense agency would most likely do is that it would initiate the evacuation Before the storm hits right and not when the city is already leveled Whereas what Nova host evacuate really does is you can only evacuate a host that is already down So in other words the typhoon has already struck and now are you getting the people out? That's a bit unfortunate in terms of naming Okay, and one thing that we of course would like to see eventually is the ability to while a while a host is acting up or during Scheduled maintenance or for whatever reason or we want to concentrate nodes so we can we can we want to concentrate guests So we can shut off nodes and save energy We would love to see that eventually and there's actually some pretty interesting work that is happening at Sousa at this time There is a block post from Adam Spires from Sousa Where he goes into these issues and host evacuation in a lot of detail Tim Do you happen to have the title of the post handy? All right, it's a the his his blog is called structure procrastination If you Google for that you will find them easily and that has a very interesting recent article on the problem of host evacuation in OpenStack Another important That is an excellent question the question was For host evacuation. How do you detect that the host has in fact failed? We currently have no automated facility in OpenStack for doing that. So that is the next step This is an iterative process. We're adding features to OpenStack as we go and What you have identified is exactly correct the next thing that we need here is a facility to detect Automatically that a host is down and then we can evacuate it as of as a Favanna that is not an OpenStack but we have Already built or there there has there have been some important stepping stones that have been built for this functionality and there is another important step stone which Doesn't seem to do a whole lot of good for us by itself But lays the groundwork for a fair amount of future work So this is another important addition to the know what that's going to schedule a casual Curly is schedule is very important because it facilitates a lot of features for guest high visibility So there is a new call that's being added that earlier it used to happen that Noah used to call Scheduler to find a host to respond an instance now a new call has been added that Noah calls conductor and the conductor carries The scheduler to give it a list of hosts then it will decide on which host It wants to spawn the instance and in case of automatic failure where it can also decide to where it can spawn the instance So so right now the only thing that we have is this facility to basically ask the scheduler where would you place this note? And that of course is something that comes in extremely handy once we move to automatic host evacuation because then What you really want to do is I have this list of say 20 virtual machines 20 nova guests that we now need to mark as failed Because the compute note that hosted them has failed and now we need to make sure that we don't overload other nodes And we still follow our scheduler hints and we follow our our host aggregates Etc. Etc anything that would goes into the scheduling decision of where to place a specific VM and That is an important step stone because this feature will eventually enable us to actually ask hey Dear Nova, I have a list of 20 virtual machines that I now need to place anew on different hosts and Tell me where we where you would where you would place them that is a really it's also very important because it will enable that you know You can you can implement the blueprint and find host and evacuate it's like relocate an instance and build out on some other compute node and also automatically Scheduling on the compute host to rebuilding your instances and okay, and this is something Where again the situation is kind of unchanged, and I'm sorry We're not going to be able to take any more questions. Otherwise. We're going to ruin the schedule here So we'll be happy to take them later We still currently don't have any Evacuation support in horizon in the open stack dashboard for those of you who are new to open stack That's not necessarily a surprise any new feature first goes into the API then to the CLI tools and ultimately into the open stack dashboard But Nova evacuate or Nova host evacuate support is not in the dashboard at this time Networking open stack networking. That's neutron and in neutron. We have the similar active as a Filler clusters you can use the same scripts to make your Your neutron features services highly available like the DSP agent is already highly available and You're intelligent and yeah, it's still as compared to no one network. We don't have multi-host feature so So for those of you not so familiar with Nova Network multi-host is a is the ability to run specific network services on the compute node Themselves is one of the most sought-after features in neutron pretty much since since its inception So duplicate that functionality for neutron. We're still not quite there yet, but we're making progress in other fields and There's going to be a design summit session on improving L3 agent What we are going to discuss is how to make this virtual routers highly available by by using keep alive and by using Keep alive and contract D and we are a protocol just like Curing the virtual routers if you're alive if somebody goes down then you can fail over to the router because currently L3 agent you have the Scheduler that you can spawn multiple L3 agents on multiple nodes But if one of the nodes goes down and all the virtual machines connected to that That router goes down. That means you lost connectivity to all those virtual machines So we are going to have discussions on Okay, my favorite topic storage There's been a few new things that have happened in the storage space. Maybe not really spectacular, but they're kind of nice from the HA perspective So for example We now have additional volume options in Cinder that we can use for the Gloucester FS and the NFS back ends Which is that if a specific Host that we connect to like for example a Gloucester host or an NFS host is unavailable We can instead try a list of other hosts to connect to so for those of you familiar with Gloucester FS It's completely an important which Gloucester server I connect to I can download the vol file from all of them And then I'll be aware of the entire cluster So we have these additional volume options here for Gloucester FS and NFS There are other back ends where that functionality had already been built in such as for example The Cinder RBD back end using SEPH because in RBD It is a standard feature that we can tell a client a list of multiple SEPH monitor service SEPH MAHN's and if one of them happens to be Inaccessible then it will try the others in sequence until it finally finds one that works global cluster support in Swift is something that most of you will have heard in the keynote from Dan Wilson from concur This is a really really important feature not so much in the ha space But really in the dr space and the disaster recovery space So having the ability to asynchronously replicate across geographies in Swift was a really really really helpful feature for very many users As is evident from Dan's keynote Stuff that has happened in heat A lot of stuff has happened in heat. It's a new project and a newly released project But we're going to talk very briefly about the HA features in heat heat has a certain amount of HA capability built in There is a heat resource called HA restarter What we're able to do with that is we can designate Individual services running inside an instance that he can restart for us when they fail We can define high availability for individual instances as well Define in the in the definition of a heat stack Beside the fact that in heat we can also do things like auto scaling and adding new new guests a new nova instances as our as we Hit capacity limits what we can't do as yet is Defining high availability for full stacks So we don't have a way to define in the heat template make this entire stack highly available So there's a new project kilometer that which provides you metering services it runs a lot of agents on different on different types of nodes like pulling agents for compute node and then on your cloud controller node and For it has a central agent Central agent is that agent that collects a lot of meeting data from your controller node So that's the problem is with the central agent is that you cannot make it highly available in active active mode You can make it highly available only in active passive mode The reason behind that is simple because if you are making it Highly available in active active mode. There will be redundant metering messages Okay, so you can only make it in active passive mode so that you don't have those redundant messages otherwise otherwise you basically get duplicate ticks in your counters and engages and Open stack is an extremely vibrant and as you as you saw in in in in Brian's and Mark's keynote this morning It is an extremely insanely fast growing project With thousands of developers new features added pretty much on a daily basis. So one thing that we can guarantee you There's almost certainly stuff in the HA space that we have omitted simply because There's stuff out there that Are so new that we don't even know about it yet because they're just in a developer developers head perhaps and More importantly and more likely it's just impossible to fit everything into a 40-minute talk That being said we'll be very happy to field additional questions during the break If you are willing to forego your lunch, we'll be very honored to help you with that If you like this talk, it will be great if you could send us a tweet about that and These slides are available Under the CC by SA 30 license you may use them and reuse them for any purpose you wish as long as you quote and attribute your original source the the the sources here are in Sides GitHub repository and Here's a link to the slides on our website, and that is also the link that you're gonna find On the page where you are able to download Open stack summit slides With that we thank you very much for your kind attention. Thank you for this great turnout. Thanks for coming here and enjoy the rest of the summit