 Okay, since you're all here, we appreciate if you... Let's see, here we go. Oh, sorry about that. Oh, this is great, right? This is a wonderful demo effect. Right the minute you're going to start your presentation, that's when your presentation medium crashes. This happens, ladies and gentlemen, and I'm very sorry for it, but it does happen. So here we go again. And now what is that doing to me? Ah, now it's being nicer to me. I much appreciate that. Come on. There we go. We appreciate if you checked in. This is something that takes you to Twitter, and if you take a quick picture of that with your mobile device, we'll let everyone else know you're here. If you haven't done that yet, you have about 15 seconds to do so, because we are under a little bit of time pressure. This is a very, very crammed tutorial, both attendance-wise and content-wise. We have instructions for you to follow along with this tutorial. Now, this says optional, and that's entirely accurate. It is completely optional for you to actually follow along with the steps here in this tutorial. All of this material is available for you on the web, and we're going to give you a link for that in just a moment. So here it is. That's another QR code there for you, and for those of you who are on a regular device, that is at goo.gl slash capital W lowercase k4d9n. And those are the instructions. You are entirely welcome to follow along not here if you choose not to, but instead at home in your office wherever you'd like. And Adam is going to walk you really, really quickly through the checklist here. Most of you have already grabbed the images, and for setting up your host-only networks and booting your admin node appliance and vagrant-upping the other VMs, Adam has just a few words for you, but the rest is all in that material. Oh, the script, yeah, sure. Here we go. Here's our script. That's the prep here. You're going to need the virtual box. You want to import the cloud3admin.ova for you. We need a separate host-only network. That is the network with the 192.168.124 slash 24 class C, and make sure the DHCP on this network is disabled. We're running to a little bit of a virtual box limitation here if you do have other existing host-only networks to be safe, just delete it and recreate a new one. That's the one that we found to be sort of the easiest. And like we said, we made these instructions available for you, partly because you were supposed to be, you're going to be able to replicate what you're seeing here, at home or in your office, but we also do it such that you can follow along with this tutorial by looking at this stuff. So again, that's your direct link. And here's your quick checklist. That's basically what it amounts to. Say again, please. Yeah, sure, here you go. You want to snap that. I hope that's working for everyone. Google Goggles is really good that way. Question? What about the slides? Yes, I have the same thing for the slides in just a moment. The slides are on github.io, and I'm going to put that up in just a second. So the files are, all of that stuff is available on GitHub as well. They're Kiwi and vagrant images, so you can replicate that as well. So all of that is also available. Oh, and it would be absolutely wonderful if you could turn your mobile device to silent while you're here. That would be absolutely great. Okay, so that's your checklist. This is what this talk is about. We're talking about automated deployment of a highly available open-stack cloud. And these are the slides, the rendered slides of this presentation. So that's exactly what you're seeing here in the background. That is all on github.io. You can also replace FG Haas with Aspires, because then you get Adams branch of this, but it's essentially the same thing. What you're going to learn... Go ahead. I'm going to put this up in... I'm going to keep this up for about a minute or so while I keep talking. So generally what to expect from this tutorial, what you're going to learn here, is, and I'm going to give it about 10 more seconds, so take your pictures now. 10, 7, 6, 5, 4, 3, 2, 1. 4, 3, 2, 1.5, 1.25. Done. So what you're going to learn in this tutorial is, first, we're going to address the question of why exactly do we want OpenStack HA, and specifically OpenStack Infrastructure HA. So what are the infrastructure high availability options that we need, and what are the high availability requirements that we need to address with an OpenStack? Since this is not meant to be a product pitch, we are going to walk you through what are the options that specific OpenStack vendors offer in terms of high availability. There is not an OpenStack distribution available today that doesn't address high availability in some shape or form, but they all happen to do it differently, and it's good to know how specific vendors do certain things, so we want to walk you through that. And then finally, we want to explain to you how SUSE Cloud does this specifically. But before that, we owe you a very quick introduction. So the person here in the front is Adam Spires. Adam is a professional cellist who was classically trained from age zero, practically, and has recently diverged into jazz and tango. He's also a lapsed X-Semi-Pro triathlete, and he happens to be an engineer at SUSE working on OpenStack and high availability technology. And this is Florian Haas, who I guess many of you probably know him from pretty much every summit previously where there's been an HA session, and Florian's been involved, very much a leading figure in the OpenStack HA community, and he answered pretty much all of the OpenStack HA guy. Coal author. Coal author. I had a fair amount of help. And he has a very patient family, I guess, because he travels an awful lot, and likes food, and founded and is CEO of Hasakso, professional services company, specializing in cloud virtualization, storage, and things. Clever stuff. Right. Exactly. So why do we actually want high availability in OpenStack? This is actually a more controversial question than you think, if this is your first OpenStack summit. As Adam mentioned, I led one of the first high availability-related design summit sessions in San Francisco in March of 2012, and this question had not been settled. So there were people that basically said, well, we don't need to address high availability specifically in OpenStack for the simple reason that everything's distributed. Right. Everything's shared nothing. And whenever one of our components fails, we always have another one that can take over. Right. Well, actually, that's not quite the case. Many of you will be familiar with this little graphic. This is from Ken Peppel. It's been in various pieces of OpenStack documentation for a long time. It's basically an overview of the OpenStack architecture. And when you look at this simplified view of OpenStack, this has no heat, this has no trove, it's like really, really simple. But if we look at this simple graphic, out of those seven components that are mentioned here, we have no fewer than five, which means the majority of our services that actually rely on shared infrastructure. And in particular, that shared infrastructure consists of our AMQP bus that is either RabbitMQ or QPIT. So that's one example. This is what OpenStack uses to pass messages between services. Those messages are considered volatile and are expected to generally have a lifetime of 30 seconds or less. So we can't do without this. We need AMQP connectivity and AMQP communications for the majority of OpenStack services to actually work. But the only thing that we really need to care about is that the service as such is available not so much what data is in its bus. Because what we have on the AMQP bus is inherently not stateful. All of OpenStack is basically built to resend messages when they get lost on the AMQP bus. And so generally we have to have one, but the data that's in there is relatively, it's certainly important, but it's not important that we actually replicate that data over multiple locations so that we still have it, have a certain data set when one of our AMQP services fails. That is strikingly different from another problem domain, which is relational database management systems. And by the fault, most people will be using MySQL, some people use Postgres. But relational database management systems is where we store non-volatile, that is to say persistent data in OpenStack. And now this is a bit trickier than the message bus problem because as far as our RDBMS is concerned, we can't do without it just like for the message bus, but in this case the data that's in there is actually really, really important. It's stateful, we have to make sure that it is available to a backup service when the primary service fails, and we need to replicate or have some way of not only keeping our service available, but also to keep the data available that's in it. So what is it that infrastructure high availability actually does for us? So on the one hand, the infrastructure HA bits in OpenStack ensure service availability. So we need to make sure that our critical services are running and are responsive. And for those services where it's relevant, we also need to make sure that we have data availability. So for stateful services, we additionally need to ensure that they can find their data where they need it, which may or may not involve replication. It may involve chairing of that data. It may involve replication. It may involve something else. So how do vendors, how do individual OpenStack vendors approach this? And when we talk about that, we need to really talk about three different things. Number one, how do they deploy their OpenStack cloud in the first place? There's many approaches to that. And when we talk about what they use to deploy OpenStack infrastructure, that's also normally the deployment facility that they use for an HA manager, kind of obviously because everything else will be brain dead. The second thing that we need to look at is what do specific vendors use for HA management? So what high availability manager or managers does the vendor support for ensuring service availability? And finally, how does a particular solution ensure state management and data availability? So that is where all the replication or shared storage management and so forth comes into play. So one of these, arguably the first distro vendor that put high availability somewhere on the map for a generic Linux distribution that supported OpenStack was Ubuntu. They were probably the first established distro vendor to put HA on the agenda. Now when we look at Ubuntu, what do they use for deployment? Well, they generally tend to favor Juju, Juju and or MAS. They use the pacemaker high availability manager for management of highly available services. And they tend to focus on CEP for replicated block storage. It's FRBD to be precise. So that was sort of the first one. Then there is another vendor that cared about high availability and still cares about high availability a lot. That's Cisco. And Cisco basically said, oh, all of this pacemaker stuff is like way too complicated and complex and the user experience and the UI are generally awful and it's horrendously complicated. And they came up with an alternative solution that where you to print it out on standard ISOA4 paper would fill 57 pages of documentation. So much for reduced complexity. But what they do is they use other than Ubuntu, they prefer a deployment mechanism that's very well known and established. So that's Puppet. They focus on that very heavily. They use HAProxy and KEPA LiveD for the actual high availability. And as far as replication of data is concerned, they take an approach that is very much centered on application-specific replication. So for MySQL database, they would use Galera for RevitimQ replication. They use SyncQs and MiracQs, that sort of thing. So all very much application-based would rely on something like shared storage or replicated storage for that purpose. We have another vendor that has HA relatively high on the agenda and that's Piston. So that was one of the first open-stack vendors to actually recognize that HA is important. You will however find that Piston is completely unlike the other platforms that we're discussing here because all of what Piston does in terms of high availability is built into or makes use of basically their secret sauce which they call Moxie Runtime Environment. And then there is a vendor that was relatively late to the party as far as high availability is concerned. In fact, even as far as open-stack in general is concerned, but they certainly have caught up. And that's Rehet. And they only very recently went public with an HA solution. They also focus on Puppet for deployment, but they generally tend to sort of advocate that you don't deploy with Naked Puppet, but instead with the Foreman. They also make use of PaceMaker for high availability management. And as far as state replication of the database is concerned, they are very much focused on Galera. And then we're at SUSE. And in SUSE what we have is deployment with Crowbar. And we're going to get into these a little more precisely later on. So Crowbar for deployment, PaceMaker and HA proxy for HA management. That is PaceMaker for services that do not need load balancing and PaceMaker managing HA proxy for services that do. And various bits and pieces related to shared or various solutions that are supported related to shared storage and or UABD. So what exactly is the SUSE Cloud thing? So let's take a quick look at what that is all about. So SUSE Cloud is SUSE's open-stack based cloud product. So there was never a SUSE Cloud that was not open-stack. It is an open-stack cloud deployment and management solution which includes SUSE packaging of open-stack components and automated deployment of management facilities. Generally what you would expect in an open-stack product. Its first release was called SUSE Cloud 1.0. It was based on open-stack Essex. It was released in 2012. The second release was called SUSE Cloud 2.0 which was based on Grizzly released in 2013. And since we have SUSE Cloud 1.0 and then SUSE Cloud 2.0 it follows logically that the next incarnation is just SUSE Cloud 3 without a zero in the interest of continuity. This is based on Havana and this was the first SUSE Cloud distribution that actually involved HA support. It is as you would expect based on... It's based on SLEZ 11 SP3. I'm not a SUSE employee so I get to say SLEZ. Their marketing department basically bangs their guys into submission and always says you have to expand the acronym to SUSE Linux Enterprise Server. But I get to say SLEZ. Which is of course sort of the latest release of SUSE's Enterprise Linux product. So with all that said, SUSE Cloud may look just a little blurry to you still and what we're going to do next is we're going to make that a little clearer for you. So a very important concept in SUSE is the concept of nodules. So in SUSE Cloud, deployment and management of services is centered on this concept and the concept of nodules by the way is not unique to SUSE Cloud. It's a rather common method of abstracting node functionality. And generally the node role types that we have in SUSE Cloud is there is an administration server which runs the crowbar components but also a pixie boot server and TFTP server and so forth. Then we have the control node and this is the stuff that actually runs the open stack control services. And then we have essentially any number of compute and or storage nodes that we can run here. Okay, so what about this crowbar thing? Show of hands please. Who's heard about crowbar? A few. Okay, so for the rest of you there's going to be something new here. So what is crowbar? Crowbar is a software deployment and automation framework that originally came out of Dell and it has a very, very unique distinction out of all the software projects out there. Open source or non-open source. Crowbar has by far the scariest mascot ever. If your small children do not get nightmares from this, you have been a terrible parent. I mean look at the eyes. It's evil. The story behind this is of course it was originally sort of a naming discussion within Dell. They basically said, okay we have this deployment framework but we have no name for it. Someone got fed up with it and basically said, well shoot, for all I care it could be named purple fuzzy bunny. And purple fuzzy bunny is not a good project name but it serves for a mascot. But why they had to add this horrible bow tie, these terrible switch off eyes and this Malay weapon, I do not know. It is cute and fluffy. It is not cute and fluffy. It is the stuff of nightmares. Did you read Stephen King for bedtime stories when you were a kid? Something like that. So Crowbar has individual application deployment units which are called bar clamps. And there were some interesting design goals and interesting design paradigms that were looked at and were observed when these HA components were added to Crowbar. Yeah, so the first thing that we decided in architecting this in the early days was we wanted obviously to be able to build a highly available cloud from scratch but we also wanted to support people who already had a cloud out there and it generally does not go down too well it turns out when you ask somebody to tear the whole cloud down and rebuild it from scratch. So we decided upgrade was a pretty key thing. And we wanted to be as flexible as possible in terms of where you put stuff. So we did not want to dictate that you have to have one cluster or three clusters, we wanted to leave that up to you and give you the control about where you put stuff within those clusters and what the size of those clusters is. So we didn't want to be too opinionated and we also wanted to allow growing the cluster later on because obviously demands change over time and it's good to be able to scale out. And obviously we want the whole point of this talk is that we wanted to automate this as much as possible and avoid a kind of pet situation which is the traditional approach to HA clusters with things like pacemaker. You do end up with a bit of a pet type scenario rather than cattle. It takes kind of a lot of manual setup to build the cluster. So we wanted to avoid that, reduce complexity and reduce learning curve as well because building clusters is a complicated thing. So we wanted to expose the options that need to be exposed for flexibility to hide the rest. So the way we did that was by introducing a new bar clamp to pacemaker and I suppose we better just quickly explain what bar clamps are, right? So a bar clamp in crowbar is just basically a deployment unit that lets you deploy something like say Keystone. It encapsulates the logic, the configuration management logic for how to provision it. It encapsulates things like the interface for exposing the parameters that are available for that deployment and it just makes the deployment of one component of something like OpenStack a nice easy thing to deal with. So we built a new bar clamp as I said called the pacemaker bar clamp and this had two key functions really. One was to provide library code for the other bar clamps so that without being too intrusive to the existing code in the other bar clamps extend them and add HA support to those using this library code in a reusable fashion. And the second main function of this bar clamp was to just automate the deployment of the base clusters or cluster if you choose to only deploy one. And that includes deployment of the cluster interfaces that let you manage the cluster and in that we support interacting with the cluster through the web interface, through command line, and there's also a desktop GTK native client for Linux. Alright, should we cut over? Yeah. If you bring up your vagrant box, I'm sorry, not your vagrant box, your OVA, your virtual appliance and you boot that up, it has basically all the stuff already pre-installed. If you haven't been able to do that while we're talking here, while we're doing the intro and following the steps, don't worry about it. Like we said, the steps are fairly self-explanatory. You can do it now, you can do it later on. You may get more out of this when you're able to follow Adam's demo along. But this is entirely up to you. But when you do bring this thing up, then it comes up with a crowbar interface. That is password protected. The password is super, super, super secret, so please, the stays in this room, don't tell anyone. You're going to log in with a crowbar and your password is going to be lowercase c, lowercase r, lowercase o, lowercase b, lowercase w, lowercase b, lowercase a, lowercase r. Okay. Top secret. And there we go. Do we want to save this password? Chrome is asking me. Well, it's pretty hard to remember. Yeah, it is. Well, actually, no. This would be way insecure. So, here we go. Yes? Yeah, the host network. So, very briefly, in the attending instructions that you were able to grab earlier, there is something about a host-only network that you need to define in VirtualBox. If you have already defined a host-only network on your VirtualBox, please just remove it. You can always bring it back later. And the network configuration is also in that, the configuration for that host-only network is also in your attending instructions. This is basically a quirk, a virtual box quirk that we need to work around. You may be aware of the fact that if you want a NAT interface, which is the only type of interface that lets VirtualBox actually talk to the Internet, that's always F0. And for crowbar to actually talk to individual nodes, we have to have a separate interface. We can... Hang on a second. This was this thing that we showed in the very beginning. So, let me quickly flip back to that. These are the attending instructions, and here's that. So if you want to copy that down real quick, then you can do so. Okay, so that's in your attending instruction. Like we said, if you're not there up to this point, no worries. You can follow all of this from the comfort of your home or office. I'm sure many of you are exhausted from the summit or maybe from the parties. I don't know. I mean, you will have plenty of time to duplicate this. Okay, now we need to go back to where we were. Adam, can you... So, the question was there is no option to use the UGCP here. Just can you do us a favor? Let's try this, okay? When you have a question, a general question for the audience, could you raise your right hand? And if you just need help with the setup, then raise your left hand. And then one of these friendly SUSE folks will stop by. Okay, if we actually have too many here, then I'm sorry, we're just not going to be able to cater to everyone because otherwise this becomes a terrible mess. Okay? So it seems that if you're using a Mac, the virtual box interface, I was the wrong menu. Okay, depending on the version of virtual box you're using, it also, I think in 4.2, at least on Linux, it kind of hides the NAT networking option. It only exposes an option for host-only networking. And in 4.3, it exposes both in the interface. But you really want one NAT network and one host-only network, has the IP address correct as in the guide and it also has DHCP turned off. That's very critical. Otherwise it's not going to work. Okay, okay. So I'm going to start going through the interface of the various bar clamps so you get a feel for what crowbar looks like, what the various options are that are exposed and the order that things get done in. But hopefully you'll see it's pretty simple. So the first thing we're going to do is set up the new cluster. And so we go to the pacemaker bar clamp for this. So this is the dropdown at the top. And so here's all our open stack bar clamps and they're ordered in a typical deployment order. So if you just go through these kind of starting at the top and working down then you're probably going to be doing it right. So the first step is to create what crowbar terminology is. It's a proposal which is a slightly strange choice of word and actually ended up changing with crowbar too. But the idea is it's a proposal for the thing you're about to deploy. So you set the configuration and then you apply the proposal and you end up with the deployment that you requested. And in the pacemaker case the name of the proposal corresponds to the cluster name. So we're just going to rename it to cluster one. Is there a mouse? Can I use this as a mouse? Oh great. Oh I can't. So we're going to create this proposal and now here we can see the options that are exposed for creating the cluster. So the first critical option is how to deal with quorum. For those of you familiar with HA cluster theory you need quorum so that within the cluster so that the nodes have a majority and there's no uncertainty that split partitions within the cluster. And here's where we get to everyone's favorite acronym when they deal with clusters. So if you haven't heard of Stanneth it obviously stands for shoot the other node in the head. Which is a fine way of ensuring that a cluster node is in fact down when you think it is down. And there's multiple ways of doing this here. We are going to be configuring this with SBD which is a, there we go, which is one way of doing fencing which in this virtual setup has the added benefit of not relying on any out of band management like IPMI which tends to be not available in a virtual box environment. So we're going to use that. Yeah, so actually, I need my notes up. So for those of you who've already got through the point where it says vagrant up and it boots up the two controller nodes you'll have, at the bottom here you'll have two extra available nodes and you can go ahead and drag those into the pacemaker cluster member box there and also the hawk server member. And at that point you'll see they'll also appear in this list of node names here. So what we're going to do is Adam is just going to bring those nodes up right now while I'm walking you through the configuration steps and what they mean and what they're good for. We do have a few slides for that as well. So, oops, sorry. Okay, so we need to set a specific configuration mode for Stoneth. We configure that with Stoneth block devices or SBD. SBD in these images has actually been pre-configured on the DEF SDC drive. And we have, we're using DRBD for postgres storage and there you would also have other options like for example sand storage and so forth. Okay. Sorry, I forgot to say one thing. It's in the guide just to emphasize. So when your controller nodes come up you can go to the nodes at the top of the screen the nodes drop down and you can click on bulk edit and just rename those nodes to controller one and two. So that because by default when the nodes come up they register with crowbar and the primary MAC address, the MAC address of the primary interface is taken as the host name. Obviously that's not a very sort of friendly thing for dealing with when you're allocating roles to nodes. So you just rename those as described in the guide to controller one and controller two. So if you do that before creating the cluster then you'll see the names appear in the drag bit at the bottom of the pacemaker bar counter that I just showed to you. Yeah. So if you want to connect to any of these machines directly you can do that. You can either, well probably the simplest way is just to SSH to the IP address. So the IP address for the admin node which was the first one you booted up is 192.168.124.10. They're all on the same slash 24 subnet. So it's dot, oh you mean the SSH, so the question was what are the logging credentials for the admin node? You mean the SSH credentials or? Yes. So the SSH credentials is just root linux. But it's in the guide I think. It should be in the guide. Yeah. Is there a question over here somewhere? Sorry? Okay. So hands up who's got the admin node booted up? Okay. And who's, so who's able to see the web interface? Okay, so it is working, right? So who has got to the point of doing a vagrant up to the controller nodes? Okay. And are they appearing in the crowbar interface? Yep. Okay, cool. Yeah, so the, so we originally wanted to provide you with a single OVA with all the VMs just pre-built and pre-registered to save time. Unfortunately it turns out that virtual box has a number of significant bugs when it comes to OVA export of multiple machines, especially when it comes down to shared disks which we need. So we couldn't do that. So this was the compromise that we chose was to pre-build the admin node and then to build the other ones through vagrant. You can actually, from the materials that we give you, do vagrant up of all four. So you can build the admin node from scratch yourself, but we just provided you with the OVA to save time. So the people who've managed to get their controller nodes up, have you been able to rename those in the bulk edit? Yep, okay. And then you, you're able to go to a pacemaker bar clamp and drag those into the, yes, okay, so it looks like people are getting there. It's going to take some time because the vagrant up of the two controller nodes and the compute node as well, it does take some time to build those VMs from scratch and then to register them against crowbar. Yeah, so the question was, in the bulk edit page, should you change the alias or the public name? Very important that you change the alias and don't touch the public name. If you give it a public name, it actually expects, that refers to entries in an external upstream DNS effectively, which is used for other things. And obviously in this scenario, it's a standalone thing. There is no upstream DNS, so things will fail later on. Great, okay. So, where's my mouse gone? Okay, so here are my two controller nodes that I'm talking about and there's also the compute node as well. So, I'm going to drag these into the two controller nodes. So, pacemaker cluster member is one of the roles that we were talking about earlier. So, when you allocate the nodes to those roles, they become members of the pacemaker cluster. And we're also going to allocate them the Hawke server role. Hawke is a web interface for looking into what's going on in the cluster. So, we're going to automatically install that so we can have a closer look. So, the SSH password for the admin node is root linux and the, as we said, the super secret web interface credentials earlier as crowbar, crowbar. So, once you've dragged the controller nodes into those roles, there are per node settings for SPD here. So, we have to tell the bar clamp which block devices we're going to use for SPD, the fencing device on each node. And these have been pre-set up. There's a shared disk that Vagrant will have set up for you automatically between the two controller nodes on SDC. Hang on a second. So, this is where, this is why I'm trying this on. Def SDC for the block devices. Def SDC, here we go. And where's our DOPD down here? And I'm here, preparing cluster for DOPD. True. And if you so feel like it, you might also want to do the non-web GUI, or HB GUI for pacemaker if you're familiar with that, if you would like to do that. So, and off we go. Question there. If you have a window-specific issue on your virtual box, please see us later. Thank you. Maybe one of our glamorous assistants can help. I'm just going to wrap along with this because we need running a bit behind. Yeah. Okay, so we can zoom that screen in a little bit, but we also have a few slides here. So, this is what you want to set. You want to set, oh, sorry. I was saying we could zoom in on the screen, but we also have a few slides for you here. So, you want to set SPD, which has been pre-configured for you for Def SDC, and you need to add that on Pro Controller. And this, by the way, is also all in the admin instructions. You want to enable your cluster for DRBD for Postgres storage, and whether or not you want to install your pacemaker GUI yes or no is essentially up to you. So, there we go. All right, let's go ahead. Okay, do that. And just for fun, we're also going to install the Linux native client, which is called HP GUI. That's this one here, making sure that I've got everything. Okay, then we hit apply, and at this point... Okay, here. So, while that is working in the background, let's talk really quickly about what's special, about what SUSE Cloud does about pacemaker and crowbar. As Adam has already mentioned, it basically provides library code for individual services, OpenStack services, to then make themselves highly available, which is kind of cool. There is a basic idea of usurping system v... Or normally system v init managed services for pacemaker. And then, of course, in pacemaker, we have things like maintenance mode to deal with restarts, triggered by config changes. We can do migrations and whatnot. DRBD is used for replicating postgres storage. HAProxy is used as a load balancer, and there is an automatic cluster configuration. For those of you who hate high availability clustering, because it's complex to set up, this is what takes that complexity away from you. So, this takes care of quorum setup. This takes care of setting up fencing, including protection from what we call a shootout at the cluster coral, which is two nodes trying to fence each other. And it also installs the appropriate UIs. So, in a way, it provides orchestration and the synchronization of your services. There is flexible node allocation and the appropriate UI extensions, and you also get notifications. Now, with that, that sort of our... That's the very basis of our high availability system. With pacemaker. Now, that is still processing. So, the next thing that we're going to do, and as we said, we're basically going through these bar clamps one by one. So, even if we happen to run out of time at the end here, no worries. You can always go back to the attendee instructions and run through that. At the end, you're going to have a fully deployed, highly available open stack cloud. So, the next thing that you're going to deploy, and if you already have the pacemaker bar clamp deployed, you can go straight ahead with that right now, is the database bar clamp. So, again, under bar clamps open stack, there is one for database. And what that does for you is it stalls Postgres in high availability mode. Again, there are certain things that you need to configure for this. Which is the Postgres high availability mode. In this case, we're doing that with block device replication with the OBD. And in here, in the database bar clamp, you also have the ability to assign a size for the OBD device that you are about to create. That's all. We're looking there. We're good. By the way, that size is... One gigabyte is very important, because in this demo environment, we've only set up a shared block device that's just over two gigabytes, which will have space for both the database, LVM volume, and rabbit and queue one. So, definitely put one there. If you put anything else, then you'll run out of space on those. And also, if you want to take a look at what's going on behind the scenes, when you click apply, we appreciate... We're automating a lot of stuff here, and there's a lot of complexity being hidden. If you're curious about what's going on behind the scenes, what you can do is, if you SSH to the admin node, again, that's SSH's root to 192.168.124.10 root Linux. And then, if you go into the VAR log crowbar chef client sub-directory, you will see logs that come automatically from the various nodes. They're collected onto the admin node. So, in a consolidated view, for example, you could do like a tail minus F on both of those files from the two controller nodes, and you can see the stuff scrolling past, and you'll see a lot of stuff happening behind the scenes, as it installs packages, lays down configuration files, stops and starts services, and so on. So, again, that path on the admin node is slash var slash log slash crowbar slash chef dash client, and you'll see some log files in there, which should be of definite interest that exposes all the chef client runs that are happening. Okay, our network cable seems to have deserted us. Okay, here we go. Much better. So, do we... So, the next thing we want to do is we want to set up our database and our rabbit and queue for this. So, let's go ahead and do that. So, back to the bar clamps. Same as before, just create a new proposal. And now, here by default, it's suggesting that we just deploy the database in a non-highly available fashion, just using a single controller, controller one, that's this one on the right-hand side. But obviously, we want to deploy it in HA mode, so we're going to delete that, and then we just assign the cluster to the role instead of assigning a single node. Okay, now, because we've done that, we see some new options appear, because in cluster mode, obviously, it's a more complicated setup. Where are you going to put the data for your database? So, we have a couple of options here. The shared storage. In this particular case, we've already mentioned. Oops. Okay, and... Here's the important part. That 50 gigabyte default, while great for an actual production setup, is not going to work too well in this virtual setup. I don't think minus three would work too well either. Yeah, so, let's make that one. And apply that. And for those of you who have ever deployed a database in high availability mode manually, this is kind of neat. You know, it's actually... Can we show the logs, is it? It's going past? Yeah. Yeah, it's a bit complicated, because we're running the stuff on one laptop here and we're presenting from another, so the network is a bit funky. Okay, so, next thing we're going to do, we have a database, which is the stuff that we need our stateful non-volatile data. And next thing is, we're going to need an AMQP service. And guess what? There's a bar clamp for that. That's the Rablin Q bar clamp. That, guess what, installs Rablin Q in high availability mode. This also uses DRBD. This is actually somewhat optional with Icehouse, because in Icehouse, a lot of the AMQP code has been sufficiently cleaned up that it really no longer cares what's in the queue if there's something that drops on the floor, it just gets resent. In Havana, there were some limitations to that, so since SUSE Cloud 3 is Havana-based, it's actually a fairly good idea to synchronize the broker queue states. And one way of doing that is to just have a working directory that Rablin Q uses that is also being replicated, whether it's on a stock or on a file level. Now we're looking here. Okay, successfully applied proposal. That looks very nice. So we're going to go ahead and deploy our Rablin Q bar clamp. Okay, let's... What we'll do is, we'll go on to Rablin MQ and then I'll show you something else interesting. So it doesn't actually matter if you apply Keystone or Rablin MQ first. Okay, same thing here. It's a DRBD, one gigabyte, not minus three, and apply. And the next bit is going to be Keystone. There's a Keystone bar clamp. Now, here's where we actually enter the realm of OpenStack scalability awesomeness. Because once we actually start talking about OpenStack API services, those are, in fact, inherently stateless. So all of their stateful information goes into a database. All of their volatile information goes into an AMQP bus and nothing else is stored in a stateful data store. So the only thing that we need to do here is to actually make sure that we have a Keystone service available. And that's why... Or that's where PaceMaker comes in here. Because PaceMaker, contrary to what many of you may have heard about it, is perfectly fine for managing a scaled-out service. In PaceMaker, we can do things like tell this thing to deploy ex-services of a specific type. And then make sure that we always have ex-many instances actually available. And then... With that... While that's applying, I'm going to show you Hawke very quickly, which is the... Another tab. There we go. What was the host name again? Let me type that for you while you talk about it. So Hawke is the web interface that provides a deeper look into what's happening in the cluster. There's... There's a link to that from... If you go to nodes from the top, the drop-down, go to the nodes dashboard and then just pick either of the controllers. And once you've deployed, or something you've got your Hawke and your cluster deployed, you'll see a link from the node page that links to the Hawke web interface. And from there, you can get a view from... a view of your whole cluster. And that's a much more in-depth look. The crowbar interface is kind of giving you a higher-level view of the whole deployment and the orchestration, whereas Hawke is giving you an in-depth look into a single cluster. So if you have multiple clusters, multiple endpoints for Hawke. Yeah, so that one's still... Oh, that one's done applying, so that's good. All right. Oh, that's... Can I please have an IP address for this box, because apparently your Avahi is doing strange things. All right, great. Okay. So... Sanity check. Who's got a cluster up and running? Okay, great. So I guess some of you maybe didn't get the files or have encountered other problems, but like we said, all this material will be available afterwards. We can answer any questions, fix any issues you may have had, or clarify things that maybe we didn't make clear in the script. Yeah, okay. Oops. There we go. What the hell? So those of you who have the database and rabbit applied, go ahead, do Keystone. Just keep working through the script. Most of these bar clamps have... We're going with mostly default options. There's just a few tweaks here and there which are listed in the script. Oh, yeah. So the question is, what is the username and password for the HA... the Hawke HA Web Interface, which that's the one that's hyper leaked and the nodes... when you go to the node dashboard and then to a node, there's a hyperlink in there for the Hawke Web Interface. And the username is HA cluster and the password is crowbar. That should be in the script as well. Okay. And all the other bar clamps are actually really simple. So there is a glance bar clamp which as you may have guessed, installs glance under pacemaker management. Has no specific settings to modify. Nothing like that. The same is true for Cinder which installs Cinder under pacemaker management. For the purposes of this tutorial, we have set that up to set the type of volume to local file, which means that your persistent volumes are actually going to be stored on your compute nodes. Again, that is just something that you would like to do here in this tutorial. But generally speaking, in a production environment you might be using LVM and iSCSI, you might be using CEP or whatever strikes your fancy as far as your volume storage is concerned. And also another thing that is supported is having Cinder talk to your your SAN interfaces or to your SAN storage. Do we have a more reliable connection now so we can actually show that or no? Okay. So let's quickly go through that. Here you go. All yours. Okay, so we can see from the little green bubbles that we've got the pacemaker cluster laid down, the database and rabbit are running on top of it. Let's go to the dashboard. Here's the link that I was talking about to the HAWK web interface. We've got a strange network set up here so we can't show you that but hopefully you can. So we're going to carry on deploying services. These are all default options here. Just drag the cluster in again. It's just suggesting by default a non-HA configuration so we just go through HAWK by dragging the cluster in. Oh yeah, okay, here's HAWK. So HAWK cluster and crowbar floating the status from the cluster. Here we go. So the HAWK has various views. I'm going to try and can you zoom in to this so that it looks a bit smaller? Yeah, great. Yeah, so just in case this was beginning to look like smoke and mirrors, it really is deploying services and hopefully you can see that on your own machines as well. So in this particular view in HAWK the first column is for the first controller node and the second for the for the second controller. So you can see we basically got all the services up and running on both. Feel free to sort of poke around with this interface. That's true if you notice but there was actually clone set in there that said Keystone and there was actually an HA proxy that load balance that access to that Keystone which is kind of nice because normally if you need to do this manually or if you need to hack your own puppet manifests for this it can become fairly tedious and in comparison just dragging a few node names to a few node roles is actually pretty compelling. So here we go with the glance. We're adding our cluster one to the glance server role again. This basically makes glance magically become highly available. That's another thing that's kind of nice. So we've vagrant is automatically set up to share the local disk between the VMs. No it hasn't. It's NFS on the admin node. And while that's deploying let's switch back to here we should be able to see some live updates as it deploys that. So you can see that everything prefixed with VIP is obviously a floating IP and there's one on the admin network so crowbars is network aware there's an admin network and a public network and it's created a virtual floating IP on both. That's used as the HA proxy front end. HA proxy itself is in the cluster so that's highly available. So from the question was how do you get to that page if from from the nodes drop down from up here if you go to the dashboard select a node then there's a link from there to the I think we're probably going to run out of time and not be able to deploy all the services but of course you can go ahead and do that on your own laptops afterwards. There we go. So that was glance and now we're just going to add the final bar clamps here for Cinder Neutron Nova and Horizon. As you go through the attending instructions do make sure that or do note that you're deploying you're obviously deploying your Nova compute services not to your controller nodes but to your compute nodes. Cinder goes to cluster one so your controller that is the compute. Cinder volume goes to one of the compute nodes and we're using the local file back end here. So there are various views in Hawke in different ways you can look at this. So for example if you click on this you can see all the started ones and see where it started. Some resources are master slave the Postgres and Rabbit one is a master slave because of DRVD and we are going to switch here to the Neutron bar clamp as you would expect basically installs the Neutron server under pacemaker management Neutron API service and there is also a specific resource agent pacemaker resource agent to make the Neutron L3 agent highly available and put it under pacemaker management. You can select various networking plugins in this tutorial we recommend you use the open vswitch plugin there is also a linux bridge plugin but open vswitch is the one that we have in the attendee list and then finally there is a nova bar clamp which actually installs our compute infrastructure so installs the nova API nova scheduler services under pacemaker management and then deploys nova compute to the compute nodes and all of that is also in your attendee nodes and then finally there is a Ryzen bar clamp which actually installs the horizon dashboard also as a highly available load balanced service for you if you set up this cluster in your hotel room tonight or at home or at your office if you want to test your high availability then you can retrieve your horizon URL from from crowbar there is an open stack dashboard URL that you can get into you log in as admin and crowbar you select and this is a standard horizon dashboard as you would generally expect you select the open stack project aka the open stack tenant which is the default tenant that is being installed and use that dashboard as you normally would and then you can do bad things to services like for example you could pkill your open stack keystone on your controller node or do the same thing to your open stack lance that kills your service on one of your nodes it will seamlessly fail over and magically become available on the other node your your open stack dashboard will not even have a hiccup typically and will be happy you can also do bad things and while you do that there is two things that you can do to watch what happens in terms of fail over one of those things is hawk which we already showed you which will show you that fail over process there is also command line utility if you are more comfortable with that this is something you would execute on the controller node it is called CRM MON CRM MON is simply an end curses interface that also shows you the state of your cluster and you will then see okay this monitored service has failed and has been recovered in place you can do bad things to services in an HA cluster you can also do bad things to nodes you can for example do one of these on your on your controller nodes for example you can do a power off dash F which basically kills a node immediately or you can use your assistor to trigger an echo O or echo B to that or whatever you prefer in a virtual box environment you can of course also take the machine and say shut down power off the machine so then again see either in CRM MON in the command line or in hawk that the failed node is being detected usually in a matter of a few seconds your services fail over and everything continues to be available and continues to be hunky dory so with that we are going to wrap up this tutorial with a quick summary so what you learned today we gave you a little bit of info of the motivation behind OpenStack HA recall not everything is cinnamon rolls and sunshine in terms of OpenStack there are certain services that do rely on a shared infrastructure service or rely on a shared state and for those we do have to think about high availability and even for the OpenStack services themselves what we have is the ability to load balance across them what we don't have built into OpenStack is an automatic service recovery or the ability to say we always want X many services in a specific OpenStack cloud available at any given time we summarized various vendors approaches to OpenStack HA Ubuntu, Piston, Cisco, Red Hat and SUSE and then gave you an overview of SUSE Cloud HA now as we said please by all means feel free to continue to peruse the material we made available to you that is both the slides and the attendee instructions and of course you're also free to use your OVAs and your vacant boxes all of this stuff is also on GitHub and that includes the vacant definitions and these SUSE images these less images were all built with Kiwi and all of what you need for that is up there as well I'm going to put this up back up in just a moment I should add that the slides that Adam and I put together are all under CC by SA so if you want to reuse any of these feel free to do so and again that is the link to the material with that we're just about out of time the next talk here is at 3 thank you very very much for coming and thank you for your interest in this talk enjoy the rest of the conference for those of you hanging around for the remaining design summit sessions tomorrow do enjoy that as well and as always in OpenStack remember experiment collaborate contribute that's how it becomes more and more awesome thank you for your time see you soon for those of you who have additional questions for your specific setups we have another 5 to 10 minutes until the next speaker arrives so if you have any questions please follow me and raise your hands and we'll be happy to come to you and help you out thank you