 So, for anyone who has the files already, if you go to this web page, it will actually explain how you can kick off the build of the vagrant environment and if you're ready to do so, I'd recommend that you start that right now because it's a time-consuming process, so the sooner you start, the more likely you are to get the whole way through the exercise and get to the stage of being able to fail over a control plane. So, what you're looking for in the readme, once you've gone through the prerequisites, there's basically a one-line commander, a build.sh, and you can just run that script and it will start, it will do the vagrant up and a few other things and you will start to see virtual machines building and once they're all built then it will start building a cluster, so yeah, feel free to start on that right now if you're able or as soon as you have the files basically. We're probably looking on the hardware that we've been using, it's like 20, 25 minutes to build the four nodes or something like that, so it takes a while. I mean, it would be nicer to just have it basically, get the shell in the box. Well, that is not a problem, you do that with no BNC. So, sharing an exception over BNC and integrating it into presentations is not a problem at all. No, I'd like to see shell in the box grow socket IO support, like reveal those and that would be cool. You can basically subscribe to the terminal where it's probably being updated. So, it looks like the USB stick is free again, right? So, if anyone doesn't have the files yet and wants them, there they are. We also, if you don't have a vagrant and virtual box and you're running on something other than OSX, we may be able to help, I think, I think we have the downloads. Somebody at the back looking for the files, I guess. Just out of curiosity, can I have a show of hands who's planning to actually follow along on their laptops versus, yeah, so quite a few. And how many people actually have the files already? Great. Okay, it was worth the effort. So, we're about getting ready to start. So, could everyone please take their seats, and then we're going to kick off this tutorial. Should be on. Okay, this is the automated deployment of a highly available OpenStack Cloud tutorial. This being a tutorial, it's a double slot. So, we've got, you're going to have us blabbing at you for the next hour and a half or so. And as Adam said in the run-up, or in the intro, we have this virtual environment for you to follow along. Doing so is entirely optional. If you choose not to follow along this afternoon, you'll still take plenty out of this talk. And all the materials are available to you and will be available for the next few weeks. So, if you want to duplicate this tutorial from the comfort of your home or office, that is perfectly fine. If you prefer to just sit back and watch the show for now, that's okay too. In order for the show to actually succeed, we need your help here really quickly. Adam has prepared something very important for this talk. And that is, Adam has prepared a chicken sacrifice. And because we, of course, don't want to offend any vegetarians or vegans in the room, the chicken is made of paper. So, for the chicken sacrifice, we're just going to need you to hum along some positive energy, please, while Adam rips up this chicken. So, could you please join me in, thank you for the good karma. Just, that might not have made sense to a lot of people. It's fine. But it will make sense to anyone who's given a technical demo in front of a live audience before. But the chicken sacrifice is an essential part of having even a hope of it working. Okay. So, you already got the material. You saw the readme. You got all that. You also have, because people always ask us for slides or take pictures of the slides during the talk or whatever, this is a direct link to the slides that we're presenting here. By all means, please feel free to peruse those now or later on so you can use those for skipping back and forth in the talk, whichever you would like to do. Okay. So, that's that. First up, what you're going to learn, what you're going to take out of this tutorial. There's a few things that we're going to be talking about. So, first of all, why would we even want OpenStackHA? We're not going to go into that in too much detail because I think it's safe to assume if you turn up for this tutorial, you already know that you want HA in OpenStack and you know why you want it. But we're going to go over it very, very briefly. Then we're going to focus specifically on how SUSE Cloud does it. Now, that's not to say that SUSE Cloud is the only vendor product out there that does HA or is the only vendor product out there that does HA right. It's simply such that we have another talk in the main conference program on Wednesday and that is at 3.40pm, if I remember correctly, in room 252 where I'm doing basically a vendor neutral overview over various HA approaches that are pursued by various vendors and how they roll that into their products. But before we go into that, you have a right to know just who the heck you're dealing with with. And first up, Adam has a much cooler daytime job than you or me or any of us. Adam happens to be a professional classical jazz and tango cellist who also moonlights as an engineer at SUSE. And that picture is actually from the London tango orchestra of which Adam is a member. Myself, my name is Florian Haas. I do much more boring things when I'm not sort of at work, which is food and travel and photography. When people ask me where I am based, I tend to jokingly answer in seat 10C because Plains is where I spend most of my time. And I am one of the co-founders of Hastexa. We are a professional services company that provides vendor neutral consulting and training for not only OpenStack, but also distributed storage and high availability, all of which is kind of rolled into SUSE cloud. So all of these things that we're talking about today are sort of near and dear to my heart and things that I tend to care about. And the same thing is true for Adam. Okay, so why would we want high availability in OpenStack? What's the motivation for even considering a high availability solution in OpenStack? Because on face value when you sort of read some of the OpenStack materials and some of the OpenStack documentation and specifically the OpenStack design tenets that were formulated some time between the Austin and Bexar release, if I remember correctly. Pretty much everything in OpenStack ought to be distributed. Everything ought to be shared nothing. So therefore if we lose any component in any given time, then we always have another that takes over automatically and everything is fine and dandy and wonderful and cinnamon rolls in sunshine. But it turns out that's not really the case. And so there is a very, very valid point and a very valid case to make for high availability in OpenStack. First of all, I'm sure that quite a few of you are familiar with one version or another of this little diagram of this little chart. This is a grossly simplified overview of the various components that are part of OpenStack today. As you can see, there's no trove here. There is no heat here. There is no cylinder. So this is really sort of the minimum viable compute cloud that you can build with OpenStack today, which typically consists of Keystone or central identity service. We have an image store, which may or may not actually storage images in OpenStack object storage. We have a compute layer. We have a network layer. We have a block storage layer to provide persistent storage to virtual machines that are otherwise ephemeral by default. And we have a unifying API layer and dashboard on top of all that. So I'm sure you've seen one or another version of this. It originally came from Ken Peppel as part of the OpenStack documentation. But even if you look at this sort of very, very simplified view of OpenStack, you can readily identify at least five services in here that do in fact rely on some sort of replicated or shared state. For example, for Keystone, for our identity providers, we need some sort of shared identity back end, which then needs to be highly available and needs to be replicated in some shape or form if we want any sort of redundancy of our data in there. So whether we're using an SQL back end with MySQL or we're using LDAP or whichever, we've got this store here that needs to be persistent, that is stateful, and that needs to be replicated or duplicated or otherwise made highly available. The same thing is true, of course, for a compute storage, for image storage, block storage, and network, because even though either of those may use an off-site data store, may take care of replication by themselves, they have something really, really important that they do need to keep track of themselves, and that is the metadata for volumes, for example, for images, for networks, and so forth. So even in this very, very simplified view, we can readily see that there are a bunch of services where we absolutely need HA simply in order to keep their data available. And then there is a number of other services where this whole we can have as many active components as we possibly want doesn't really fly. So two examples where that has historically been a bit of an issue are the Nova scheduler, the Cinder scheduler, but also the Neutron L3 agents at some point, for which we had the Neutron L3 agent, specifically, we had sort of a collage to fix this up to this point, and we're finally getting something really interesting there. Well, it's actually in Juneau, but it's marked experimental there, but it's landing in Kilo, or it's expected to land as a fully supported feature in Kilo. More on that in my talk on Wednesday. Okay, so we absolutely have a need for high availability, for example, on our AMQP bus. Now, this is relatively simple in that we absolutely can't do without it. Pretty much, well, not all of the OpenStack services, but a great majority of the OpenStack services rely on the availability of either RabbitMQ or Cupid or some other messaging broker. So we can't really run an OpenStack cloud without this, but thankfully, the information that we have in there that we need to keep in there is not exactly stateful. So messages that are shared on a message bus are typically, they have a lifetime of like 30 seconds or less, and if any message ever gets dropped, then an OpenStack service must essentially resend it. So it can never really rely on the reliable delivery of this message. So therefore, the only thing, quote, unquote, the only thing that a highly available infrastructure must ensure is that we always have either RabbitMQ or Cupid actually available to services, but we don't really need to worry about replicating their state. This is vastly different when we talk about, for example, our persistent metadata backend, which is normally a relational database management system. It's frequently MySQL, and this is a cloud case that happens to be Postgres. And here, it's actually much more challenging, which is we also can't do without it. Pretty much every OpenStack service wants to store metadata in a database. But here in this case, it's also very, very stateful. So the information that we keep in the database is something that we absolutely need to protect. And in case of a failover, we absolutely need to make that information available on the machine that we failover to, which in other words means that we need to replicate its data in the first place. So much more challenging there. So in fact, what does this infrastructure high availability actually do for us? Well, one of the things that we want to ensure is service availability. So this does not only mean the services that OpenStack consists of itself, but also the services like RabbitMQ, like RDBMS, like whatever, is this kind of stuff that we need to keep available, and that we need to keep essentially sort of a defined number of instances of. So that's one thing that we absolutely need to do. However, what we also need to do is in a highly available environment, we also need to make sure that all the data that all these services need is available, is consistent, and is as a consequence, that means that we need to properly replicate it. So now approaching high availability tends to be in present day OpenStack, one of the key differentiators of the OpenStack vendors out there. And they're going to differentiate along the following lines. First, what's the deployment facility? What are the deployment tools, the automated deployment tools that you want to use? And trust me, if you deploy an OpenStack cloud, you do want to automate the deployment. People always ask me, well, why can't I deploy manually and hack my config files myself? And if so, how do I do that? And I always tell them that's essentially equivalent to asking the question of, if I wanted to drill a hole in my kneecap with a power drill, which drill bit should I use? And there is not a very good answer to that. The answer is, of course, well, please just don't drill a hole in your kneecap. It will be much less painful. So the automated deployment facility is something that is sort of a crucial differentiator between OpenStack vendors. Another one is, how do we do HA management? Now, this is an interesting one. And compared to when we did an earlier version of this talk in Atlanta, there's actually something really interesting that has happened, which is pretty much all the vendors have settled on a single stack for node availability and service management, which wasn't the case six months ago, but now it is, which is actually quite nice because it has saved us all from a lot of wheel reinvention. And finally, the other differentiator is, how do individual vendors approach state management and data availability slash replication for their OpenStack deployment? Now, the SUSE way of doing this is for deployment. SUSE is built around the crowbar system automation deployment framework, which wraps Chef. For high availability, SUSE cloud like everyone else has standardized on pacemaker for service availability, node monitoring and failover. And for load balancing or making available active, active services, SUSE cloud has settled on HA proxy, again, just like pretty much everyone else. So the central part is actually something where we're seeing a fair amount of convergence in terms of the technology that's being used. Although, as we might be able to highlight in this talk, and I certainly highlighted in the talk on Wednesday, there are some pretty important and pretty interesting architectural differences with how the specific vendors have approached pacemaker configuration. And SUSE has also made a design choice in relying essentially on shared storage or DRBD, which is a mode of kernel level block replication for state synchronization between services, opting for Chef only on the actual cloud storage side. There's other vendors that basically use Chef all the way. There's vendors that mix Chef with MySQL and Galera replication for SUSE cloud. That's either actual shared storage or DRBD that support it. Okay. So what exactly is SUSE cloud? As you may have guessed, it is SUSE's OpenStack-based cloud product. It saw its first release in 2012 with SUSE cloud 1.0, which was SX based. Then there was SUSE cloud 2.0, which was grizzly based. And after SUSE cloud 1.0 and SUSE cloud 2.0, of course, what followed was SUSE cloud 3 without a zero. And I almost got bashed when I referred to it as SUSE cloud 3.0 last time around. So I made a point of fixing the slide this time. This was based on Havana. It was launched in February of this year. And this was when HA support was actually added. And for SUSE cloud 4.0, which is the stuff that we're going to be deploying now, but it hasn't actually been released. No, it's either way around. It has been released, but we're using the post release fixes. So that's why if things, well, no, nothing's going to break because we sacrificed the ticket. Yeah. So and SUSE cloud 4.0 was the release that added SEF support. And so that's what we're this. So we're effectively deploying SUSE cloud 4.0 plus here. Yeah, 4.0 plus. Some fixes that have been released and others that are not yet released, that will be very soon. Okay. This is based on SLES11 SP3. Can you share your plans yet on whether this will be re-based on SLES12, which has just been released? Yeah. So for SUSE cloud 5, which is still, yeah, it won't be this year, I don't think. We are, I believe, going to support having the compute nodes on SLES12. But I think the rest of the control plane will be on still on SLES11 SP3. There you go. SUSE cloud has this interesting approach that I think is actually quite intuitive, which is you're basically defining node roles, you're effectively defining a control node or rather a set of control nodes in an HA configuration. You define compute nodes, you define storage nodes, and then you've got your admin node, which is the stuff that actually runs crowbar and that you can essentially deploy everything from whether you've been in from bare metal or whichever way you choose. And this is sort of an overview of what this looks like. And if you fired up your build.sh and you've got a reasonably fast machine, then at this time, you probably have already deployed something like this minus the storage node. So just because some people came in late, it might be worth reiterating that that if you have the files and the git repository, then please don't wait for us to tell you to do anything else, do fire up the build.sh script, which is documented in the readme that was linked at the beginning. And yeah, because it takes a while for vagrant to fire up these nodes, so good to have that running while you're listening to. So effectively, what you're getting out of that, and it doesn't matter whether you're doing that now or whether you are doing it next week or whenever you'd like from your home or office, what is going to get built is you're going to get an admin node, a single node that basically acts as that hosts your crowbar admin UI, hosts your Chef server, and in a bare metal environment also does your DHCP, TFTP, Pixie Boot, whatever. Then you've got a control node. Well, actually in our environment, we're going to have two. So we're going to have two controller nodes, which configure in a highly available fashion, a database, a message queuing server, the open stack API services, the horizon dashboard, the scheduling services and control services, and also Keystone and an appropriate glance configuration. So you've got this, you get this control node times two. And as sort of a departure from what you would deploy in production, but is a standard concession that you're making for the tutorial setting, is you've got a single compute node. Now, that's of course not realistic in a production environment. You're typically going to have any number of compute nodes, but in this case, it's just one. So you're actually able to fire up a virtual machine. Whether or not we're going to be able to get to that in the course of this tutorial, we're not going to guarantee, but the compute node should definitely be available for you. By the way, also in the readme are a couple of alternative deployment mechanisms depending on your RAM constraints. Yeah. So if you look in the demos slash HA sub directory of the Git repository, which should be the zip file you received as part of the files, unless you cloned it from online, either way, you should have the Git repository unpacked. And in the demos slash HA sub directory, you'll see there's a build.sh, which by default is configured, assuming that you have 16 gig of RAM available, and it will build the architecture that Florian just described, including a single compute node. If you only have eight gig, then there's a build-8gb.sh, which just tweaks a couple of environment variables, and that changes the profile, the config. So it won't boot up a compute node. You'll just have the two controllers, and it reduces the number of open stack services that are deployed as a consequence. So it won't deploy Nova because there's nothing to deploy instances on, but you'll still have a highly available cluster with some of the open stack services running in it. So if you have eight gigabytes and you want to run this, then just source that script. Again, the readme in the demos slash HA sub directory explains this in full detail. And then we also have a super tiny four gig deployment that only deploys the deploy node, so the admin node, which is interesting, if you're a resource constrained and you basically just want to look at what crowbar, the crowbar admin interface looks like for SUSE cloud, but on account of those memory constraints, it doesn't really allow you to actually build a cloud. Yeah, if you want to do that, just simply type vagrant up admin from the vagrant sub directory, and it will just boot that single node and launch the crowbar installer. Okay, so what's this crowbar thing? Crowbar is a system deployment and or software deployment and automation framework that comes out of Dell originally, and that SUSE cloud has sort of settled on as the way to actually deploy sort of a cloud environment in an automated fashion. The other vendors do this differently. So for example, Mirantis and Red Hat both focus on a puppet based approach, Ubuntu slash canonical run the run thing with juju for SUSE cloud, it happens to be Dell crowbar. The individual application units that are being deployed in crowbar are called bar clamps. So that's just sort of their technical term. And the meet or the sort of the interesting stuff about or the intelligence about OpenStack cloud actually is sort of encoded in these bar clamps. And you've got a bar clamp for say a pacemaker cluster, you've got a bar clamp for a database, you've got a bar clamp for for a message broker, you've got a bar clamp for Keystone, you've got one for Glance and so forth. And there are some specific design goals that basically went into adding the HA facilities to crowbar or to SUSE cloud, which is you should be able to just both build from scratch and upgrade in existing clouds and not just greenfield installation, but actually upgrade an existing cloud as well. There is a flexible allocation of these nodules that I previously mentioned across potentially not one but multiple clusters. There's of course automated configuration. And the way the pacemaker bar clamp is wired I think is actually pretty interesting and pretty nice in the sense that it makes something that is reasonably complex, quite easy to use. One of the criticism that the pacemaker stack itself has sort of faced in the past was never exactly about stability or fencing or recovery or whatnot. It was always essentially usability. So this stuff is essentially hard to get right to get configured correctly, such that it actually does what you want it to do. And the remedy for that is of course to basically automate the whole thing along best practices. And that's exactly what this does. Now, this is, I think this is really kind of cool about sort of the SUSE approach here, which is the pacemaker bar clamp itself provides basically HA library code for other bar clamps. So you can, when you deploy say for example a database, a postgres database in this case, on SUSE cloud, depending on whether you deploy it to a single standalone node or you deploy it to a cluster, this thing will actually automatically figure out what it needs to do in order to be properly configured in that context, which is really kind of cool. Now, so this is something that the pacemaker bar clamp does. It of course installs the pacemaker HA manager, which I guess is completely expected. It configures your pacemaker for a fencing facility, which is known with the affectionate acronym of stoneth, which of course stands for shoot the other node in the head. In this case, it doesn't really do that. It uses a form of self fencing that is block based, which is also what we're sort of deploying here in this environment. So this SBD is preconfigured on your def SDC device if you're running on virtual box, or def VDC if you're running on liver. Okay. And then we get essentially our DRBD. We get our pacemaker GUI that's being set up. That's HAWK. That's the high availability web console. And now we're going to take a quick look at how this initial deployment or what this initial deployment looks like. Let me just kick that off here real quick. Hang on a sec. So this is a behind the scenes glimpse at the process that those of you are following along right now. It will be happening on your laptops at some point during the execution of this build script. So yeah. So what happened here is it's just essentially deploying the pacemaker bar clamp and everything else happens in chef essentially. So and then like once that is actually run and completed, and we should see here popping up shortly that it's actually sort of deploying a pacemaker cluster. There we go. And now we're essentially monitoring what that does. And there's our pacemaker cluster that's running here. So it's it's been completely configured for two nodes. It's got fencing set up all the things that basically people sort of hate to do manually on pacemaker clusters. All of that is done automatically here. And that's basically what a healthy node like looks like. And then we could also basically look at this live here. Hang on one sec. There we go. There we go. So that that would be the the high availability web console called Hawk. This is actually sort of the the complete configuration already. So as you can see here there's 39 resources configured. If your stuff is still deploying, then you will see, you know, maybe fewer resources and more coming online as we go. And now this is the live one and and Adam is quickly just going to walk you through how to find out the the IP addresses here for the for for this environment. So feel free to cruise this year. There you go. Yeah. So this is the the environment that should be coming up bit by bit as your vagrant boots up the the VMs. And it's hidden by the the chrome thing. Oh, we don't have the that's fine. Well, I just want to show the yes a different URL. But so if you go to HTTP colon slash that's local host colon 3000, then you'll get this web interface. So even after you've already basically after the first node has finished booting and the crowbars finished installing, you should be able to see your local installation of this web interface. And then as the other nodes join, you'll see them appear here. Initially, they'll appear with just MAC addresses referring to the interface that the that the second oh yeah. So crowbar crowbar the username and passwords for this web interface on port 3000. So local host vagrant is setting up an automatic port forward from local host to the virtual machine just to save you having to type in the IP address. You can access it directly as well. But it's just a shortcut. And just in case they got lost, if you open the crowbar interface, the username is crowbar. Password is crowbar. Yeah. So I just quickly show you like a whistle stop tour of the crowbar interface so that some of what Florian was describing earlier comes to life. So we already have this is the nodes dashboard. Obviously, if you go up here to bar clamps, then we can see all the bar camps. And again, just a reminder, the bar camps are essentially plugins to crowbar that provide units of provisioning stuff. So from the names on the left hand side here, it should be pretty obvious what those particular components are that it's provisioning. So all the bare metal and the crowbar registration process for the nodes happens with the core crowbar bar clamps at the top. And then the ones lower down for deploying OpenStack. And the first one of those, as you can see, is the pacemaker one, which on here, we've already deployed. And we have the cluster up and running with some of the OpenStack services in already for you. Probably it won't have got to this stage yet. But I'll just go into the pacemaker bar clamp setting so you can see some of the things that Florian was describing in real life. So the first thing to note here is that you can have multiple clusters simply just by clicking the create button and giving a new cluster a new name. And then once you've done that, you can go into the cluster here. I'm just going to stick with one cluster because we're running it all on a single machine in VMs. But you could easily have multiple ones. So if we go into this, then we have some of the options that Florian was mentioning before. So the stoneth option is currently set to shared block device or stoneth block device, depending on how you want to call it. But there are other options for IPMI, out of band management and so on. Even a sort of non-production thing for using talking to the libvert hypervisor. If you're doing this with libvert, then you can use libvert as the stoneth mechanism, but obviously not for production. There's the disk, the shared disk device parameters that Florian mentioned earlier, email notifications and so on, DRBD, HA proxy. We've just stuck with all of the defaults for this demo and that should work fine. The most interesting part of this screen is the allocation of crowbar nodes to this HA controller cluster. So crowbar has the not just bare metal provisioning and registration, but hardware inventory and network management and so on. And once that whole process of getting a node into the crowbar environment is complete, then it will appear here on the left as one of the nodes which is available for allocation to a particular role. So to add it to the cluster, you simply just drag the node from here to the right-hand side to the pacemaker cluster member role. So it looks like that. Obviously controller one's already in there, so it's not going to, it's just going to say it's already assigned. But this also means even though we have the two node cluster up and running, if I had another node spare that I provisioned at a later date, it would just appear here. I could drag it into there, hit apply, and it would join the cluster automatically and then it would be, all the services would become available on that node as well, released all the active, active services. So that's a quick view of the pacemaker bar clamp. And so for those of you actually, you know, following along or running build SH, how many of you can actually get to a crowbar interface? We have a quick show. Okay, cool. All on this side, interestingly. This is the spectator side, I guess. Okay, so let me switch back here to the deck. We're going to come back to the Hawker interface later on. We do have a number of conspicuous green hats and green lanyards in the room planted at strategic points. So if you're having problems with any aspect of this, maybe raise your left hand to indicate you need help from someone who knows this environment and hopefully they will notice and come to you. Bernhard, I think he needs a bit of help. Well, we'll have to do that sort of within certain limits, of course, because we're not going to be able to sort of have people darting around. If you actually do get like completely stuck somewhere. And don't worry about it too much. Like we said earlier, you're easily going to be able to recreate this whole environment. Adam did a fantastic job basically documenting the whole thing piece by piece. So by all means, if you do happen to get stuck here, it's perfectly fine if you want to sort of follow along or replicate this tutorial later on. So I'd just like to point out quickly. So the screen I just showed and those options, you won't need to make apply those settings yourself because the build script, the build our SH after it finishes the vagrant up, booting all the machines, it's going to launch another batch script, which will automatically start applying those and building your cluster and launching the open stack services in it. So you don't need to do anything manually, but feel free to click around that interface. And you can also click you can also go to the Hawke interface on HTTPS local host 7630. And you can see a cluster view, but we'll get on to that in a second. Okay, so very briefly, so sort of what's what's special about the way that Susie Cloud does here, it uses these lightweight resource providers in chef for deploying pacemaker. And like I said, this is what affects in the background, the fact that depending on what type of node, whether you're whether you're selecting a single node, or a cluster to deploy to a to deploy a specific service on, it will automatically detect Okay, do I do I need to be deployed standalone here? Or do I need to be deployed as part of a cluster, which I always think is a really nice and sort of elegant approach of doing things. Okay, so dbd then is is also being configured for replication of your of your database and of your rabbit mq message broker and h a proxy is configured as the load balancer. The entire cluster configuration is essentially completely automatic. So it's it's pretty much completely hands off. Once you deploy these these bar clamps, you are essentially good to go. Okay, so what what this does for you is basically orchestrates and synchronize your synchronizes your data state across your cluster. It provides for flexible allocation of these node roles provides notifications and all of that is wonderful. So now moving on to the database bar clamp. This is where sort of this kind of really starts showing as to as to what this can actually do. So what the database bar clamp will do for you is it will deploy low and behold a relational database management system that you can then use for persistent data storage in open stack. And the way does that it deploys a Postgres. So Suza is to the best of my knowledge, the only open stack vendor that has basically standardized on Postgres as its relational database. Everyone else seems to be preferring my SQL or some flavor thereof. So this installs Postgres in a chain mode that is to say under pacemaker management and that includes appropriate dbd configuration for Postgres. So that's actually it's a reasonably complex task and it's actually quite different from sort of a single node Postgres deployment and it's really kind of nice to be able to do all of that essentially in one fell swoop. So let's kick this off here real quick. That's our Postgres development, Postgres deployment. There we go. Go down here so we can actually see that. So this is actually sort of what's happening in the pacemaker cluster. So well we're basically seeing a bunch of stuff that's happening there in the background and then here we go. So that's our dbd that's being fired up automatically and it's being synchronized. Then we get a file system that's being created on that in a highly available configuration and that gets fired up. And then we've got our actual Postgres services that are being fired up here as well. Let me scroll that up here a little bit and actually see that. And that's our Postgres database coming up and now that's running and then finally the last thing we fire up is a virtual IP so we can make sure that regardless of which node, which physical node that the Postgres service is running on at any given time, all of our other services can happily connect to it. There we go. Okay, next up RabbitMQ bar clamp. This is relatively similar in a way. So it effectively configures RabbitMQ in high availability mode. Again this is something where depending on whether you deploy it to a single node or you deploy it to a cluster it's configured completely differently. It also uses dRbd and there we go. Actually, since this is pretty much the same thing as we get in Postgres, let's just skip over that for now, which saves us a little bit of time at the end of the talk. And then basically what happens is in sort of an orderly, linear fashion what you're deploying is all these other bar clamps. So there's a bar clamp for Keystone which puts Keystone under pacemaker management. I'm going to skip over the ASCII casts now because you can always refer back to those later. Then we deploy Glance again under pacemaker management. Cinder under pacemaker management. And oops, there we go. What the hell? So if your four nodes have booted up you should be at the point where you're seeing the output from Vagrant is applying the various bar clamp proposals for these things in the sequence that Florian just described. And it's doing that in batch using a simple YAML file which you can actually find in slash root if you SSH it will log in to the admin node. We can show that in a second. This is interesting. Let's actually Yeah. Okay. And what we're going to do now is presumably at this time most of you are going to have this stuff essentially fully configured and fully deployed. So this is roughly what it should look like. This is roughly what your whole console should look like. So this is not the full deployment yet. I deliberately on this one that we've set up I've deliberately stopped it short of deploying everything so that we can actually see something deploy live because just a bunch of slides after a while I think eyes begin to glaze over. So we're going to do that. Again, so this is the Hawke web interface that gives you a view into the cluster. It's available. It's running on both nodes in the cluster because obviously if something goes wrong with one of the nodes you still want to be able to get a view of what's going on in the cluster. So the way you get to this in this Vagrant based environment again is HTTPS the S is very important colon slash slash local host colon and then the port number is 7630 7630 and that's for one controller and 7631 is the other controller. So you should be able to log into either of those and you'll get the same view. So your your clusters if you've got to this point now your clusters may not be as fully populated as this yet but you should be seeing the resources appear as it goes through the script applying the bar the barcump settings. And what I can do now all in a bit is is to maybe do apply another of the proposals and we can actually see chef doing stuff behind the scenes and we see the resources kick in here. Let's let's do that real quick. So we've got we've got a couple of resources still missing. You want to do that in your box. I will do it on here. I think I'll SSH from a terminal so that people can see what's going on. I think that's fine. Yeah I think the only and that's why I have a microphone stand here because typing with one hand is not fun. Maybe what we could do first before we actually deploy a new service is basically show some high availability to some of these services themselves basically show service recovery. So Adam is now going to select one of these services that are deployed here and he's just going to kill them on one of the nodes and then we'll just see how we how we recover from that from that. There's the app sign. So I'm just right. So now I'm in the demo laptop rather than the presentation laptop and now from there SSH into the admin server VM which is the first one that booted up when you run the script. Okay and in case you haven't found in the documentation the root password for all of these VMs is vagrant if you want to SSH in as root or log in from the console of your virtual box GUI otherwise either way is fine. Right so from here I can SSH into either of the controller nodes and crowbar sets up a trust relationship between the admin node and the others with the keys SSH keys shared through crowbar and chef. Okay so now we're in the one of the controller nodes in the cluster and this is our command line view of the state of the cluster. So what I'm going to do is from the admin node I'm going to run the crowbar batch script which will apply another bar clamp and then we'll look at the the internals of the chef logs as the config management kicks in and we'll also look at the Hawkins face. Oh shall I first kill the service and just demonstrate yeah okay so before I do that I'm just going to do a simple failover demo by killing say Keystone on controller one and we'll just yeah actually that's going to be a relatively boring service recovery not just a full not a full service failover but still we're going to see sort of an automated service recovery there. Which country does this keyboard come from? Oh it's a German keyboard so good luck with that but I in my defense I did warn you ahead of the tutorial here but you'll manage. Okay so the penultimate line there you see process 6181 is the Keystone. This is the Hawke web interface so we'll keep it so you can see this row here shows that each column represents which services are running on a particular controller node so all the left hand services are from one controller and the right hand from another so if you're at this stage feel free to try this out you can attempt a service failure simulation so I'm going to kill 6181 and it's already started it back up actually and you can see the warning sign up there and yeah warning sign just here as well but it started it back up so if I so you can see it started back up with a different process ID now so it's just a very simple service high availability thing but we'll do we'll kill a controller which is far more interesting in a bit. So the question was is this done through Monit? No this is actually what pacemaker is good at and does it in a way in a configurable way that you can effectively check not only for say the presence of a specific process but you might also for example talk to to a socket that that process monitors check whether you actually get good data back etc so this is the monitoring facility in pacemaker is sort of built right in it's built into every single pacemaker resource agent and we can also interface with any LSB that is sys5 in it or upstart or or systemd unit or job or script. Okay so we have all the bar clamps up to neutron deployed so the next one is nova so I'm going to kick off a deployment of nova in the cluster and we'll watch the internals as that goes through and I could do this through the web interface through here just by clicking create and then setting the parameters but I've got a script that will do the same thing and this is the script that your builds are already running automatically oh yeah so just in case you're interested the the yaml file that this this tool is building you it's it's in the the root directory on the on this admin node here so you can feel free to have a look at that I'll just flash it up quickly so there's just a section for each bar clamp and there you can see in the middle of the screen is the one for nova so it's very simple really it's all it's doing is taking the defaults and well more or less it's enabling well actually kernel same page merging which in this case we're using virtual box not live so that's not particularly useful but it doesn't do any harm because we're not using that hypervisor but it allocates the controller services from nova like nova api and so on to the cluster and then it assigns the compute one node that we built earlier to be the hypervisor and it's a QEMU software hypervisor in this case so I'm going to build that and actually I'm going to get another terminal ready so that we can tail some logs as that's building okay so that's basically just another terminal that we're sort of opening here on on the admin nodes so that we're able to tail our logs as we go along okay there we go right so this is just going to watch all the activity that is happening behind the scenes from the chef configuration management platform running on the individual nodes as orchestrated by crowbar as we apply the nova installation process okay so it's created a configuration and it's now committing it there we go so this is the log files actually on on both controller nodes and in fact the compute node as well so there'll be an awful lot of stuff here you'll you'll see various chef log lines flip past referring to pacemaker I won't go into details about those now but the interesting part of what it's doing here is the synchronization because at certain points it has to wait for all nodes to reach a certain point in the configuration before proceeding for example you don't want to insert a pacemaker resource into the cluster for nova api until which is an active active clone resource until you have all the nova necessary nova packages installed and configured and ready to run it because if that resource gets inserted onto one node in the cluster then it it will run across all of them because it's a clone resource so there are various interesting challenges that we've had to overcome for this to work oh let's switch to the hawk interface and as you can see so there in the background we got right now with the point that we got to was just to a nova managed db sync so we're basically we're firing up our nova for the first time we're populating our database with the appropriate schemas and then now we've got a bunch of other nova services such as nova console auth and so forth now we're actually adding those services to the to the ha proxy configuration as you as you could see from some of the lines that flew by there and at this point we're actually starting to add these services to the serum configuration they're being added here with the stop roll which means just add them to the configuration for now and hold off for a bit before you actually stop start them and here we go so these are all of our nova services that we're now sort of being deployed in a highly available fashion across those two control nodes so now we've got and not only a nova api service but we've also got a a a scheduler console auth etc so and as you could see here that bar clamp is just about to turn green so it's still in progress here but we should see that that little light turn green in in just a little bit and then we can continue on with horizon and then we're actually done with our with with the with the configuration of sort of a basic compute cloud and if you then want to sort of continue on you can still add heat to that you can use you can add swift to that whatever you would like so it's up to you like how depending on how far your laptop has got through this it to apply all the bar clamps and have all the open stack services up and running takes quite a while but you know depending on your laptop the amount of RAM you have available and the processors and SSD versus spindles and so on you don't if you want to attempt to fail over and do nasty things to your cluster you don't necessarily have to wait for the whole cluster to come up you you know things might get a bit messy but i've documented both the the failover the process for simulating a catastrophic failure of the whole controller node well that's just simply you just hit hit the reset or power off button on virtual box and just watch the hawk interface running on the other controller node not the one that you've just killed otherwise it will just freeze obviously so you can you can either wait for this process to finish on your build completely but you can also just yeah try try the simulated failover yeah so should we i could make it do all of the the rest of them or i could do a failover now we can i mean we can maybe take some questions about how i i suggest we we continue essentially with a with a node failover and and that's i mean so we have another 30 approximately yeah but i'm sure that people don't are are not hugely mad at us if you get to hit the booth crawl a little early so that's good i guess so anyway let's go ahead with a with an actual node failover because that's ultimately sort of what we're building ha systems for and adam if you could perhaps show us a serum mon that we keep running in one of these terminals and also then we can sort of flip back and forth between flip back and forth between there we go okay between that and the and the hawk console is what i meant to say so here we go so as you can see we're on we started this on controller two now for those of you who are not familiar with pacemaker pretty much any node in the pacemaker cluster can act as a management node so therefore uh you can not only connect to the cluster but also configure it from any node in this case we're we're connecting from controller two but we of course get the full view of the of the cluster here so as you can see all these resources that we're running here and because adam cleverly connected to controller two to fire up serum mon i'm surmising that he's now going to kill controller one um so we can actually see this stuff failover i'm gonna have to do this from the other laptop because virtual box is running on on that so yeah well actually i could just do an echo be let's go ahead and just kill it from virtual box just to be just to pretend that someone actually don't believe me what i'm doing yeah just but you know the chicken is ripped up and that's fine so what we're what we're now gonna do is we're effectively gonna pull the power plug which for those of you familiar with ha uh environments is actually sort of the uh the simple failure but uh there are uh but pacemaker actually handles more complex failures uh really really well as well when provided that you've uh that you've configured appropriate fencing i think actually this hawk interfaces the one running on controller one that's why i suggested you just keep the serum on open so let's just oh i see well it's nice to see i'll switch to the other hawk one because it's nice to see it happen in hawk as well we can flip between them as it as it's failing over you could have just hit f11 in the full screen mode now are you back in full screen there you go f11 here so yeah this is forwarding to the local port forwarding from the host to the other the hawk running on the other controller mode oh did we forget to say the login for this is ha cluster and crowbar crowbar sorry uh that may not even be documented whoops it's ha cluster and crowbar that's why you come here you know find these things out someone should have screamed if you were pondering that okay so we've got those two views of the same cluster from controller two i'm now going to kill controller one which is particularly mean because controller one is currently the master for d rbd so which will affect the database and rabbit um all the active active services um it will just simply carry on running on controller two and not much will happen but that that will fail over okay i've just hit reset on controller one take a few seconds uh for the monitoring to kick in there we go there we go so the uh the monitoring is set to uh 10 second intervals by by default so and you'll get an average of a five second wait and you can see a bunch of services are now stopped on one of the machines and stuff is happening um and yeah now we have the other side is the master for d rbd there's one sort of important thing that i want to point out here and that's sort of it's another sort of point of differentiation or a point of how certain vendors do certain things in in different ways as i foreshadowed earlier on uh as i said pretty much all the open stack vendors these days use pacemaker for service management but as you can see here the susu approach is one where you're using a relatively small set of virtual ip addresses in fact there's only three uh there's one virtual ip address for your database um that's this one up here oops sorry there we go um so you've got a virtual ip for your database you've got a virtual ip for rabbit and then you've got a virtual ip for essentially all the open stack services um now that makes the whole thing relatively simple um as a setup um uh it has a a bit of a downside which is um the fact that if uh during the time a service recovers because the ip address is still available um that service will temporarily actually have sort of a user facing um say service unavailable like a 503 or or something like that um other vendors use a slightly different approach here which is uh a virtual ip address per service uh which means that you can always essentially the the minute something is wrong with the service you can tear down the ip so you make sure that there's actually no client accesses hitting that at all uh just depends uh it's essentially it's a conscious design decision in this case uh it's been done this way it's not any better or worse than the other but it's something that you ought to be aware of um there is actually sort of a uh an improvement that i had the liberty to propose last week um so that may actually you know become uh even more robust in the uh in the near future uh but that's the way that it's currently done and as you can see uh you've got ha proxy itself that is being managed as a highly available resource basically with a with a virtual ip address and then all the other services that act essentially as back end servers or real servers to uh to to ha proxy and um as you could see in like less than 30 seconds the entire failover completed um the uh the other node is still marked as offline so suppose it actually suffered some sort of near permanent or catastrophic failure uh that's fine the services still hum along uh nicely uh you make sure that not only are the services available but also that they actually have uh and find the data that they uh that they previously had um i just just wanted to prove that and we can still nicely interact with our open stack services there we go and that's our keystone that's still happily alive and and and working nicely um as you would expect it to so that's pretty much exactly what you want like i said um this is um you can also get you know highly available open stack from marantis you can get it from rehat you can get it from ubuntu uh this is one way of doing it and uh in my humble opinion it's actually pretty elegant way um of of of deploying that um okay so let's see uh you may have noticed earlier on that uh my my presentation here went berserk briefly uh let's see if that is treating us any more nicely now uh there is no berserking anymore not wonderful uh so just a few words about sort of a little bit of background uh on how this is actually configured um so uh your your neutron configuration in this in this default config uh is with uh open v switch with gary tunnels through uh the multiple layer two uh modular layer two uh plug-in the ml2 plug-in um vlands are also supported um and uh and that's that uh i don't think you're advocating nova network at all right which is great uh this is um actually this is a bit of a departure of what other vendors are doing so for example if you deploy um uh if you deploy rdo then it still has a nova network option but i think it's great to just go neutron only um i think that's perfectly fine okay uh a few words about the nova bar clamp here nova underpace maker management this is the stuff that adam just deployed um and you can also refer back to the little asky cast that we have for that and then finally uh there is a horizon bar clamp uh which deploys apache and the open stack dashboard in a highly available fashion um this is actually not very spectacular uh because horizon itself is a completely stateless service the only thing that you need to make sure is that you have a number of uh of apache instances um that load the um the open stack dashboard configuration uh and through mod wiskey uh and off you go so that's really not a big deal and then the only thing that gets actually is actually being put under pacemaker management is uh we have the several apache back ends uh that are plugged into the ha proxy configuration and then a virtual ip address uh that um gets an additional port that we can then use to interact with uh with horizon with the open stack dashboard um now uh how do you test uh for high availability uh it's basically if you've deployed this um the stuff that adam just did a moment ago um you can retrieve your horizon url for example from crowbar and then you can select the the the default uh open stack tenant you can use it as you normally would any other uh open stack uh installation the only thing that you're going to notice is that there's actually isusa logo um in the left top um and then you can do you can start doing bad things to services um such as for example if you want to uh go ahead and kill your keystone or kill your nova api or whatever you'd like and then you can watch your services recover automatically either in crm one or in the high availability web console in horn um and frequently the recovery is going to be so fast that you're not even going to notice so what i like to do is i actually do a uh kill dash nine process id semi colon crm one to make sure that it actually starts monitoring the cluster recovery immediately um and then there we go and then you can also start doing bad things to nodes um such as a hard power off or a hard reboot uh or uh or or triggering uh with echo or echo b to syserq trigger a hard shutdown or a hard reboot whatever you prefer so echo o to proxyserq trigger uh actually shuts off the machine without doing anything else and echo b to proxyserq trigger triggers a hard reboot and um we have there is a document well there's a couple of documents in the same demos slash ha directory one that um goes through the process of of doing a simulating a failover another one that explains doing recovery afterwards if you want to then um clean up your cluster again and then try doing some more failover testing or or deploying new services um and then you can also watch not only um the the the node being properly marked offline or fenced or whichever uh but also you can watch the the services fail over uh automatically with against your mone or with the high availability uh web console and uh yeah we already showed that so that's good so quick summary um things that we wanted to show you today and i hope we did um what is the motivation behind open stack ha why do we need infrastructure ha in the first place recall we have a bunch of services that we need to be available and we have a bunch of services that also need some form of replicated or shared state um i briefly went over uh some of the other vendors approaches to open stack ha like i said um on wednesday there's going to be another talk that goes into that in more detail uh and uh and adam then demonstrated how this is actually done with uh susie cloud ha so recall uh crowbar chef deployment pacemaker in ha proxy for um service management at high availability um and uh and then finally largely uh dr bd or share storage for storage and data availability uh okay um if you want to use or reuse the slides you are certainly welcome to do so uh they are under a creative commons attribution share a like license so if you find anything of this useful please by all means go ahead and use and reuse it um this is uh the link to the github repo uh containing the sources of the slides so the the qr code that we showed at the top of the talk was the slides rendered uh and and these are um these are the actual um slide sources yep so the the uh intention uh for the future of the vagran repository which contains the vagrant file and the box definitions and all the documentation about how to set up this environment and play around with it um the intention is to maintain that and keep on providing it and hopefully update it for uh future releases and so on so um feel free to keep keep an eye on that as it as it develops but as it stands you should be able to you know within an hour or so um stand up the entire environment um that we've shown you here um on any machine assuming it has enough enough ram and then of course you can also deploy this whole thing to the bare metal which is what it was originally built for um and uh see and at least like from the from the crowbar side it's essentially it's a it's a completely linear process you basically start out with your pacemaker bar clamp and then you start adding bar clamps as you wish uh or of course you could use basically a crowbar batch job to to affect the same thing um and with that we've actually come to the end of the tutorial we are I think a little early which means that we can take a little more time number one for questions and number two if any one of you followed along and got stuck somewhere along the way we can we can take we can basically help you on your on your machines here and we'll be happy to do that um if there are no further questions then yes you have okay there's questions go ahead Ralph do you want to take that no let me back up on that I have a slide for that that I skipped over earlier just a second uh hang on a second oh come on oh where's my slide no that's taking too long anyway uh so uh what you're highlighting is a uh what you're highlighting is an important point uh with high availability uh to the uh neutron uh l3 agent uh in uh as of I think in grizzly we got what at the time was called quantum scheduler um which allowed us to um distribute uh uh virtual routers onto multiple l3 agents but the assignment was completely static which is bad because if you uh if that l3 agent then dies then there's no automatic way to fail over and um what SUSE ships is a thing called neutron h8 tool uh which basically detects okay well here's an l3 agent that has died so now we need to basically grab the uh the virtual routers that were assigned to that agent which you can do essentially with the equivalent of neutron router agent list um what is it agent router list I forget whatever it is um and then you can fail over this is actually uh an important topic in my wednesday talk because uh there are some really interesting things that are happening in open stack right now with uh dvr and highly available um l3 agents uh which are marked uh experimental for juno but are expected to be fully supported in kilo and uh when that happens then we're essentially expecting to sort of be able to do away with this kind of approach where we sort of have to do this this this manual um this manual translation this manual shifting of uh yeah that's a fair summary so uh we had to write a new resource agent that wraps around the that's going berserk again for whatever yeah I was trying to switch it away yeah no worries um yeah so that we have this fairly lightweight custom pacemaker resource agent that wraps around the tool and runs uh I think the stop action doesn't do anything but start and monitor uh well maybe it's just start I can't remember but if you want to know more details Ralph is at the back and he wrote it that's that that's a general problem that we have though in in open stack as as up to ice house uh and and that's that's that's sort of exactly the problem that uh that the highly available l3 agent uh in juno fixes which not only um make sure that the agent remains available but actually does connection tracking uh replication so basically contract the connection state replication uh such that when we fail over uh we actually don't even we should well most applications shouldn't even be uh required to um re-initiate a connection which would be cool sure because there's there's not much to figure out there um it's just yeah it's just that's just sorry limitation of yeah it's yeah yeah it's just uh yeah there's a brief time where the network is disconnected but you don't need to restart any instances or something the network continues but but yeah but the but the problem is that if you have any outgoing connections like for example tcp connections uh then the then then that uh connection may need to be restarted from either inside the instance or from whatever service outside needs to connect into the instance because no connection tracking right that's true but you don't need to reboot any instance no of course no you don't need to restart the instance okay do we have more questions sir for the general plenary yeah one more go ahead whatever you want just any any any block device so it just happens to be sdc here because sdb was what we used for drbd and that just happened to be the way i ordered the disks in the scuzzy controller setup and and that and that is only if you actually want to use spd for fencing right but we had this same discussion this morning in the uh in the in the in the in the ha design summit session uh which is what what assumptions do you make do you make do you make assumptions as to for example the availability of ip my devices that you can use for fencing uh or do you not and then you you have to you have to have some other sort of fencing facility yeah so you can see here you can use any any block device path and it can be different per node as well depending you know whenever you've set up and if yeah if you choose a different stoneth option then you get different parameter options uh well yeah in that case it there's a separate ip mi bar clamp for taking care of the i low type stuff so it just refers to the configuration of that one so it's all yeah the parameters that you might need to change are exposed and the ones that you probably don't are not exposed although there are still something behind the scenes that you can tweak um that are not exposed in the interface so there's a raw thing here as well um and there's a of course a rest api and you know come online interface and so on so uh you can do it programmatically uh well it depends on your fencing device but so for sbd um the default time out is three seconds um although actually for this environment i've upped that to 30 seconds because vms can be pretty sluggish um but yeah it depends on the defense fencing device in question all right so if anyone yeah has has had problems with their setup that they want us to look at or or just questions about it and then feel free to grab us afterwards and we'll see you at the booth crawl thank you and enjoy the rest of the conference thanks