 I've worked mainly on the application side, Java, you know, some of my SQL, really more on the no SQL side, so I'm pretty familiar with distributed systems and scale out. And with that, let me introduce Jacob. My name is Jacob Walsik. I am a solution architect in the Rackspace Private Cloud Group. My background is all in infrastructure. I've worked for some of the largest universities in the world. I've done high scale web commerce, a lot of kind of media facing sort of applications. So I have a lot of experience building designing infrastructure. And we're here today to talk about both what HA is, as well as talk about a few different options for making MySQL highly available within the context of your open stack environment. And one of the things to keep in mind about HA is that it's not just as simple as running to. It's a matter of having some way to ensure your application stays available at all the various layers. We look at this chart over here that talks about HA S9, which is how we typically see it represented, either in SLAs or various vendors talking about how their product is going to make your life easier. The further down that stack you get, as you go from days to hours to minutes to seconds of downtime over the course of the year, generally the more complex your solution is going to be and frequently also the more expensive it gets. The way that we implement HA is we eliminate single points of failure from the environment. So at its most basic level, it is simply running to or more of something. But there's a lot more to it than that when we start dealing with services such as MySQL. We need to have our clients able to talk to a single source or they need to be aware of the fact that they have to talk to multiple sources. Behind whatever that bit might be, we need some way of distributing our workload, simply some sort of load balancer. And then for data that's refraining applications that are going to have stateful services, we need to have some way to make sure that our data is getting replicated between those instances of that application or that they have some single source that they can share that has that data. When we talk about stateful versus stateless, we look at the API services that are part of any of the OpenStack programs. We're looking at a stateless service. It's HTTP. It's a web server. It's very, very basic. We can run a lot of them. We know how to scale them. We know how to distribute that workload. Stateful services can be a bit different. They're all a little bit specific in how you implement failover because you have to make sure that they have access to whatever the data is that state information. We also want to talk about what mode are we going to use for our failover. With a lot of networking devices, for instance. You might have two firewalls that are set up as a active passive pair. There's some sort of heartbeat connection between the two of them. And if one of those physical devices dies, the other one takes over. Very common solution to use with our single-corded net devices. For services that are, we can run in an active, active setup. That means that we have more than one of them. We have some sort of mechanism to distribute load between them. The different methods that we're going to talk about for MySQL today are going to be a blend of these two different options. The first solution that we're going to talk about is the one that is outlined in the OpenStack Community Guide for implementing highly available MySQL. It's also the one that Rackspace uses on our public cloud platform. It's to use the, whatever, CoreSync, Pacemaker, and DRBD. The first component we want to talk about here is DRBD itself. Of the three options that we're going to talk about today, what makes this one unique is that the data replication is all happening outside of MySQL. When you're using this setup, MySQL has no idea that it's underlying data as being replicated. We're talking about the common use case for DRBD, and there are options to do this differently, but we're going to talk about the most well-known, most well-used, is a active passive setup. You're going to have two nodes. You're going to have block replication happening between them, but you're only going to be reading data off of one of them. That means that this option is going to have some limitations. With some of these solutions where we're doing our replication within MySQL itself, you can do things like run backups off of a passive node. In this setup, you're going to be doing all of your active queries against one node, and you're going to be taking all of your backups off of that same machine. When we expand this out, we have a few different components that are going to work together to make MySQL failover for us. We have CoreSync, which is providing the cluster management engine, letting us set up our heartbeat. So it lets two machines know that, yes, I'm still here. Or it notices that that second node has gone away. Pacemaker is used to actually facilitate doing things like starting MySQL on the passive node in the event that the active node dies. DRBD is what's keeping our data in sync. Again, very important to keep in mind that MySQL has no idea that its data is being replicated. So you do need to have some way of telling MySQL on one node to stop doing work before it starts doing work on the other. This is a tried and true solution for implementing highly available MySQL. But it does come with a rather ugly limitation of you're on two hosts. If you need to do a scale out solution, this is not going to be the best option to go with for a highly available setup. However, if you're just talking about building a cloud that is some hundreds of nodes, those two MySQL servers are going to serve you quite well for a long period of time. So we totally have about three solutions that we're going to discuss. But obviously, these are not the be all and the end all. There are obviously a lot more solutions than these. What I'm going to focus on is HAProxy, Kipple ID, VRRP. And then the next is Galera. Hopefully, everything works well and I'm keeping my fingers crossed. We'll do a demo on HAProxy, Kipple ID, and VRRP. And then I'll kind of talk about some of the steps that you would have to do to use Galera replication. Jacob talked about the fact that you need replication services. You need some kind of a heartbeat mechanism, right? And really, if you break it down to those, if you don't want these schemes, you can go ahead and implement something else as well. Just keeping in mind that you need some of those components to make HA possible, okay? So just so that I get an idea, how many of you are experts on HAProxy, Kipple ID, and VRRP experts? Okay, so I gotta watch out for two of you guys, but other than that, I can say anything, right? So what I'll do is walk through really quickly. And like I said before, pretty straightforward to implement this. What I say is, if I can do it, pretty much anybody in the room can do it. So Kipple ID is based on the Linux virtual server. And the actual protocol that's implementing the failure is based on VRRP. So Kipple ID and VRRP work hand in hand. And the idea is that it uses something called as a virtual IP, okay? And this is a layer three IP. And essentially what it does is, if this particular node goes down, that IP automatically kind of moves to the other node, to the node that is alive and well, okay? And when it comes back up again, if there is a failover, if there is a failback, then it goes back to the node. But otherwise, it's still pointing to the original node. And only if the original node dies, then again, it's gonna float to the other one, okay? Or it's gonna go to the other node, okay? As you can see here, basically what I've set up is, I've set up a controller one and controller two, okay? And all these demos are gonna be based on the Rackspace Private Cloud, which provides some very easy Chef recipes to make this happen. In fact, all that you need to do is provide two roles, HAController one, HAController two, okay? And it'll inject all the appropriate scripts with Kipple ID, with VRRP, with MySQL, and so on to make it happen. So really, if you install it, you really don't have to do anything after that point, okay? So what I did was, I set up two controllers. Controller one is using an IP address of 192.168.246.11. I did it all on Vagrant on VirtualBox on my laptop, okay? So you can try it out yourself as soon as you go back, okay? Controller two is on 12, okay? I could make a direct call to either 11 or 12 for MySQL. But that's something that is not recommended, right? So what it does is it uses master, master replication. So as long as 11 and 12 are both alive, you're fine, right? But the moment you're using only 11, okay? And if that particular node goes down, now you're in trouble, right? Because even though 12 is alive and is able to do whatever it is that you want to do, you cannot access that, right? So typically what you do is you use the VIP. In this particular case, it's 192.168.236.198, okay? So I'm using this for MySQL. I use 197 for the API services. The API services, as Jacob pointed out, is stateless. And really what it does is it's proxied, so it can really go to any node. Whereas MySQL is really more in an active passive mode, okay? So it's basically going to one of the nodes. So essentially what happens is if one of the nodes goes down, the other one becomes the master, and so on, okay? Yeah. HAProxy is, I'm really not going to spend too much time on it, but basically it's a software load balancer. And like I said, if you're using API services, which are stateless, it doesn't really matter which node it goes to. You can kind of effectively make use of the two controller nodes. This kind of puts everything in one slide. Essentially you have an application, right? And again, the beauty of this is, even though we are using this for MySQL, you can do this for any application that you want, okay? So if you're doing your own application, essentially what you have to do is you have to provide the replication service services, you have to provide the heartbeat services, right? Or you can just leverage what's available, right? You can use Keep Alive D, you can use VRRP, right? The replication might be a little bit different, right? Depending on what kind of database you're using. If you're not using MySQL, you may have to use something else. So I have active passive infrastructure service on controller one, which is MySQL and Rabbit. And controller two is using really more of an active passive mode, okay? And API services are always active, active, because it really doesn't matter. It can go to any of the different nodes, okay? So this kind of illustrates what happens when a particular node goes down, right? So effectively what is happening is that the VIP is no longer pointing to controller one, it's going to be pointing to controller two, okay? And really from an application perspective or somebody who's using OpenStack, you don't even feel, you don't even know what's happening, right? It's just a speed bump. There may be a little bit of a pause when the VIP is transferring over from one to another and then you're ready to go, okay? All right. So what I should do at this point is probably change pace a little bit and kind of do the demo. So can we switch to the demo laptop and see if I can get this going? Thank you. Just one second. Okay, so those are really, really small. I don't really care about too many of those windows. I just care about this. Can you guys see that? And let me go with this one. That's good. And maybe use another one. Okay, so this is the controller one, okay? And what I'll do here is I'll just list all the services, okay? So as you can see here, there are a bunch of services, L1 is running on L1 and a bunch of services which are down, basically L2 is all down, okay? Everybody see that? Okay? So what I'll do at this point is I will bring up L2, okay? So let me bring up L2. Thank you. Okay, so while it's booting, I will show what is happening with the WIPs, okay? So essentially you can see here, can everybody see this? 192, 236, 198, okay? So what is happening is 198 is assigned to the controller L1. Is everybody with me, kind of, more or less? You know, pretty straightforward, right, at this point. So what I'll do is I just, you know, let's see if L1 is up, I mean L2 is up, okay? So everything is up right now, right? So what I'll do is I'll shut down L2. I'm sorry, shut down L1 and see if the WIPs move over to L2, okay? So as simple as this, so I'll go back to, and I will halt L1. You know, it's probably the easiest thing to do, okay? So, okay, that's good. It's shutting down, okay? So now what I'll do is I'll go into L2 and see what happens here, the first thing I may want to check is I see that the WIPs have basically transferred over from L1 to L2, okay? So effectively, you know, if I do my commands, you know, which are using, oh, I should do that, right? Then, you know, basically, it's as if nothing has happened because the WIP kind of takes care, okay? Does that make sense? You know, how the WIPs transferred over, you know, I can go ad nauseum, you know, shut this down and we'll see that, you know, things go back to L1 and all that, but hopefully you got the point, right? And I can also go into a demo where I can show, you know, using an application program, you know, if I use the address of the server, you know, instead of using the WIP, then essentially when the server goes down, you know, my application is also a host, right? So it's not available anymore because I'm not using the WIP anymore, or it was not using the WIP. So if I want to use the WIP, then I'm all set, you know, I have no problem, you know, the WIPs automatically take care of making MySQL highly available, okay? So if you don't mind, can we move back to the presentation laptop? Thank you. Perfect. Okay. So the next thing that we are going to talk in this particular presentation is Glera, which is the multi-master replication for MySQL and InnoDB, and uses a technique called WSWAP. You know, it stands for right-set replication, and essentially what it does is it uses a right-set, and in effect, it turns your cluster into one multiple master. So in effect, you know, you don't even see that as a cluster, you just write to any of the nodes, and it automatically gets replicated to multiple nodes, to all the other nodes participating in the cluster. That makes sense? It's basically a technique that's used in conjunction with Glera, the right-set replication, and you can use it for active, active, right? Really, it doesn't matter where you're doing the reads, it doesn't really matter where you're doing the writes. You know, you can read to any, you can write to any. Obviously, we all know that it doesn't come for free. There is one issue here, which is if you're writing to the same row from multiple masters, then you have a problem. So what typically Glera does is it issues an error. Basically, it errors out with a deadlock error, and we'll talk about that in a moment. But what Glera allows you to do is not only multi-master, it does true replication at row level, and no slave lag or integrity issues. And some of you who know about the split-brain possibility, you can eliminate the split-brain possibility by having at least three nodes. So if you have three nodes, actually you can work with two nodes and an arbitrator. But in my example, what I did was I used three nodes, and effectively, instead of pointing the VIP to the controller nodes, what I did was I point the VIP to the Glera nodes using HAProxy, and we'll see that in a second. It's based on optimistic concurrency control. The idea behind optimistic concurrency control is let it happen, let's deal with it. Rather than trying to lock a row and have performance issues, it's like let's see what happens, and then we'll deal with it. So if two transactions are modifying the same row, then obviously one of them will have to win. The other one will get a deadlock error. Now the problem here is that it's really not the application program that's going to get the deadlock error. It's really the open-stack infrastructure that's going to get the deadlock error. So I tried different combinations of different things to see if this happens. Obviously, I didn't load it, or superload it, or whatever. And I really don't know all the intricacies of the internals or whatever to figure out what workload might cause that deadlock error. But that's something to keep in mind, because it's not something that you can work around in your application program. It's something that the open-stack infrastructure has to do. And I'm not quite sure what happens in that case, but that's something that I'm sure some of you in the audience might have already seen it or may be able to answer it. So the application needs to handle the error. And like I said, it looks like a multi-master cluster with one big database and multiple entry points. So it doesn't matter what node you write to, it doesn't matter what node you read from. How do I deal with multi-master conflicts? Basically, one of them is going to get a deadlock error. So either you need to retry the transaction, or you need to abort the transaction. It's really up to the application. And like I said before, this happens at the open-stack infrastructure level. And again, not quite sure which workloads it might happen. That's something that you need to keep in your back from writing. But to be able to install Galera with open-stack is pretty straightforward. And this is a diagram I basically stole from the several-nines slide on several-nines site. And essentially, it talks about how you can use an existing open-stack infrastructure and add Galera to it. And I went about doing this for the Rackspace private cloud. And what I'll do is I will outline some of the steps with the caveat that this is completely unsupported. The support guys are going to kill me if I recommend this to anybody. Because all that I've done is very simple testing. But essentially, how I did this was I installed the private cloud on two controllers with HMO. And if you don't want to install Rackspace private cloud, you want to do something else. The steps are pretty similar. So you have two controllers in HA mode. So HA proxy, keep alive D, and VRRP is already installed by the recipes. The cookbooks and the recipes take care of all of that for you. You don't have to do anything. The whips, everything is set up. All that you need to do is set up the environment file appropriately. The environment file will have to have the virtual IDs for MySQL, for the APIs, and for Rabbit. And then you're all set. Install Galera on three separate nodes. So if you go to the several nines site, severalnines.com, and I have a pointer to them later in the talk. So basically, if you get there, they have a configurator, which essentially allows you to very easily install Galera with WSWAP on three nodes. If you want more nodes, I'm not quite sure. That configurator doesn't quite work, but I'm sure you can work that out. Then what I did was I took the MySQL data from the existing controllers. What the Rackspace Private Cloud does is it takes the MySQL database is installed on the controller one and controller two with master, master replication effectively. So what I did was I took the data from the controller nodes and essentially dumped it into the Galera. Does that make sense? So basically, I kept it in sync. In fact, I tried it out after starting some instances and all that, and everything seemed to more or less work fine. Then what you want to do is you want to grant privileges to OpenStack. So there are a bunch of privileges that need to be granted. So if you go into, I have a blog that outlines how you want to grant those privileges. And once those privileges are granted with the appropriate password and all that, then you're essentially ready to receive OpenStack requests. So then what I do is I update KippleID and HAProxy on the controller nodes. So what happens in the controller nodes for KippleID, for MySQL, is it checks for the MySQL demon running. And if the demon is down, then it's going to start it back again. So essentially what I do is I have to change some of that to be able to HAProxy to the Galera nodes. So instead of trying to check for MySQL demon running on my controller node, what I'm now doing is I'm pointing to the three Galera nodes. That makes sense? Pretty straightforward. And then what I do is I essentially stop and uninstall all the MySQL services on the controller nodes because we don't want to use the controller nodes anymore. We want to use the Galera cluster. Does that make sense? So it's pretty straightforward to do that. And I have a blog, and I'll talk about where the blog is at the end of the talk. So this is kind of how we went to Galera. And again, the idea behind Galera is that you can scale out and also make it highly available. We just kind of have a cool thing. You really can't quite do that with just the master-master reputation. Yeah. All right. So in summary, we've looked at three different options today for making MySQL itself highly available within the infrastructure services for your OpenStack Cloud. The first two that we looked at, PaceMaker and PovertyPaceMaker Coursing, DRBD, and the Keep Alive, HAProxy, VRRP setup are two very well-tested, well-known, very stable entities that are commonly used today. Options like the Galera, Percona, a few others for doing massive scale out versus just high availability, making sure that by service stays running, are much newer to the market and people are using them with mixed success in different types of environments. We also didn't touch on the idea of using any other databases with your OpenStack environment. One of the things that we'll probably all start to see as we all start relying on Solometer very heavily is that maybe we don't want to put everything in MySQL. Maybe we end up with multiple data stores that are providing back-end services with a place to keep their data. So here are the resources that we used to kind of gather some of the information where we stole some of our images for these slides. This deck will get uploaded to the conference slide share after this presentation. So if you want to be able to pull this down and follow all these links, that will be available for you. We also wanted to plug the book that a couple of our Rackspace friends just finished. They've been giving it away all week this week. As you guys noticed, there's no more expo floor out there today, so I do not believe they will be out there giving copies this afternoon. Hopefully you guys were able to grab a copy of that earlier this week. Okay. So in summary, there are really a bunch of different options for implementing MySQL. I think what I like to do is keep it as simple as possible. I think Keepal ID, HA Proxy, and VRRP is going to be applicable for 80% to 90% of these trials. But where you probably need scale out, where you may need more than two controller nodes, and surely there are a lot of edge cases out there, then there are definitely different options. The beauty of OpenStack is that nothing was invented, and that's a good thing, I think. Everything is based on existing infrastructure that's been pretty well tested. Corrosing pacemaker and DRBD is recommended by Oracle. It's been around for a long time. And again, Keepal ID, HA Proxy, and VRRP has been around and installed in the Linux world for quite a long time as well. But Galera is definitely gaining a lot of popularity. How many of you attended Florian's talk? It was pretty cool. Florian has? Yeah. So he talked about Galera as well. So I think there is going to be a lot more happening probably next year when we'll come back in Paris, I guess. But again, if you can start with a very simple install and keep it simple, I think that's the best way to go. With Rackspace Private Cloud, the cookbooks and the recipes are pre-installed and ready to go. So very easy to use the existing HHA infrastructure. But if you don't like that, you can slide something underneath as well. I did the Galera one, but you can as well do Percona or anything else. My SkySQL, there are a bunch of different options out there. The OpenStack HHA guide, that's again, I think Florian was instrumental for that, is a great guide. If you don't know anything about HHA, which was what I was a few months back, I think it's a great place to start. And I'm a mechanical engineer. That's my undergrad. And I like to see things move. So the best way to actually kind of get your hands dirty, kind of play around with this is just install the Rackspace Private Cloud and install it in HHA option and kind of play around with it. There are a couple of blogs that was on the resources, which talks about how we can do that. And how many of you use Vagrant? A few? OK. If you haven't used Vagrant, it's really, really easy. You know, to do something like this, you know, of course, just before the talk, I had some issues with Vagrant, but that's a different story. It's really, really easy to set up VMs and kind of play around with it, OK? With that said, Jacob, anything else? No. I think we have time for a question or two if you guys have questions. Yeah, we have time for definitely a few questions. Here's the front. Yeah, so the question, if I can understand, is that the Python application or the Python library, right? When they are using Galera, will it recover a deadlock error, right? Can it handle the deadlock error? And I don't know if anybody else, you know, I'm not a Python guy. Is this gentleman out? Yeah. It's a bit different. How offensive do you want to be about never needing to go to a pre-active active cluster that can be active access simply to a mobile audience? Yeah, that's a great point, yeah. So I completely agree that it's really at the application level, how do you handle it? But I thought your question was, does it handle the deadlock error, you know, at least, you know, flag it back to the application? OK, any other questions, comments? Like I said, you know, I haven't done any of the testing, so, you know, I think a point there that was made was, you know, if you want to be conservative, the better way of doing this would be just go active passive, right? That way you don't have to deal with the deadlock errors at all, right? That's in the, well, the other solution we talked about using Keeple ID, HAProxy, that's why HAProxy is there so that rights are only ever being directed to one host at a time. Obviously that is where we get some of the limitations on scalability, but I think we, between Rags and I, we definitely don't know the answer today as to how, you know, how some open-stack program using a C++ connection is going to respond there. Exactly, and it really depends on the workload for sure, you know. So I'm, I'm not aware of that, but again, you know, if somebody wants to comment on that, you know, feel free to. Except the guy with the blue shirt. Yeah, sure. Yeah, that's absolutely. Yeah. The master, 100, 100, good notes, the section that, there are not that much kind of a deadlock. They're all, the nine notes are mastered, they're all received, all the rights are reported. Okay. And we have no more than 20 deadlocks. Okay. I mean, the use case for MySQL within OpenStack is such that it's, it's not like an e-commerce site on Black Friday. You know, it's, it's a very, very different kind of scenario in terms of the, the chances where there might be a, a deadlock scenario. So, that makes sense. No, but, but, but it's really heartening to know that, you know, you use a nine note Galera cluster with a three master, you said, or? They're all master. Okay. Okay. No, I mean, I'm talking of controllers. So all, all nine are masters and Galera is installed on nine of them. Yeah. Okay. And there are not many deadlockers. Any other questions? Any other non-Galera questions? Yeah. All right. Thank you very much. Thanks for coming. Enjoy.