 OK. Hi, everybody. My name's Curtis Colquett. I'm with Oro. And today I'm going to do a quick little demo of doing some high availability stuff. And I'll explain that a bit more, but I just want to sort of frame this little demo in terms of, if you were a customer and you were going to use Oro to try to put up some sort of highly available like CMS or some sort of highly available system. So that's the kind of the perspective that I'm taking for this talk. So again, my name's Curtis. I'm the lead Opusack engineer at Oro. And that's all my information. So what we're going to do is do a quick example of running a somewhat highly available Django system. And this is what it kind of looks like. So it's pretty simplified. But basically we have an HA proxy node, three nodes that I sort of call zone nodes that are running the Django application as well as a cluster of Galera systems. And so I'm going to flip back and forth here to the sort of live system. But you can see that right now with HA proxy, we have three nodes up and running. And HA proxy is showing that. So what I'm going to do is go over here. And we can see that in our OpenStack tenant, we have a few servers running. So the HA proxy node, the three zone nodes. And what I'm going to do is delete that one of the nodes and then rebuild it. So just quickly, we'll delete that. So we can see that it's being deleted. And then back over here, we can see that in HA proxy, it's noticed that that node is no longer available because we deleted it. And we'll just kind of watch that. So I'm going to do some stuff and sort of flip back and forth here between the slides. So what I've done is remove the node that has the application and the database, the Galera server on it. So that's been deleted. And what I've done is use Nova command to delete that instance. And once that's deleted, I'm going to rebuild it using Ansible. So I've got a bunch of Ansible playbooks here that sort of separate out this process. So the first thing I'm going to do is tell Ansible to rebuild the node. So you can see that what it's doing is it's noticing that the first two nodes are already there. And now it's going to recreate the third node. And so that'll take a little while to run. So I guess the part about doing high availability stuff is often like trying to figure out where all your state is for your applications. So in this example with Django, there's basically like three major pieces of state that you have to keep track of and manage in a high availability manner. So one of them is the files. So for example, static files, as well as any files that are actually uploaded into Django. And the way that I'm approaching this to get high availability is I'm using a plugin with Django to upload all those files into OpenStack Swift, which itself is a highly available object storage system. So once those files are there, I don't have to do any sort of shared file system or other stuff like that, which doesn't scale very well and is hard to run. So you'd have to run like Gluster or some sort of file system like that. And that's still going. So the other two major pieces of state are the actual database that Django uses. So of course, in this example, that's backed by MariaDB, which is in a Glare cluster. And then there's also the sessions, which are also in the database. So we have those three major pieces of state. And we're keeping them all highly available with a combination of MariaDB, Glare, and OpenStack Swift. So that's still going. And we'll see how this timing works. I'm not too sure. I've done it probably 20 or 30 times. But I didn't do any caching or anything of packages and stuff like that. So that might take a little bit extra. But in general, this is a process. So when you make technical choices like this, like say when you use Glare or you use OpenStack Swift, you're making choices. And those choices come with positives and negatives. And generally speaking, if you're familiar with the CAP theorem concept, MariaDB and Glare chooses consistency over availability. And generally speaking, for OpenStack Swift, you get availability over consistency. So you're making choices when you do those things. And you have to take that into consideration. So let's see how that's going here. So as you can see up here, Ansible has requested that OpenStack recreate that system and apply a floating IP. So if we look, that system should be back. But it doesn't have any of Glare or Django application or anything on it. It's just a plain node. So what I'm going to do now is run another playbook to restore the Glare system. Or actually, one thing I'll do first is SSH into one of the nodes and just show you the MySQL settings. Let's see here, that's 1-2. And let me just hop back here to grab that config. Sorry, I'm sort of hopping around a little bit here. Where is that? Here we go. OK, sorry. Show status. Well, at any rate, I'll just show you the, let's see here. Oh, sorry. I got a little lost here. But what I'm going to do anyways is just run the start the Glare process. So in the background, now Ansible is going to use that node that I just created and reinstall Glare and bring everything back up online. And after that point, we'll have all three nodes back in with Glare. This will take some time, actually, to do so. But basically, the restoration process, once you delete a node like this, what we're doing is first I restore Glare, then I restore the Django environments. And then finally, after that, I tell HAProxy about the new node. So once it's all back and up, we'll see that the size of the cluster will be back to three. And then I can go on and restore Django. But this will actually take a couple of minutes. So does anybody have any questions or pointers so far? The idea that I always wanted to do today was actually do a demo, because that was sort of what they suggested that this was going to be. But as you can see, what's happening is Ansible is installing all of the required packages and things like that. We can actually hop into that server and watch what's happening. So you can see it's going to be stuck here for a little while installing all the packages and downloading them. So once this process continues to go through, it'll install Glare, reinstall everything. And then it'll also restore the database to each node. And we can sort of watch that as it happens as well. But yeah, so this is a demo. We just have to wait for a little while. No questions or anything? Any ideas? No? Ah, awesome. So now it's installed all the packages. That wasn't too bad. And it's going to go through and reconfigure Glare, install all the configuration files, and then eventually it will restore the database as well. So at this point, we can see that it's doing some backup stuff. So it's using extra backup to the nodes are talking to one another. It's going to pull that data back into this system. We can kind of see the WSREP extra backup stuff happening in the background. And so this node that I'm on in the bottom here is the brand new node. So we can see that it's only been up for a few minutes. And we'll just kind of watch as this happens. With Glare, sorry? So in this particular example, OK, so that's part of the problem maybe with doing high availability is the complexity, right? So when you choose to use something like MySQL, Glare, or MariaDB, Glare, that comes with some additional complexity for sure. But in this example, actually, I'm using community playbooks. So somebody else has already done almost all of this work for me. So in a way, I'm just running Ansible using those playbooks. And it's doing it all already for me. Yeah, sure. Sure, I can throw that on at the end. We can talk about that too. Let's finish off reinstalling Glare. And we can now see that this node has the Django database reinstalled on it and everything like that. So the next thing we'll do quickly here is put back Django onto the nodes, also using Ansible. So this is just going to go through and put all the Django code back onto the node. So each node has a copy of the actual code. And yeah, I can show you. No. So you were asking about the playbooks. So here's an example of. So there's a whole bunch of roles inside of this playbook. Sorry, the text on the side might be a little bit small there. So there's three main Glare roles in this Ansible system. So there's the common config and the bootstrap process. So all of the tasks and playbooks are all in here. So this is some of the stuff that's happening when the Ansible is running. Does that sort of help to, yeah. And this is all open source stuff. So right now it's just putting back Django. Any other questions or anything, or pointers, yeah? Yeah, it's in the cloud. So in this case, do you mean like are the Glare nodes, are they talking to one another like over at the seller with encryption or something like that? Yeah, no, so it's all plain text, yeah. Yeah, so this in most clouds, what you're going to get is your own private tenant network, right? So in a way that's, you got to hope that that's secure anyways, right? So I know what you're getting at. And you could also probably add some encryption like the over the wire if you wanted to. But in this example, no, it's all just plain text. Ansible is communicating with them over SSH, so that's encrypted. But everything else is plain text, yeah. Yeah, so maybe I'll talk a bit about I'll go down here and talk a bit about the Swift part because that's what's really interesting to me. So there's a kind of a plug in or whatever you want to call it a module for Django that allows you to use OpenStack Swift as the storage backend for files. So that does not only static files, but also like any files that you upload into the application. I don't have a good example of a file upload for Django right now, but I just didn't have time to do that. We can see when we go to the actual admin screen here. Oh, what happened? Oh, that's weird. Well, anyways, we can see on the page source that the, oh, this doesn't show that, yeah, right. That's why I wanted to log in there. So I just wanted to show you where the files actually come from. That's weird. I guess because the patchy might be up already that there's a little bit of a problem there. But when you use Swift with Django, when you upload files, they actually go up into Swift. And so in a way, it's kind of like a CDN in a way, but yeah, so that gets you away from having to use an actual file system. Unfortunately, I don't think that there's similar kind of plugins yet really for other major content management systems like Drupal and stuff like that, but Django definitely does have that kind of thing. And in this slide, I'm just showing you an example when I use a Swift command line, I've configured the Django application to actually use a specific container and all of those static files are in there. And this is the example of the actual source once you get into the admin page. And you can see that the CSS files are actually coming out of OpenStack Swift as opposed to the file system that the Django is actually on. So now that that's done, we have the demo of Django application has been installed. And then finally, the last thing I could do, which is a lot quicker than the rest of the stuff, is to run HAProxy Ansible Playbook and have the HAProxy config restored so that the IP from the new virtual machine is set up in the Ansible config. And I think that should mean that we get this back. Yeah, there we go. So in this page source, we can see that this CSS file is actually coming from OpenStack Swift. And yeah, so sorry, that was a little awkward, but basically what we did was we had three nodes, we deleted one, and then in a few minutes, we automatically restored everything on that system. And everything was still up and running. There's some things we could do to make sure that that actually stayed up when that other things went down. But in general, as far as HAProxy knows, everything's back. And that's kind of it for my little demo. Does anybody have any questions or anything like that? Yeah, sorry. Yeah, so I think Percona is right behind us. And I believe they're the ones that actually have worked on a lot of the Galera stuff. But yeah, in general, what you do is you have at least three nodes, or you can have two nodes and then an Arbitr, but you need like three systems to sort of end up, if one goes down, then they can like make sure that there's enough systems to keep everything running still. I'm not using the right word for that, but in general, yeah, that's why I have three nodes, yeah, that's why I have three nodes in our system. And there is some additional complexity to running Galera, obviously, but you kind of have to, even in OpenStack world, like that's basically what everybody does to achieve some high availabilities to use Galera in the back end of OpenStack Cloud. So that's what we're doing here. But there is additional complexity, but it's actually a pretty good system. And as you can see, it's really automatable, and you can set it up so that it automatically restores the database and is up all the time, yeah. So how long did it take to me to use Galera for this example? I haven't personally had any major problems with it, there's some things you have to do in addition to learning how to configure it. So there's more configuration options and things that you have to figure out. So there is a bit of work. I don't know if I could give you a number in terms of, it's very similar. Like it's still MySQL in the background. There's just like this extra piece that does the clustering for you, yeah. But there is additional complexity, and you do have to learn a little bit about it and mess around with it and play with it and figure it out. Probably the biggest thing that people don't get is that if you shut all the nodes down and just start them all up again, that doesn't work. Like you can't just bring them all down and bring them all up. One of the nodes has to come up first as a bootstrap node, and then the other nodes can come up, and that's what the playbook was doing already. So that's the thing that people probably get confused with the most is they shut down all the database servers and they bring them up and it's not working yet. It doesn't work. One has to come up in a bootstrap node, yeah. No, just one of the nodes has to come up first yet. That's a good question. Like in terms of data consistency, I'm not trying to have to look that up, yeah. But in general, I usually just restart the first, like I have the first node that I call the sort of the first node, and I always bring that one up first, yeah. But that's a good question. It just doesn't, none of them will start. Yeah, yeah. So it just stops, like, and people are like, well, why didn't my database come back up? So yeah, but I think that's all the time that I have. Sorry, that was a little kind of awkward and weird, but we did actually delete a node and put it right back and reinstall everything. So that was what I had hoped to accomplish. So yeah, thank you.