 All right. How's everyone doing? Good summit so far, I hope. It's Thursday. If you're not sure it's Thursday, you might be at the wrong place. My name is Billy Olson. I am a software engineer at Canonical. I've been here for about 16 months or so. It's one of the greatest places to work. If you're ever considering an opportunity, please look us up. We're always looking for well-qualified people. Earlier this week, I gave a talk on upgrades in your open stack environment and how to plan for it and what to look for. Today, we're going to follow that up with actually putting my mouth, the money where my mouth is basically, and actually following up with doing an actual live upgrade in place here. As we get through this, we'll take a look at how to do an upgrade with juju, what makes it easy with juju and how we can actually run that. In case you didn't notice, we're here at the Mataka Summit, which means that Liberty is here. Specifically, Liberty is here on Ubuntu. Along with all of our previous releases of OpenStack and Ubuntu, our release schedules line up. When Wiley went out for Ubuntu, OpenStack Liberty was there with it, right as the release cycle closed. It is here, it is available. As we look at Liberty, it comes out with new features. All kinds of new features that we may want in our own environments. Maybe some better IPv6 support. It's got some improved DVR support with L2 pops in case you're using that. There's all kinds of new features. There's some stability, some usability improvements, and we're getting to the point where it's getting really mature. So, of course, why would you not want to upgrade? Liberty is here, it's the latest greatest, and if you're like me, you enjoy running Bleeding Edge as much as you absolutely can. Now, juju, if you've actually used your cloud to deploy your cloud, then one of the nice things you find is simplicity with juju. What you simply do is you define which services you want and how you want to relate them together, how many pieces of which service, whether you want it in HA. And one of the nice things about juju is if you define the abstractions correctly as they have done in juju, then upgrades are fairly trivial. Take a look at the OpenStack charms. The OpenStack charms will simply define what service you want from an OpenStack perspective. So if you want to deploy OpenStack some Nova compute, you have a service and a charm called Nova compute. And one of the nice things about that is it comes with a simple config parameter, the OpenStack origin, which says basically, where do you want to actually install this particular version of this service from? And when you change that, your configuration will apply and it'll upgrade your particular cloud in the way that you want. So a simple upgrade from Kilo to Liberty is as simple as just saying, your new origin is now Liberty. And that's because the abstraction has been defined correctly. You can model what it is that you're actually really going after. The upgrades within juju for OpenStack is a rolling service upgrade. We don't download an image and upgrade the entire cloud at once. What we do is we focus on one at a time. So you would actually focus on, I've got my Neutron, my Nova compute, my Nova cloud controller, my Keystone services. You would actually take a look at each of the services that are deployed within your environment and you can do the rolling upgrade. And the nice thing about that is rolling upgrades minimize the risk to you when you can do one at a time. Most clouds today with the maturity of our OpenStack is you can actually communicate in two different version levels back and forth between. And so if you get stuck part way through your upgrade, even though you want to get all the way through, you can at least stop where you're at, debug the one service that's actually not upgrading or you're running into problems with and then move on. So that's a nice thing. And with the juju charms, with the OpenStack charms and with juju in the way that we deploy it, you get really minimize downtime because all you do is you say juju set, Keystone, OpenStack origin equals wherever. And so what this is saying is I've got my service Keystone that I want to change which version it's actually running from. And the nice thing is, as you've told us at the service level in which juju the services will each get a hook that fires and they'll all go and see at the exact same time and all units across the service get upgraded. So if you've got a three-unit Keystone cluster deployed, when you say I want to be on Liberty, which is what this is saying, I want my Keystone service on Liberty, it will go out and it will upgrade all pieces of that cluster at one time. Now, there's some minimize downtime with that, right? Because as soon as you say go and I want to change my OpenStack origin, I want to do the upgrade, give me up to Liberty. There's a little bit of a change there because all the services are going to start at once, actually upgrading the software. Each of the API services that are being provided could be restarted at approximately the same time. So until the upgrade is actually done, there's some kind of, you know, it's not quite sure. It's a little racy as far as whether your service is going to be up or down, but they're going to run as fast as they possibly can and get up. So you've minimized the downtime, but it's not completely gone. And some people say that's not good enough. They really want zero downtime. And there's been some really good work done in the Kilo and Liberty Cycles from NOVA and Neutron on their database migration schemes that allow you to do an expand and contract type approach for your database migration. So you can expand your tables. Those tables are still backwards compatible. And you can do a live in place upgrade and then later, at a planned outage later, if you need to, you can then contract it and shrink your database. Within Juju and the OpenStack environment, we've actually embraced the zero downtime because of some of the way that our customers actually approach it. And so one of the things that you see on this last cycle is we actually focused on being able to upgrade one unit at a time. So it looks very similar to what we saw in the previous slide here. So this is upgrading an entire service, whereas this one is upgrading one piece of the one unit of that particular service. So if I've got a three node cluster of Keystone, I can upgrade one node at a time within that cluster, which really allows me to minimize my overall risk and downtime. If I run into a problem upgrading node zero in the cluster, then I can fix that problem, but I still have a highly available cluster because I have two other units which haven't been upgraded yet and are still running fine. So I have no downtime because even though that one guy's down, I still have services that are up. And we focus on this across all the OpenStack charms so that you actually have the ability to roll it across and you can roll out the updates. The other way still works. You know, if you want to get the fastest possible upgrade in your environment and you're willing to take a little bit of downtime, you can just say, all your services go and upgrade and it works. But for those of you who want a little bit more control, who want absolute zero downtime, there's another option that we're going for here as well. Whoops. So demo time. Let's actually show you what this actually looks like, right? So man's got a cloud. That man happens to be me. And my cloud looks a little like this. So it's not a truly HA cloud. So we're not going to get a fully zero downtime upgrade. There's going to be some risk here. But what you can see in this particular picture, which is kind of difficult to see, I've got some Nova Cloud controller up here in the top left. That's highly available so that my Nova API services and things like that are highly available. I've got some for the OpenStack dashboards so that I know which IP to connect to. I've got a cluster there that's highly available. I've got some multiple service units of my Neutron gateway. In this particular configuration, I've got a Neutron gateway that's set up with L3 high availability so that one gateway node goes down. It'll migrate the router over there and I might see a couple packets lost, but it gets picked up on the other gateway and I can continue with my availability there. I've got some stuff. I've got some sender, some basic things there. So I've got different pieces of the puzzle here. And on this cloud, I am running a workload. I am running a, I think the canonical workload that has been done this particular summit is the RSS log real-time analytics bundle that's actually out there in the big data world. I've deployed that just like all the other presenters seem to have done as well. So you can see I've got the Spark and the HTFS and all that good fun stuff on there. And this is just to kind of show you that it is live. It's, you know, it's nothing that, you know, is surprising or anything like that. It's actually there in live. And for fun, I have also deployed a Plex server with my own movies loaded onto it because if you're like me, when you go to actually do an upgrade, you know, some parts of it take a while and I want to be entertained. So I would normally, you know, watch a video or something like that on my laptop and keep an eye on things. But it's a lot more fun when you can run it in the cloud on someone else's thing. So we might be battling a little bit of a bandwidth issue here. I think we'll be somewhat okay. But I will go ahead and start the movie once we actually kind of get started going on. So let's see. This is a script that I'm going to be running that basically goes through the services to do the upgrade. It will do a rolling upgrade for certain services. So any services that is in a highly available state, it will do a rolling upgrade for any of those services that do not have the high availability state. Like I have a single unit of center deployed, it's going to sit there and actually just do a bulk update of that particular guy with the old method of saying, you know, service go and actually do your upgrade. So if I take a look at my current status of things, I see that I've got various pieces out here. One of the things that you'll have noticed is on the far right, you can see that we've got some status about things. You know, we've got the unit is ready, we've got how many USDs for my staff, whether we're up and whether we're clustered, what the actual status of my services is actually improvement that went in on the last cycle to give more feedback to the user as far as what's going on. And you'll see that with some actions going on. So what I'm going to do is I'm going to go over here and kick off a script, this do upgrade, which will go and start doing upgrades in the background and it prints out what it's actually doing. So it's doing an upgrade of the Keystone and it's going to upgrade one unit at a time. And I have saved probably one of the more trivial services, but it's good for the demo purpose of the OpenSec dashboard here. So with an upgrade using this particular method, I can get what actions are defined because these are implemented in Juju actions. So I can actually take a look here and see what actions have been defined for each individual service. So I can say, Juju action defined for OpenSec dashboard. And what I will get back is I will get two actions which are defined. I will get a get reinstall action, which is if you are using our deploy from source, you can actually choose to reinstall the get and refresh based on what you have. Or you can see that there's an OpenSec upgrade option there. And so you can actually perform the manage upgrade that way. Now, before you do that, one of the things that's important to know about this is that the OpenSec dashboard has a setting. So I can say get OpenSec dashboard, type in the last. This will give me the current config that's actually applied out there. And I will be able to take a look at what the origin is from the OpenSec origin. So right here, you can see that I'm running from a, where is it over here, a cloud trustee kilo proposed. So I'm running off of a proposed branch of the cloud archives kilo versions for trustee. And so what I'm going to do is I'm going to go ahead and change that up to liberty for this. But there's one more thing too. If I set that value, it would be just like saying, okay, go upgrade the entire service all at once. So there's another option that I need to set here, which is the action managed upgrade, which says, even though you can do it really fast on your own, I want complete control to be able to roll it over automatically and do it myself. So I will currently set juju, set OpenStack dashboard, action managed, upgrade equals true. And what that does is that basically just tells it, when I change the next value, don't go out and update everything at once. Go ahead and do it on your own good time. So I can, I can now say OpenStack dashboard, OpenStack origin equals cloud trustee liberty. And I can set it, but it's not currently going out there running. I can even look in format equals tabular. I can see here, if I look at the current status of it, that it's not doing anything. Everything's just ready and clustered. The unit is ready. And it's basically waiting there. So now what I can do is I can actually go and upgrade a specific unit by juju action, juju OpenStack dashboard, OpenStack upgrade. And what this will do is when I submit the action, it's going to give me back an ID for the action that it's done. And this is basically just saying, okay, it's an asynchronous action that we're going to take care of. Here's the action ID that you have. So if you want to know anything about it, you can juju action fetch the actual current status of it. You can see that it's currently running. If I went back and actually watched the OpenStack dashboard output here, the status, if I took a look at it again, I can actually see that our unit is actually currently upgrading. The OpenStack upgrade is in progress. And you can sit here and watch it and it will continue to upgrade as a whole. Until eventually it will be done. You can sit there and you can watch the action fetch here to see whether it's still running. It'll continue to tell you it's running until it's finished and completed its upgrade. We can go ahead and start and upgrade on another one if we want to. OpenStack upgrade. If I could type it in. And then I can actually check the status of that one as well. Whoops. So they should both be running at this point in time and we can check the actual format of the OpenStack dashboard. This status output again and it will tell us whether it's upgrading or not. So it looks like that first unit went ahead and finished upgrading and it's currently ready. So that should be running on Liberty. So if I just went and checked out OpenStack dashboard slash zero and I can do, I can get in there and I can check what the apcash policy is on it. And I can see that it's now running the 8.00 version which is Liberty version from updates there. So and then we'll want to go ahead and check the status to see if the second note is done and once that's done we can roll it over to the third note. It's still upgrading. I can check on the status of the script in the background. So the script in the background is running like I said before. It's going to perform the rolling upgrades for different pieces. As you can see that it's currently upgrading Keystone. It's finished Keystone. It did a big bang upgrade. So upgrade the entire thing out all at once for the Glant service. It's upgrading service for Nova Cloud Controller and it's starting doing one by one on the Nova Cloud Controller as well. You can see I'm still in here. This is an SSH terminal into a virtual machine with the big data deployment that I actually have there. So if I wanted to do going on to the plug-in zero note here, I can still use it and run it just like normal. I can run the Spark Pi, get some output there. Everything's still running as things are going along. We haven't really been disrupted yet as far as what we're doing. I can check on the status of that dashboard while that's running there. And it looks like those are all upgraded there. Did I upgrade the last one? Which one was the last one I upgraded? Yep. So I need to do OpenStack Dashboard 2 to get the very last one there. And he'll finish that. He's doing the Pi calculations. Still doing some upgrades. He's doing the Nova Cloud Controller, Nova Compute. I can come down here. We can start watching our movie if you guys are interested. Maybe some Godzilla. So this will be streaming. Hopefully the bandwidth cooperates here, but it's a streaming low quality version of the actual video. I'll put it off to the side so it's a little bit smaller here as it goes. Maybe it buffers a little bit as we go and talk to it, but got a nice little video that we're playing and we can check on the status of that as it's going. Now this is the 1956 version. So don't think it's the true Godzilla version. This is the Americanized version, the one that first came with the Godzilla King of Monsters. So we'll let that run in the background while the upgrade is still going on here. We can see what the status is here of our entire thing. So before I was actually mentioning which ones I wanted to actually see the status of, here I can actually see the entire dashboard obviously and you can see what the overall status, which units are being upgraded because we're going to try and report that this one is currently upgrading OpenStack dashboard two, it looks like. How's our script running? It's now onto the Nova Cloud controller, so it's still running off and doing that. Some good bit of fun. So our movie's still playing, so we haven't seen any interruption yet. We can still kind of go on and see that everything's just nice and live. We can still kind of come and access all our stuff, so it's just up and running while it's live. While this continues on, are there any questions that you guys might have? Kind of a field some now, yes? Yeah, I'll repeat the question for the, yeah. So when you said you want to do a rolling upgrade with services that are in HA mode. Yes. You upgrade a service at a time or instant service service at a time. If there's a schema change in the middle and the new service needs a new schema, how does it work? Because other services will still have to get the request by a load right? So that's a good question. So the question was, as I'm doing a rolling upgrade across different instances of a specific service, so if I've got, say, Keystone as a service and I want to upgrade Keystone 0, Keystone 1, Keystone 2, and the schema changes between, say, Kilo and Liberty, how does it actually handle there? So the good news for us is that one of the big things is they have the grenade project within the OpenStack, which is trying to focus on doing the upgrades and ensuring that the database compatibility is n minus 1 at the least, as well as some of the API calls. Now there have been breakages and things like that in the past, it's not been perfect, but because of the way the schemas are updated and they are continuously trying to fight, actually only doing at least n minus 1 support, what you can end up seeing is I can do an upgrade from Kilo to Liberty on the first unit that gets upgraded, and now it has Liberty's scheme, the schema there, and then the Kilo versions can still access it because it is n minus 1 compatible at that point in time. So that's how you can support it on the rolling upgrades of that. So the schema itself is n minus 1 in the sense that Kilo, Keystone, can work with the Liberty schema. Yes, that is the theory and that is, it works for us in this particular case. There are some cases that have been happened throughout the past and stuff like that where there are some issues. So I think from Juno to Kilo, there was a breakage there. But there are things, especially when your critical services, like everyone's service is critical, but Nova and Neutron are really the kind of key critical services for most people. Nova and Kilo in the Kilo cycle, they went to an expand and contract model of their database upgrade. So if you do an expansion model, that new database schema is only adding in columns and information to that table rows. And so even if the new, the older level of code actually access it, it still has all the columns that it needs to and all the data, the data bits in the schema there. And then in the Liberty cycle, Neutron actually added that capability as well. So they also have the expand and contract. So if you were just going with the expansion model, you'd be able to get those minimal downtime, zero downtime type approach. You're welcome. Any other questions? Jason? Okay, so that's a really good question. So there is a service orchestration framework that we find interesting for things like this for doing coordinated type upgrades. We are in talks about it, but I can't commit to the fact that we're actually doing that or not because at this point in time it's just a discussion. We are, you know, these same types of things are being baked into, and the capabilities of this are being baked into the autopilot, which will be able to handle that for you as well. You know, having the logic driven there as well. But certainly the service orchestration piece is going to be the key piece for us for actually helping roll that across the different pieces of the pie. So does that answer your question? Okay, great. Any others? If you have the, let's say, Nova controller running in active active mode, and I have Nova compute running on 10 nodes, would you recommend first updating the controller 1, 2, and 3, and then updating the compute nodes? Right. So the question is, I've got, you know, 10 nodes of Nova compute and three nodes of Nova controller. How do you recommend upgrading them? Do you recommend doing them service by service or do you recommend doing part by part? And the answer to that is service by service. Focus on one service at a time. Upgrade your Nova controller piece where you can keep the APIs the same and then start doing your compute. Because really compute is a lot more complicated for many people. Depending on what your actual service is and what your requirements are, the Nova compute itself might need to be, live migrate all the instances off of your Nova compute before you touch it and do any type of maintenance in case there's something that actually happens. I didn't do that in this particular demo, but you know, that is pretty standard for a lot of people to actually evacuate a hypervisor. Even if the application, what is your recommended live migration or would a cloud native application expect to handle a node failure? It's entirely up to the application itself. I don't know how it's specifically, you know, constructed. And it's also up to your business requirements. If you can just upgrade it, that's great. But a lot of times, you know, there may be an upgrade of Libvert or Kimu or something like that that you want to take advantage of getting new versions of that. So migrating that instance off the box allows you to be able to live migrate the Kimu to get new security fixes and things like that and then bring it back in. So, yep, you're welcome. Any other questions? It looks like our upgrade is complete here. How's Godzilla doing? He's still playing. He's going on just fine. Things are getting ravaged. Exactly how we want it to, right? So, it looks like he may have lost something there, but at this point in time, you know, we should be able to have our cloud up and running. We should have, you know, upgrades of our overall horizon here and we can actually see what instances we have running and we should be on liberty at this point in time. Of course, after I log back in. Okay. Are there any other questions out there as it's logging in in the background? Okay. Well, thank you all very much. And if anyone is interested, I am going to put the scripts I use to actually do this migration so you can get an idea out on GitHub. They're not currently out there right now, but they will be up there and I attached it to the bottom of the demo here. So, okay. Thanks.