 All right, let's start good afternoon Thank you for joining us. We're gonna talk about fast forward upgrades And easy button to move from open stack Newton 10 to 13. I Want to remind everyone that this is the red hat sponsor track So everything that we're gonna talk about is of course based on the work of the community But we're gonna be very specific about the problems and concerns and issues and how we solve those problems for our Customers, so feel free to ask any question, but just know that that's where we are So Again, thank you for joining us. We're gonna talk in general about their open stack lifecycle You're here about Some of the community projects that we're gonna be Working with and how we build products for our customers and solutions. So we are Maria and Chris and Our names are there We're gonna be sitting here at the end of the talk if you want to reach out to us for any questions also, we have some developers in the audience that have worked on these procedures as well and So we're gonna leave Portion of this talk at the end for questions and answers. So save your questions for the end or start thinking about questions All right, so this is not the first time we mentioned fast forward upgrades in open stack in 2016 we introduced red hat open stack 10 Based on Newton that was our first long life release And we did this responding to our customer needs to remain in the same release that already fulfilled their needs So as we were coming from seven and eight and nine ten felt like a good release that was For most of our use cases feature complete So our customers didn't want to plan to upgrade and have changes to their infrastructure Every six months or even every year They wanted to be able to stay in a stable release that was already good enough for them And we needed to give those customers that wanted stability that out at the same time We didn't stop releasing We also kept up with the cadence of the community and continue to contribute upstream, which is what we do and in fact, we started working on this fast-forward procedure back in the Ocata cycle with proof of concepts and Conversations with a community. Of course, we didn't know what Queens was gonna look like back then But we didn't know that there was gonna be a lot of disruption and changes in the way that we delivered open stack If you remember back in Ocata cycle, there was no talk of containers Some projects like cola started to to come up and introduced But we we hadn't broadly adopted that as part of the community But then we knew that was coming and we knew that we needed to find a solution To help the customers that were already staying in that ten long life release They needed a way to get to then the next long life that we knew was gonna be 13 based on Queens And we needed to Upgrade them through these three releases without staying in the mid releases and those non-trivial changes that happen in between One one of we watch fully containerized open stack services As you can see here opensack 10 still has extended life support And we have seen a lot of customers Adopt that route and they're not also in a hurry to try any fast-forward upgrade because they still have quite a bit of time To remain in 10 at the same time We also have customers that have said that there are features in 11 and 12 That are necessary for their operations and they have already built in Expertise in open stack and they have teams that are ready to perform these upgrades So they have upgraded from 10 to 11 to 12 to 13 that path is not the one that we're gonna talk about today We're gonna talk about the move from 10 all the way to 13 Through the fast-forward upgrade Also, this is not the the first time that we're talking about here back in Sydney We introduced this particular cadence that I'm talking about this long life release Oh, sorry before before we go into that. Let's pull the audience. We have you know a pretty full room And we want to find out if any of you are Running open stack based clouds Please raise your hand That was a trick question just to get everybody to Understand that you have to raise your hand. Okay, so are any of you red hat customers? All right, now, let's try to are you running red hat open stack prior to open stack 10 So are you on seven eight nine? Hopefully nothing before that. Okay. How about red hat open stack 10? All right anyone moving to 11 on 12 All right, great. So some of the ones that are in 12 you either upgraded in place from 10 to 11 to 12 or just Fresh install on 12. So for you the route would be obviously the upgrade from 12 to 13, which we obviously still support We're not gonna cover that you will see some of it Because we do utilize some of the same procedures for this fast-forward upgrade And another thing that you might be interested is the fact that once you're in 13, you can stay there for a very long time fully supported All right, thanks for that So back in of us at Sydney summit We talked about this long life releases and why we were doing this approach and why did was it different from The cadence that the open stack foundation was following which was basically releasing every six months So we explained that while we're providing this long life support We still as red-haters continue to contribute to the to the upstream community and continue to release at the at the at that cadence at the Vaku or summit we had With some developers that actually work on the procedure both from the proof of concept perspective as well as Following through to the actual implementation And we show the developer angle of exactly how the fast-force for upgrade works and how How all the intricacies about what's upgrading when when things happen? And today we actually have we will hear from solution architects from the red-hat Tiger team Their perspective is different. They come from the field They have seen multiple customer implementations and they're gonna give us a demo About exactly how this fast-force procedure works their feedback and their angle is is based on our customer feedback We will leave time at the end of the presentation and you can find us after the session With that Hi guys, so so yeah, just to And again say what Maria just said the one thing that differentiates us from from maybe the presentation You've seen before we are the field people right so we have a pleasure to to work with our Clients and we take the best tech that our smart developers are putting together and we're always translating them the tech into the Business solutions or trying to solve the business problems with it So with that said we want to break the rest of the session between two section The one one section or it's gonna focus on what our customers typically ask us When we go do the workshops for the fast-forward upgrades What do they usually worry about and what do we typically recommend? For them and how do we answer these questions and then together with with there And we're actually gonna show you the process from door to door of the fast-forward upgrade So it's pretty pretty exciting. So these are the the first three questions that we typically get So what what prep work steps are recommended to evaluate fast-forward upgrade? And there's there's a couple things that comes to the to mind right away So first you definitely don't want to do this on your production Environment right away right and this is this might mean as this might be a common sense but you would be surprised how many people just want to jump they're gone and and just you know start start getting it Rolling the updates into production Another another step that's highly recommended this get yourself familiar with that with the OSP 13 Which is gets yourself a mirror of the installation of the of the Queen's release of redhead open stack There's a lot of changes between the Newton and Queens from the configuration perspective there was a huge move for us to move to containerized services and They do require a certain amount of changes a new way of thinking and looking at your services So I would say that's that's probably some of the recommendations I would do there So which versions of open stack are supported by FFU process So we in purpose we asked you guys what versions do you do you guys use? So the FFU provided by by redhead you obviously have to use the redhead tools for the deployment, right? So you you have to you have to run redhead director redhead OSP platform in order to do it and From the release perspective you have to be on Newton release right which is equivalent to OSP 10 Can you be can you do that with that with the release that was earlier? You could but with the extra step of getting to the OSP 10 slash Newton first and then Doing the fast-forward upgrade from 10 to 13 or from Newton to Queens Okay, so how can we how can my ops team minimize the risk of placing my cloud into the bed state, right? So that's pretty important, right? We don't want to Start the process and end up in a in a bad state and not being able to recover out of it So so our recommendation for that is Make sure you back everything up, right? And and we're gonna talk about that a little bit later But there's another session that we're gonna recommend To go to that will focus hundred percent on how do you make the process automatic of backing up your your infrastructure? All right, so moving on to the demo part of the of the presentation. So what we've done Originally we prepared this cooking show style demo for you guys So by cooking style What I mean we we spawned five separate environments and we up we upgrade each of them to the different Part of the fast-forward upgrade Unfortunately, we did some timing and we could not be able to finish this whole process even in the in this cooking show time In in a 40 minutes time frame So we we did the next best thing and we recorded that and tried to trim all of the Unnecessary pieces and we're gonna show you that in the in the videos in a bit So this is the environments. We've been working on for this demo So there's six nodes total three controllers to computes and under cloud and I mentioned about you know the very first step you do is the backup and restore And there's there's a way to back up both under cloud and over cloud so your your installer if you were and you're in your production cloud and a good friend of ours sits on the back there Dan McPherson, he's gonna have a Session tomorrow at 140 that will describe exactly what you need to to do to get your your Environment back tap and in even more in the automated way All right, so in the in the first phase in the first video, we're gonna show you How we're moving the red head open stack under cloud from OSP 10 Actually, this is this is the first part is the minor upgrades of the of the environment So we're gonna we're gonna still go from OSP 10 to the OSP 10 Z where Z is the latest one So who here has actually done any any kind of update or upgrade using triple O? Okay, so you all know the great messages that are that spawn so that that's kind of the stuff we dropped out Which you'll see here so The first thing what you wind up doing is first you have to stop the the under cloud services before you actually perform the update So here where you we we stop all of the services The next thing you'll wind up doing is I'm showing here that we're running on open v-switch to six one Which is what is open stack 10 is based on What what happens is when you run the open stack under cloud? Upgrade it will actually update all of the packages, but it doesn't actually restart anything So at the conclusion of the actual update process, we'll still see open v-switch is still at two six one Which is what requires an actual reboot of the node This is true for your under cloud and your over cloud nodes So when you're actually going in doing the ten dot z to your over clouds You can then do you utilize a live workload migration? To be able to move your workloads around so to keep them up while you're actually doing the ten dot z up update So all those messages flew by but it actually the time there's a time around the top right that is consistent throughout all of the videos So here we're just showing that it's still at two six one and then we wind up rebooting the nodes And then showing that it's at two nine one. So again, this is just the under cloud that we're doing first So the the update for the under cloud runs fairly quickly So that was about 19 minutes to update the under cloud from the ten version to the ten dot z For red hat ten dot z means the latest bits for that that major release So once your under cloud has been Upgraded the next thing you're going to do is you're going to update your images which was used to deploy your over cloud So here I make a backup of the images and then I wound up injecting a password in case something goes drastically wrong So I can get onto the nodes at a later point so We want to update in images and loading those in the glance and then the next step will be to actually Perform the update the ten dot z up update to all of the over cloud nodes So if anyone's actually done a minor update before in our triple O this you're probably familiar with this whole process right here The key thing is the reboot that is required in order to move on to the open v-switch 2.9 So I'm displaying here the command that's actually used to perform the update So it's actually broken into two to two sections The update plan which actually updates the over cloud plan and then then you actually run the update itself Which will update the nodes themselves The the videos here are cropped section sectional videos the actual whole video is actually available also on YouTube from start to finish So again, I'm just displaying the difference between the deploy command that was used and the update plant in order So you can see the that there's you use the exact same environment variables and everything that we used to actually do the deployment One thing one difference with this fast forward upgrade procedure another fast forward upgrade Examples that the community has done not in a red hat distribution is that this update is exactly where we do them Operating system upgrade as well as the open stack upgrade. Sorry update. So that's really important because in other Examples we see that Done at the same time so upgrade upgrading the operating system is usually a lot more disruptive So this procedure allows for less disruption while the operating system is being upgraded and then we move to the fast forward Yeah, so So as part of the the update procedure there you can see that there's breakpoints inserted that so you can Specify which nodes you wanted to actually apply apply the update to again This will not reboot the nodes, but you may choose to update specific nodes at specific times and then What you see here is I just piped yes to the there because I want to update everything for the sake of time in the demo, but in a Production environment when you actually get there you may want to update Apply the update to specific nodes reboot those and then apply and then apply it to other nodes at other times They're in specific maintenance windows So to Maria's point the the repo changes that I switched from That are going to happen at late at the later point That is just the open stack bits repositories that are going to be changed from 10 to 11 to 12 to 13 The operating system repos say stay in place So that's why all of the operating system bits actually get updated as part of the 10.z application the very first one and Again, sometimes you may want to reboot anyway because there's probably a new kernel out there or something So do you want to update the bits for your your OS? so there's as you can see there's lots of useful messages that are Displayed as it's going through that it's in progress The one of the things you'll note here The hash here that is listed for the node is actually the internal hash of the process It actually maps to the last node. That's actually listed there I believe there is a BZ in order to get that to open in order to match that up So at the completion at the completion of this video After everything's updated here, I'm gonna just Jump through all the excitement here So one thing I'm showing here is I'm using ansible to go out to all of the nodes and Display the version of open v-switch on all of the over cloud nodes And as you can see there's still two six one nothing's changed So again in a real-world environment, you would you would specify times and reboot your controller nodes in In a sequential order to ensure high availability and then do workload migration using live migration to move your nodes from one from one Compute to another in order to ensure you maintain consistency. So this is going to go through and reboot the nodes and there I show that Open v-switch has been updated to 2 9 1 across the environment or 2 9 0 sorry Okay, so the the second set of questions that we typically get from from from the customers when we do perform these workshops So how can I handle the workload for this required reboot to enable them to the new version of OBS? Or how do I in general handle the workloads? Whenever any services require a restart So I would say there's two type of workloads the ones that can be migrated and the ones that cannot be migrated So pretty binary answer here But you know if you if you think about it that the very easy answer is if you have a work Like just typical VMs that are not taking advantage of any hardware underneath Directly then the traditional procedure procedures to upgrade in the rolling fashion Either one compute at the time or multiple computes at the time but before reboot You know live migrate or evacuate these certain hosts So there's no disruption to the services for the Unfortunately for the for the VMs or for the services that are attached to the particular compute hosts workloads like SIOV enable VMs Today they don't Allow for the for the live migration So so you just need to plan the outage for these for these particular workloads But then again, we'll give you the option to really be very Prescriptive on which compute host is gonna go down right so you can plan that ahead and and be proactive about it How long does it take to upgrade OSP 10 to OSP 13 right? So we have a When we recorded this video we did this in a in pretty early state Stage of the fast forward upgrade being being out there It was just before the GA and for the six servers that we have it took us roughly six hours Right, so this is a this is pretty pretty long time We know that with the with the updates that came out after that. I believe we got Recently we tested 45 servers in about four hours. So the time went down tremendously And so that that that timing was also doing a rolling up update to keep the VMs up as well So so when we did the test we try to do a more like a real-world test Yeah, but that also brings me to then to the next question Can can you break that the whole you know still lengthy process into the phases, right? Maybe you don't want to take the the big outage of like six hours outage or even four hours outage And and you definitely can you can break the process into into the phases and then Maybe not affect your your production environment as as much as as you would if you had to do everything from a to z in one swing And we're gonna talk about that a little bit later, too Sorry, and one face could be the entire procedure for the update Which is what they're in just show just do that and then wait until your next window And then do the fast forward So with the fast forward upgrade at this point your environment is still up stable and you can still do any kind of Changes to your environment scale it or or what have you once you start the actual upgrade procedure the He goes into a maintenance, but we're scaling you can't do cloud scaling functions and stuff like that All right, so then the next video is gonna show you this next face if you will so we're we try to break it down into That phases that that you could use in your in your environments as well So what we're gonna do next we're gonna take the red head open stack Controllers Director sorry, and then we're gonna move it to for each of the Releases so from 10 to 11 from 11 to 12 12 to 13 So this this would be still as you would normally upgrade from and to n plus one However under cloud upgrades very very quickly Yeah, so the general process of performing the Underclad update Upgrade is essentially changing your repositories to point to the new open stack bits Stopping all of the services Running the upgrade process from one version to the next and then typically there's about five Validation commands you want to run to make sure all the services are up and running So as you can see here, I'm basically changed my repository from 10 to 11 I'm stopping the services and then I'm gonna go ahead and run the the upgrade command So the you wind up running you wind up updating specific packages the triple low packages first and then when you go ahead and you actually run the Upgrade command Secondary to that that's when it actually performs the upgrade of the packages in any configuration changes that are required It's it's the same exact process to go from 10 to 11 to 12 to 13 for the director The only minor change is to go when we go from 11 to 12 We actually and then 12 to 13 we actually incorporated the process of stopping the services When you actually perform the upgrade process in the early days if you didn't stop the processes You could cause some headaches for yourself with duplicate processes running in stuff So these are the commands here that I typically I typically run after the The upgrade process One note that we'll also make later is in 12 The there's one of the Nova commands and Nova cert Damon is deprecated So you will actually see when you do that update upgrade later in the video here You'll actually see that it shows it as disabled. It's still listed, but it's not functional It's been deprecated in open stack so I mean anyone who's any ever patched anything Ever knows that it's basically boring. Yeah. Yeah, hopefully hopefully it's boring Let me jump over here So again, so we did we did the reboot long from 10 to 10 Z and that's That that one is required for the Damon Best practices are to reboot as well for the undercut. It's not it's not mandatory but we did it anyway just because We wanted to account for timing for customers who actually do provide to do that reboot again There's no operating system changes here From one version to the other it that was all done in the 10 dot Z Here you here's where you see the top line there where it shows a Nova cert Is enabled and down, but that's because it has been deprecated All right And that brings us to the the third set of questions that we typically get from that from the clients So are there any alternatives for for fast-forward upgrade from the from the redhead perspective? And I can think of two from from top of my head, right? So the first one would be if you are on OS on any version of OSP We've been supporting the n plus one upgrade since OSP at least OSP seven At least from the director perspective So so this is always always has been there one thing to keep in mind though If you are on OSP 10 and you're trying to move to OSP 11 manually OSP 11 is out of support right now, right? So that that kind of brings an obstacle here And OSP 12 will be out of support in December. Yeah Another alternative is that you know some of our clients inquiry about is Maybe trying to create a parallel environment with the with the vanilla installation of OSP 13 and then Migrate the workloads from from from the old version to the new versions and it's definitely a viable option You know, we don't have as many tooling for that type of migration as we as we have for the fast-forward upgrade but this is Well, we actually had a session earlier today that is recorded So you can go see it where we partner with other companies that actually do this migration for you We just don't have that kind of fully out of the box inside the red hat opens to our product But you know, there's a that's the Trilio. We did that earlier today And then another question around that of how can another alternative is because some of the workloads that our customers are running Need to be fully certified. So need to have a number of partners Software on top of red hat open stack that is fully certified in the whole stack and they have not done that for 13 So some workloads can migrate others cannot so they choose to reuse some hardware Just to install a new cloud and then over time not fully migrate everything But over time start moving in that direction and then new workloads obviously go in the new cloud and then all workloads slowly Transition obviously that there's nothing slowly about the fast-forward the entire cloud at the end will be 13. So Thank you All right, so so the next question is what are some of the main concerns with that with that while the FFU steps are Happening right so probably the biggest concern is we you know that the upgrade does require at least one reboot, right? So there's something to keep in mind. You will have to plan for it And and you will if there is a workload that you won't be able to move around You have to plan for the outage and notify users and And just be aware of that just to be clear the part that requires the reboot is the update piece that happens before This whole fast-forward procedure and that update procedure has been introduced since open stack seven So it's been maturing over time And it's something that actually some of our customers are already using and that procedure itself is something that should be done as part of your regular operation and maintenance Okay, and the last question is there a point of no return, right? So we mentioned a little bit earlier that it's it's absolutely crucial for you guys to take a backups before you get into it And there is actually a point in an deployment or in the upgrade where we Recommend moving forward all the way rather than trying to recover back from maybe some error state and and rolling back to OSP 10 and we're gonna show that in a in the next set of slides where that no of no point of return is So I'm this this will set us up to the next phase of the of the video of the upgrade process So what we're gonna go over is preparing the over cloud Containers so so again one of the major changes we did between 10 and 13 if we containerize all of the services So we're gonna walk you through the process of how this How this step is being done? within the fast-forward upgrade and then of course you're you're there's gonna be some changes to the To the YAML files into the configuration files that you that you have in place So we're gonna cover that as well And in the first phase of this demo, we're gonna move the Redhead open stack controller one controller is gonna get to the OSP 12 So it's gonna get upgraded the database and and the packages are gonna get upgraded to that to that level The rest of the controllers will stay at OSP 10 But we'll have the access to the repositories for the latest and greatest all of the controllers actually gonna have access to Latest and greatest OSP 13 repositories and then the final step in this section is we're gonna bring the single controller That was an OSP 12 to OSP 13 and the remaining two controllers to also to OSP 13 So we're gonna pretty much finish with the upgraded controller layer So the reason why we actually stepped through in upgrade one controller through 10 11 12 and 32 12 And then ultimately the 13 is during that process. That's when the databases and everything are going to actually be Be Updated upgraded. I just want to pause here for really one second Because as you can see here, we copied that we copy templates over One of the key notes the key things we mentioned earlier was doing the basic opens that 13 deployment You can generate your templates by going through that process and get in templates that you know are known working state That's again. That's critical. That's a critical point that I want to just point out because everyone who's ever been everyone who's been Successful in the least amount of time has deployed a 13 vanilla environment first Utilize the templates and what they learned there as part of the fast-forward upgrade process So As we mentioned going from 10 to 13 we wind up moving to containerized services So here what we're actually doing is we run a command that will generate two files the The overcloud images file in the local registry images file the local registry measures file is utilized to Create a registry on the under cloud in our case You could also use satellite server or you could point directly to the red hat repositories So as you can see here it lists all the images out on Access red or access red hat calm You wind up using that to load those into the director and we're using the director as a registry in our demo environment And then this other file the overcloud images file is consumed by Heat as it's doing the deployment for where it should be pulling the images from the push destination There is the IP address and port of the registry on our director node in this in this demo So this is this command that's being run here is actually what's pulling the images down onto the director node So the other other main part of this is that on with the overcloud now The overcloud actually connects back to the under cloud For to access certain services So you do need to add the under cloud certificate out to your over cloud nodes So that it it accepts that handshake most companies use a Signing CA for all both environments. So did they would have they'd be able to skip this step here But there's a playbooks in all of our in our documentation on exactly how to how to do this We're just looking at the time here Like I said, this is a long thing this command here This command here the second line there that shows the Fast-forward repos. This is a YAML file that points to all the different repos 11 12 and 13 That are consumed during the process in order to point the over cloud nodes to the right repositories as it's doing the step upgrade So I'm gonna jump ahead here Just because I just realized we're running short on time here and we definitely didn't want to get through everything So this part here is so we we ran the Up here is where we ran the fast-forward Prepare so that prepared the environment the fast-forward upgrade run is actually where it actually goes through and performs the upgrade of the single controller all the way. I'm sorry. It stages all the packages and prepares it runs the single controller all the way to open stack 12 and Then sets the repository for that single controller and all of the controllers to open step to open stack 13 so Here's where we actually ran the ran the command that actually upgraded the This will actually run an update all of the controllers to open stack 13 So but as Chris mentioned by the time this is done on the entire environment has been Upgraded the entire entire control plane has been upgraded to 13 All right, so I know we're running out of time So I'm gonna try to get the get through the question I know you might have some questions at the end. So the last phase is pretty much Upgrading all of the remaining nodes from OSP 10 to OSP 13 and that includes all your compute and all your step notes if you if use them as well and at the very end we're doing this extra step calling converge where we Making sure that that our templates are aligned with what is actually out there in there on your overcloud so So we definitely want to leave time for Q&A at the end of this slide deck We actually have the a link to all the videos, but so you can actually go and watch this So I'm not gonna play the video because it will it does take it it does take a couple minutes So we'll just move on to the next step, but I just wanted to mention the converge step As you can see we use ansible a lot for the fast forward upgrade So what the converge step does is it pulls your your deployment stack? That's within the under cloud back into a usable state that you can then use the standard Upgrade processes eventually to go from 13 to 14 So we do a lot of thing a lot of updates and stuff and changes outside of the triple O use the Heat so what the converge step does is it gets everything back in line in your deployment plan with what actually occurred So then from that point on it's just like you'd installed up vanilla open stack 13 environment All right, just to just to summarize we wanted to finish with with the Q&A But before let's let us give you some best practices that we That we learned during the the time we had with the fast forward upgrade that in the in the wild So first and the most important get yourself familiar with the OSP 13 deployment, right? So make sure you can deploy the environment that you with the same Variations and the same customizations that you have today in OSP 10. Make sure you can do the same with OSP 13 So that would say if you if you do that you're 50% there Practiced fast forward in your lap and pre-prod before that's pretty pretty common sense I think back up the under cloud and over cloud But not just back up but make sure you know how to restore it to be you know Get help from redhead right where we we're here for you We'll be more than happy. We've done it before so we'll be more than happy to guide you for the process And again plan for it There might be some outages if your Workloads cannot move from from one place to the other there is going to be some some outages and but again You can do it in phases So I think that brings us almost to the end, but we're gonna have Videos posted here and we're gonna share this presentation with you and That will we can still take a question or two if you guys have any. Thank you very much and Any questions? Yeah, so one question I had is it looks like you're all the controllers upgrade to OSP 13 While the computes are running OSP 10 Is there compatibility issues? With the Nova compute on OSP 10 talking to scheduler on 13 or something on the top So during that time we mentioned that there was no scaling and there are no Yeah, there's there's degradation on the on the cloud in general But one thing that we did test was the fact that the data plane continues to still be up So networking routes that were already established can already continue to do to do so workloads that needed to say reach The internet we're able to continue to do so during that time if you can check the video of the talk we did in Vancouver we specify exactly what is the data plane outage and Where are the outage that are happening and when when do we have those breaks? But do we recommend that there should be no control plane activity for no spawning of new That is what is expected during the fast forward piece So data plane is operational, but control plane will take a hit The the workloads that are running in the in the in the compute nodes are are still working Any other question? All right, we're gonna be outside. Thank you so much for your time and fully we didn't run too much over time Thank you