 Good afternoon, everyone. Thank you for joining us. So before I hand off to my colleagues, I wanted to just say a few words and also make some introductions. So kind of to set the context for all this. At Morantis, we believe that the whole disruption of cloud that is propelled by public clouds that we see is really happening in kind of two key areas. The first area largely revolves around a much more efficient way to actually run and operate infrastructure software. And the gist of the presentation that we'll be doing today, as well as our roadmap going forward, really revolves around making OpenStack very easy to operate. And by operating, I mean things like patch upgrade, et cetera, optimize for the most scarce resource that all companies have, which is the operations people. And the second, arguably, just as important disruption that public cloud providers have led is around not just operating the software, but also operating the actual physical infrastructure, the data center, the connectivity. So running data centers super efficiently. And before we go into the software part of the story, which my colleague will be demonstrating. And by the way, good luck to him because he's going to be doing some crazy stuff, which might break. I wanted to also introduce a new partner of ours, NTT Communications, which is the largest operator of data centers worldwide. And we are partnering with them to be able to deliver managed merentous OpenStack in the NTT data centers. So please. Thank you, Boris. It's big stuff. So again, my name is Hideki Ikuri, NTT Communications. And just I'm very excited to get to know Mirantis. Let me put that away. And also get to know how they are working in terms of OpenStack areas. Also, having partnership with Mirantis to serve the global enterprise in any location. Just so you know, NTT is the largest data center group. And we have a network reaching to the 200 location worldwide. As of today, we have infrastructure services in 11 countries stuff. But again, what I like about the partnership of this is that we are focusing on operationalizing OpenStack as well as infrastructure. I mean, in my term, I mean, we've been working on the data center network and the URD services. And the idea is to have those three layers integrated, operationally, so that they could provide customers other services. And then Mirantis come in with their great toolset, which I'd like to see. Actually, I should stop speaking, perhaps, so that I can see this as soon as possible. But again, this partnership would enable all those layer operationalize it, provide the services as a service level to carry the large enterprise as well as meet them in the global company. So I'm very excited. All right, thank you. So with that, let the show begin. OK, good morning. Can you hear me? Yes. OK, so welcome on our session, which names a wall or die. Enterprise Radio OpenStack upgrades with Kubernetes. My name is Jakub Pavlik. I am a former CTO of TCP Cloud, now Director of Product Engineering at Mirantis. And today I've divided my talk into three parts. First, I would like to take you through the steps of evolution of every OpenStack deployment to explain you the reasons and the requirements, why we are doing new fuel architecture, new approach, which will be introduced in the details in the second part. And then, hopefully, I will show you the live demo, several free demo scenarios where I will demonstrate how easily you can manage launch OpenStack, upgrade OpenStack, and do the changes. So stay tuned until the end of my presentation. So OpenStack evolution. Almost every deployment and every people usually think that biggest issue is to set up the OpenStack. And everyone who ever starts with OpenStack thinks that deploy itself is most difficult because it's a complex. So usually set up the goal. And you are trying to find the easy tool which helps you to set up your cluster in 30 or 45 minutes in one hour. So you get it up and running, and you are happy. And you think that your mission was successful. But the second day, you come to your office and you receive the email from the security department or from any other department that you need to patch LibVirt. You need to do the changes in NovaSchedule. So you need to operate your cloud. So you need to have automatic monitoring, logging, backuping, documentation, and all touring around which help you with the operation. And that's the reason why latest releases of fuel started to be more as a lifecycle management tool than deployment tool. So when you set up all these, the last step is the upgrade. And anybody who ever tried how to do the upgrade knows that it's very difficult to do upgrade. And the tooling and manual step and procedure is difficult. Every release is different. So you need to have appropriate approach and lifecycle management for this. So you realize that deploying OpenStack is the easy part. But operating, scaling, and upgrades is the difficult part. If you look at this picture, it's not important what's written here. The important is number of the boxes and the colors. And it represents the complexity of the OpenStack control plane where you start to build in large-scale clouds. And you realize that you will end up with something between six and 23 virtual or physical controllers with host operation system. And you need to patch all those controllers. You need to manage them. And it's not so easy to replicate them, reproduce them. So we look at this approach, and we realize that we can a little bit modify the solution. And why we cannot treat OpenStack as a set of application? We can split it from the infrastructure and launch it as a set of databases, APIs, memcaches. So beat the monolithic VMs and create flexible container microservices environment which enable you to run different releases of components in parallel on the cluster and easily scale. So OpenStack is just another application workload. Now let's clear naming confusion. So you probably realize that Mirantis, there is notions like MOS, TCP cloud, and CP fuel, MK20, MK22. So we introduced Mirantis Cloud Platform, which is next release of Mirantis distribution. It's designed to make operation simple, and it's just not just about OpenStack. But it also covers Kubernetes, Calico, SEV, and other components. And fuel in this part is lifecycle management. And I will show you how looks fuel today. Because what's the important? Fuel in MCP is not same like fuel in MOS. So we did significant architecture refactoring to support day two and day three operation. If you look at the picture of the Mirantis Cloud Platform, we have three outputs what we want to support, the green boxes. So we want to support containers by native Kubernetes for workloads. We want to support for our customers VMs. So we are running OpenStack on top of Kubernetes. And we want to offer bare metal provisioning as well through the ironic as a part of OpenStack. But what I will demonstrate today in the demo is lifecycle management tool, fuel, and how we can spin up and manage those clusters. So let's jump on the architecture. So fuel was designed as a DevOps. It consists from five components. So we have artifactory, which enable us to store any artifacts of package, PyPay, Docker containers, whatever. We have get it as a system for the reviews. So every change what you are doing in the infrastructure goes through the Git repositories. And you can fork upstream repositories and build the stuff. We have Jenkins as an interface for delivering OpenStack, upgrading OpenStack, and also for building specific components. And we are using Solstack as a solution for orchestration. So Jenkins called Solstack and Solstack do the orchestration on top of Kubernetes. So what's the important? I already mentioned that NCP is not just about OpenStack, but it should provide you deploying locally or remotely whatever is needed, your whole data center. So we can produce Kubernetes for workload OpenStack, but also Cassandra clusters, or Chef clusters, or everything what you can need. You should be able to run and iterate and test through this DevOps center. This is how looks OpenStack on Kubernetes, the small site reference architecture deployment, where we usually taking for the small site free servers where we're running Kubernetes controllers. And we are launching OpenStack and support services as a pods on the same nodes. And then we have three nodes here as a compute nodes where we deliver in clip width and Nova compute. I have details of these two parts. So if you check how looks the OpenStack controller, OpenStack is in the Kubernetes pods powered by Docker. And networking is provided through the Calico. So Calico is as a plug-in. And we also here have OpenContrail as reference SDN solution with OpenStack. If you look at the compute node side, you can see on the right side that we are delivering as a container a slip width and Nova compute on the Kubernetes. And then we launching virtual machines. And there is also vRouter as a data plane module for the OpenContrail. He runs in the host operation system. Almost last thing before the live demo is metadata model, which is most important. This solution is, I several times heard something like it's unstable versus puppet or sort or whatever. But the key point is metadata model, single source of true, which is YAML based structure model where you can define multiple sites, multiple deployments, multiple clouds, and it's used as a Git repository. So you are doing changes in the Git and managing your infrastructure as a code. And you have complete audit on your infrastructure and see who does which change. How looks the lifecycle? So on the left side, we have administrator. And administrator wants to, in this case, migrate from Liberty to Metaca. So he changed the line in the YAML metadata model, push into the get it. Someone approved this review. Jenkins trigger automatically update model on the soft master. And then Jenkins trigger commit to deploy, which calls the Kubernetes rolling update, which should one by one replaced a given component, which includes downloading the new images from the private doc registry running in the artifact. And then you will get the deploy reports back into your Jenkins. So we will see all this in the action. So let's move on live demos. So first scenario, like I am infrastructure administrator and I need to provision small site OpenStack deployment. So what I will do here, I will deploy OpenStack on pre-installed Kubernetes cluster, which covers four hardware nodes. Three nodes will be for HA Kubernetes master nodes with OpenStack services. One node would be OpenStack compute node. And we will launching it from the Jenkins deploy pipeline. So if I will show you, like, preview of the portal, how it should look like, we have here Jenkins, GetIt, and Artifactory. It's just iframe in this case. When we check the Jenkins, which is most important right now, you can see that we have couple of tabs. So in this deployment, we are building OpenStack Newton Docker images. We are building Mitaka OpenStack. We are doing Git mirrors because everything must run locally. So everything is mirrored in the local infrastructure to be able to reproduce every build. We have other Docker components, like we're building Calico, building Galera, Alibird, and other components. We also have OpenContrail build for Debian Jesse. Because what's interesting here, our OpenStack components now runs on Alpine Linux. Only OpenContrail runs on Debian still. And then we have deploy pipeline. And what we want to do is we want to deploy the OpenStack. So I have pipeline for deploying OpenStack. I will click on build with parameters. And before that, to see something, I have here terminal where I have my infrastructure. So I have three controllers here. And we will deploy just one compute node here. And solution is completely empty. So this is Kubernetes cluster. It's nothing is here. So what we will do, I will launch the watch command. And we will be watching what's happening because now it's empty. And in parallel, I will launch the pipeline. And we can go into the pipeline. And we can watch what's happening. So first, what's happening here is that it's generating manifest and YAML files for OpenStack. So it generates deployment config maps. When it's done, it's set up the components, which means that it calls and start the services. You can see that now my services is starting. And when it's done, it's starting to enforcing my SQL service. So you can see here he waiting when my at least one Galera node will be up and running. And then he will enforce databases. After he will enforce databases, he will wait for Keystone and enforce Keystone endpoints. If you are looking here, you can see that Galera is running. One node is running. So it should very soon tries to see this and forcing my SQL state. So Galera is up. So he deploying. And now he waiting for the Keystone. What means this? So we have init containers, which checking if the component what he needs to run is prepared. So in case of Keystone, Keystone needs to have at least one Galera node. And he needs to have memcache. So Keystone has a check init container with checking if memcache and other components are ready. And when it's ready, it's starting to bootstrapping the whole node. So my pipeline is done. And now it will take like two minutes before all containers will be up and running. So Keystone is running. Glance is still checking. So we can leave it for a while. And we can continue in demo. And in two, three minutes, it should everything be up and running. So this is how looks the bootstrapping of the OpenStack. If we go on the second scenario, second scenario is I am infrastructure operator. And I need to scale my cloud because now I'm running single replica of each OpenStack component. And I want to scale to three replicas because my load is heavy. So we will go as same way as goes the operator. So we will change the model, push the get it review, approve review. And we will launch exactly same pipeline. So there is no difference between deploy pipeline or upgrade pipeline. It's same. So let me go into my local mirror of the repository. And here I have parameter OpenStack replicas. So I will rewrite it from one to three because we want three. I will here, meantime, you can see that my OpenStack is almost up and running. Here I am on my local computer in the repo. And we can check the diff. You see I am replacing this stuff. So I will do git add, git commit, get push. So I am pushing into the get it. Done. I have new review. So let's go into the browser. And here I should have my changes. Yes, replicas free. Let's jump here. We can check the difference. See that we are doing change in the replica. And we can merge this into the main model. So done. I merged. And we are ready to scale up. So let's go here back. See our OpenStack is up and running. And see that we have just one replica of Keystone right now, which consists of containers. It has Keystone public and Keystone admin endpoints. So that's the reason why there are two containers. So let's go here and go again on my deploy pipeline and build with the parameters. Yeah, I have a small res option here. I can tick this button which skip the enforcing MySQL and Keystone endpoints because I'm not doing changes in the endpoints on database and just launch the build. And we can open again the console. And we can watch what's happening here. So now it's enforcing, it generating, it changing deployment definition from 1 to 3. And then it will apply Kubernetes deployment and it should scale up. So here is the change. And see this? I am upgrading. So now I have my extra two CinderController starting. Maybe question here is like, if it will fail, if I will pull the API, it should not fail because it's working, working, and last one. Yes, so my contrail and everything is up and running and you can see that I have freed up because this is how it looks easy live with upgrading of OpenStack. So let's jump on to the third scenario. I am an infrastructure operator and I need to modify my Nova schedulers. So I want to do change in the Nova location ratio. So again, I will change the model, push the review, and then I will launch the pipeline. But because you want to see what's behind, I will do it manually. So I will skip the job and show you what's our exact steps in the job pipeline because you probably won't know how it works. So and I will not do the get it review because we don't have so much time. So let me jump here and let's check here what's the config maps because we are using config maps. So here are my config maps and if I will put the Nova control or YAML and I will grab CPU, you can see that my CPU location is 16 right now and I want to change it to eight. So let's copy this. And normally I will do it locally, push the get it like last time, but now I will jump directly on the salt master and I will update model local. Yeah, so here it's exactly same repository. See this is where I did the replicas and I will just change the, what we are doing. Yeah, we are changing the control definition of containers. So I will find in this structure, place where is Nova controller and I will push the CPU allocation ratio eight and I have to do one more thing because every time when I am doing changing the config file, I have to increase Nova config version because in Kubernetes, so increase one from one to two because if you don't want to do the rolling update you need to change the deployment so you need to change the version of the config. So we have to do these two changes. If we check the diff you can see I edit, I want eight and I want config version this. So let's go here directly on the cookie note and now normally I trigger job, but right now I will call the state which will just generate a new YAML definition for Kubernetes. Let's see the output. This is what jobs does normally. Two changes, if we go up, you can see that my config version was changed to two and inside of the Nova Conv, I did the change from 16 to eight and there is also one more change which show you the definition of Nova deployment which is definition for Kubernetes where I am changing from control one to control two. Now let's apply. So kube apply F Kubernetes config map. So this will just run through the old generated config map and do the change. And now I will just apply the deployment. That's what job does, that's done. If we check the get pods, we can see that now it's rolling update Nova controller. So it's doing change. So it has to restart the Nova controller and if we hold on for the seconds, it's running. So let's jump inside to believe me that I am not lying. I am inside of the Nova controller and if we check the Nova conv, we can see that CPVolocation is eight. Yeah, so that's working. So that's another scenario of your daily operation and now let's move on the final step which is how we should upgrade. So what we will do here, I will not upgrading whole open stack because we don't have a time but I want to upgrade Cinder and I see it many customer cases when someone needed latest version of Cinder because of drivers for their storage. So we will launch Cinder Newton with Mitaka cloud. So again, change the model, push to get it review, manually run the salt or launch the boot job. And I will do it again manually to understand what's doing behind. So again, in the model, I will open the file and I find the line with Cinder version, sorry, here. And now I want Newton, Newton. And again, I am doing changing in the config. So I need to also increase the version of the Cinder config file to number two, done. If we check the diff, this is like, see we are doing change in the Cinder config version and we are doing change to Mitaka. Now, let's run again the state which will generate a new deployment. So he will change the config, the container. I don't want to use Mitaka now, I want to use Newton. And I also need to update my config because config between Mitaka and Newton is different. And you can see it here. Like this is API paste, Cinder API paste. And you can see what's the difference between releases because it's now changing my configuration file to work with Newton. So you just pick and we support like three releases, four releases usually. And you're just choosing between the components. And now let's see the deployment. So we are not pulling Mitaka container. We will be pulling Newton container from new container with new config. So what I will do now, I will just apply again config maps to generate a new config. Cinder control version two, you can see here. And now I will trigger the deployment which will rolling update my Cinder. Hold on, I will show you for a while. See this, one is starting. This is again in it containers. And we should have running at least one. This is of course with outage because you need to do the SyncDB into database. So there is short outage. But we can try to see this, I'm working. So now I have Mitaka to trust me that I really did the change. We can jump on one of the controllers like here and we can do the check the version of the container. See, here is Newton. So that's how I did the upgrade. So in this 20 minutes, you saw how I easily bootstrap my single cluster which can be testing just to verify that my config is working. Then you see how I did the change in the daily operation change in the config. You saw how I scale up and you saw how I did the rolling update. And in same way I can do the, I forgot, I lost my word, otherwise then upgrade, go back. And yes, so that's it. How simple is it? I don't know how much time I have. So if anybody has a questions, yes. Yes, how it's done, you have init container or build container or entry point where it's sync DB. So when it's starting, it's doing sync DB. So if we check, for example, Nova deployment, see this, see this. This is like init containers. And here I have Nova managed DB sync. So when it's starting API, it will do the sync DB into database. And if you are changing the version, it will upgrade your database schema. So that's how it's done. Yes. The difficult part, there is still like outage of the APIs, like small outage, yeah? And so you cannot, there is two points. Like it depends on the network plugin what you have. If you have here in particular case OpenContrail, OpenContrail version three can support, I think Kilo, Liberty and Mitaka. So you can easily upgrade between those three version of the OpenStack and you are not touching workload because network is completely independent. So only issue is that API is unavailable for, I don't know, it depends one hour. You cannot provision your workload, but all your current workload and application still running without any issues. Yes, the question is, if I am running ClipVert in the container as well, yes, let me show you. I wanted to boot instance, but I didn't have so much time. So here I have compute and you can see we are running ClipVert here. Nova compute and ClipVert is in container. It's same pod, same Kubernetes pod to containers. Yes, and Nova compute use TCP connection to the ClipVert, not so good. If we support multiple version update. Liberty, yeah, this is like the design is that if we want to support, as I said, everything is run on Alpine. So we can run our OpenStack on Red Hat, Canonical, CoreOS, whatever, but only one change is the first is LibVert. So we build LibVert per type of host operation system. Yes, yes, right now this is Mitaka because we want to MCP build on the latest and not support all the releases with new MCP but migrate customers to new stuff. So this support Mitaka and Newton. Yes, that's standard OpenStack feature, DB sync and upgrading. Yes, how what? How big, yeah, so this is like, this is what you see here is a preview. So we are still testing. We tested like with 100 nodes but it's still, it's not still like, tomorrow I will deploy it for the production. It's still in the testing procedure but 100 compute nodes work fine. It depends mostly on the Kubernetes scaling and Docker, yeah. Docker sometimes is not stable in such cases. So each version must be tested before it's pushed into production. Rolling back, this is like question for more for OpenStack developers because like DB sync back, that's the, I'm not sure really, like I can go down with on same release, like you can launch container with some fix and it can cause the failure on, let's say on same Mitaka and you want to go on the previous one. So that's the easy but to downgrade database schema, that's like, that must be manual, yeah, I hope. Or I have to check it, that's, yes? Hey, it doesn't apply so much to the example with Cinder but for something that has RPC pinning like Nova, if you can upgrade the Nova service. Yes. At what point during this workflow do you do the unpinning for example? So you pin it when you upgrade and then what happens? Yes, so that's, you have to do usually in config file so you can change the config map but the container is separated so he has, he using Oslo messaging from the Newton. So it's built with this component. So we are building containers from source. So it's not built from the packages. It's built from the pip from downstream branches. So we have downstream branches from which we build the container with the exact version of the Oslo messaging. And then of course you need to test like pinning version in config file for example to work with the, with the Nova. This was like really simple example. I know that in the real production environment there must be like testing but we had the customer where we run for example, Juno Cinder with Isahouse OpenStack without any issues, without any modifications. Yes? Yes. Good question. So what we have here, the question is how we are handling parameters between different releases and to make them support. So I have formulas, let me check. And each formula, see this, yeah? Support liberty, this formula supports Juno, liberty, mitaka, kilo. So actually customer can also choose if he wants to run something in virtual machines or physical servers and rest in the containers. That's the benefit of the architecture because of course there will be cases where we cannot put everything tomorrow on Kubernetes. So we think about that. And if we check for example, different between liberty and mitaka, we have parametrized config files where we have same values. So you have four for each version because most common thing in OpenStack is that they rename the parameter. One version was out host, url, now it's domain name and domain. So this is the changes. But we are putting same variables there and generating the config files. So it's like difficult to explain in one minute, yeah? But you can stop by and I will explain more details. Yes? The containers you are building, are those open source? Like the, how you built them and what you put in the Alpine containers? It's a good question. I'm not sure if I'm right person to answer this. Let's discuss it later, yeah? I don't know what's the statement for this. But it's building same like most packages, yeah? We don't have publicly available right now Docker images because it's not finished yet. But we are using, everything is open source. All containers are built exactly same as Mirantis most packages eight, nine. But now we are using branches and going directly into containers. But containers is not public right now because it's not finished and it's not like that we want to deploy it everywhere tomorrow. Thanks. Yes? Yes, there is like migration team who works on the way how we will migrate from previous few versions to new MCP. It will be, there will be some general approach and then also because lot of customer has customization we will solve this issue case by case. But that's the definitely plan that we want to upgrade customers from few nine, eight to MCP, this model. I'm not sure if I got the point, the question like how do we manage what? Yes, understood. How we are managing different networks like you have network for API, you have network for the other stuff, for the storages. We have Calico and Calico is fully L3. So we on the control plane side, we are using L3 but on the compute node side we are using host networking. So we actually map same NICs which are on the compute. So nothing's changed, like it's same like today that you, your containers listening on all interfaces. So you can set up as you want. Control plane is not host networking, it's L3 Calico with the service endpoints but compute nodes are same. So we are trying to do it simple as possible to be production ready and good for the operation. Not over engineered stuff. Yes, the performance is better for example in case that you don't have to run three memcaches anymore. Like you can run single memcache because you have replication control. So if, I don't know if you remember but my operation experience is that when I had three memcaches and before especially before Furnet tokens when I power off one of the nodes I have to wait five minutes when each service relies that my memcache is down. So in this case you have one memcache when it's crash it's immediately started in the second and on the different cluster. So it's not anymore like if you lose one server it means like slower than all your deployment because you have, you can have just single. So from this perspective is much better and also we end up several deployments of virtual machines where we run, I don't know, Galera in virtual machines opens like in API in virtual machines to separate services for easy for better operation. And then you have overload of VM where you emulate work animals. So containers are much faster. So it's much faster than normal deployment and easy build manage. Okay, so I don't know what's the time. Probably I have to go. So you can ask me after the session. So thank you very much and I am here. Holy.