 All right. Welcome, everyone, to the presentation about OpenStack Helm, managing the lifecycle of OpenStack on top of Kubernetes. My name is Alan Meadows. I'm a cloud architect at AT&T. My name is Brandon Joseph. I'm a community team lead for AT&T. I'd like to ask a quick question. I'm wondering how many people in the audience are running Kubernetes in some way? And how many people in the audience are leveraging Helm in some way? OK. And one question I'm just curious about. How many people are leveraging OpenStack on top of Kubernetes in the audience? All right, cool. We'll see if we can increase that number. So I figure before we start talking about OpenStack Helm, it's probably pretty important to just briefly touch on what Helm is. So really simply put, Helm is a package manager for Kubernetes. So it helps you define, install even the most complex Kubernetes applications. And you do so using Helm charts. Charts are collections of files that describe a related set of Kubernetes artifacts. And a single chart might be used to deploy something simple, like Memcache or Apache. But it also might be used to deploy something complex, like a full stack with servers, databases, caches, and other things like that. It has an understanding of interchart dependencies. So it can understand that one chart requires another chart to help build a full deployment. But at its core, one of the things that Helm brings to the table for us is the fact that it allows us to templatize our experience with Kubernetes. So we can introduce overrides as we're installing these applications that are either operator-specific or environment-specific. It also wraps up all of these charts and installations into independent releases inside Helm so that this collection of artifacts inside Kubernetes can be managed as one unit. So we can delete it as one unit, we can upgrade it as one unit, and we can install it as one unit. So that brings us to what's OpenStack Helm. So it's essentially, it's a marriage of Kubernetes. It's a marriage of Helm and OpenStack. So this project, which is an OpenStack project, seeks to create Helm charts for each OpenStack service. So these Helm charts will provide complete lifecycle management of these OpenStack services. But really beyond just creating these charts and being able to instantiate these services, the goal is to provide full lifecycle management for OpenStack services. So that means that it will be a framework that will allow you to deploy, update, maintain, and basically operate a fully functioning OpenStack environment for both simple and complex environments. So users of OpenStack Helm, so they can either deploy individual OpenStack components or the full stack, and then heavily borrows concepts from stack entities and other complex Helm application deployments. So I think in order to kind of set the stage here, we should talk about the history of OpenStack Helm because it has a nice history. But it was just a short time ago where we started in Barcelona. And it was a collection of many companies and ideas and things like that. It was actually born from a company SAP. You probably all know them. They had this idea that there was an easier way to deploy OpenStack on top of Kubernetes. Also, you guys are familiar in Austin, stack entities was born. And that made quite a buzz throughout the community. So what we ended up doing was we worked with the COLA team to talk about this concept. There were a couple of things that were on the table. There was Helm. There was KPM, Kubernetes package manager, that CoreOS was backing. And we needed to determine a path. And since Helm was in Kubernetes Incubator, it was apparent that there was a lot of force behind that. And so we kind of rowed behind that. So this also takes some concepts from Intel as well. They were collaborators in this stack entities project. They had a project in particular, which was interesting to COLA and OpenStack Helm, the folks at AT&T, the entry point. And this handles dependency logic. And we'll kind of talk about this a little bit more. And then it also was born out of our communication with Helm in the Kubernetes side. So we had to reach across the aisle to the folks at Kubernetes and the Helm community and say, what's your vision for this? Can we, we're going to put an entire OpenStack deployment on this, and we're seeing kind of small apps, web apps and things like that. OpenStack is definitely not that. So we wanted to work with them a little bit more closely too. So we have several contributors. And from that point on, I should say that we were dedicated to providing a POC for the community in general and demonstrate that we could do this, because we had the task from our leadership asking, OK, what's this going to look like, and can we really move forward with this? So we did a POC, presented that back, and we were able to prove it out. So we ended up continuing on with the project, with the architectural vision that we're about to go over here, and that's the history. So we're going to do two live demos today. One of them, we're going to kick off, briefly watch it, and we will come back to it at the very end. So we are going to instantiate the Mataka version of OpenStack across three bare metal hosts. Now these three bare metal hosts have already been provisioned with Kubernetes, but that's all that's been done. So we will do that. It'll take about six minutes, and I think one of the things that's probably important to mention is that we're going to do this with what is essentially one command. So we have two configuration files, appropriately named Mataka.yaml and Newton.yaml. And so we'll start with Mataka.yaml and feed that into a tool that will essentially instantiate all of the OpenStack Helm core services, ensure that all the overrides that are appropriate for this environment are taken into account. And we should have a fully functioning OpenStack environment that our scripts will also create a test VM that will get a floating IP, and it will be a fully functioning thing at the end of the day. So let's see if this switch is over. OK. So these are our three bare metal hosts, CLCP demo 1, 2, and 3. Two of them are control plane hosts. They're labeled appropriately that way. One of them is a tenant compute host. In this window over to the right-hand side, we're just doing a kube control show. And we will just run our script. That's essentially using our Mataka.yaml file. We're just using a utility here called WeaveScope to help give us some visualization of what the containers are doing at the end of the day. And essentially, we've fired the full stack of the OpenStack services that OpenStack Helm supports at this point. So we're going to let this run in the background, and we'll just continue with the presentation. And we'll come back to this at the end. And we will do the second part of this demo, which is an in-place upgrade of this environment with a live resource in it. So we'll explain in detail what each of these... We're going to go into detail, rather, into each of these bullets here, but let's just briefly sort of summarize what each of these means in terms of vision for OpenStack Helm. So the most important thing is being agnostic. You know, it's a goal to be image agnostic. So that way, there's a lot of... At least at AT&T, we have a customized OpenStack deployment. I can't imagine that everybody is running no customizations. So whether it's Kola or whether it's Loki or we need to build our own images, we can use tools to build those images thanks to the Kola community. We want to remain image agnostic. We also want to provide a framework that essentially provides independent OpenStack service charts. If you're providing a platform that just wants to leverage Keystone by itself, that should be something that's supported. If you want to just leverage Keystone sender and neutron, that's okay. You don't need to pull in the full stack. You can pick and choose which components you want to deploy. And if you've been working with OpenStack for a while, which I assume all of us have, each OpenStack service has a set of patterns. We need to have a user. We need to create a database. We need to init or sync that database. So we want to have clear, defined patterns that can be repeated in every single OpenStack chart. This is partially what makes the project very fun to work with for developers. They can essentially just copy these patterns and then concentrate on the things like the deployments and so forth. We also want to provide, we want to push the control completely into the edge. So this means that all the deployment logic, the dependencies, the deployment graphs, so to speak, or the main brain that's controlling how things get installed, we believe that should live in the actual charts themselves. So when someone introduces a new OpenStack help chart, they have complete control over how things are operating. A really big item too for us is flexible configuration overrides. Again, going back to being able to have a completely customized solution, we're able to deploy that with a single command. But in order to do that, you need to have some way to override these things. So we provide flexible configuration overrides across any of the service charts and we'll go into further detail on that in a moment. And so really, finally, we want this whole system to work for very small and very large installations. So we believe you should be able to install this on your laptop and it should be able to support 4,000 compute hosts. So when we talk about the immutable container or image agnosticity, we're really just talking about codes, like bucket codes, or codes, a code bucket, I'm sorry. And that is, I just need my OpenStack core services. I need the things to get those running, but I don't want anything additional in the container that's not necessary, like orchestration or whatnot. When we have that, it actually increases our attack vector in the container. It's more to maintain as far as CVEs. So we wanna reduce that a little bit. We also want to consider our decisions in architecture so that way we can, if we're going to remove these things from the container, well, we've gotta have some way to put them back. If we need to create users and so forth, we wanna have a pattern for doing that that's image agnostic also. So we do that with jobs and that's clearly defined in the project. And it also gives you just a single source of truth. You're looking at the chart itself, you can tell exactly what the container's doing, no magic in the container, just code. Let's see. We also want you to be able to use any, to this point, we want you to be able to use any type of base image that you want. Whether it's Ubuntu or CentOS, careful consideration needs to be made when creating the jobs and so forth because there are some uniqueness between these two flavors. And we want to make sure that the jobs make sense and that they'll work across anything. And then troubleshooting the chart too. So if I need to look at a particular job and a job has failed, then we don't wanna go into the container or anything like that. We wanna use native Kubernetes commands to kinda figure out what's going on. And additionally, we want you to be able to turn the knobs in the chart itself as opposed to going into the container and doing something different. That's a really, really important thing for us. So one of the early decisions, the architectural decisions we had to make with OpenStackCalm was how we wanna structure the charts. So we've already discussed that Helm has the concept of charts depending upon other charts. And by leveraging that, one of the common patterns that exists is to wrap up a complex application into a larger meta parent chart. So with all the services being automatically deployed as child charts, they're simply described as dependencies of that parent chart. This provides a simple way to do something like Helm install OpenStack and everything comes up. And from a developer perspective, that's also an easier way to share information between charts, which is especially relevant for OpenStack. But it comes at a cost. First, it assumes that you wanna deploy the entire stack. If you just want several elements out of the larger OpenStack meta chart, you're out of luck. While Helm supports installing different Helm charts into different namespaces, the parent chart concept forces them all into the same namespace of the parent chart. And finally, what really was most important to us and would be difficult to correct in this model would be that from a production operator perspective, updating any of the child charts requires that you go back and update the parent chart. And while most of the time, a change to one of the child charts should result in only a change there, there's a risk that you may actually inadvertently affect something else. And we needed a very guaranteed level of impact. In other words, when you update the glance Helm release, we wanted to be very sure that that was not impacting any of the other releases at the end of the day. The path that we chose for OpenStack Helm was an independent chart approach, which essentially means that every chart is self-contained, they're independent, their installation results in a single Helm release for each component that can be independently maintained. The immediate disadvantage, though, that we realized with this is that without the mega chart orchestrating all of this, the operator, or some set of automation, needs to manage the installation of all these charts and all the overrides and other things that need to be fed to them. Likely a lot of this is the same information shared between the charts, and we developed an approach for that that we'll talk about shortly. Obviously, one of the immediate benefits we realized with this is that we can control, at a granular level, the placement and duplication of each of these services in different namespaces with extreme granularity. So this opens up all sorts of hyperscale patterns where we can have a MariahDB cluster per OpenStack service. Each service can have its own RabbitMQ cluster. It also means that platforms looking for just one or two OpenStack components, obviously, can leverage this particular approach. And again, finally, most importantly for us, from an operator perspective, it meant that we could guarantee that an update to one particular component, that update was only going to that component at the end of the day. So this gets into reusable patterns, and this is, I think, one of the coolest things. There's a lot of cool things I like about the project, obviously, but the feedback we get most often when working with developers and so forth is that the charts are fun to work with. Part of this is because the reusable patterns. So to start off with, Helm itself has a reusable pattern, right? So there's a specific set of files and directory structures that are kicked off that you can start with. Another thing is that the jobs that I spoke about earlier, you know, these common things like a DBNet or DB Sync, et cetera, we have a Helm toolkit, which is a set of functions that basically allows you to repeat the jobs per each service and then override those in the values file too. So if I want a particular user for Glance or a particular user for Horizon, I can essentially just use the same jobs and override those in the values of a ride file, and they're deployed exactly the same. So it's reliable, it's reusable, et cetera. Also, you know, from this, you know, this was created by developers, obviously, but we care about the developer experience. So this reduces the amount of technical debt that a developer has when getting started with the project. But that said, it is a deployment project. So it's not just for developers. I mean, if you have a DevOps mindset or whatnot, it's very easy for you to get started and start contributing to the project. If there's a particular chart that you want included. We also have a common bootstrap model. So we all know that, you know, when you instantiate an open stack cluster, you need things like flavors and images and things like that. You need things to kind of get started. So again, going back to, you know, the problems that AT&T has experienced from a large operator, you know, we want to be able to inject these custom bootstrap items at initial runtime. So, or as these things are being deployed. And so we've created a bootstrap model that also allows you to create your own custom flavors and so forth, right in the values file. That's really nice for operators because they have to look no further than that values file to get started. Also, there's clear patterns within the values file itself. So resource limits for Kubernetes, those are in the same place. If I had to find my users, same place. Config, same place. So there's a pattern for you to get started that way you don't have to look for, you know, some overrides in some sections and overrides in other sections and so forth. And they're all relatively, they look the same. And then also an important one too is, again, going back to a scale thing, we want the Kubernetes services to be, we know that we're gonna have hybrid cloud situations, right? So we may have an open stack service that is hard to roll off of or whatnot and we want to be able to communicate with it. So we allow you to use either an Ingress or Ingress controller if you're familiar with Kubernetes. It seemed like there was a large amount of people that have used Kubernetes so you're familiar with this. Or a node port, right? There's some cases where you may want to use node port, use your own load balancers, et cetera. That's completely fine. These are things that can be overridden again and the values file making it a lot easier for operators. So we mentioned that an important philosophy that we have is control at the edge. Couple slides ago and the primary motivation behind this is that we want to avoid the introduction of a centralized controller that needs to be updated with any change in the inclusion of additional open stack services or the introduction of a brand new service. Examples of how this control is actually defined at the edge is the upgrade philosophies under the control, in other words how the rolling updates work in Kubernetes. That is actually defined within each of the charts themselves. The dependencies that that particular chart needs are all defined in the chart so it knows how to wait for the things that it needs in terms of external dependencies before actually the real Kubernetes work really launches at the end of the day. So examples, I mean, and finally given the distributed nature of open stack, the same, so the endpoint lookups, these are also something that needs to be under the control of the actual charts themselves and what that allows is that each of these charts can essentially point to unique and individual infrastructure services that they need to. So we mentioned that hyperscale model before. It's necessary that the actual charts are in control over what endpoints they're pointed at in terms of message queues, other open stack services and things like that. So we have full flexibility in creating any kind of open stack installation at the end of the day. So there's common theme that we need to provide flexibility, agnosticity, all these things to give you guys, the operators and developers options at the end of the day. So this gets into providing flexible configuration override. So now our values are, we can override those. Those are now flexible. We can define resource limits and things like that. We have a nice pattern that's been established, but what about the configs themselves, right? So we have highly customized nova.conf file or maybe we've got some specific sender back ends that we're trying to configure and so forth. So what about these configs? Because these can get unruly. Well, what we've done is allowed you to override these as well. So we have a tool that's been created and it's in the tools directory in the project. It's an Oslo GenConfig utility. And what that does is it will create value overrides for configs, your any files, your policy.json files, these things that would talk, typically you would need to be highly customized, right? You need all these knobs to be able to turn and create your own deployments of each individual services. We run through those and apply values to those. So then you can take those values and then override them in the main values, the values manifest. That's extremely powerful. That puts all the power in your hands to do whatever you need to without actually touching the chart. And that's the main concept, right? Is that we want you to be able to have every knob to turn without actually touching anything except the values. If you're just an operator, you wanna use the project, it's easy for you to get started and do this. And that's a very powerful thing. Let's see. Additionally, there's the concept of you can use partials. You can prepend or postpend if you wanna use that. Or if you wanna just ingest an entire config, yeah, no problem. If you don't want to use the values override and that's maybe not flexible enough for you and you want to, you already have your configs and you just wanna carry those over, yeah, that's fine, you can go ahead and do that. So we wanted to make that as extremely flexible as possible. The next is, and I was passionate about this in the very beginning, is working on small and extremely large environments. So we've done several different ways of doing this. MiniCube we used for a while. And then we kind of like took this on ourselves a little bit, but the message is the same. We want the deployment to be exactly the same, barring like back ends and things like that. On a developer's laptop, as it is in production. And that's extremely powerful too because now the developer can turn all their knobs, they can see things, we've got a guy that his laptop boots up and it's just instantly there. Like that's how much he works on it, right? So that's extremely flexible too. If you're a developer and you start working with the project and you're going to your leadership and saying, hey, this is something we really need to consider, that makes it really powerful. You can take POCs and demonstrate that elsewhere. It's easy for your developers. And then you're guaranteed that if it worked here on your laptop, it's the same thing, really just replicated now in production at Hyperscale. So with all this flexibility in all these independent charts and all these overrides and possible customizations, we believed early on that there should be a solution for managing a collection of charts and all these parameters that they need to be fed. So Helm thus far is really primarily focused on, focused itself as being a command line tool operating on one chart at a time. And we really wanted a way to declaratively define what we wanted to achieve, source control that and point that tool at the configuration and just have it drive. So we developed Armada, which is an open source tool to do this for us. So at its core, it's simply a replacement for the Helm client. It still works with the Helm engine that's running inside Kubernetes, which runs as a pod. And we simply wanted to provide more programmatic control over Helm. So a typical Armada YAML configuration file consists of defining several charts, other dependencies, all the overrides that need to happen for each of the Helm releases. And we also expanded some of the flexibility to support things like reaching out to get to fetch the chart source and included other options like deleting Kubernetes resources that need to happen during an upgrade and other workflow type things. Eventually, we'll expand it to include things like potentially supporting things like backing up databases prior to upgrades and other things like that so that you can really outline the whole procedure and flow for how you want your upgrade or changes to take place. So it's worth touching a little bit about on how OpenStack Helm handles the complex initialization process that OpenStack services require. So this is obviously something that the demo just went through. Many things need to be managed and done in the right order. So we need to create databases for services, grant access to specific users on those databases, set their passwords. We need to create service users inside Keystone, set passwords, roles. We need to create Keystone services, create endpoints and create endpoints for different types of endpoints, internal and admin and public. And all of this needs to be done in the right order, needs to be done idempotently so that if one of them fails, it can rerun, which is the native Kubernetes way. And it needs to be able to support us running it once during installation and then again, when we actually upgrade that environment. We do all of this using Kubernetes jobs. So these are very granular jobs that just do one specific thing. In other words, create a database or grant a role on the database or initialize a new user inside the backend message queue system that this particular service is supposed to use. So what's next? You know, if you saw David Aaron-Checks talk from Google, you, like, we need people. We need people to use it. We need people to develop around it, give feedback. We're working closely with, you know, all sorts of groups, Intel, SAP, the Kola folks. In the very beginning, we started this as, like I said, a POC in November of 2016. And here we are, we've got many charts up to Magnum. And we're at a .5 release, so we're still very, very young. That said, we're very confident in the project. And to give you a sense of timeline, you know, this is an OpenStack project. So, you know, we wanted to show that this isn't an AT&T project. This is a U project. This is an OpenStack project. That said, internally, our timeframe for using this is 1710. So in October, we want to use this fully so that should give you an idea of where we want to be and where we want to take the project. But that says nothing on you bringing yourself to the project and giving us feedback and so forth. So, and just quickly, a little update about how AT&T will use this product when it eventually rolls out, is that we have 80 plus zone deployments. Each of those zones actually include two OpenStack regions. So really, we're talking about 160 OpenStack installations. We're targeting 100 plus, which what that translates to is 200 plus OpenStack production installations. So this is a lot of installations potentially receiving OpenStack Helm. Each of these installations are complex. They run carrier grade workloads. They span from small sites all the way to very large sites. And our availability today is fairly high and we intend to keep that even after the transition to OpenStack Helm. Do you want to show the demo as I do the open source? Yeah. Okay. We want to show you the demo because we got more than just starting it. So, us in open source, like AT&T and open source. You know, we've been in various stages in open source. Our approach now is everything goes in open source and it's made available to everyone, but that's not just like made available. That means that you could take it and use it. Our motto is a POC to demonstrate to the Helm community the features that we would like in Helm core. So it's not something that like we really want to maintain or anything like that. That's something that we want to actually start the conversation. Like this is what we think the Helm client could use. This is what our vision in our use cases and actually that's been really responsive. So our goal is to POC it, right? So we've got a POC, we can demonstrate it now. Now there's something to talk to as opposed to just talking. Sometimes it's hard to get those concepts across to the other aisle, but if we POC it, then it becomes much easier to see the vision. If I could type the IP correctly, we could just verify. So what Alan's doing right now is he's going to start establishing just an ICMP. You guys might have seen this. Some of you might have seen this. We're going to do a token request, a keystone and leave that going in the background. We've got the horizon dashboard up and ready, but the deployment was successful. So while we've been talking up here and so forth, that one command that you saw, it ran through and it installed the entire stack. And this is what you see today. So we can launch VMs. So what Alan's showing is that a VM is actually launched and go a little further. Yeah, so I mean, so in the bottom left-hand corner, we've SSH into the VM that we just provisioned. It's got a floating IP address. It's a public IP. We are connected to it and inside that VM, we're just looking at its network interfaces. We are proceeding with the upgrade from Newton to Mataka. I'm sorry for the other way. We're proceeding with the upgrade from Mataka to Newton at the moment. And then the bottom right-hand corner, we're doing a simplistic example of showing that we have a path towards having the control plane operational while we're doing this upgrade. And obviously we're showing a very happy path. We're doing something very simple like fetching a keystone token. Obviously a lot more work to be done to ensure that we could do something highly complex while this rolling update is occurring. But I think it shows that a lot of the foundations are in place in this project already. I was gonna say, there's a few of you that said that you used Helm. So if you used Helm, and let's say you had a job that completed, have you ever had a problem where the job just never cleaned up? And this is kind of a Helm thing. So Armada actually, it will clean up jobs before it reruns and that's an extremely valuable thing. That's one of the features that we wanted to present to the Helm developers as a POC again, not as just talking about this. So it shows the use case on why we would actually want this. So I think this also demonstrates what's occurring under the hood in some ways. So as we're doing the rolling update, which itself is defined, the behavior of it is defined in the charts in terms of how that should proceed. But we also see components in an init state. And basically they're doing dependency checking to make sure that the things that they need are actually there. In other words, if a service requires keystone or requires a certain database synchronization's job to have run to bring it up to the version that that particular service needs, it will sit there and wait until that eventually completes. And you guys might have seen some other demos like this. We've been running variations of this demo. But as this upgrade is going on, we're using a Seth backend for Kubernetes. And we actually show a video and the video doesn't drop as we're doing this upgrade on the tenant workload. And that's extremely useful as well. That's what we showed our leadership and our leadership was happy. So that allows us to continue working on the project. One other thing to point out is that in this particular environment that we're looking at, and what comes out of the box in OpenStack Helm is that everything runs as a container. So MiraiDB is clustered. It's running as a container. Seth is providing PVC backends for Kubernetes. It runs as a container. There is nothing in this environment that's not containerized. And the nice thing about this is it just runs, right? So going forward, we want to improve our development process and improve on our gates and so forth to make sure that we're always working off of master. We've learned a lot about managing cloud at scale, some things to do, some things not to do, et cetera. We're bringing this to the community and to the project and trying to work with other companies and individuals who want to contribute to the project to make this better. So our call to action is, again, try the project, give us feedback. Also, the story doesn't end here. It actually goes into Helm. So what we're expecting from the community around OpenStack Helm is to reach beyond the aisle over to the Kubernetes group quite a bit. This is essentially Kubernetes, OpenStack on Kubernetes, right? So we want to engage with that community. They're going to give us a lot of advice. And this is what we feel to be a perfect marriage or a perfect demonstration of the two communities collaborating quite a bit for the common good. And the common good is infrastructure and provided in a resilient fashion. So we have completed the upgrade, the in place upgrade. We're now logged into the Newton version of Horizon with the beautiful left blue hand navigation. And there is our same instance still running. We've been connected to it down below. And so we've completed an in place upgrade. So I'm gonna talk about Newton. Good.