 Okay, good morning everyone, let me first introduce us. So my name is Sergey Lukyanov, I'm working in Miranches. I'm senior development manager and the head of a few CCP project in Miranches. So with me today, I'm Peter Prokop from Intel. He's a cloud software engineer in Kvansk from Korea. He's a software engineer as well. So the main topic for us today is to talk a bit about the OpenStack in Kubernetes and what we learned while running it on a scale and different options. So we'll have the following plans for the day and we'll talk a bit first of all about how we are taking the benefits from Kubernetes to run OpenStack. And it's very interesting because like Kubernetes provides really lots of the benefits that we can just reuse and not develop our own stuff. The next thing is obviously what's mission in Kubernetes and what we need to cover for us to run OpenStack. And it's mostly about the stateful services. And then we'd like to talk a bit about how it was working on scale and we have some lighting demo in the very end if we'll have time for it. I think we'll have. So let's proceed. So today we're going to talk about the two projects that were started in parallel. It's a few CCP and Stackinatis. Both of these projects are targeted to run OpenStack on top of Kubernetes and the main goal is to make it Kubernetes native application. Not to run external orchestration or some like black magic out of Kubernetes but really to make Kubernetes application out of it. So the next slide you can see the comments to deploy OpenStack. So both projects using the metadata driven approach where you can specify the topology configs and et cetera in the config files. And they will be just used as like for the infrastructure as code approach or something like this. And all operations are done based on the metadata for the both projects. That's like probably why we threw our site today here and sharing information about these two projects because they are like very similar but still have like lots of interesting differences. Okay, so for guys who don't know what is Kubernetes, like what I can say like for sure is a container management platform that's doing a very, very huge set of different plug abilities, like different networking and different container engines like Docker and Rocket. They mean the policies supporting the different objects and types for running their workloads in containers. So in general, you can think about it about like the OpenStack for containers and the world around the Kubernetes it's open source project started by Google and actively driven by Google so far and like there are lots of companies joining it and there is like showing very, very good growth angle and more or less showing the same story as OpenStack was showing like four years ago probably. So let's, I'm like passing the ball to Quentin. He will talk about how Kubernetes actually using our life easier. Hi, so if there is one thing that I learned since I started deploying OpenStack on Kubernetes is that Kubernetes really makes our life easier. So through a little animation, I will show you and describe you some of the fundamental aspects of a Kubernetes infrastructure. So first of all, we will deploy a Keystone deployment declared Keystone deployment here in this example, which is roughly 100 lines of YAML. So it's pretty simple. As we defined a desired count of two, Kubernetes will automatically start and make sure that two Keystone instances are running across our cluster at any given time. So now let's consider that we have a Cinder container running somewhere. How can this Cinder container discuss with Keystone? So where we have a Kubernetes cluster, we do also have a network overlay. This can be powered by Kana, Flannel, Calico, Weavnet, or even advanced stuff like OpenVswitch. So now technically we understand that Cinder can technically discuss with Keystone because everyone has an IP address. Well, but that's not great because we do not want to deal with these IP addresses and because there are maybe dynamic and stuff. So instead we should write 10 or two 15 more lines of YAML to get ourselves a Keystone service, which is another Kubernetes object. It has a service IP and it will load balance the traffic that is directed to that service IP to any of the containers that are labeled with the same, that have the same labels. So here we have application equal Keystone and version equal one. So that's pretty cool. I mean the Cinder could communicate using this virtual IP, but we don't really need to, we don't really want to deal with any of these IP addresses and stuff that again can be dynamically assigned. So furthermore, Kubernetes will periodically probe any of the running containers to make sure they're healthy and if they are not healthy then they won't receive any traffic and they will be killed automatically by Kubernetes and be risk-aduled automatically. So the health checks can be done by either a simple HTTP probe or more advanced scripts for testing MariaDB and such, for instance. So, well, how can actually Cinder communicate with that virtual IP? Because it's a virtual IP that is not suitable. It's in the 10.3 range, where the network is in the 10.2 range. That's where we have Q-proxy, which is a Kubernetes infrastructure container running everywhere and queries the service IP, like it gets basically every service IP and the list of healthy containers for that service. And then it creates IP table rules with probabilities on each node. Therefore, when Cinder tries to communicate with 10.3 the packets are going to be rewritten to discuss with one of the healthy keystone container on the 10.2 range. So like I said, it's pretty cool, right? But we do not want to deal with any IP addresses. So instead, we're going to consider another Kubernetes infrastructure container, which is called kubedns, and this thing will simply as the service name and the service IP of every service is declared, right? So now that's pretty cool because Cinder can discuss with keystone by just saying, well, query keystone, right? It's the same thing with any of the open stack services, of course. We could go a bit further and say that this internal load balancing should also be external load balancing. So to do this, we declare an ingress resource, which is again roughly 10 to 15 M lines with an host name, a service name and path and optionally TLS certificates. With the help of a third Kubernetes application called an ingress controller that can be, it's essentially a black box. It can run internally like HAProxy, Nginx, traffic, et cetera, but you don't really want, you don't really need to know what it is, right? You just have to know that it's gonna use all the ingress resources to basically expose externally on an external network, a public network, whatsoever you have on a specific port your service. Therefore, you will be able to query keystone from anywhere. So far we deployed, we declared three objects. We declared a deployment, a service and an ingress. It's roughly like 200 lines of YAML. So that's pretty short. And we've got a scalable service with self-feeling capabilities, internal and external load balancing, and also service discovery. That's a little, it's a very little effort. But we will go a bit further. So this is, so Kubernetes is able to perform updates. Basically there are two ways to do it. There's a more standard way, I guess, that everyone would know is a traffic shifting method where we will deploy another keystone deployment. For instance here, I said V2, but it can be just a configuration change. Kubernetes will again create containers for it automatically. And then we will simply change the label on the service to say, well now you have to serve V2, V2 pod instead of V1 services. But more interestingly, you can do rolling updates, right? So we will again deploy a keystone V2, but this time instead, Kubernetes will automatically reduce the number of V1 containers and start starting V2 containers instead to eventually only have our new containers running, therefore being entirely on the new stack. So I've presented some of the most basic bricks of Kubernetes. Creates do have way more features than that, way more advanced features that we're gonna describe here in such a short amount of time. But recently we got some nice features as well over the summer, such as Petsets, which are very useful for cluster, stateful applications such as Galera. We've got node affinity and anti-affinity, which goes into the same pattern. We have init containers as well for all the initialization tasks that we might have to do, schedule job for all the maintenance that we may have to do or dynamic persistent volume provisioning as well, which is quite cool, and rocket support. So rocket is an alternative container engine that has been built from the ground up for composability, security, and built-on standards. You can start using it on Kubernetes with a single flag, so it's very easy. It has various advantages, one of them being the fact that there is no daemon that runs the container, so there is no single point of failure on this, and you can live update the engine. It has also some very advanced security features, including the fact that you can run some containers with a simple CH route isolation, but you might also run containers with a more standard C groups based isolation or even go all the way with a full virtualization. I will now hand on to Piotr who will talk about dependencies, management, and stuff. Yeah, so we talked a lot about the Kubernetes, but OpenStack itself, like installation of it, is not a trivial task. One has to follow certain rules and sequence of the playing services and do some kind of batch jobs as registering endpoint, create databases, et cetera. In Kubernetes, it looks even more challenging because you have dynamic host names and IPs, and Kubernetes was not designed for stateful application and has no native support for interpot dependencies. So we tried to make OpenStack services more self-aware of their dependencies. So that's why we developed an application. We developed applications, so we divided them into two groups. One is a container level dependency management stuff. So there is a few SCP entry point, so we keep it as a Docker entry point or the container entry point. It works on Rocket too. So the state of the deployment is kept in HCD, and each container before launching an exit service is checking its dependencies via key value store as HCD. We also have a second approach, a Kubernetes entry point. So we are not using any key value store, but the application is querying Kubernetes API. It's inside the container directly, and it's checking its dependency through Kubernetes. So there is another approach, a cluster level, dependency management, another orchestration layer. Example is Kubernetes app controller. So user is specifying a deployment graph, and the Kubernetes app controller is following it with deploying application. So we had also problems with stateful application on top of Kubernetes. One of them is clustering MySQL with Galera. So to do it, we just are creating a Kubernetes job as a primary component of Galera cluster, and then using pure finder script to find another piece of Galera cluster. And then when the Kubernetes job, which is running primary components, see that the cluster is formed, it's killing itself. So under any circumstances, the cluster won't be reformed. And to use local storage for performance. Another big thing is RabbitMQ. So we used the RabbitMQ auto cluster plugin with HCD backend. The RabbitMQ is keeping its state in HCD and doing a TTL. So after a brain split, the RabbitMQ note with the highest detail will become master to easily recover from brain split. And thanks to new features in RabbitMQ, you can use in configuration files, IP addresses instead of host names. It's cool. So another thing is Nova Kubernetes drain. So we'll try to make Kubernetes aware of OpenStack deployment and not treat it specially. So for example, running keep still drain command, which is basically putting a Kubernetes node in the maintenance mode. We are catching this event through the Kubernetes event stream and triggering auto evacuation. So further, we are disabling the note in OpenStack and then do life migration all of the VMs from the note. Yeah, I will pass to Sergey. Actually, I want to continue the idea of this slide. The overall idea behind the few CPUs taken that is, I think, is that there was a very important thing to separate the layers of, let's say, bare metal LC Kubernetes provisioning and OpenStack provisioning to make them like self-containing and self-sufficient while not mixing them between each other. That's why we were doing this Kubernetes native application from OpenStack, not just running containers on Kubernetes. That's possible as well. You can run containers in Kubernetes and orchestrate with Ansible, but it's molded as the same if you will be doing like on a plain docker probably, and it's not using the benefits of Kubernetes. And this concrete feature about the draining, after draining the workload from OpenStack on a Kubernetes operation sites, it's actually a very essential part of this stuff because the operator of Kubernetes don't need to know about the workload that is running on OpenStack. And you can just run the usual operations on a Kubernetes site and it will automatically deal with the workload. So the same, I think, approach is available and needed for any workload running on Kubernetes and the separation of layers provides ability to not worry about the upper layers and to not oversee the approach of trying to go in from upper layer to down there. So you only should know what's behind, what's the layer underlay part. Okay, so for the scale, it's specifically about the few CCP. That's what we were able to run so far and it's not probably very, very huge scale. We were running on up to 350 machines with like 340 computes out of them, something like this. So all the test plans used scenarios and results are published under the Scaling Performance Working Group. So you can find fancy charts and more detailed data there. I just want to go through the highlights and some issues that we're facing when running on OpenStack on Kubernetes on some at least small scale. So on the highlight side, we were running in general various combinations. So boot and list and boot and create scenarios for another virtual machines with and without network interfaces. And for the keystone, we were running the authentication scenario to test the RPS as we know from our experience with MOS. It's one of the main thing that's going to block everything on some significant scale. So we wasn't doing any tuning over services intentionally. We were just willing to see what will be the initial results and the complaint to like the usual deployments, it wasn't bad. It was like 20 to 10% worse than the usual deployments. And it's one more time I went to explicitly repeat that it's without any tuning done. So during this process of enabling different tuning for the OpenStack components and Kubernetes itself, we were seeing like significant increase of our performance and we're still not completed. So we will probably see even better performance and bare metal due to some abilities to distribute services. For example, across the nodes and like with few CPCP and stack connectors, you can very, very easily distribute your services. You don't need to run the whole control plane on a single node. You don't need to, I honestly don't know how it's done in stack connectors, but like in few CPCP you define your roles on the go and you can say that, okay, we can run around keystone on these three machines and on one of the computes. And for example, for scale, you need to run additional Nova schedulers and especially Nova conductors. And we, for the scale testing, even try this scenario to distribute all of the control plane services across of all of the computes, just the one compute service per compute. It's not something for production, but it's very interesting that it's very easily possible to be done through such framework. Another scene is actually that for few CPCP, we build the containers from sources. You can build it from just a source folder on your local computer and just update the cloud from it so you can tune both configuration and the code of the OpenStack component on the go. And it takes, for me, I was personally trying to tune keystone on a scale and it was like two, three minutes before the attempts and between the changes. So it's much, much faster comparing to what we were using before and it's only time to rebuild a single container and push it to Kubernetes to make this rolling upgrade that Quentin was covering. Now a bit about the issue that we were facing. I want to say that all of them was not real blockers and some of them was fixed already and some have good workarounds. So I will probably start from the end. When we, half a year ago, started thinking about the test and fuel CPCP on a scale, we faced the first issue. There is nothing on the market to install Kubernetes on scale and we were evaluating different solutions and we ended up with a cargo and we write some small wrapper around it named fuel CPCP installer to just install the Kubernetes and the only stuff for Ryan fuel CPCP. And in addition to it, we were contributing a lot to cargo and the last release at two points or release contains like lots of features to run Kubernetes in a different place like in containers or from binaries with Calico or Flannel. So right now it's a very fancy installer for Kubernetes that supports different operation systems and we tested it on this 400 node scale and we like virtually pre-tested on the 1,000 nodes and done unit optimizations for Ryan and it was 1,000 nodes. So like our next step will be to try all of the stuff on the 1,000 nodes and we like right now thinking about first try on a like emulated environments with probably compute on a virtual machines and then to switch the bare metal to have a more real results as our experience showing that it's very, very different results showing when you're emulating the scale and we'll have any real scale. So the next issue obviously was how to distribute all these dark images when you're building them on a single node and pushing to registry and all 400 nodes don't have this images and in the Iranian services it's I think four or six containers on a compute node that they're sharing some common part but for example, Neutron agent and OVS with VGD share only the first layer like the first guest operation system so they have a pretty huge diff and the Docker registry itself was showing that it's not enough just to run Docker registry and like here we were doing some optimizations on the container image size and another thing that we started just deploying registry inside the Kubernetes and exposing it to all hosts through the service and it's actually improved a lot the way how the load distributed and how it's working in general and we just like mounting the host directory to the registry to increase their performance as well. So it's like on 500 nodes, it's works well for the larger scale we believe that it will stop working and we look on two options first is to scale the Docker registry itself like run a few instances probably try another backends like Artifactory or something else and another approach is that we're looking on a torrent based distribution of Docker images using the import expert or safe load features of a Docker just to distribute the image binaries across all nodes and load them locally to the Docker. Next issue that we were facing starting from probably 50 nodes testing it was a slowness of a cube DNS and the very low reliability so it was like failing showing all data and it was like really not working even on 50 nodes under the load, sure like but I think it was 1.2 Kubernetes half a year ago and with the new versions it was improved dramatically and now it works like thousands of times better and we are in addition to these improvements done in upstream and some of them was done by Mirantis guys working on the Kubernetes team. We run in the DNS mask on each node to cache DNS request to the cube DNS locally on each node and it's a couple of big bypassing to the Docker containers that we're running so it's like additional layer or decreasing the load on a cube DNS. And the last thing that we were facing for the whole times is a Docker demon freezing. So on a scale after some containers restarting and some, I don't know, bad moon phase Docker demon can freeze and is just stopping answering any API calls and here it was like lots of improvements done and the last minor releases of Docker but still it's sometimes happening. So that's like, we're using it right now as a marker of how good Kubernetes survive it. So right now it means that usually only one node will have a Docker frozen. So it's not affecting workloads. It's kind of usual hardware outage or anything else. So we're working on this issue and they're trying to find the root cause and so far we find out that there are different reasons for this thing happening. It's just interesting that they're the same outcome is visible for people when there's like demon freezing and this bug, if you will search for such kind of a bug on a Docker bug tracker, it was fixed. I don't know, like probably a dozen of times and it's still happening sometimes. Okay, so I think that's all from my side. I was willing to share about the scale testing and I wanted to switch to the short demo recordings. It's not really like a demo sets. It's like highlighting, like in recordings of how it works. So the first part is just want to show how the deployment looks like and then how some simple operations like scaling up, changing configs and changing images. Obviously rolling upgrades looks like and we have this recording for after drain feature as well. So let me switch to, okay, so for Phil's CPN was second as as for both we're using the separated namespace to deploy OpenStack and for this recording I was using a 30 nodes lab and drawing a compute on the first five nodes probably, a bit distributed between them and all of the nodes were just used for the compute. So just here I'm showing that there is nothing yet deployed and after that for both tools there are command line interface that you're working with and you need to define the metadata in a config file. So for Phil's CCP it's just simple YAML file that's have information about the way where you need to find the Docker registries, tags for images, namespace for Kubernetes, some configuration like which interface to bootstrap for Neutron, how many replicas for each service to run because it's decoupled from the topology because sometimes you want to run probably like two keystones on the same node just to increase the performance because it's like not always scaling well in one instance. So after that you can just run the deploy commands. I think that for Staconetys it's a KPM deploy and for the CCP deploy. So after running it it will get the DSLs and this config file and compile them into the native Kubernetes objects and it will just push them to the Kubernetes itself and on that point of time the work over CLI is ended, the same for the Staconetys. So the CLI is technically just a compiler from a DSL that describes how services should run on a Kubernetes and just compile it to the native Kubernetes application together with all configuration and all new stuff to run. So I think for both projects using the Kubernetes jobs for the bootstrapping and here you can see like this single time operations, like creating databases, synchronizing them, creating the users and endpoints. So all of them are described as a jobs in a Kubernetes and it's like natively running on Kubernetes and Kubernetes will ensure that these jobs will succeed. And if they will be failing it it will restart them on the other nodes so we don't need to implement our retries for creating endpoints or installation database, Kubernetes will do it for ourselves. So we're using the deployments to run the usual services and the demon sets to run the services that we need to have on each node. So I will probably speed up it because we have it's last time. So here is a view of jobs that are currently running as single time operations. And for both projects we just need to evade for all jobs to be completed to name OpenStack up and running. So in that case you see that all desired status achieved so we can try to access OpenStack with like here I'm using the OpenStack CLI to make some comments to verify that OpenStack works. In general, we have in both projects the readiness probes configured in a Kubernetes for each service. So it's there are exposing it with a real healthness of application under each component to Kubernetes. So when you get in the list of pods, for example, and see the readiness of the pods, it means it's not just container running but something inside the container is real working. So next thing that I want to show in this video is that's how we gonna to make some simple operations. So let's scale up the keystone. Like by default we have a replica one for everything. And it's currently running out of a Newton tag. So we can override the specific image spec for the keystone, for example, to run that it's from a Cata Docker image, run the three instances and enable debug and keystone. So it means that we're doing a single time three operations. We scale up keystone, change it in its config in a keystone Oslo config file. And we're switching the image from the old one to the new one. I just, it's switching to the node that's running keystone to show that it's running the old version. And after that I'll be running the command to apply. So, and here I'm showing that there is no debug logs, only info logs in a keystone log. So the real upgrade will happen. So one more time, like we changing the tag, changing the value for the debug and changing the replicas. And after that, for both statistics and CCP, we just need to rerun the deploy command and it will regenerate all needed state and push it to Kubernetes. And after that it will be natively resolved in the Kubernetes what need to be restart and actually here Kubernetes will run this rolling upgrade for keystone. And as you can see, there is one container with the old version running for nine minutes and the three new ones running for 20 seconds. And when they're starting to show in their healthy, as a healthy in the Kubernetes, it will just kill the old one. So after that we can try that still keystone works by running the project list commands. And the, after that I'm going to show that logs starting showing the debug. So I've just run the Qubectl log for one of the running container since now have lots of a debug lines. And like that's kind of everything that I was willing to show on that side. So we already, I think they have only five minutes till the end. So I will probably not show that says after drain stuff. It's just working very obviously by running the Qubectl drain node and the workload will be automatically migrated from the node using the life migration feature of OpenStack. Another scene that we need to show is like legal notices and disclaimer. Hello, Intel. And that's like last slide we have about the like some useful links and our names one more time. So like sense for attention, like some questions. So we were running it on our private labs. Are you talking, do you mean like hosted query? So on our private labs, there was no access to internet. So the hosted one wasn't working and we were not trying to, like we were trying artifactory, but artifactory showing not very good performance for Docker specifically. It's like it's not supporting divs between the common layers. It's fixed right now, but half a year it wasn't supporting. So few CPUs currently require 1.4. We, whether it depends on some features that introduced in 1.4, I think the same for Staconitis. I think like for such kind of projects, it's very important to follow the latest release and the leverage the last like features. Yep. So in Phil's P we're not right now using the Petsets. We almost have a support for them. And we are thinking about running Memcached D as a Petset to like natively use the direct names of each of the hosts. And the Petset provides this ability to have static names for all instances. So you can just use these names in a services of OpenStack without a loan balancer that directly. Right. Staconitis doesn't either use Petsets just yet. We are using some different ways as pure describe it with scripts and stuff. But we do plan to integrate Petsets like in the coming weeks as it just got released. Yeah. Actually in 1.5 there is most probably updates for Petsets will be merged. So without the Petsets updates, it's not very interesting to use them as a limiting the functionality or upgrading. Right. The same actually with the DemonSets. We like if the updates for DemonSets will not be merged in 1.5 we just migrate everything to deployments and we'll use only them in PhilCCP. So in PhilCCP we're using just a third party installation of CEP. You just need to, if you want to have like a senior volume with a CEP backend or glance with a CEP backend or Rada's gateway. So we run on all of this stuff on Kubernetes but you need to provide access to the external deployed CEP. So we're not. So sorry. So yeah, with technologies it's essentially the same you can or not use CEP. You just decide in a parameter as a single parameter and then you may deploy CEP as part of communities or not, like depending on your existing infrastructure or not. Any other questions? All right. Yeah. Right. Right now it's Takanete since we've seen that we only support CEP. Yeah, same for PhilCCP. We just know CEP better. You can always like make a pull request for it and we'll happily to update it. Yeah, we don't know any technical reason not to do it. So as I was saying in the beginning of the talk we like both projects have a goal to have a separation between layers. So we just don't care about what's kept on the bare metal level. And we just expect that there is a Kubernetes available with some list of prerequisites installed on the host operation system needed to run OpenStack. What's the next layer is managed like separately. Right. So we do assume that you already have a Kubernetes cluster running with all the nodes available, what not. Now, so this is for the Takanete and PhilCPP part of it. But of course, Kubernetes does have discovery for bare metal nodes, et cetera. So at Chorus we will mainly use IPXE to bring up nodes and they will automatically register to the Kubernetes cluster. Cool, awesome. Thank you.