 OK, that's pretty bright. Should we start? OK, so let's start. Hello, everyone. Thanks for coming so late. I was expecting even less people. It's nice to see you. We'll talk to you about container clusters on OpenStack with OpenStack Magnum. I'm Spiros Trigazis. I'm Felon, Felon Wa. And Felon is also a core reviewer of Magnum. Yes, so I'm the PTL of Magnum for the past three releases, like past two and the current one. I'm also a computer engineer at CERN, and a member of the Cloud team. And Felon is a recent addition to the core reviewer's Magnum group. So this presentation initially will be about Magnum generally. And then we will discuss briefly some issues that we had when deploying Magnum production. And at the end, we'll have a demo each, one from the CERN Cloud, one from the Catalyst Cloud. So what is Magnum? Some of you might not know. Magnum is an OpenStack API service. It means that it offers just a simple API that we offer to users. And they can create container clusters with one click. These container clusters are a single tenant. It means that all compute nodes that are hosting the cluster and running Docker, Swarm, or Kubernetes, or Mesos, they're owned by the same tenant. So the isolation is ensured either by virtual machines or difficult physical hosts. The added value of Magnum is managing credentials for the clusters so that users can access them remotely from the APIs that the container engines are offering. Since this is an OpenStack project, a very important aspect of it is the integration with other OpenStack services. Kubernetes has the Cloud provider that I will explain later. And most people in the OpenStack world know about. And there are also other components, like RexRay, for example, which is to manage sender volumes with Docker directly and using maybe Docker Swarm or just the Docker daemon. The main focus of Magnum is lifecycle operations of the clusters. So we don't interact directly with the API. The interaction with the API is done by the end users, which run their applications on the clusters. And we just create them, delete them, scale up, down. We monitor the health. And we try to also implement upgrades and more advanced features like having clusters in different availability zones and manage the lifecycle of the operations of the parts of the clusters in different availability zones separately. Before going to the cluster template, a term that we have in Magnum is container registration engine, COE, which is also the identifier noun in the OpenStack client. So if you use the common client of OpenStack to talk to Magnum, you must specify COE. And COE stands for container registration engine. And in Magnum, that means Kubernetes, Docker Swarm, Mesos, or DCS. Some terminology of the Magnum service, we have the cluster templates and the clusters, which are the main components, the main objects in the service. The cluster template is basically just an entry in the database that describes the different configuration parameters that users can pass to clusters. For example, specify flavors, specify the image with the operating system that is going to be used to deploy on the compute nodes. Options like volume drivers, network drivers, Docker storage drivers, flavors, and other gate features that we add as labels. And then they graduate as normal fields. Then there is the cluster. A cluster is created by a cluster template and inherits all the options. Plus you can pass some extra ones that we will describe later. So the cluster has a number of master nodes, worker nodes. And in the OpenStack, like in the OpenStack Cloud, this is represented by HeatStack. So all the orchestration is managed by Heat. It's delegated directly from Magnum. And in the HeatStack, they contain lot balancers, private networks, and virtual machines, or volumes. Each cluster, to integrate with OpenStack, the clusters need to talk to the OpenStack APIs to solve this and not expose the user credentials inside the nodes, where maybe the nodes get hacked. Or in any case, they must not put their OpenStack credentials anywhere. It would delegate the powers of the user with a trust user, with a trust user. The trust user, it's a made up user in the Magnum domain. And the trust for this user, it's actually giving to that trust user all the capabilities that the normal user has for that project. Then to talk to the API, we have a certificate authority, which is stored either in Barbican for the more secure solution, or the Magnum database, where Magnum can also accept the certificate requests so that after the user has retrieved the certificate, he or she can also talk to the API of the Container Orchestration Engine securely with TLS. The main orchestrator, as I mentioned, is Dr. Kubernetes Swarm, or Mesos, slash DCS. DCS is just an API over Mesos, but by far the most popular one is Kubernetes. We try to support different operating systems, but the most prevalent is Fedoratomic. Personally, because we use it to turn, and since we validate it, and we use it heavily, other users also use the same one. Other clouds use the same one to have, for us, as a community, to track issues together and find the same bugs and not just spread bugs across districts. We also have implementations for CoreOS, and for DCS and Mesos, we use Ubuntu or CentOS. Apart from spawning clusters in VMs, which is the standard way, for example, to do it in Google Container Engine, with Magnum, users can also deploy clusters in physical machines with Ironic. Actually, Magnum doesn't know anything about Ironic because it interacts with a Nova API. So if the cloud offers physical machines with Nova, with the Ironic driver, this is transparent to the user, and the only difference is specifying a different flavor, which corresponds to a physical machine. The first operations, as I mentioned here, is the last bullet, is a cluster scaling, which scales clusters up or down. So features of Magnum, I mentioned already that Magnum's added value is managing the credentials. So each COE, for example, the Docker swarm, it's a combination of many Docker engines, and the Docker engine exposes an HTTP API. This API, we protect it with TLS, which is where we use a self-signed certificate authority. And similarly, Kubernetes has its own API server, which we protect with the same mechanism, and we just distribute the certificates across all nodes. Actually, the nodes are those that talk to Magnum to retrieve the certificate, instead of just passing in user data or in any other way, sensitive data as the certificate key. Other features include the providing multi-master clusters to have HA. At the moment, we don't have a very clean way to support the HA across availability zones, but with recent patches from a catalyst, users can specify the affinity between the nodes so they can achieve... If the cloud is configured in such a way, they can achieve availability using affinity between the nodes. There is work in progress to support this more explicitly by the user. Recently, we started Simplify Clutter Creation by allowing the user to pass the flavors and the Docker volume size, which is the block storage volume where when a container or an image is retrieved in the nodes, it's stored inside. So since those fields are describing the size of the cluster, we decided to make them fields of the cluster object instead of the cluster template, and as you can see in the bottom right corner, you can just create a cluster with one command that specifies the size of the cluster that you need. So how does a cluster look like? In standard Magnum deployment, the most minimal cluster template and the smallest cluster creation that a user can issue will have this result. Some worker nodes, it doesn't make sense to have only one, you should have more, but you can start with only one master, and all these nodes will be in a tenant network, so this probably is backed by an SDN deployment of the cloud, and then all nodes will have a floating IP. This is the default as the project started, and the most standard way to expose applications inside those clusters is to expose the application from on the host board directly, but this is not optimal. The full feature set of a cluster looks like this, and then I will move to the optimal options. This again has a public network where all nodes get floating IPs, and also private network, which is used for inter-node communication, and it has also a lot balancers for ETCD in case of Kubernetes, or the API for all cases, like in Dockershaw or Mesos, you just have an Octavia or Neutron load balancer in front of it. And also for extra storage in some clouds, VMs get a very small disk, and containers at the same time can be very big, container images, or many containers can be created with many different images, so we attach a volume in each node to extend the storage, which is available for Docker or other container runtimes to store the images. This is the most minimalist-laded cluster that someone can create. Actually, we're very proud of it because although it happened almost by accident by designing the smallest possible cluster, it was one of the latest editions of the Google Kubernetes Engine, which is all nodes to be only in a private network, which is completely isolated. For example, if someone wants to do some number crunching inside the cluster, it doesn't want to expose a service. And actually in public clouds, this is very useful because it means that you will pay less since you don't burn floating IPs and you don't, if you don't need them. This is the optimal configuration for with a single master in a cluster, which means that the compute nodes, like the workers, are in a private network isolated. The master is on the private network also to achieve communication between workers and masters and only the master node has a floating IP. And then when a user runs an application inside the cluster, he or she can create a service in Kubernetes with type load balancer. And OpenStack will create a load balancer with Octavia, give a floating IP to the public network and also create listeners in the private network. So this is the most isolated option because from the internet only with the port exposed for the service users, attackers and even users can talk to the cluster. And this is the last variation which is for like a production application should run like with at least three masters for the control plane of Kubernetes or Docker. An API in front of it, the master's, the worker nodes should be isolated in the private network and all services should be exposed as with a Kubernetes service type of load balancer which will be exposed to the internet. I will hand it to Feiland now to talk about some features in Kubernetes and what we work on. So until Rocky, now there are some features that we would like to highlight. The first one is the Calico support. So currently when user creates the cluster template, user can specify using either Flano or Calico as the network driver. So with Calico, you can get the network policy support in Kubernetes cluster. And we also support the autoscaler for core DNS that means as before, yeah, we just hard code only one port for code DNS. But for now, when you scale your Kubernetes cluster, the autoscaler can help scale your code DNS port as long as we decide it's a cluster to make sure you won't lose the service discovery. And currently we also support a low-based access control for your Kubernetes cluster and the Kubernetes dashboard. And we also support the hipster and influx DB and Grafana, but it's not enabled by default. You have to enable it with the label. And CERN also adds support for traffic ingress controller. And actually, user can also use the Octavia ingress controller, but it hasn't been fully integrated in Magnum, but there is Octavia ingress controller already. And for Kubernetes version, for Queens, Magnum supports v1.9, and for low-key, we support v1.11.x, and actually, we also support v1.12, but it's not default. We probably put it into a stain, the stain release. And for usage, user can either use command line to create your cluster or access the cluster with Qt control. And the dashboard is also righted by default when you create the cluster, it's enabled by default. And here is a list for the work we would like to do in Stain and mostly is undergoing, just need some polish and more review. So for rolling upgrade, generally the patch is in good shape and we are just doing some testing and code review. It should be merged into this release in Stain. And auto healing. For auto healing, we are changing the design, but it should be in Stain as well. For node groups, we are also auto healing. For auto healing, currently there is no patch just in design stage. For node groups, we also help patch Friday for code review and it's also in good shape. The next one is the Kubernetes Kstone house integration. So with that feature, a user can use the role already created in Kstone and reuse the same role for your user to do the house and also in your Kubernetes cluster. So you can see the same feature from GKE, some typical role like the cluster administrator and cluster developer cluster, we were something like that. And we would like also support premises operator and the logging solution. Currently, the design is using IVEK, FluentD, LactiSearch, and Kibana. And another work is we would like to add the heat container agent for all worker nodes so that we can improve the performance for bootstrap the whole cluster, Kubernetes cluster. And we would like to add more straight security rules for worker nodes. Currently, we are probably opening all the posts for worker nodes. In Catalyst, we have got some feedback from our user to complain that. But yeah, it should be changed in stand release. And we are also working on the self-hosted Flannel, that means just put Flannel support as a particle, as running on top of the Kubernetes cluster and deploy a Taylor. And another work we would like to do is release all the Kubernetes Docker image in CI automatically. Currently, we have to manually build all the Kubernetes cluster image and release on Docker Hub. So here are some experiments. I have mentioned that in the talk on Tuesday in the journey in Catalyst to deploy Magnum in our cloud. So the first one is at least for Kubernetes, we want .11. Don't use the overlay or overlay to add the storage driver combined with Docker volume size. For that one, sometimes if you are using both, so sometimes you will see like you can't create any port, any container on the worker nodes. I don't know the reasons now, but that's the solution. If you are using overlay or overlay tool, just leave the Docker volume size parameter empty. And that means for that case, your Docker, the Docker running on the worker node will share the same root disk with the operating system. That means you may need a large flavor for your worker nodes. And another one is the heat container agent multiridance bug. That is embarrassing bug, I think. We fix it in different components related to heat. So if you're running Magnum with, I think, before heat queens, you will run into this bug. You can't deploy a cluster successfully for multi-region environments, because for that case, the heat container agent will try to talk with another wrong heat in different regions. For example, you were running great cluster in region A, and the heat container agent may try to talk to the heat in region B, and it will get a 404 error. See, I can't find the stack, so I just fail. So that means as long as you have multi-region, you'll probably run into this problem. And there is another bug in Kubernetes is in V1.11. There is a bug sometimes randomly Kubernetes, the cobalt will loosing the internal IP and the external IP for whatever reasons. The bug has been fixed in V1.12 and has been charged peak to V1.11 and could be released in V1.11.5, but it's still under review and seems the reviewer don't think it's a critical bug, but actually it's a critical bug. So yeah, that triggers another topic is for some case, you probably need to build your own Kubernetes image. Though yeah, as a public cloud, even a private cloud you want to avoid to do that, but just in case. So that's the experience we have got based on the feedback from our customer and when we run the OpenStack Magnum in our cloud. Yeah, do you want to cover the? So some of the issues that we identified in running Magnum at CERN were mostly coming from the networking part, I would say. The first one was, since this years, we had to reboot the whole cloud two times, and one for Spectre Mildan and one for L1TF, and that means rebooting the hypervisors and also the VMs that were running the Kubernetes clusters which revealed a couple of configuration issues that we had with Flannel. But by default, the configuration of Flannel is to accept all forwarded requests with IP tables, but when we're restarting it, all the routes were un-setting and this was causing in some nodes to lose connectivity for inter-pod communication. So pods in different nodes could not communicate with each other. In the CERN cloud, we don't run the cloud provider because we have some restrictions from the networking infrastructure and we cannot provide a lot balancing as a service. We have other means to provide a lot balancing. I'll say it opens back. And at the same time, we have different storage solutions that we offer to users in the cylinder. But the most critical issue for us was that creating clusters with a cloud provider enabled that mean in every 10 seconds, the controller manager of the Kubernetes clusters would have to talk to Nova Neutron and check the status of the VM and the ports that the VM had and the IPs. And if any of these calls would fail, it would start doing the request every five seconds. So in case that you had, for example, a cluster with 100 nodes and let's say 30 nodes were down, you had different requests happening every five seconds just to verify if this node is up. This node might be down because it was overloaded or even the user stopped it because he wanted to change something or debug something. And this would create like 40% of the load on the open stack APIs. This way, this reason we decided to disable it. Another issue that we had was we were seeing a lot of clusters being created and we were seeing a lot of activity in the Magnum database, but we could not identify what's the actual status of the cluster. That's why upstream we are introducing a centralized monitoring. And before that, CERNU had implemented the wrong solution that was creating all APIs to verify the status of all nodes. So if you're an operator, you might want to monitor what's the actual status, not only if the VMs are up and running. The biggest issue that we had when we started to reach some critical months and some in the usage of the project was configuring correctly the heat service. As I said, the heat service creates all the VMs, but not only the VMs, it passes all the configuration and applies all the configuration in the nodes. And each container cluster node has a heat agent inside that pulls the heat API to ask for if there is something available to apply on the node. So that means we needed to scale at least to four times eight 32 core deployment across four VMs in all our availability zones and bump the RAM of the VMs to 16 gigabytes. But the most tricky part to configure was, we had some issues, one heat was trying to connect to the database instance that was explicitly set up for heat. It was configured to accept 1,000 concurrent connections, but when there was a high activity, let's say around 10 o'clock in the morning that people are in their offices and they start doing some work or even earlier, they started to hit the DB very hard and the heat engine was actually timing out because it couldn't get any connection from the database. I will have posted the configuration instructions in the mailing list and that in our documentation for this. Another issue that we had was that some users wanted the latest and greatest feature of Kubernetes. So they wanted the latest stable or sometimes better release, but others were more conservative and wanted to run in a more validated configuration their services. So we started investing a lot of development time to be able to select exactly which Kubernetes version we want to run in the cluster and I will have this configurable by running all Kubernetes components as containers. And to simplify our lives as operators, we started using only the stock images provided by the Fedora and Ubuntu projects so we don't have to have our own CI to build operating systems and actually releasing operating systems. That was from our experience running Magnum for a couple of years now, actually two and a half and I will show you some demo to prove that this actually works. Okay, sorry. I just rebooted and lost my GitHub. I'll do this one more time. So earlier this morning, I created a cluster with 16 nodes and now I will create one more as to make it, to have it building as a demo what's working and what doesn't in my existing one. Okay, the cluster is building and you can see here that it's in progress and as I said before, each cluster is a stack so you can see also the stacks that are in progress. I show my colleagues clusters. Now the most important part that I mentioned before was that you can retrieve the cluster configuration which is the TLS certificates to talk to the cluster securely. So this is done by a single command. I'm passing force because I retrieved the certificates before and now with minus, minus force, it will override them and if I export my code config, I will be able to talk to the Kubernetes cluster now. I can do version first so you can see here that the server is running 112 and the client that they have locally in my machine, actually it's in my VM, it's 110 and if I list the pods, I won't have any pods in the default namespace but if we move to Kub system so the work that we do in Magnum to deploy components you will see many more. So this is the internal DNS of the cluster so you can have a resolution of services, DNS name resolution based on names of the services that you create. This is the hipster component, the dashboard and all these pods that you can see here is the CSI plugins for CFFS that was demoed yesterday by my colleagues but also CDMFS which is a dedicated storage that we have at CERN. Also these are at the bottom, the Grafana and inflex to be containers. So what I can do also is I can run one and now I have a deployment. Also what we can check is the ingress controller that we have which is in Kub system. It's this ingress traffic but we don't have any pods of this because it has dedicated the node selector which is the whole ingress. So to start using ingress we need to label some of our nodes. So if I show my nodes I can label one with this label and now if I list in Kub system my pods we can see that I have a traffic pod and if we also list the services we can see that ingress is exposed with node port which means that if we create ingresses for the service that we run in the cluster we can talk to them in those node ports using the ingress traffic controller. I think I exceeded my time. I should dedicate more time for the demo. Do you want to do a demo of your cloud now? So this is the dashboard of Catalyst Cloud and currently because we are using Horizon Pack and there is a conflicts between Horizon Pack and the latest version of Magram UI. So we can't use Magram UI Rocky release. We have to use Magram UI pack. So it's not really the latest version but just give you guys an idea how this looks like. And we have got some feedback from our customer about the dashboard is they don't want to get too much options when creates cluster but they don't want to see much information about the template and the cluster. For example, when click the cluster they would like to see as much as possible information from the cluster but we are running an old version. So there are some information missing on this dashboard. And generally for create a cluster in our cloud as we mentioned in our Tuesday session currently in Catalyst Cloud we are using the image from Docker Hub and we are based in New Zealand. So when pre-image and the request has to go out of New Zealand and then back. So it takes longer time. Generally for one cluster it takes around 15 minutes but insurance environment takes about I think 10 minutes. Yeah, so that's the idea. I want to show to create another cluster because it takes long but just show you the dashboard. I think that's it. I think that's it, we don't have anything else. So thank you for your time and I think we have some time for questions. To use what? Contrailer. Yes, we are deploying and evaluating Contrailer and we start also looking at the CNI plug-in of Contrailer to have a big unified networking plane between physical machines, VMs and containers. So we are evaluating it. Most services are in Rocky. Actually Magnum that I'm the PTL is not in Rocky but apart from Magnum, Barbican and Chit everything else is in Rocky. The rest of it is in Queens. So it's Queens and Rocky. What was your... In Catalysts we are using Magnum and Rocky and Chit Queens because the multi-region for all the adversaries. Okay, so thank you. Thank you.