 So, hi everyone. I'm Spiros. I'm a software engineer at CERN, and I'm a core developer in OpenStack Magnum. So, we'll talk about how we use OpenStack Magnum at CERN, and the use cases of containers we have, and some scalability tests we did with our service and Kubernetes. So, as I said, I'm a core developer in Magnum. We are the OpenStack Containers team. We offer an API service that offers Kubernetes, Docker, Schwarm, Mesos, and DCS experimental as a service. So, with two clicks, you can have a cluster running, and you can dock directly to the API of what it shows, Docker, Schwarm, Kubernetes, or Mesos. So, what Magnum does is it orchestrates compute instances that can be either VM server metals, it creates networks like tenant networks or public networks and a lot of balancers. It also configures storage for container storage or for persistent storage. We also deploy the certificates you need to have a security service like TLS credentials for ATCD, TLS for the Kube API server. And, of course, we have the container native API. So, if you use Docker, you do Docker Run or Docker PS or whatever. And if you use Kubernetes, you use Kube CTL and Mesos, we use Marathon. And DCS has its own UI and its own API. So, Magnum mostly focuses on lifecycle operations. So, the current available is create, delete, and scale up and down the cluster, and we have more in progress. So, this is the architecture of the service. So, on the right side of the screen, we have the Magnum user that creates a cluster, which has a special cluster driver. So, if you are an operator, you can customize your driver and do modifications of how Kubernetes or Docker is deployed. And the orchestration service, which is hit in OpenStack, it creates the cluster. And then we pass our scripts with Cloud Meet inside the nodes, master nodes and worker nodes, and we play the service. And then, on the left side of the screen, we have the native API of Kubernetes or Docker, so we can use the tools or directly the rest API. So, the OpenStack has two releases per year. So, the next release is in two weeks. And these are the plans that we have for Pyke, which is the next release that will be in August. So, we want to manage upgrades of clusters. So, it's either upgrade Kubernetes only or upgrade Docker underneath or upgrade Docker. And we plan to do it like rolling upgrades with node replacement. We also want to support heterogeneous clusters. So, to create an OpenStack always clusters in different availability zones or with different hardware or with different flavor. For example, have a bunch of big nodes and a bunch of small nodes. Also, very soon we will release the Docker Schwarm mode, which is not available. We will use the legacy swarm for now. We're also working on providing a solution for container monitoring that on deployment time with Prometheus. So, I just saw about the operators. We might do it with an operator, maybe. But it's to monitor Kubernetes itself to monitor something else. And we're going to improve the support for cluster drivers. So, we can allow different companies with different use cases, modify their drivers, and customize their needs. And to extend a bit our bare metal support, which is only limited to Kubernetes for now. So, about our infrastructure. This is a screenshot taken this week. So, we run at the moment 60 Magnum clusters, if you can see. But we have very big infrastructure, so we can create a bunch of them more. So, the use cases at CERN. So, for you that don't know what CERN is, is that we have a particle accelerator that accelerates particles in nearly the speed of lights. And we smash them together and we take pictures of them and we send them as events. So, the first use case is batch processing that is distributed system that tries to do event recreation from the metrics that the sensor did. We also have end user analysis with Jupyter notebooks because physicists want to analyze the data and to allow them to do it easier. We have these notebooks so they can do analysis on their browser. Also, use cases for machine learning with TensorFlow, deep learning. So, physicists are more into that. We just provide infrastructure. We also have infrastructure service, infrastructure management, like moving data across the various data centers that are used by CERN users. And then web servers, platform as a service, continuous integration like GitLab CI and many others. So, this is a history of Magnum at CERN. We started to prototype, starting looking into 2015 and in the beginning of 2016 we had the first pilot service and later last year we opened it to all users. So, we modify as I said with cluster drivers a bit the upstream Magnum to support HEP services such as CVMFSE use which mounts data from the LHC and we investigate how to do that with system containers with a comic. So, if you want to have a look. So, this is how it looks like for a CERN user to use Magnum. We have this public cluster template which is Swarm or Swarm High Availability, Kubernetes High Availability. And this is the workflow. So, you do cluster create, you specify the node counts. You wait a bit depending on how many nodes you want. Then you do list and you see that it's create complete. You do one command create config that if it is all the TLS credentials you want and then you talk to Docker or Kubernetes like you do normally in a deployment. So, how is this service, how good is this service that we offer? So, we did two benchmarks, one is to benchmark the service. So, does Magnum, is Magnum able to serve many users? How does it scale? Can create all these clusters? And the second one is what are these resources good to use? Is the performance good or is it low? So, we use the Kubernetes benchmark that the Google Cloud Team released that creates some load bots and some HTTP servers that serve the static file and you scale up and down for that. So, we did the test in two data centers. One at CERN and one at CNCF cluster in Las Vegas. So, our deployment has 240 hypervisors with 13 cores, 100 hypervisors at CNCF. We used similar configuration for Magnum and Hit. Hit is the registration service. At CERN, we used our production service. So, we have more controllers for AbitemQ which is heavily used when you create clusters. But in CNCF, we use the upstream Ansible Scripts. So, you can replicate what we did at CNCF using the Ansible Scripts. And we have a bit small difference between CERN and CNCF. At CERN, we have a flat network. So, all the VMs are in the same network. But at CNCF cluster, we had tenant networks. So, this is the result at CERN for both tests. On the left is how to benchmark the service, how fast you can create VMs, so clusters. So, for two nodes clusters, which are essentially three VMs, one master and two workers, in 2.5 minutes you can have it and we created 50 clusters at the same time. And as you can see, the time scales is a bit stable until 100 nodes, which is about five minutes. But then, we started to see that the scale is linear and for 1000 nodes, we did like 25 minutes, which is still pretty good. But as we noticed, there is still room for improvement. And on the right is the Kubernetes benchmark. So, in this example, we managed to serve seven million requests per second with 500 NGINX servers serving static file and 9,500 load bots hammering these servers. And we had like pretty good latency about, not pretty good, but reasonable, 15 milliseconds. So, about the Kubernetes test we did at CNCF. We managed to have similar numbers, but we did only, we didn't scale very much just to reach one million requests with 100 HTTP servers and 1,000 load bots. And the deployment of clusters is very similar. For a small cluster, we needed three minutes, at certain we need 2.5, but then we must refine our ransom scripts to do a better deployment of RabbitMQ. So, we didn't have exact measurements about how it performed when we created many, many clusters. So, we were only able to measure that we created successfully 200 clusters than our benchmark to rally broke. So, that's it. I hope you like the presentation and thank you.