 So thank you for coming. I'm Spiros Trigazis from CERN. I'm a software engineer there. And I'm going to talk to you about our container service that we are basing on Fedora Atomic Host and the AppStream project that we are using, which is an OpenStack service. So the OpenStack service that we use is OpenStack Magnum. And I'm also the project team lead of the project. So I must talk about it because if I don't, who else is going to do it? So Magnum is a community project from OpenStack. And if you're not familiar with OpenStack, I will simplify random words that you see that are OpenStack project names. So Magnum is using keystone credentials. It means that it's using the centralized authentication method of OpenStack, which on the back end can be Kerberos or you can use a free IPA or Active Directory. So this is like the entry point of applications that are developed based on OpenStack. Magnum offers different cluster types. So we have Kubernetes, which is by far the most popular one, Docker, Swarm, and Mesos in this US, which are the least popular, at least from our perspective. And the way that Magnum offers multi-tenancy is actually the opposite. It's single-tenancy. Every user has its own cluster, and he's responsible for the applications that are running there. And it's guaranteed that he has control the machines that are running their applications, are running only their code. So it could be virtual machines or physical servers that they own. These are some cool logos from the container orchestrations that we support, with like 70% of the users are using Kubernetes, another 20% are using Docker Swarm, and there are some outliers, I would say, using Mesos or DCS. In our organization at CERN, we have some Mesos users, but they are not using OpenStack. They are deploying on physical servers their own infrastructure. Some terminology about the project, the cluster is made up of compute instances, virtual or physical networks that are managed by the Neutron service of OpenStack, security groups, which are part of Neutron, and block storage based on the Cinder service, on the Cinder project, other resources like load balancers, which are based on Octavia, and on the back end could be whatever solution you use, maybe Open Daylight, Tungsten Fabric, or some solution based on OpenWe switch. And the one that I haven't mentioned, which is a new addition to us, is shared file systems that will have integration with CFFS and the Manila project of OpenStack, which basically it's a client for CFFS. Also, the cluster is where your containers are running. So you see all these entities that are based infrastructure as a single thing, and you talk to the cluster with the API, either it is the Kubernetes API or the Docker Swarmer API, or the marathon API that is used by Mezos and DCS. The project is focusing on lifecycle operations, like scaling up and down, not auto-scaling yet, I like to hope, upgrading clusters and healing or replacing nodes. So usually if a node doesn't work and you have 500 nodes, it's still used to fix a node, at least in our case. And it's much easier to replace it, drain it from Kubernetes or Docker, and just create another physical instance or VM. Also, in each cluster, we try to provide a self-contained monitoring solution. So every cluster has its own dashboards that monitors applications and the other line infrastructure, like operating system metrics and metrics from the applications that the users defined. The strongest selling point from the project is that it doesn't wrap the upstream APIs of Docker and Kubernetes. So you can use Magnum to just bootstrap the cluster and then do some low-level operations on it. But after that, you're using Docker and the Kubectl as you're used to, and the native APIs. Magnum does all the PK infrastructure for all of these projects. Kubernetes has the most complicated one, but also you can run Mezos and the Docker demon behind TLS. And Magnum creates those credentials. And then with a single command that I will demonstrate later, you can retrieve all the certificates required to access securely the cluster. This is the architecture, but we're not talking about it because this is not for this audience. But very briefly on the right, it's the Magnum client and the interaction with OpenStack. And on the right box is all the components that are used in OpenStack, which I'm not going to mention here. And on the left, it's the cluster that has Docker, the operating system, and the applications. And then when the bootstrap of the cluster is done, the user is going on the left and using the rest APIs that are common in public clouds, private clouds, and in solutions that you have deployed on your own. I will mention a few of these features briefly. So what we did a year ago, I think, it's running everything in containers. And this is why I'm doing this presentation, because we're starting using the stock Fedora atomic project. We don't modify it whatsoever. We take the QCAUs or raw images from getfedora.org. And then whatever we need to do, any customization or additional features that we want to offer to the users, they're running in containers. So when I say we, we are operators of clouds that we offer a service. And then the users deploy containers, which are the application containers that are on their web applications or analytics or whatever they want. Also, we added the full Prometheus stack before. We did that before the Prometheus operator was up. So it's something that we did on our own, and it's basically what the Prometheus operator is doing. And we are planning to move on that. We added the upstream Kubernetes dashboard and core DNS, even before becoming general available. Core DNS is the replacement of Coup DNS in Kubernetes that you can use to have DNS resolution in Kubernetes clusters. And another important feature that we are adding, it's not there yet, it's this one, it's cluster federation, oh, sorry. It's a cluster federation, that it means that you can join different Kubernetes clusters in the single data center or Kubernetes clusters in different data centers. And we'll talk about this one in the use cases. And what you're working also on is adding different container runtimes. At the moment, everything is based on Docker and is based on Docker that is running, that it's included in the Fedora atomic host. I'm finishing with Magnum. So why we use Magnum and why someone else might want to use it. If you haven't understand that the moment what it does, but you have experience with GKE or the AWS Kubernetes service, it's a very similar product or project in our case that you can create a product or offer it as a service to users. We are a public organization, so we just use open source solutions for this kind of things. And it makes sense for users that for organization that have organization that have at least five users and when I say users like different applications that maybe more users are managing and you may need more than 10 clusters because if you can name your clusters, it's better and you have some experience, it's better to manage them by hand or with Ansible or with something else. If you have an OpenStack Cloud and you want to add Magnum, accounting comes for free. So it means whatever accounting you have, someone has done with VMs. This can extend to container clusters because each cluster belongs to a project or a user, so accounting doesn't break. Also, I will have noticed that it's a very easy entry point for new users. So when users start to experiment with containers, they install probably Docker on their laptop and try some applications there and then it's very easy for them to switch at least Docker swarm and then when they try to have more than three containers in their application, they immediately go to Kubernetes and they stay there forever. Also, what's very important to us also I'm here, as I said, it's the offering, all the deployment that we have for running Kubernetes and other engines is based around Fedoratomic, so we control very hard what is the operating system that we allow to get into our data center. And some notes from an operator's perspective on what someone should pay attention to when they run Magnum or any other container service that's offered to other users when they're not consuming their cells. The network design needs a lot of attention. So to simplify things and assuming that users have a full open stack cloud with all the features and private internet networks are cheap and they create them as they need, you have a private network per cluster and optionally we have a floating AP on all nodes. We also have clusters that don't have, they're not reachable from the outside which is perfect for compute tasks and not to burn public IPs. And also what is very important to have a complete offering with Kubernetes at least is to have a lot balancing as a service and a lot balancing for master. One case is to have bigger clusters and scale the master nodes and have HA and the other case is that you want to expose services on the internet without using the port of the nodes. Another crucial part that comes on day two is the container registry. At CERN we're a heavy CentOSop, not Fedora, but we started pushing the team that manages the operating systems in our organization to accept Fedora as well. So we have our own Koji and our own packages and we recommend to users to build containers based on CentOS or Fedora and host them in our own registry not doc.io or other registers outside. So we can track down everything that happens in order to enter and when people leave the organization we can still have access to their images and not trying to poke around with doc.io or even Kuei.io and task for credentials. And also this improves latency but in some organization this is not an issue. Also someone must notice that when you provide the self-contained service like this one with Magnum and container clusters in a way you provide software. So we must test and verify that works for us the operating system and since we're using Atomic we have to test only once because we use it for all use cases and we also need to test the Kubernetes containers or Docker containers that are running as part of the cluster. And what we need to also do is upgrade regularly because all the configuration parameters and the way that the clusters are deployed are changed rapidly every couple of months. So essentially it has a very fast page following the Docker and the Kubernetes page. And I have a couple of slides on what the work that we're doing with the Atomic Working Group and how we ended up there. So the reason that we chose it is before going to Project Atomic we were building Fedora on our own. OpenStack has a project called disk image builder that builds Q-Gauss that are designed to run an OpenStack. And that meant that every time that the kernel changed every time that there was a security fix you had to rebuild and test again if the build procedure was proper. And obviously all this work is done by the Fedora project and the send us project if you use send us. So we tried to remove all the specificities that we had in the base image and we went for Fedora Atomic which is like the minimal layer. And when we started Kubernetes goals also included in the image and that was convenient but then when you wanted to bump versions it wasn't very convenient because packages should be built and then include it in a Fedora Atomic release and then published on GetFedora.org and then we had to test again, et cetera. And when the Project Atomic decided to even minimize more the base operating system we forced ourselves to use only read-only containers to extend the host. And this was a very good exercise because it made, it deduces to us a lot of discipline and because we will use system containers that I'll mention in a couple of slides these containers are read-only so you must design very carefully what you want to do. Another great advantage is that if something breaks in the operating system we ask Fedora project users don't ask us that we build the operating system image and of course we don't have to maintain our CI anymore that it was very painful because we had to store artifacts, rotate them, make sure that we don't point the users to all the operating systems. But when you start using a lot of the stock project you eventually become a contributor. Personally I like it very much and what I ended up is co-maintaining the Kubernetes package for Fedora and CentOS and I'm using the same DistGit as mentioned two days ago for a couple of months so the DistGit for CentOS and Fedora is exactly the same and it works pretty well so far so whatever changes I need to make in the Fedora package they just go straight into CentOS and it hasn't break at any time at the moment and what we are doing now that we move to system containers is instead of we're running for example Fedora 28 at the moment but Kubernetes that comes from Koji and the Fedora repos is a Rohite so for some use cases like this we're using the stable branch of Fedora but a Rohite for the latest packages that we want but when these packages are graduating to the stable release we choose Fedora 28 or whatever the stable one is but with the Kubernetes release base sometimes this doesn't happen and it holds only for a month or so. We are also early testers of Scopio and the atomic utilities because with system containers we rely a lot of them and we contribute a lot to the system containers repo from Project Atomic in GitHub. I haven't seen any talks about system containers and probably this is gonna change slightly or heavily with Fedora Core and we're looking forward to it but at least this is what we have now and we are investigating how we will replace it when we move to Fedora Core on when and if the atomic utilities are going to be removed so the first example is how to install Kubelet Kubelet is like the core component of Kubernetes that runs on every worker node and usually in the masters because also the control plane of Kubernetes runs on containers that are managed by Kubernetes so this is pulling Kubelet from the Fedora registry and it's using OS 3 as a backend instead of storing the image in Docker and minus minus system means that it's a system container and it's like a super privileged container and also in the spec of the configuration of the OCI image, we have kept only the month namespace and we have given almost all the capabilities that we could give to Kubelet and then you just install and it's like from the user perspective from the configuration perspective better it's like having the package installed so you just go to ETC Kubernetes and modify the parameters that you want and then you start with system CTL service for Docker Schwarm clusters that we wanted the faster pace for Docker for Kubernetes the stock Docker that offered in atomic it's fine but for Docker Schwarm and for people that wanted to use things like multistage built with Docker and not Podman we wanted the newer Docker version so we used the Fedora Docker system container and we used the repos from Docker from Docker Inc and we installed the version that we want at the moment I think we were around 1709 we haven't moved to 18 yet and then you just started as a system D service and on the bottom box it's a snapshot from a Kubernetes master node so all the components also Kubernetes and friends are running in system containers which is the API server manager and scheduler plus ETC DN Flannel for the overlay network and at the moment as you can see we have even 111 for Kubernetes a couple of days, 111 was released but in this way it's very easy to upgrade we just re-base the container that we want on top of that and we restart only the containers and it works fine and the last container is the one that we wrote on our own which is a specific service process that runs on all nodes in OpenStack machines that you want to configure containers just to configure nodes so now about the certain container service everything is based on OpenStack in our cloud and I have a type, we have 100,000 cores more as you can see in the box, not in the letters but the important thing for this talk is that we have 1500 Fedora Tomic 27 VMs we have occasional bursts to 3000 but this is like a more conservative layout of our service only for experiments or before a conference that physicists want to do extra analysis we deploy more nodes and in the middle of the screen show that I have here you can see that we have 450 Magnum clusters so for the Magnum deployment the first work that we needed to do is to integrate containers with the CERN cloud meaning the changes that require software for physics analysis specifically we have an in-house file system called CDMFS that is used to distribute software and run root analysis which is a programming language for do physics analysis and we also added all the services that we required for CERN to install specific certificates in all the hosts as a system containers this layout is a bit old, I think I need to upgrade it but after 2016 we run the service in production and we had only a couple of upgrades for adding new features but the layout of the service hasn't changed since then so how users interact with our container service we have clustered templates which are describing what clusters look like apart from number of nodes and size of the VMs and these are the public cluster templates that we offer to users we have an NHA and a non-HA solution so if someone has five nodes it doesn't have to burn more quota and more cores to have an HA for master nodes and with three commands the important part is that with three commands we offer to users KubeCTL or Docker so one command is cluster create and then they can monitor with cluster list what's the status and when it reaches create complete they can just do cluster config and retrieve all the credentials and then they can talk with the native API and deploy their applications some use cases and some very nice pictures that my colleague created and how are we using containers at CERN before I was talking about OpenStack and what many people are doing including us but why we need it we need containers to deploy easily batch farms so we have a lot of physics data to process and a lot of simulations that we need to do and for this kind of system we use a condor or a AC condor or before LSF and with Kubernetes it's much easier to scale out and even scale to public clouds as I will mention later also with Jupyter and Python notebooks and R notebooks and user analysis is done interactively on the web browser in most of our cases now and the Jupyter project is it has done a lot of work in deploying on top of Kubernetes so we have a group that is managing a centralized service offering Jupyter notebooks other use cases that are wrapping up is machine learning and when we try to add GPUs and of course we, the infrastructure folks are using it a lot for running simple web applications continuous integration and deployment run OpenStack itself so we start for a few physical machines deploy OpenStack and then we add more compute nodes which run VMs and those VMs run OpenStack again and those VMs create clusters and those clusters that are running on OpenStack may be run OpenStack again so I think we are in three layer sandwich there and finally I would describe three use cases that we are using Kubernetes and everything runs on Fedoratomic the first one that we did recently is running Spark on Kubernetes Spark is using some research providers as backends so the first one that was introduced was Yarn which is from the Apache ecosystem but when Kubernetes became popular they added a driver for Spark to talk directly to Kubernetes and submit jobs to Kubernetes so the data analytics working group that I borrowed this slide is creating Maglem clusters and then the Spark community create the Spark operator which is a way to a more easy way to manage Spark and they submit jobs to Kubernetes and they have done integration with shared file systems to share the artifacts of the analysis later on recently that I mentioned that we have usually bursts then this group created the 1000 node clusters so we just booted in 20 minutes 1000 node cluster with around 4000 cores or not 4000 2000 cores in total to run a big Kubernetes cluster that was used in full capacity for analysis with Spark as I said again everything on Fedoratomic and another use case is reusable analysis so in the beginning of this century the storage was not a problem so we could store data and then when users wanted to do some computations they could do it how many times they wanted and they could just do it the analysis on demand but now with the highest data rate what we want to do what CERN wants to do is to reuse analysis that has done before and this group that manages this Reana and Rikas project which is like an acronym from reusable analysis platform they use container images as artifacts so when someone has done an analysis and it has also the results and the data that he used for the analysis it creates a Docker image and pushes it in a registry and all the workflow is managed by Kubernetes jobs and its layer is a self-contained job that they are submitting to Kubernetes that runs on our service. I think this is as far as I can go to explain what it is but if you're interested we can sync later but this demonstrates that the solution that we have developed and the modular way that Kubernetes is created and the compatibility that we have with the latest kernel of Fedora has allowed us to use all cert file systems that we have which is one it's called EOS which is CERN specific and it hosts the largest amount of data that we have around 200 petabytes also we have CFFS and CVMFS to distribute software and all these are integrated and running easily with Kubernetes and on Fedora atomic host and the third use case that we recently did is federated Kubernetes clusters so in the right picture you can see that we have a host cluster that is run at CERN and it runs the control plane of the batch system and the two Kubernetes clusters on the top are running again on the CERN cloud but on a different coordinate in the data center and the third one is run on the T-System cloud which is again which is OpenStack but this is just a coincidence and the team that running the batch farm deployed there by hand the Kubernetes cluster and then all these clusters joined the host cluster at CERN so when we have computer deficit let's say and we want to add more compute capacity we just go to any public cloud that can offer Kubernetes to us and with the same way that we have deployed the batch system at CERN we can extend it and deploy it to the public cloud and join and submit jobs and leverage the compute capacity this is ideal for compute jobs for compute intensive jobs when you want to transfer a lot of data this is another story because then you have to pay for all the data and transfer but for simulations which we're doing a lot of Monte Carlo simulations is perfect so a conclusion is that this talk was from the user's perspective and what I want to highlight is Fedora Tomic and Fedora it just works for us it's not the bleeding edge it's just a minimal distro that worked for us very well and I want to say yes now I don't include only CERN but also the OpenStack community that is using our project everyone is using the same images and we're happy with it and we just benefit from all the work done in the project upstream and its immutable state allow us to sync even operators in different sites we can sync and share exactly the same operating systems and tests are done in one day and they'll just work in another easily which might be a lot of different and there's a closing note and the reason that we moved this talk here is that we are looking forward to Fedora Core because so far we had a lot of users saying we are mostly deploying, using Fedora Tomic but we need to use CoreOS because our organization uses it and because it has a bigger community on the container ecosystem so now with this convergence we are very happy and we don't have to hear about these complaints again and I will just leave here questions if you have about Magnum and to promote my project so that's it thank you do you have any questions? we do upgrades but when we do an upgrade in the cluster we don't upgrade the node in place we delete the nodes in some cases that we didn't have a clear path how to do it because it was an OpenStack specific issue that we had we just migrate the application in another cluster but when we do upgrades we try to delete the node and not do any in place upgrades it's our own hardware and we have deployed OpenStack so it's our own cloud it's our own private cloud sorry I didn't get the question the only issue the only issue that we had it was my the question the question was if we have seen any issues with Kubernetes on Fedora because to build Kubernetes we use a more recent GoLang version than the abstract project is using and no we haven't seen any the only one was a minor issue to that was specific to the spec file that was doing a check and when the version was banned from 1.9 to 1.10 the comparison wasn't working because it wasn't using Sember but other than that we didn't have an issue the only issue it's not Fedora specific is that by default from Project Atomic Docker uses systemd as a C-group driver and we have noticed that deletion might be a little slower of containers when using systemd versus C-group FS and some features like monitoring the nodes and having the nice graphs in the Kubernetes dashboard don't work with systemd as a C-group driver so we changed to use C-group FS