 Hello everyone good morning. So let's start today with this talk So my name is Bell Miro I'm a cloud architect at CERN and with me I have Okay, it's work. I'm Spiros. I'm also a cloud engineer at CERN I was I'm core member of the Magnum project and they used to be the PTL one and I contribute mostly in that cloud for service management in general and Specifically for Kubernetes and Magnum So what we would like to talk about today is about our control plane We have a control plane that we have been presenting and during the past summits It has some particularities that you're gonna explain But the main topic today is our attempt to move from our control plane running in VMs to Kubernetes We are still in the initial phase an evaluation phase So what we're gonna show you is basically what we are thinking about makes sense So but first first this will not be a CERN talk without a little bit of introduction about what we do and Actually, we are in a different continent. So let's introduce the organization So CERN is the European organization for nuclear research It's the biggest international scientific collaboration in the world More than 10,000 scientists work in the organization and The main mission of CERN is to do particle physics fundamental research for that CERN offers facilities to scientists all around the world and One of the main things that CERN offers is a very complex set of particle accelerators for particle physics This is a fundamental Machine you can see there This picture is taken from the front side. So CERN sits in the border between France and Switzerland Very close to Geneva. You can see the Geneva Airport the Geneva Lake and also the Alps And you can see here these little rings is the Accelerators that we provide to physicists and Our main accelerator is the LHC. It's a 27 kilometers ring It is 100 meters in the grounds and it accelerates beams of particles in two different directions And these beams of particles collide in these big four experiments. It's big. It's detectors We call them experiments That is CMS LHCB Atlas and Alice So if you see look into this picture CERN headquarters is basically that little thing there So for you to have an idea about the scale so this is inside of the tunnel of the LHC the large other one collider and you what what you see there is the magnets and That are those blue pipes that you see there and In the LHC we have more than 10,000 of these big blue pipes the Magnets there are different kinds of magnets once they bend the beam others focus the beam of particles And they can have between five to 15 meters long these beams these Magnets and they can wait more than 35 times each one So you imagine the effort of mounting off having this infrastructure 100 meters in the ground in a tunnel But these are not ordinary magnets these are sub Superconducting magnets meaning that they conduct electricity without any resistance and to achieve that they need to be cooled down To minus 271 degrees Celsius. So that is colder than the outer space It's a lot of helium to achieve this so imagine the Operations require to have this infrastructure running So these are slurred to party to beams of particles and then they collide in these detectors We have four of these detectors They look a little bit different each of one and these machines are huge They have up to 45 meters long 25 meters in diameter and they can wait more than 12,000 tons each one All of these of course is 100 meters and the ground And basically a detector is a digital camera Not the ordinary camera like you have in your phone. They can take up to 40 million pictures per second So we can get more than one petabyte of data per second. Of course, we cannot store all that data So what the experiments have it's triggers Hardware filters and they'll filter most of the information The ones that the information that they think they already know or is not interesting for science because we already know that So we reduce if we keep the analogy these 40 million pictures to around one 1,000 pictures per second. This is only a few gigabytes per second that is sent to our data center It's still a huge challenge to store constantly 10 gigabytes per second, but it's much better than one petabyte per second So and to support all of this During the last years, so we have the open stack in production since July 2013 We are building open stack clouds that help us to process all this data This is one of the monitoring dashboards that we have You see that we have about 300,000 cores That number went a little bit down because we needed to disable SMT because all the security vulnerabilities during the last year Number of VMs more than 30,000 VMs more than 4,000 projects Magnum clusters they are becoming very popular in our Organization now we have more than 500 And in terms of compute nodes We are doing some changes. We have around 9,000 compute nodes Okay, so this was a little bit An overview of our infrastructure So let's talk about the architecture of our control plane So something that we do from the beginning is to isolate each component From the beginning that we When we're thinking about the architecture that for us It was not really trivial all to deploy the control plane because we didn't want to have Every components or all the components together in two three physical nodes So we wanted to spread as much as possible our control plane To do that we could deploy another cloud to run our control plane Or have a lot of physical nodes only dedicated for the control plane But we decided to run our control plane in infrastructure itself so what we do is we have our Control plane each box that you see there is basically a virtual machine that runs These components in a isolated way. So we're gonna have ten virtual machines for Keystone Then virtual machines for a glance When I say ten is the number that we are running The scale that we have it's Seen there a virtual machine that runs all those components in formagrum same for every component for OpenStack Even the databases each database is is lightening in each on a visual machine or physical nodes We don't have a huge mysql database and I have all the all the databases there So we try to isolate as much as possible So in terms of also Nova architecture because we run cells We have the API nodes around 20 API nodes and then the top cell controller and placement 15 and 30 for each and then we have the cells. We have 80 cells for each cell. We run only one control plane That control plane includes rabbits The Nova API and conductor and fortunately we are still running Nova network for some of them And then we have the compute nodes that in average it's 200 per cell When think that you can see here is rabbits Runs in the same VM and is dedicated per cell. So we have a lot of different rabbits that are independent We only run one rabbit cluster that is for the top cell controller For each cell then they it has its own rabbit MQ Why because for us we have this kind of architecture because we can afford losing a cell For you if one of those VMs running the cell control plane goes down Means that the user will not be able to create instances in that cell However, the VMs that are created already will continue to run Meaning that these will be almost transparent for the user unless he's trying to delete an instance But considering our use case for batch processing we can afford that So to summarize more or less what I said a cloud inception Everything that you saw there runs in infrastructure itself Meaning that of course to bootstrap all of that initially we needed physical machines However, now we remove them. Everything is running on top of the of the cloud Advantages each open start component runs in in a different VM So we have isolation meaning that we can upgrade each open start components in a different time So we initially we start with keystone two weeks after we upgrade maybe glance and so on These advantages of course also this means that we have a lot of VMs that we need to manage We are managing this with foreman and puppets But anyway, it's difficult to manage all these VMs As you saw in the previous slide, we have 20 apis plus all the control plane for the cells at the end is Almost 100 VMs that we have for all the open stack control plane Of course this creates then unused resources Because no vpi will not be consuming all the resources of that VM And which causes some inefficiencies And this is what we are trying to to remove from our cloud is inefficiency because for a cloud That is dedicated for scientific computing any overhead any inefficiencies means that we are not using that CPU cycles for research That is the mission of the organization So what we see here is exactly the same diagram everything that has color is what we virtualize Okay, you see that everything runs in a VM not everything even the databases Are managed by our database team run on top of our clouds, which can cause some issues So we don't have all the databases virtualized Some of them run in physical nodes the ones that are critical to bootstrap the cloud All right. So this is another view Of what I just said. So we have different cells Different availability zones each availability zone contains different cells And then the control plane runs side by side to the With the user vms Okay, so what we we are running this for a few years now So what we want to change Why we want to move to kubernetes or at least try it and see if you can get some benefit from that What this will mean The way we want to move to kubernetes is to run it in the same way we are running our VMs Meaning that it will continue the inception with one more layer What we want to achieve is to create the kubernetes clusters using magnum That with VMs on top of the cloud and then run the control plane That will be pods that run inside those VMs. So you see all this inception advantages this will mean that we're going to have a strong consolidation Because we will Use it more efficiently all the resources our control our kubernetes cluster could be much Smaller and run much more components because it's not it will not be one vm for one of api Another vm for nova scheduler for example So we'll be able to constantly that much more inside that kubernetes cluster that for us is a big advantage And it'll be much faster to iterate because right now Creating a vm it's fast But then all the configuration using the configuration management tools that we have it takes it takes some time So it will be faster iterations Cloud native auto scaling So we will be able to scale up and down the cluster and then the applications inside It'll be easy for us if we are having High demands of requests increase the number of apis much faster than today So today what we have is we have some Outstand by vm's that if we have a lot of requests on the nova api side We just configure them to be nova apis and they join The load balancer and start acting as nova apis So with kubernetes that will be much easier or we expect this to be much easier and faster These advantages one more inception layer Without the implication that that causes Um The support of the infrastructure this is a big change for us because over the years We're being using that model meaning that all the law consumption monitoring is integrating our configuration management system Removing that and moving to kubernetes. We'll need to do that to work from scratch And Not all stuff is used yet With kubernetes so meaning that The operations team will need to get used to to kubernetes So the final result that we expect is to have this instead of having all those vm's running the services We're gonna have the kubernetes cluster and all the services will run inside those vm's for example like this and this is to basically to illustrate the inception that we have So imagine these vm's were Created using magnum orchestrated to be a kubernetes cluster. We have all these spots that will be our control plane running um So then to create a vm a user vm You see that is those components that will intervene to create that user vm That is side by side the other vm. This is a little bit confusing. Uh, this is these are the easy examples Um For example, this is for magnum. So magnum to the to deploy another cluster We'll use the components that are already that were already deployed by magnum This gets really confusing with the ironic resources that we also use very ironic Um, so basically we create ironic resources using the cloud that our supervisors and then we put all the other stack on on top So let's talk now about elm One of the things that we try we try several things to run our control plane and kubernetes is helm um, not just manifest but packaged charts With helm So what is helm helm is the package manager of kubernetes is the pip Of kubernetes and it has a very large its strongest point is that it has a very large selection of community managed charts or packages that are distributed as starables And All these charts have a business logic for the applications. So for example for a very big component like engine next ingress which is specific to kubernetes or drupal All the configuration is factored out to a single file and This provides some benefits and simplicity in configuration Simplicity if you are not a very corner case user Um You can use the charts that are provided by the community and host it in public clouds like Google cloud or aws But you can also host the charts that you build on your own cloud If you have s3 and swift you can use chart museum, which is just a server component For helm to manage a pool and push charts from there like you do with a container registry Um I will describe briefly the usage of helm. Uh, this is for version 2 I will verse I will mention only version 2 and not version 3 because It's only just out version 3 and open stack helm that will mention later uses v2 So typical steps is that you have already kubernetes cluster You need to configure the client. I have also posted a very important link to do it securely Not just expose the server side component of helm and securely, which is one of the Biggest security holes that you can create in your cluster And then you add some chart repositories. For example, your chart museum deployment and then The workflow is that you inspect the charts You never just pull a table and then pass it as admin or as a user that has access to credentials to your cluster And then you install some charts So in code in commands, it looks like this You do helminit with tls as I mentioned you add your repo which is example.org slash charts or something Then you don't update you update all the dependencies You do template first to see what you're doing and to be sure you're not installing something crazy And then you do helmin install and you install your app with this values file that contains all of your configuration Um, I won't describe opens stack helm is a very big project And these two numbers show it clearly so the open stack Helm, which is only the Open stack component is 20 ripos, but all the additional requirements are 46 ripos This is not specific to kubernetes or helm if you run a cloud you need all these 46 components You need a lot balancer in monitoring You need a network plugin You need a lot balancers and so on One of the most crucial parts for managing Services not only with kubernetes, but also with kubernetes is secret management and one very important tool at least for us that we created was A plugin for barbican that I will mention later, but what are the requirements for secret management? And most teams to be efficient and have standard procedures. They need to do a github Git to have a github's Workflow so everything is managed with git secrets configuration and All the changes are managed with pull requests with at least to approvers ideally or one um For specifically for helm the requirements are also to not change upstream charts So for example, we don't want to patch anything in open stack helm And um, we want for our secret management to integrate with standard helm commands and not just introduce something new Or another layer or another component Um, what we also want to have is to leverage the existing infrastructure that we have for authentication In our case, it's keystone. It's backed by active directory, but it it would be ideal to manage um of for authentication authorization with keystone And use the existing infrastructure and the integration we have done with active directory So now we'll talk for the barbican plugin that we have implemented and uses the open stack API service um, this is the illustration of um How to use it so you will have the user that there is an encryption key stored in barbican If this encryption key doesn't exist the first time that we that you try to use the plugin, it's generated Um, so first you export your open stack token. It uses token authentication only doesn't use any Um password authentication, uh, because at least not glad we use kairberos and we don't want to encourage users to use passwords So what it does is either fetches the key from barbican or generate it encrypts the passwords or other secret that you might have Encodes in base 64 and stores in the file system Then you push to git your colleagues can review they can pull out they can pull the branch Or they can or you can have a cacd that uses the secrets And then you do instead of helm install helm secrets installed um the secrets uh binary Which is a helm plugin. It just wraps the install command so it detects if a file is encrypted or not Tries to decrypt it with a key in barbican and then passes it to the helm binary to eventually install um as it's in the Kubernetes ecosystem and helm the The plugin is written in go and one of the latest important additions that we did that I mentioned at the end is that all the Edit of secrets when they are in plain text. It's done in memory. So the passwords never touch the file system And this is how it looks Encode, uh, this is the the help output. So we have some decrypt encrypt commands that you don't necessarily need We have a view command that you can just take the encrypted file and you can just see it And then we have installed upgrade and linked which are for helm And which is the wrapper commands that they take the encrypted files decrypt and pass to the helm binary This is how the workflow looks so we have a service Let's say in this case it's glance and we do helm view Helm secrets view and the encrypted file And we can see that we have some passwords there And then we do helm secrets install which will take them encrypted file decrypted and pass it to helm Then we can do helm secrets edit It will uh spawn vim you can edit your secrets in memory Save them and again encrypt it and then upgrade or push to git I will pass the belmira again for Loki Okay So other thing that we need uh if we have a containerized infrastructure is images Uh, so we start looking what is upstream what we can benefit from from and one of the things is open stack Loki so This project is designed to build quickly images OCI compatible Basically all the open start services and actually this is great If you haven't looked into this project, I have a look if you are looking to deploy containers Because it's so flexible to create the images for your projects the way you will want them That we really recommend this it's a very flexible tool Um There is a base image that is naturally support and then basically you can customize wherever you want So when we saw this This is a bright fit for us. We have some dependencies because our internal patches and these help us A lot to build our images So have a look into the Loki project So and now we're going to start with the views cases If now I'm sorry if now it looks clear that maybe elm it's a good solution to deploy And control plane when we started doing this it was not that clear that we wanted to use elm And actually we wanted to move faster and what the best way to move faster is to have something running in production If we have your development cloud it will always be your development cloud And you'll be scared to move this into production. So we wanted to have something Uh a part of the control plane running Kubernetes very fast Also for us to get a real feeling what it is when we expose this to the users And looking to open stack elm as Biru said, it's a big big project. It's complex It's very very at least for the initial understanding of the project that takes a lot of time and um We knowing Kubernetes that was not really the fastest way to move forward We wanted a more controlled experience We actually wanted to know what we were doing if we are basically the point that with elm We'll not learn a lot. We need to go through the code to try to understand what elm is doing behind the scenes to deploy the service So the first thing basically what we did was looking to what was done pick a simple service like lens And see what elm is doing behind the scenes And you can do the elm fetch To get basically the chart and the templates and you're going to see all the steps all the manifest all the yaml That is generated by by elm So what is needed? What are the basic components to deploy something open cell components using kubernetes? We need an image We'll need a config map for the configuration of the service Then we're going to need a deployment basically to trigger a glance api And a service basically to expose this to outside It's basically that for a simple service like glance if you look into elm There are a little bit much a lot of stuff there that we actually don't really need for the simplest use case So let's start simple. We got all the that yaml from elm We remove everything that we didn't understand at that time or we thought that okay, it's not interesting for us and we try to make it as to look as The production service as much as possible um so we Basically got our configuration file that we are running production for glance we Put that in a config map The deployment it's only glance api to run that And it's basically what we did the problem was what to do about the secrets Uh, the secrets are embedded in the configuration file Fortunately open stack supports several configuration files. So what we did was have a glance config for The configuration file and the another Configuration file only with the secrets the database secrets Plus the all the service accounts And that is a kubernetes secret that we configured manually Um ingress uh For this for ingress we we use nginx And what we did is using the recipe that we have for elm. So nothing fancy um So what are the the difference? Uh of using elm basically we understand from the beginning all this process Uh, it's very easy to understand to deploy It's basically only a few components this deployment and configuration and config map We started out configuration on git as we always do and when we added this took like a very few time to the To have this configuration working and then we deployed this into production There's a clue. We point that the endpoint in our h a j proxy and the users every time that start using glance Any bi call that uses glance can go through The the setup that we have in kubernetes So we run out the vms and the kubernetes deployment in parallel In glance in in parallel. Yes So This is how it looks like if we do inside that cluster Get pods you see that this is very small and we only have one pod for the glance api the other side for the And for the ingress Um, because it's very simple. We don't need anything complex complex to have to have a kubernetes um deployment running the control plane So after this running we started more complex things and now spears will talk about them So after the initial investigations Of helm from for open stack helm and the deployment of glance after a lot of frustration with All the dependencies that we had to manage We've we started to figure it out And this is great. You can do Almost a bit of github without automation, but with just the git workflow But then you cannot iterate fast and take upstream upstream changes easily So the second use case is one of the service that actually I knew very well and it's also very simple it's The heat service which has an api and the conductor and the rabbit and adb. Okay. Now that I said it Out loud, it's not sound very simple for a standard service. But anyway for an open stack service is pretty standard um So we deployed that in an additional in an extra region that we have for for qa not for development And we plan to move this to production and it looks very similar to what Bellino described but everything is managed with open stack helm. It's the stock Helm chart it's the stock image is built by the open stack helm community And it has it's an addition to the puppet managed machines So we have the central h8 prox in front of all services We have some vm's and we have the the heat service Deployed rabbit mq and hit db is external for now And we're starting to get some experience for it and starting adding more services and adding more components And the only difference the only different thing that we try to do is to store everything in our Docker registry and our chart museum for charts um, so the other use case that we Added was um, we had the some requests for some users from one of the experiments that runs the accelerator The requirement for a new region the accelerator runs in a dedicated network And serenity is a very old laboratory. So the data center is old the network infrastructure is old It works very well, but the design was done in another decade. It's not Very cloud friendly and there are some strong requirements for security also history and so what we needed was Have a complete isolated environment that doesn't have access to a container registry Doesn't have access to any kind of storage that we have Very big safe deployment at cern we didn't have access to that no access to puppet And we had to build something from scratch We thought since the scale is Small and we can easily manage it We can try help And the final requirement was that the users wanted kubernetes and demand clusters They wanted to iterate fast but in this isolated environment Because they have the production services there And also they have The development environment In the same network So we picked this infrastructure This architecture for the infrastructure. We will start with less than 10 compute nodes And the minimum that we require is And one database that the dbt our dbt will provide to us And the keystone glance nova neutron hidden magnum the only component that will be outside This deployment is keystone because we have So far four regions Four regions and one keystone that rules all the regions And so what we wanted was self-contained storage self-contained container registry And since this was completely isolated, we didn't have magnum there. So we had to deploy The initial cluster by hand then we could move to a managed cluster This is in In development stage for us, but soon we also have committed to our colleagues that will give them access And they can start using it So some conclusion nodes for me and then next steps from affordable meero And so far OpenStack Helm had a bit of a steep learning curve similar to Kubernetes So it needed a couple of iterations and then first go simple without it to To try to see the benefits of it And which are that the simplified and combat configuration with a few With a few yaml files and if you start to add to add more services You can Commonize all the standard requirements like configuration for logging and so on And then move only the secrets basically and the few service specific Configuration and indedicated files for every open stack component And with open stack Helm if you start a new cloud is Very easy to do it because it provides all the dependencies and all like most common And projects are already packaged in charts And you can start building a cloud from from scratch with it for large deployments and very opinionated It's challenging. That's why we did the first initial step Without it And some of the drawbacks Not very big at the moment Is that there is no external secret management to open stack Helm We need to Either do a plain secrets just encrypted on client side. We cannot use sealed secrets. For example Or a CRD instead of for plugin There are some dependencies in the infra charts like there are bitmq to come from the Open stack Helm repo and Helm 3 is not supported yet Which is will be a massive advantage if we try if we remove tiller I will pass to Belmiro for our next steps So, uh, how we see this going Uh, so summary we have a few open stack projects running side by side with our control plane and also in kubernetes So we believe that we will Continue to grow Or to move the control plane to kubernetes. We think that that will be a great approach for us Uh, however, there are many other tools and doesn't mean that this really needs to be elm So elm 3 is coming Customize maybe will be simpler Uh, we didn't play a lot of it yet. What we would really need is flexibility Uh, I told that provides flexibility and also GitOps It's starting to be a big thing now. Um, we have some colleagues looking to flex To provide that and maybe we're gonna follow that um Integrating logging and metric and we see that we want to deploy everything with the kubernetes So we really need to go further in these steps um And service mesh So if we are moving our control plane to kubernetes having service mesh underneath that will be a big big plus for us We choose will be a huge challenge considering our network infrastructure But is the way we move things forward is having big challenge and then move everything with it So thank you We are happy to answer your questions now Thank you so much And one of the things that we like to is also to hear your experience If you're trying to deploy these services using kubernetes So that is also one of the goals of this presentation to generate discussion about this Thank you questions Yes I have two questions. The first one is in your cloud architecture chart. I didn't see uh, Neutron the networking component, right? The second question is that you said that you are currently running kubernetes plus the the VMs So you are running in power up. What's the technology that you use for the VMs way? So Oh, yeah, okay. So we don't have Neutron, uh, just because We didn't put it in the slides But yes, we have Neutron. We have many other components that were not in that slide. That was just an overview Okay, so we have many other components like barbican was not there Eat I think was not there So, yeah, Neutron was missing Uh, we don't uh, we if we leave Neutron that will be one of the lightest components that will move into kubernetes Okay, other story, um The other question was about what is the technology uh in the VMs. So we are running kvm for the virtualization um So that's it. So vms are created um The visualization technology is kvm and we run then The vm that runs glance api for example and in kubernetes the vms are again deployed exactly the same way But they are they run Kubernetes and then the pods the glance api for example or eat No, it's a just uh, which kind of the vm technology. It's like kvm or zen or I don't know. We use kvm. Yeah any other question Shall I repeat? Okay, so I think the question was if we plan to move from the testing environment to production all right Uh, yes, so that is the ultimate goal to move the all the production environment For at least for the control plane in kubernetes and then for the compute nodes As a second step, maybe Yeah, we are running We are already running some of the components in production. That is As soon as we feel confident enough we move them into production and then there is no way back If it's in production And that's why we do that Yes Yes, so the question is I think all these inception in the control plane can can that cause some problems when vms are live migrated Or something else happens The answer is yes, you need to know Um, really the architecture of your control plane One of the things that we had A few months ago is one For one of the security Vulnerabilities we need to reboot basically the entire cloud And we do this per availability zone And we informed users availability zone will be down a will be down Uh, however, as you saw the control plane runs also in this infrastructure Meaning that is not only the control plane that we thought about that actually but we forgot about the databases that the db team also runs in in our clouds So when we reboot that availability zone We noticed that the nodes came up, but the vms were started, but without network So what happened? The neutron database was hosted there so the The host hosting the neutron database was rebooted the vm come up But without network because then the neutron agents were trying to configure The the vtaps of the network and they were not able to connect the database Because it was running a vm that was rebooted that you have network because needed neutron Uh, so you see all this inception You need to understand after that, of course, we move this to a physical node It's one of those databases that are running a physical node More than a question. So we experience this and I think, um About the inception is great because you have a very large number of bms that provide you a lot of high availability, right? You're so the if some of the bms go down, you still have a lot of resources to keep your control plane up However, moving to covenators means that You're going to aggregate way more resources in concentrated bms. So the the the risk is going to be higher So the observation will be why don't you just take this opportunity to actually Rebump everything from scratch So rebuilding a new cloud only with covenators for your control plane and migrated everything over there We think that will be a big overhead for us managing two clouds So the way we see that with the latest developments in magnum with node groups that we can basically sup on all the cluster Between all the different availability zones that we have Um means that we're going to have the availability having the cluster that runs in different availability zones so We can consolidate but be have all these High availability that we have today the number of research the vms and then the pods that we have inside Going to consolidate is like for every service we have at least three replicas So for every availability zone, this will be compact in one or a couple of vms Rather than having like for every component one vm availability zone And that would be the consolidation the benefit mainly I think we need to finish. Thank you so much