 start. Good morning. So today we'll cover now a presentation about how we looked at containers at CERN and to CERN Cloud. So I'm Ricardo from from the CERN Cloud team. And hi, I'm Adrian Otto. I'm the team lead for the Magnum project. So we'll go through this. This was a collaboration between the CERN open lab, which is a group we have at CERN that collaborates with external entities and also Rackspace. So I'll give a brief introduction to CERN just to explain why why we are looking at this in for our use cases. So CERN is the European Organization for Nuclear Research was founded in 54. It has 22 member states but actually a series of other associate members that also contribute to the different experiments we run. So we're basically a particle physics laboratory. We do fundamental research in the area of particle physics. And the main things we have right now it's accelerators, physics accelerators. So you can see a picture on the bottom left there of our main one, which is called the Large Hadron Collider. So it's under 100 meters underground and we accelerate protons up to the speed of light and we make them collide at specific points. So with the picture in the middle you see how this complex works. So you can see we have multiple accelerators, some of them start the proton acceleration and then we go to the main one. So the accelerator, the big one is 27 kilometers per meter, 100 meters underground and then in certain points we have dedicated experiments looking at different parts of physics. And this picture on the bottom right is from the Atlas experiment, one of the biggest ones. I don't know if you can actually see but there's like a human sized person there. So you can see the scale of it. So it's a cavern 60 meters high, 100 meters underground. So we accelerate these particles up to the speed of light and then we make them collide against each other and we try to analyze what comes out of these collisions. So what comes out in general is a lot of data that we have to store and analyze. So for this we use a large open-stack cloud. So this is a screenshot I did yesterday of our current state. So we have something like 250,000 cores available. We have almost 8,000 hypervisors. We use open-stack nova cells to split the load. And the interesting bits here, so 26,000 VMs more or less. But the interesting bit here for this presentation is the Magnum clusters there. So you can see that even if it's an early project, we already have quite a bit of deployment. So we have around 80 clusters available right now. So with this we'll start with the presentation of the Magnum project just as context for this talk and I'll pass to Adrian. Thanks Ricardo. So Magnum is an open-stack service that allows you to, as a cloud user, produce a cluster that runs a container orchestration engine. And it allows you to use your existing cloud credentials in order to produce those clusters. So if you're already an open-stack user and creating VMs or volumes or other cloud resources, you can use the same account that you use to produce those to produce these clusters. You get to choose which kind of cluster you create because the actual back-end for this is modular. So there's a driver for Kubernetes, there's one for DCOS. We'll talk about these in a minute. There's also a multi-tenancy solution. So you can have clusters that are side-by-side of the same type, of a different type, but that are guaranteed never to share the same kernel with each other, which is important for security isolation reasons. So because of the way that Magnum works, you get multi-tenancy not just at the control plane, which you might be accustomed to with your favorite container orchestration system, but all the way down through the entire cloud. So it's using all of open-stack features for multi-tenancy and that implements its own. I find it convenient to use Magnum to create clusters really fast. Any of you that have ever tried to stand up a multi-master Kubernetes cluster on your own using scripts might recognize that that is not an easy process. So being able to just ask an API service to get a cluster right away is extremely compelling. So when we talk about Magnum and CERN's use of it, we're going to use some terminology that you need to know. The first is a COE. A COE is a container orchestration engine. So this is what I mentioned before, this is modular. So we support Kubernetes, Docker Swarm, MISOS, and DCOS today. And because this is modular, we can support others in the future. And the reason why we need a term for this and other container orchestration solutions don't use this term is because open-stack is the only cloud environment that gives you a choice. We also have a concept called a Magnum cluster. So Magnum cluster is represented in the Magnum service as an API resource. So just like when you create a server in Nova, and you can represent and query that through the API, when you create a cluster, it is also an entity that's an API resource. And that resource is backed by a heat stack that contains all of the cloud resources necessary to bring that cluster up. So it's all of the Nova instances, the neutron networks, the security groups, the software configuration components. Everything that's necessary in order to produce and manage that cluster is contained in a heat stack. Now, once you have the Magnum cluster, you can act on it to manage its life cycle. So, of course, you can create it. You can scale them. You can act on them. We're working on getting in place upgrades, which is kind of a very handy thing to have. If you've got a giant cluster and you want to be able to upgrade it in place rather than make a whole new one and redeploy your workload onto a new cluster. And the Magnum cluster is the place where the COE software actually runs. So here pictured at the bottom, you see the Magnum clusters in the middle. The COEs are running on those clusters. And they are composed of an elastic composition of cloud resources underneath. So you can determine how many Nova servers are in a cluster. You can determine the scale of it in that way. There's also a concept of a cluster template, which is a little bit different than a heat template. If you understand how a heat template works, it's a file artifact, right, an HOT file artifact that you present to the orchestration service in order to produce a stack. And the disadvantage of that model is that it's not represented in a way that's reusable by all of your users of your cloud. Every user needs to have his or her own file artifact in order to produce the stack. In this case, it's actually represented as a resource in the API. So the administrator of the cloud can produce one of these things and have it saved as a public resource. And then all of the users of that cloud can use it in order to produce clusters that match that form. And then the last bit of terminology is native tools. So we have a client. So Magnum is not designed to be a container orchestrator. It is designed to be a service that gives you an environment of your choice, and you're going to use the native tools that belong to that orchestration tool. So we defined what a COE is. Each COE has clients that are designed to work with it. So if you're using Docker swarm, the client is Docker. If you're using Kubernetes, the client has its own client. You don't have to use an OpenStack client in order to interact with these environments. You're going to use the native client. And that represents an interesting problem because you have to, when you're running native client, you need to use whatever its native authorization and authentication mechanisms are. They're not integrated with OpenStack Keystone, for example. So how do you manage that from a user perspective and from a security perspective? It's a different mode. And so Magnum bridges that gap by managing all of the TLS certificates that are necessary for doing that. So Magnum is different from running a system like Kubernetes by itself in a few ways. So first, I talked about multi-tenancy. Most people don't realize that today container orchestration systems do not have multi-tenancy in their network. It just does not exist and probably will not exist for some time. When you use Magnum to deploy your container orchestration environments, you're getting multi-tenancy because Neutron is multi-tenant. And so you get that isolation between deployments that you wouldn't get otherwise. You get to choose whether you want to run, like, swarm for some workloads, Kubernetes for others, or is going to talk about their interest in having this as a choice for their scientists. You also get to choose what kind of server gets put underneath this. So if you want this on top of virtualization, great. If you want it on top of bare metal, that's fine too. And it's also integrated with OpenStack in the way that I described before, where you use the same identity in order to interact with cloud. Thank you. So I'll build on this. So when we started looking at this, it was mostly because we saw an opportunity here on using containers to simplify our procedures and to help our users in having more flexible ways of doing their data analysis. In the end, what we are producing, as I mentioned before, is a lot of data. We produce some tens of petabytes a year. We have a couple of hundred petabytes available that we need to process from time to time and always new data coming. So we try to improve our procedures constantly. So containers give us a better way of doing that, but they also give other things. The sharing of the experience of running an analysis, the ability to reproduce that analysis. This is what I'll try to cover here. So if we look at it, a container gives you isolation, kernel namespace, C groups. It gives you the possibility of having improved performance because you're sharing the same kernel instead of traditional VMs. And then these of use. And these things will be clear from our use cases. So a bit of the timeline of what was our process to get to where we are today. So if we look at what was available at the time, we started some container investigations of what was available, which tools were there. And we started this end of 2015. Magnum was already there. So we started a set of early tests beginning of 2016. And we saw potential to offer what we needed. We already have a big open stack cloud. So building on top of an open stack service makes our life much easier. Magnum also had the possibility of choosing the container engine. This was very important for us because we had groups of people that were pushing for Kubernetes. We had groups of people that were already using Mesos and others that were just using plain Docker. And they wanted to rely on the Docker API. And Swarm there has great potential. We wanted also that this is easy to use so that people don't have to understand complicated templates of how to configure their clusters. So in 2016, we worked a lot with the Magnum team also upstream. And we worked on integration of the missing bits on the wall setup that relate to our specifics in our infrastructure. This took a couple of months. And by the end of last year, actually October, we opened the service to all the users. And it's been quite popular until now. So an example usage. Adrian described really well the concepts of what is in Magnum. And this is how we use it at CERN. So this is what a user will see. They will do Magnum cluster template list and they will see the possibilities of which systems they can deploy. So for the three systems we described, we have what we call production templates and we have what we call preview templates. This is a really good thing because it allows us to deploy to the end users the next version of the configuration. And if they want to try it, like if we integrate with a new storage system and we want to expose these to users, then they can just try a cluster with this preview template and get an early view of what's coming and give us feedback. So this is really good. Then inside the template, you can see what is there. So you have the COE, which is the main part. But you also have specific configurations of how large your master should be, how many masters, how large your nodes should be where the containers are run, what kind of server, VM or bare metal. The image ID is actually controlled by us. We only support Fedoroatomic for now. And then the network driver is also controlled by us. But this is important. For example, if you consider a small cluster of say five nodes, then probably a master of medium size is enough. But if you start scaling to a couple of hundred or thousands of nodes, then you will need probably much larger masters. And this is what we experienced. So users have these default templates, but they can actually customize to their needs. So this is all they need to do to create a cluster. When I meant fast and easy to use, this is what we were looking for. So set of understanding complicated setups and templates, all they have to do is a command that they are very familiar with because it's configured using the usual open stack credentials. So they say cluster create, they give it a name, they select the template they want to use and the size of the cluster. And after a couple of minutes, they will have the cluster available. And then we have this cluster config, which is the one that does all the fiddling on the client side so that you can use the native client. So if you're using Kubernetes, then this maximum cluster config will actually set up the Kubernetes configuration file so that you can just use kubctl and it just works. If you use form, then it will configure the Docker host environment variable and things like this. So this is very simple and some of our users have tried the Google container engine for example and they feel the same kind of experience here, so this is what we were looking for. So the second step was to make sure before we opened to users that we could scale to our size, to the use cases we have. So some of these use cases can be quite large. So if you have to process a couple of petabytes of data, you might want a very large cluster. So we started stress testing and we built on some stress tests that the Google team had developed for Kubernetes and we tried to reproduce last year. And for this purpose we deployed a cluster of 1,000 nodes and we managed to have their test service scaling to 7 million requests a second. The initial goal was 10 million. We couldn't quite reach it because of some networking issues we had internally, but we are pretty confident that we could have reached it without that. The second thing we did is how well does the creation of the cluster scale for different sizes. So here you have an example on the table on the left. And here you can see that we tried from pretty small clusters of two nodes where it takes like two and a half minutes and then we went to 28 nodes and we saw that actually the time doesn't increase that much, it's still around five minutes, which is acceptable. Then we went further up and we tried 512 and 1,000 nodes, so pretty large clusters. And we started seeing some kind of linear scaling so we are working on improving this because there's no obvious reason why this should happen. But still, for most of our users or all of them actually, if they really want to do something like this, they can do it. And we started to see a lot of clusters, 23 minutes is something they are willing to wait for to get a large cluster like this. So in the end it proved that this kind of technology we can really scale to, in the Kubernetes example we can see 7 million requests a second with 10,000 pods generating loads and 500 replicas of an engine and at the same time we were doing some integration work, so if you deploy this kind of clusters in a large organization like ours we have a lot of legacy systems. So the data comes from the detectors but it goes to our storage systems. These storage systems have been there, there's hundreds of petabytes of data available and that's what they need to access. So we have to make sure that when they deploy their clusters and then when they run their workloads on containers that they actually access to all of this. So in our case there's two systems that are briefly mentioned here. The first one is called CVMFS which stands for CERN VM file system. This is mostly a read-only system that we use for distributing software in many distributed sites in around the world and it's basically caching, web caching of this data. So this is accessible as read-only using a fuse plugin so we had to make sure that this is well integrated into Magnum and in our configuration. You can find the links below of how we did this. The way we did this we wrote a Docker volume plugin that exposes this file system to the container using the Docker interface. I'll give an example. The second one is EOS which is where all our physics data is. This is a bit more complicated to configure because it actually needs credentials. It's user credentials so they support Kerberos and X519 certificates. It's also a fuse plugin but it's slightly more complicated. So for swarm and mesos we just used the plain Docker volume plugin and for Kubernetes we did a flex volume wrapper that uses the same code behind. So the way this looks is if you're used to Docker you usually create a local volume and you have a storage on the host. So in this case you just say Docker volume create and then you give the repository of the CVMFS file system and this will do all the fiddling for you on the host configuration and then when you deploy in this case I'm just deploying a normal shell in an interactive container and you just reference that repo and it will be available for you inside the container. Then you have the same. So just different manifest but the configuration is exactly the same and internally it will work the same. So this abstraction is really good because we have a lot of users using Docker but then they need access to the systems and the configuration is not always trivial. The fact that we could centralize all of this in one service within OpenStack in Magnum is really a very good feature and we have some users who don't have a single node or two node cluster because they get all this configuration for free and they don't have to do anything. So this was more the internals. Now I'll cover a bit the use cases of where we are currently using containers extensively. So I'll cover two Charmort infrastructure oriented. So we use GitLab for most of our code repositories and we use GitLab continuous integration very extensively for this. So the builds when you're using continuous integration some of these builds can be just building software. Some of the builds are really validating physics software which can be several gigabytes and take a long time. So the runners behind this infrastructure are heavily loaded and we have to be able to scale very easily. So we have some predefined share runners which is the standard configuration that you can use in GitLab CI. And these are also running Docker but there's no specific configuration. So if you need something special for your runners you won't get it here. What we provide instead is the ability for people to define their own runners which comes with GitLab CI. But then it might be complicated to set them up. So what we've done is we integrated this with our Magnum service. So when you just deploy in this case a Docker swarm cluster and you start running your jobs your GitLab CI jobs within your tenant in your cluster in a swarm cluster. This works very well because GitLab CI has very good integration with Docker and uses Docker API. So if you instead of running a single instance of Docker on a host you just point to a swarm cluster then it just works. And if you have periods of time to scale the cluster, so this is another functionality that Magnum allows you, you can just change the number of nodes in your cluster from say 5 to 10 for a while and then you shrink it again when you don't need it and you just play with your quota as you need. So the second one is Horizon. So we use OpenStack everywhere. So we wanted to see how far we were from being able to run actual services in containers. So in this case we took Horizon as an example. So traditionally we deploy everything managed by Puppet and centrally managed. So we have a form and database that keeps track of all the hosts and we have profiles for how the hosts should look like. This works very well but it sometimes doesn't give us the flexibility we would like. So we started looking at moving all of this to containers. So there's a lot of talks during the summit about how to do this and different options. In our case because we had Magnum so we deployed the Kubernetes cluster and we did all the configuration of Horizon to be deployed in Kubernetes. So we built some Docker images and all of this and it works really well and what it allows us is when we have periods where we need to scale the service then it's much easier to scale pods in Kubernetes than it's possible to scale VMs because it's just faster and takes less resources. We did discover a couple of things that we are working on. With the current OpenSack service it's not completely trivial how to get configuration other than using the local files. So we actually use distributed file system to share the configuration between all the instances and the second one is how to get the secrets. Both Swarm and Kubernetes have the possibility of storing secrets for your service and the service can use them but how we get those secrets there is not clear. So we have a secret service, secret storage service turn where we put all the passwords and all the things and how to do this integration and plug them into the engines is not completely trivial. So this is something we are working on and there's work upstream in those projects also to help. And then I go to the coolest one. So in the end we are a physics laboratory. What we want is to do physics analysis and get some results. From all this infrastructure we have, from all these complicated instruments we have the cloud. In the end what we have is plots. This is what physicists are working for. And I give two examples here. So the one below on the left this is from the Atlas experiment it's a plot of a couple of years ago we discovered a new particle called the Higgs boson that had been predicted many years ago. So you can see there are plots where they finally found the particle and it's just a histogram like this and you see a pike and that's the particle. So this is the result of analyzing petabytes of data for many, many years. And then we have like future physics that might come and these are other plots on the right. But this is what people actually do when they're working with this data. Now one of the problems we have is that sometimes it's not very trivial to share this analysis so if you're doing this work and you want to share it with a colleague, sometimes the setup of this analysis can be very complex. There's a lot of software, a lot of dependencies on where to fetch the data, what to run, which version of the software to use. And this is where containers also help a lot because physicists can just build this environment in one single unit and then say to people just reuse this and this is a really, really big benefit of using containers. So again as I mentioned this analysis has multiple pieces and I mentioned the data. Even the data can be multiple types. We can have the raw data coming from the detector but we can also have different steps of reconstruction of this data and we do a lot of simulation. And then the frameworks that use the software that have different releases versions and you have to make sure that this analysis is run with that version of the software. Otherwise it just doesn't work. So being able to define one single unit where all this is contained, the software, the dependencies, everything is a very big benefit. The other thing is that the computing is massively distributed at CERN and you manage to do something in your corner. It doesn't, if you need to scale it's not very trivial to just now run this in a thousand nodes. It's not that trivial. If you define it in a container in one unit then you just say run this container and scale it to 10,000 instances. So this is also very important. So in the end having this single unit of the deployment is very important. For sharing and also for preservation. So another issue we have is that if you run the analysis today the infrastructure and the releases and software will change so much that it's not guaranteed that you can reproduce this analysis in three years. And this is quite important for physicists because if you publish a paper and someone sees that this paper in five years they might want to redo this analysis with some different parameters and if you have a way to just publish also the data of the analysis of how to reproduce this makes it very powerful. So this is why we have this portal called certain analysis preservation. It's an effort that is building a lot on the container technologies to be able to achieve these goals. So what we call a reusable analysis has three pieces the workflows you want to use so the different steps of the analysis the software that you need and this is where the Docker images are really key and then the data you want to access. So in this portal you just publish all these pieces and then anyone having access to the portal can go there and just say I want to redo this analysis and they plug into these clusters that we are providing and then people can just get the results without having to do any custom setup. This is very powerful. So these engines this is a work in progress but in this case it would be a workflow engine and in for this case we are using a Kubernetes cluster I think the size of the cluster is something like a couple hundred less than a thousand cores but it still already shows quite a lot the potential of the system. So in the engine you will have multiple levels and different steps on each level and this can be executing in parallel so we've tried paralyzing up to 500 jobs at the same time running and then each of these steps will run in Kubernetes container or pod. For this we exploit a feature in Kubernetes that is less known than most of the others which is a job. So job is a very useful abstraction if you want to do this if you want to define a very specific task that should you don't really care about all the details of pods and the underlying infrastructure you just abstract what you want to do in a job and you say just run this job and that's it. So what we do is we break the workflow into multiple jobs and then we throw them to Kubernetes and we expect that the Kubernetes handles this we don't have to care about retrials and things like this. Then because these jobs are independent in themselves but then they have input data and they produce output data and this output data is used by other jobs then we need a distributed file system and again because we have OpenStack it was very easy to get Manila working there's a talk tomorrow about the details of how we did this and we use FFS as a backend so this is the system we are using to store the intermediate jobs. Again we are scaling this and I think the cluster right now has something like 200 nodes so it's quite large already. Now this is a visualization of how the workflow is a very small workflow but you can see that there's different steps and the reason I wanted to show this picture is that another benefit you have is to track the execution of these workflows. For many years we've developed our own softwares. With Kubernetes all you have to do is just attach to the jobs running and aggregate the logs and we get the visualization on the bottom of the page there of everything that is happening. This is the person developing this portal was a physicist and it was quite dramatic the change from the previous systems that he had worked on. There's a couple of things missing in the whole setup so one is the prioritization so right now in Kubernetes you have the notion of a job so I can't really say that a job is more important than the other and this is quite relevant for us because you will have different types of jobs. A reconstruction job should be happening before an analysis job for example so this is something we'll be trying to work with upstream Kubernetes to introduce. Then the improved job abstraction is that the job abstraction actually has some characteristics that we don't think it will always retry until it succeeds and sometimes we just want it to give up but then it has quite a lot of flexibility in implementing your own controllers so that's what we did for now we just built on the existing job abstraction and wrote our own controller to deal with this specifics we need. But this is something that I'm pretty sure we'll just start appearing in the future and then the last one I mentioned is interactive analysis so Jupyter Notebooks are another super popular service at CERN because you don't need any setup at all you just need a browser and then you do your analysis which is usually either C++ or Python against the data again if you back up the Jupyter instance by a container cluster that has all the integration already done because we deployed it in Magnum and it makes your life really really easy in this case we actually do more so in addition to run the Jupyter Notebook itself in a container which is specific to that user we also make some of the steps in the analysis run in a remote cluster so if you have one step of your analysis that needs more than one container then you can say ok this Spark cluster and again this Spark cluster can be deployed using Magnum and DCOS for example so all this integration we are still working on it but we already have users trying it I think that's it for use cases so I'll just finalize with some some overview of what presents so this Magnum service has been turned for a few months now in total in a couple of months we got 80 clusters deployed some of them have more than 100 nodes already and it's interesting to see what's the distribution of usage of COEs this is a question we get a lot which one is the most popular our goal is really not to choose it's to give users the option to choose themselves but I checked the numbers yesterday and we have something like 40 Kubernetes clusters we have 20 swarm clusters the main use case for these swarm clusters is the continuous integration with GitLab and then we have 5 Mesos and DCOS so the Mesos driver in Magnum had a lot of features missing so that's why it wasn't very popular but recently we added DCOS I think it's in review but we actually already deployed it at CERN and it makes it more popular integrated so one missing feature that we really need is the upgrades so this is coming in pike it wasn't an issue until now because most of these clusters are quite new but it's an issue when people started saying I need Kubernetes 1.6 and I need it tomorrow so this will come very soon and then we have a few new use cases that we're sure will come we've been following the activities at CERN regarding machine learning and there they need massively parallel executions with clusters and GPUs so this is something we'll have to make sure that we can put in Magnum too and then the second one is federated batch clusters so we have a very large batch cluster running at CERN and we have other batch clusters in other sites but one use case we have is to have spikes so before conferences people get very excited and a lot of analogies so we need to be able to spike the size of these clusters very quickly so if we can do something like have a batch cluster running at CERN and then deploy temporarily in a public cloud a similar cluster and just federate them this would be amazing there's work going on in Kubernetes to allow this so this is something we'll be looking at also and I think that's it and thanks for listening and we'll take some questions now Ricardo while the audience is finding the microphones I have a question so you mentioned the absence of priority queuing in Kubernetes as a potential drawback to using that for that application did you consider using a a separate Magnum cluster also running Kubernetes for those jobs or why did you want them all to be in the same cluster together because people are used to this so all our batch clusters have this queuing mechanisms and we could separate them but you would still need to do the scheduling at some point in the application layer if you split them in multiple clusters then you still have to have someone deciding okay this fast one should be redirected there this slow one and it probably doesn't give an optimal usage of the clusters also because if you have low priority jobs only then the other clusters might be not so busy the ideal thing would be just to merge everything Hi gentlemen, thanks for being here, Scott Fulton from the new stack and Ricardo please I want to start, pass on to your colleagues at CERN my congratulations and those of my colleagues for what I believe will be confirmed as the first great subatomic discovery of the 21st century the confirmation of the Higgs boson I think the meaning of that is not quite understood yet but I believe it's brilliant I followed along watching online watching the celebration and I think it is magnificent what CERN has done it's my understanding that the data centers that CERN uses and that it's outsourced that are distributed all around Europe some I don't know that there are really any in Switzerland but I know there are several in France some in Germany, I think there are some in Poland do you have an understanding of the physical locations of those data centers and whether any of that plays any factor in terms of latency or performance with how you are able to create clusters of open stack so I'll try to answer we actually have a large, it's called we started developing a technology called grid computing a couple decades ago I guess end of the 90s and this was the main way we were we are still using to do physics analysis so this is a large distributed computing infrastructure that has hundreds of a couple hundred sites all over the world it's not only in Europe and we have very fast links between them and traditionally this is the way we've been doing things so what we've been doing is when a physicist submits a job he doesn't really care where it ends up but we developed systems that will know where the data is and what is the availability of resources in all of these appropriate sites they are not all open stack of course they've been deployed since many, many years but in the end we do have the knowledge of the physical location of all the sites but especially the usage of all these resources so we can schedule the data the execution of the jobs to the appropriate sites what we do is one of two things is similar to what we would do in a local cluster which is we either ship the job to where the data is or if that site is very busy then we'll create a new replica of the data in another site and ship the execution of the job there so it's a trade-off on time spent flushing moving data around and executing the job faster and if I may follow up you mentioned that CERN has had to do a few things on its own because of the way Kubernetes handles jobs for instance what does CERN then contribute back to the community that we might see further upstream right so regarding Magnum we've been working a lot with upstream so in a collaboration with Adrian and Rackspace and OpenLab at CERN we started this work upstream in Magnum so we actually have a person from CERN from this collaboration that has been became core developer in Magnum and has been contributing all the features that we require internally so we've been doing this in Magnum then there's also these side projects that we've been involved and we require so things like integration with Cinder OpenStack Cinder we've done contributions to Leap Storage which is the library that Swarm is using to provide Cinder support same for Kubernetes so we've done a couple of patches on the OpenStack driver for Kubernetes so we've been working there and we are reaching out to these communities to work further with them definitely that's that worked really well for OpenStack and we count on doing the same for other projects I think we have time for just one more question yep I have a question about storage you mentioned EOS is your main storage and you did also mention about CIF and I understand you have a big CIF cluster as well would you mind explaining how those two storage systems are used together or differently and also the deployment and management is it done somewhat together or very separate right so the usage is right now is that EOS is our main storage system for physics data so everything that comes from the detectors raw data or reconstructed data we store in EOS so this is a massively parallelized storage system but it has very specific characteristics that allows us to scale to hundreds of petabytes then CIF we use for Cinder for block storage it has something like a couple of petabytes I believe like 5 or 10 petabytes and we use them for the volumes of the VMs so to back up the VMs block storage and then because we have CIF already and that's fully integrated with OpenStack and via Cinder we started looking at CIFS and the Manila integration because we have some use cases to have distributed file systems as filers for this so these are use cases for CFFS so all these systems are within the same group at CERN and they are managed in the same way in the sense that they are managed the same way that any other services at CERN is managed which is deployed at Puppet with a common monitoring system all the tools we have are built in using the same stack so that any service manager can move around between services I find it amusing that you just casually mentioned 10 petabytes so I think we wrap it up thank you very much