 Thanks so much for coming today to our talk. I know there's a lot of good talks that are competing with us, some of them just across the hall that are also related to Kubernetes. Today we're going to talk about a native Kubernetes operator tailored for Cloud Foundry. So let's get started. I'm Troy Topnik. I'm from SUSE. I'm the product manager of SUSE Cloud Application Platform. And this is... I'm a Rican Galada software developer at IBM Germany and also part of the CF Containerization Team. And I would say a few things about how this all got started. A little bit of a... This is what we're going to talk about today. How we got to where we are with this team and this project. A little bit about the CF Containerization Team, who they are, who contributes to this project. CF and Kube, how they work together right now in existing distributions and how we want them to work in the future. And how we want that to be in a really good state upstream. And then Enrique's going to take over and talk in great depth about the CF operator, how it actually works. And then some of the good things we've found about this approach and some of the difficulties we've hit. So I've been working in this space for a long time even with some of the people in this room who got started trying to containerize Cloud Foundry a number of years ago back in 2015. We always thought this would be a good idea and it was always a question of what technology underneath was ready to receive a complex workload like Cloud Foundry. So I was with a team at HPE that developed a tool called FISILE and the original mandate of that was to create container images out of Bosch stem cells and create container images for all of the roles in Cloud Foundry and run them on some container scheduler or any container scheduler. And that project resulted in a distribution that ran on an abstraction layer that was on top of Kubernetes. The original intent was to develop something that would run on Apache Mezos, on Docker Swarm, on any container scheduler. We only ever got as far as Kubernetes and then when some of that team moved to SUSE, the open source company, the mandate is to always use the open APIs wherever they're available. Kubernetes by this time was the clear leader in container scheduling. So we went with the Kubernetes API and of course all the work was open source for men. And SCF is the repo where a lot of this action has been happening until now. That's where we met with IBM and they started taking a look at that. That's short for SUSE Cloud Foundry, but it's not a term that we use that often, just SCF usually. And SCF is a repo that contains the results of things that go through FISILE and config and which writes... So FISILE creates the Docker images, config and creates the config for Kube, and SCF is the repository that keeps all that stuff, that ties all that stuff together. SUSE Cloud Foundry. And there are a couple of production releases that are using this now. As we started to develop it, we noticed that IBM was working with us on this as well. And now we have Cloud Application Platform, which is my product. Cloud Foundry Enterprise... Environment. Cloud Foundry Enterprise Environment from IBM. One is a hosted Cloud Foundry Environment, and ours is a software distribution. And they basically do the same thing, except one is managed for you. They deploy containers via Helm charts, which is templatized Kube YAML to Kubernetes, and they run the containers for the applications and the control plane of Cloud Foundry itself in Kubernetes. So with the current releases, this is done with the Diego scheduler, with future releases, or current releases in tech preview, this is done with Ireeni. So this is not only the application workloads running on Kubernetes, which is Ireeni, but also the control plane. So this is where we are right now. And this all started, the upstream project that we're talking about today is resulted from the desire from a few of the Cloud Foundry partners to make this an official upstream project. So the work had gone on internally at SUSE and previously at HPE along this approach, and it was a little bit siloed. And when we had discussions at CF Summit in Basel two years ago with people from SAP and IBM, we decided, no, let's do this right, and let's do this in a way where people in the wider Cloud Foundry community can get involved and take interest, and all of the partners can see what we're doing and have input onto the design. So a gentleman in here named Berent, who did the tough work of drafting the first incubation proposal with the help from a few of us, put this together and we put together a proposal that was going to start a brand new project. Start over from scratch. We're going to do it right now and we're going to have a fantastic Kubernetes containerized distribution and we're going to tie this in with Bosch in a beautiful way. We noticed at SUSE that the direction this was going was matching our roadmap of things that we had to fix to satisfy customer needs. So the design that we were creating in this incubation proposal was very much what our end state was. So we shifted our own roadmap. We revised the incubation proposal. We sent it. It was accepted to the Bosch PMC and that's where this project lives right now. I was not at the team meeting in Nuremberg. Did you want to take the call? The team meeting in Nuremberg in Germany took place last year and basically was a meeting with the developers from SUSE and IBM who basically defined the next evolution of the project and that's basically how we'll be doing this transition to the CF operator, which is basically the talk that we're doing today about the implementation of the CF operator. Cool. As I mentioned, it's a Bosch incubation project. It's SUSE and IBM developers right now. I've had good information that others are going to join and I'm very, very happy about that. It is a very remotely distributed team in a lot of different time zones and that's one of the reasons it has a looser pairing model. So it's not like most of the other Bosch projects or a lot of the other CF projects. We have a contribution model that allows for individual contributions. So we have a slightly more flexibility there. We've got people all over the world and I have to remember all these flags now. Can you help me? Sure. Romania, Germany, China, Canada, India, United Kingdom and Holland. Yes. And all good go users. So what we want to deliver for people is the Cloud Foundry promise on Kubernetes. This is a community that I've been talking to more and more in my role as a product manager who think they know all of the lessons that Cloud Foundry has painfully learned over the course of the last several years. This is what we want to deliver to our users. Here's my source code, run it for me in the cloud. I do not care how. Kubernetes doesn't really have this yet. It's aimed at operators and the mindset for deploying things is actually you've already done the packaging, you've already done the build, you've already created your artifact and you probably have some knowledge exactly how you want to run it. I think I have a corresponding haiku for this, which is here are containers, run them exactly like this because I care how. And that's sort of an operator's perspective on this. And we need to find a way for these two groups of people to be satisfied with the new technology that we have with this fantastic scheduler that's come onto the marketplace. But we want to expose it to users. So what we have are our two distributions. And this is how it is right now. We put them through our own separate build processes. We take the Bosch releases from upstream. We create container images for each of those components. And we create using config in YAML config for it, which is templated into Helm charts. And then we fire it off to Kubernetes, where Kubernetes then basically takes over the task of managing the control plane workload for Cloud Foundry and the applications. I call this fire and forget. There's a little bit more to it than that. We have some special pods that do a little bit of extra management. But we have this thing where we're actually using Helm. We're leading on Helm, the sort of package manager for Kubernetes, quite a lot in a way that maybe it wasn't meant to do. Maybe it goes a little bit beyond what Helm was meant to do. And so when we look at what's in Bosch, we see some things that can happen during operational time that we would like to adopt into the containerized version. So this transition to the CF operator is basically intended to leverage all of these lessons learned that we already saw with the current offerings in CAP and CIFI. One of the two main keys that we want to achieve with the CF operator is basically we want to have this full Bosch compatibility. So if you take a look at the technology now, everything is run during build time. If you would like to modify in the way you modify a Bosch manifest in runtime, it's a little bit more complicated. So when we say that we would like to have this full Bosch compatibility, we allow in the operators of the cube cluster to modify on the fly and persist those changes during runtime. Also, if you take a look on the flow of the current offerings, what we do is we build, deploy, and if you would like to apply some sort of changes, then you call Helm and then you build and deploy again. So we want to add this third step on the flow that is the life cycle management of the deployment and the way to achieve that is through the CF operator. So the point here is that the CF operator is a concept on Cuba needs where basically we could add the logic that we want to define in the cluster based on the requirements that we have. In this case, the requirements is to be more Bosch compatible and this is why we're building or developing the CF operator. Before we jump into the CF operator implementation, I think it's important to lay out some of the basic concepts of Cuba. In a nutshell, Cuba has this concept of API server that is going to be writing the desired state of resources into a storage. In this case, the storage is ETCD. So you could see the resource or the object as, for example, in Cuba you have a stateful sets and deployments and that is kind of a definition of a resource. Then you have the controllers and the controllers are these components that run as synchronous in the cluster and they are going to try to match the storage desired state of an object or resource into the cluster. So, for example, if you have a stateful set where you define that you want two pods or two replicas of the object and you modify that stateful set object, then the operator is going to try to reconcile the desired state. So if you move from two to one replicas then in the cluster you will finish from two to one pods. And I can just show a small demo about this so that you could keep in mind the idea of the controller and the resources. So here we have a namespace CF with a specific set of pods that are Doppler. So if I would like to modify the stateful set definition of those pods, I could just go to the replicas and I will modify from four to two pods. And what I'm trying to showcase here is how the stateful set operator is going to reconcile the desired state that I want to have at this point in my cluster and you will see in the terminal that he's going to delete or he's going to move from four to two pods. And this is kind of the idea of the controllers, the operators and the resources in the cluster. So going back to the presentation, the controller itself is the brain behind the resources, right? So the controllers follow this pattern of I will read an object, I will do some things and then I will update that object. So the nice thing about the controllers is that they are declarative. That means that you as an end user or as an operator of the cluster, you could define how you want that resource to look. And for example, you can apply a cube control, apply amount of F, where in a file you define the whole object. And every time that object changes, the controller is going to reconcile the desired state. Resources is kind of the secret source of the CF operator because for resources we're going to define the logic we want to have in our cluster. By default, a cube is shipped with a standard resources like a stateful set deployments and jobs, but for the CF operation we are going to use a customization of those resources to leverage functionalities or to achieve requirements we want to have. For example, once we have a Bosch manifest, we would like to convert part of that manifest to specific blocks like the instance groups into a set of specific custom resource definitions. And I will explain that in the next slides. So at this point, automation is key and as Troy mentioned, with the CF operator we're actually getting rid of the need of using Helm and triggering Helm every time to do an update. The advantage of this is that the CF operator is going to trigger those updates for us automatically and this is why automation is key and this is one of the main features that we're bringing up to the table with the CF operator. So this is the CF operator logo and this is like the main picture that illustrates the whole flow of the operator. As we mentioned, the idea of the CF operator is that it is full Bosch compatible to the operator users the same way of defining how you want your fund reader to look in your cluster. So the flow is you could define a deployment manifest, as you know. You can even define ops files that are going to apply on top of that Bosch deployment and then at the end you will get your desired manifest. Then the operator is going to read that object which is the Bosch manifest and he's going to do a conversion of pieces of that object into a specific set of custom resource definitions that we have. So we have extended the staple set. This is basically very similar to the cube native staple set with specific differences. The extended job is the same case really similar to the job with a specific difference. The same case with the extended secret. I will explain now in detail what is each of them and the conversion or the evolution of this manifest to cube resources is going to end by having your whole cloud fund components running on containers or bots in your cube namespace for example. So it's important to understand the three main custom resource definitions that we're using for the CF operator. The extended staple set is basically a staple set in cube with the difference that the staple set is going to react to changes that are applied to config maps or secrets. So the idea is that when you define an instance group with two instances in your manifest that will generate a staple set with two pods and those pods are going to have mounted secrets and config maps. You can imagine that you will end with a secret if you define in the same instance group properties that make reference to variables in the Bosch manifest. So whenever someone modify one of these secrets, the extended staple set will react to those changes and will generate a new version of the extended staple set. So this is just one feature. We have different features. Extended the staple set support versioning. So this will allow cloud fund components to be versioned and this actually end up on having the possibility to do canary upgrades or blue-green upgrades, which is something that is not so easy to do with Kubernetes. In the future, we will have AC support and the main thing is that the extended staple set will always react to changes. The second custom resource definition that we managed is extended job and as I say, it's the same as pretty similar as the native job in Kubernetes. The difference is that this job is going to persist the output. So for example, we will trigger this type of jobs whenever we want to render templates of job release in our instance group. So we will render the templates or we are going to gather the data, render the templates, process the BPM files and at the end, the final end file or JSON is going to be put into the standard output and that will be available for the pod that is running that instance group so that the pod that runs that instance group can have standard configuration files. For example, when you go to bar pickup jobs, it's kind of the files I'm talking about. These extended jobs also behave as error jobs, the same concept in Bosch or automatic error jobs. The last thing is extended secret. This is also a really nice customized resource definition where basically this CRD is going to allow us to generate passwords, certificates, SSH keys depending on what type of secrets you define in your Bosch manifest. This diagram just illustrates what happens in terms of the CF operator logic when you define a manifest and op files. In a nutshell, the operator is going to apply the op file on top of the manifest that is going to be stored on a secret and at the same time the operator is going to get all the variables and store them in the extended secrets resource that we define. Then he will take that as a previous secret where he stored the render manifest and apply on top of that all the variables that these extended secrets have so that at the end you finish with the desired manifest. This is just to illustrate the flow of the first job that the operator will do with the Bosch manifest that you as an operator provide. This is just to illustrate the complexity of rendering Bosch templates. Of course, this is something that Bosch is already doing for free for us, but for the CF operator we need to make this implementation. So it's just to illustrate the complexity. It's not easy working with BPM files and processing spec files is a little bit tricky especially because of the need of Ruby when we're building a binary that is mainly implemented in Golang or purely Golang. So this is for you to really grasp the idea of the conversion from a Bosch manifest into a cube resource. So the idea here is to put into squares the blocks of a Bosch manifest so that you can understand what type of cube resources they are going to be mutated, right? So for example, the release this block is just going to be for processing information of what type of Docker image my pods needs to pull. So I'm going to convert to extend the stateful sets. The properties is going to be the properties inside the jobs is going to be used just for processing information so that you can know how to render the spec files of the release. And all the block of variables that is intended to be for generated an ad hoc extended secret. So I will show now a demo which is basically an end to end demo that I would like to showcase all of these concepts and implementations that I just mentioned. So the first thing to notice is that my CF operator in the left screen is already running. The second thing is I would like to show what type of files I will apply into my cluster so I could define the state of the object that I want my CF operator to reconcile anytime I do changes. So you can see that this is the goal that we are achieving or the goal that we are aiming for that is you as an operator you will provide this config management file in the same way you do with Bosch with this basically a Bosch manifest. So you will see here that we have the same blocks we use in a Bosch manifest releases instance groups we have an instance group with an amount of one instances and under there we have a single job that is NATS with a NATS release and you could pay attention to the properties where we reference the password for NATS into one of the variables, right? The second thing is because we are full Bosch compatible we could actually apply an op file on top of that Bosch manifest where we will say that I don't want just NATS instance but I actually want to and the last thing is we are going to tell to the operator that we have somewhere in our cluster this configuration maps and he will process those ones to generate the desired manifest and this part is basically the diagram that I explained before about how to render that desired manifest. So the first thing to do here is I'm going to watch for resources that I have in my CFSummit namespace at the moment I have nothing because I haven't applied any of the changes the second thing is I will apply this deployment manifest that I show here which is the one that basically let's call it the Bosch manifest because it's a Bosch manifest and I will apply that in my CFSummit namespace and I will take a cube that I want him to generate from that Bosch manifest config map, right? The second thing is I will apply the same I want to generate a config map but this time I want him to generate the config map based on the ops file which is the one I have here where I would like to have two instances of nuts instead of one right? So I trigger that guy and the last thing I want to do is to apply the cube Bosch deployment which is this command here and what will happen now is that the operator will get the desired state of the cluster I want to have and he will render that manifest and he will realize that he needs to generate now two pots of nuts because I'm telling him that I don't want one I want two and also he will trigger a set of jobs to process information that is needed to be able to have those nut spots for example you will see here that I will have for a specific period of time a job that is going to be run together all the template information and also a job to generate the secrets because we define a reference to a variable for the nuts password so now I run that and you can see here we got variable interpolation we have a job to gather the data of the manifest and now we have a pot that is initializing and if you recall we applied the ops files to have two instances greater than one so we finish here with those two instances so now you can imagine that you in theory have the whole box manifest of a couple of hundred lines where you define all the CF components and the flow will be basically the same the operator is going to generate that a desired state of the Cloud Foundry deployment into the cluster and every time you do a modification for example the operator is going to try to reconcile that change and apply that in your cluster you can see here for example that I could go into that nutspot and if you are familiar with Bosch and Cloud Foundry you will see that here I will have a very similar setup that we are used to have for all the components that we run so we have the same type of directory where I could cut a config file for nuts and you will see the same renderer template that nuts component needs to use for running alright so that was the end to end demo and I will move back to the presentation so we found some problems as we work our way through this we are having trouble dealing with availability zones right now we need more feedback and more people using it now the stuff that we have just shown is very very new so this is landed in a working state I believe last week just in time for the show and we could also we need to think about integrating with technologies like Istio we have this nice setup of how to do different upgrades but we know also that in the market we have Istio that could even leverage more the setup so highlights there is a lot of things we could do actually with the cf operator we could use this concept of mutated webhooks in Kubernetes to leverage Irene for example in a nutshell we could add some sort of add-ons where we could label Irene application ports to be accessible via SSH so basically with the webhooks you could know if an application wants to be accessible via SSH and then you can put a sidecar on the same port something like that also one thing is that through FICI we were able to have a good set of lessons learned that now we are applying with the cf operator just to leverage the implementation it would be nice also to migrate some of the features we are implementing now also to the Kubernetes community and one thing is that the speed of the project have been pretty good we are working together since less than one year and I think we have achieved a really good milestone so far and this is how to find us that's where we are on the CloudFanry Slack, you can talk to any of us there, we didn't really leave any time for a lot of questions we will try and get a couple in but you can see Berndt's original proposal our proposal and this is how you can get involved with our project thanks very much I have a question we did look at this the question is did we look at the CPI route and Sandy Cash and I did a presentation in Basil about and he did one even the previous Basil about why that really didn't do what we needed to do and this approach is sort of meeting halfway so the IBM team that had explored that and had then come around to using Fisile and then seen it from that perspective along with us that this was a good way and it seems also that the whole Kubernetes community is moving to operators for really complex workloads so this seemed to be the best way to get it in Bosch directly in an operator yes Dan yeah so far the way we deployed the components for example the router is so that is already a concept when you want to roll an upgrade it's going to be upgrading one by one for example I think we use webhooks for things like that also for volumes so when you want to do an upgrade webhooks will allow us to make sure that the volume is passed to the next component or put so I assume it's kind of the same way for a rotation of secrets or something like that the advantage is that we keep all the secrets in cube and that could be accessible by different bots at the same time more questions yes yeah so yeah the question was sort of we also need Irene in here to run the applications we don't actually need that right now we can use either Irene or Diego like the IDM and SUSE distros figured out how to run Diego in a pod so Diego cells can run that way so the goal is to actually pull in an Irene Bosch release include it with the rest and then application scheduling is happening right alongside the control plane running in cube I think one of the things due to the maturity of Irene we could offer the option now for users to do the same either with Diego or just straight with Irene and that would be configured in the Bosch manifest yes I think we're at our end thanks so much for coming today