 Hi, I'm Devdutta Kulkarni, founder of CloudArk. I'm going to share with you today how to build Kubernetes operators that are good citizens of the increasingly multi-operator world. Little bit about me. I've been working in the system space for a long time. These days, apart from CloudArk, I also teach courses in cloud computing and modern web applications development in the computer science department at the University of Texas at Austin. Kubernetes operator pattern has become widely popular for running various kinds of applications on Kubernetes. Operators are getting built for stateful applications, complex services, internal IT workflows, etc. A Kubernetes operator adds custom resources to a cluster. The set of resources on any cluster depends on what operators have been installed on it. For instance, here we have a MySQL operator and a custom application operator installed on the cluster and they add MySQL cluster and AppCR custom resources to it. The ensemble of Kubernetes built-in and custom resources that are deployed together to serve a specific workload can be thought of as an application stack. Increasingly, we are seeing a lot of DevOps teams use multiple Kubernetes operators to build their custom passes. This makes perfect sense actually as the operator technology provides the right building blocks to deliver multi-tenancy that is required for any pass through the custom resource instances. There are two examples of such passes that we have encountered in our work with different customers. On the left-hand side, there is a Moodle pass built to deliver an e-learning solution on Kubernetes. This uses two operators Moodle and MySQL. Moodle operator is developed in-house to deliver specific workload requirements for Moodle. Moodle instances depend on MySQL instances which are delivered by the MySQL community operator. This way, it serves multiple Moodle application stacks to different customers. On the right-hand side, there is a pass developed to deliver browser instances to different application developers for their testing. This uses browser as a service operator developed by an internal DevOps team and for monitoring purposes, this pass uses the Prometheus operator. The custom passes built on Kubernetes are multi-tenant and multi-operator environments. So, given that an operator may need to run alongside other operators in a cluster, the important question that operator developers need to ask today is whether their operator is ready for this multi-tenant and multi-operator world. Towards helping operator developers answer this question, we have developed the Kubernetes operator maturity model. It consists of four categories, consumability, security, robustness and portability, and there are guidelines in each of these categories. In today's talk, I am going to focus on the four specific problems and associated guidelines to address them. These problems are how to enable atomic deployments of application stacks, how to enable co-location of stack components, how to make application stacks robust against pod restarts, and how to perform accurate chargebacks per application stack. Check out our comprehensive list of guidelines on our GitHub page to know more about these different categories and the guidelines. To discuss these questions, I am going to use the example of the Moodle pass, which I referred to earlier. Moodle is an e-learning software and I am going to be using only two custom resources that are part of this stack. There are several, but for the sake of discussion, I am going to focus on only the two custom resources Moodle and MySQL cluster. The Moodle operator handles the Moodle custom resource and creates a pod with the Moodle software on it. The MySQL operator handles the MySQL custom resource and creates a pod for MySQL. And then the Moodle pod uses the MySQL pod for its database needs. The first problem we want to discuss is that of atomic deployments. Now as a pass provider, we need to ensure that an application stack is created atomically. What do we mean by atomicity in the context of application stacks, especially the stacks that are built or consist of ensemble of Kubernetes resources, where some of these resources may consist of maybe custom resources coming from different operators. Well, the atomicity in this case means that we want to avoid situations where only some pods of an application stack are deployed and not others. So for example, we want to avoid situations where let us say the Moodle pod has gotten deployed but the MySQL pod has failed to get scheduled or vice versa where the MySQL pod gets scheduled but the Moodle pod has failed to get scheduled on any worker node. So then the question is how can such atomicity be guaranteed especially when there are operators, some community operators in your multi operator environment, some that you might have developed on your own. So how do you really guarantee this atomicity? And then the second question is what can operator developers do to ensure that the pods that are deployed as part of their custom resources get guaranteed scheduling from the Kubernetes scheduler. So one way to achieve guaranteed scheduling in Kubernetes is to define resource requests and resource limits for all the containers that are defined as part of the pod spec. Now when we are working with custom resources, the pod is not visible because the operator is going to handle the custom resource and as part of creating that custom resource, it is going to create some pods. Maybe it's not even actually directly creating a pod, it is probably creating a deployment or a stateful set. So the question is what can the operator developer do to leverage the built-in Kubernetes mechanisms of resource request and resource limits because if you look at the Kubernetes scheduler, if all the containers that are part of a pod spec are provided with resource requests and limits, then that pod is going to get guaranteed scheduling behavior. So the Kubernetes scheduler will provide guarantees that that pod will get scheduled on some worker node which has adequate capacity. So as an operator developer, our guideline is that you can provide in the custom resource spec a way or the spec properties through which these resource requests and resource limits can be specified and then implement your custom controller which is part of your operator to pass these scheduling hints down to the pod that the custom controller is going to create when instantiating a custom resource instance. So for example, in case of Moodle custom resource, it could be that the actual resource request and limits from the Moodle custom resource instance are passed down to the pod that the Moodle operator will create for that Moodle instance. So once you do that, if you are a operator developer, then Kubernetes will provide that guarantee of scheduling that pod on some worker node which has the capacity. Now it may so happen that you don't have access to, you are not developing an operator, you are using some operator from the community and then the question becomes, you don't know the source code of that operator to actually modify the custom controller, then in that situation what can you do? So our guideline is one option that you can explore in that situation is use some admission controllers, maybe something like OPA to intercept such pods and then at that point you will be able to change the spec to add these scheduling hints on the custom resource on the containers that are defined as part of that pod. The second problem that we want to look at is that of colocation. So what do we mean by colocation in this context? Collocation means that we would like to run all the pods of an application stack on the same worker node. You may need this for various reasons. For example, you may want to provide some differential service or treatment to different customers as part of your pass and for that reason you may want to just use some worker nodes to basically deploy a complete stack and you want all the pods of that stack to run on that worker node. That could be one reason. The other reason could be that when let's say if you are rotating certain worker nodes from your cluster you want to know which customers will get affected and if all the pods of a stack are running on that worker node then it will be easy for you to tell which customers will get affected. So what we want to avoid is in case of this Moodle example is we want to avoid the split situation where the Moodle pod is running on worker node 1 and the MySQL pod is running on worker node 2. So how do you as an operator developer can achieve colocation or enable colocation for the pods that your operator is creating for handling your custom resources and co-locate them with let's say some other custom resource pods that some other operator in your environment in your cluster might be creating. So how can you really achieve that? So one of the ways you can do that is by enabling specification of node selectors as part of your custom resource spec. So in Kubernetes there is this notion of affinity, there is pod to node affinity and there is also the affinity between pods. So pod affinity and node affinity. And the way it works is if there is let's say if there is a node and if it has the node may have some labels and then in your pod spec if you specify a node selector with the label of one of the nodes that the node has then the Kubernetes scheduler will schedule that pod on that particular node. So that's available directly in Kubernetes. So as an operator developer to enable colocation of your custom resources pods with let's say some other pods you need to expose that node selection capability through your custom resource specs. So if you design your custom resource spec to allow specifying a node selector and then if you implement your custom controller to pass that node selector down to the pod that it will be creating then it will be possible to co-locate these your pod with some other pod which may be coming from another custom resource or just directly created deployment doesn't matter but having that ability to specify node selector and then passing that label down to the pod will allow co-location of your pod with other pods. The third problem that we are going to look at is pod restarts and how to make an application stack robust against pod restarts. An example of pod restart is when a sidecar container is injected into a pod spec a pod gets restarted. I mean anytime you modify the pod spec the pod is going to get restarted. So where do we see really pod these sidecar injections happening? So there are certain operators, community operators that they will modify a pod spec an example of that is the volume backup operators what they do is some of the ones that we have looked at they will essentially in order to take backup of a pods mounted volume they will add a container to a pod spec and then that container the sidecar container has access to that volume and then they are able to that way take backup of that volume. Now when in order to do this that operator will once a pod is specified to that operator that operator will modify that pod spec and inject this sidecar. So in this example we wanted to take backup of the Moodle custom resource and the pod that the Moodle operator has created for that. So the volume backup operator once it injected the sidecar into the Moodle pod it caused the Moodle pod to get restarted. So now what is that as an operator developer you need to be cognizant of such pod restarts because what may happen is after a pod gets restarted you have to still go back and verify whether whatever the application level invariance that were supposed to be valid whether they are still valid you need to check that and that is something that you need to do. So they are still maintained. So the way to handle pod restarts is essentially your operator needs to subscribe to all the events that are related to any pods that the operator is creating and then verifying if any invariance of the underlying software are still holding true. So for example in the case of the Moodle stack the requirement was that there were these Moodle plugins that had to be installed on a particular stack and once the Moodle pod got restarted those plugins had to be reinstalled because otherwise after a restart there were situations where those plugins would not be present. So the operator had to subscribe to these pod restart events and then ensure that the plugins got installed again when the pods restarted. Finally as a fast provider we are also interested in accurate chargebacks at different levels. We may want to for example find out what is the CPU memory consumption of a particular custom resource say for example the MySQL custom resource how much CPU and memory that one is consuming or we may want to find out CPU memory consumption at the level of an entire application stack which in this case consists of a Moodle custom resource and a MySQL custom resource so and then how do we charge it back accurately to the customers for which that stack was created. So to answer these questions what is needed is we need to be able to discover all the resources that are part of an application stack. So for example here you see in this picture the Moodle application stack actually consists of several built-in resources like ingress and stateful set, service config map, persistent volumes, persistent volume claims and there are custom resources, cluster issuer, Moodle and MySQL cluster. In order to do accurate chargebacks we need this entire graph of custom resources I mean resources of a stack because without that entire graph and without the ability to find out all the for example pods that are part of a stack we won't be able to accurately find out what's the CPU consumption of the entire stack because ultimately what we need to find out is what all pods are part of a application stack and then find out the CPU consumption of the containers that are part of those pods and the only way you can find out all the pods that are part of an application stack is if we follow the connections between different resources that are part of a stack. So as you are aware in Kubernetes there are different kinds, different mechanisms to establish inter resource connections, there are owner references, there are labels and annotations and there are spec properties. So four different ways exist and what this picture is showing is not only the application stack consists of all these different resources but there are these resources are related to one another through these various different relationships, owners, labels, annotations or spec properties and in order to be able to discover that entire resource relationship graph we need to be able to follow these relationships within a stack and that's how we will be able to discover the entire graph and then that's how we will be able to find out all the pods, all the persistent volume claims that are part of that particular stack. In order to discover these resource relationships and the resource relationship graphs we have developed Q plus which is open source tool that is able to basically discover all the Kubernetes resources including custom resources and their sub resources it's able to discover these on a running cluster Q plus is able to discover the connectivity and the topology of all these resources and then we provide several QCTL plugins to retrieve the graph and get stack level aggregation matrix and logs from that leveraging that relationship graph. So how does Q plus discover these relationships and build the relationship graph? In order to do that especially for discovering custom resources and their dependencies what needs to happen is Q plus needs to be aware of the operator developers assumptions around custom resources and their dependencies on other resources by the way Q plus is a generic tool so it works with any operator so the way these operator developers assumptions need to be captured is we have developed a simple mechanism to capture these assumptions. It consists of five different annotations that can be put or that need to be put on the custom resource definitions or the CLDs that are packaged as part of an operator these annotations offer a simple declarative way to capture for example what sub resources will be created by a custom resource by an operator for handling a custom resource or what type of labels annotations or spec property based relationships can exist between a custom resource and other resources for that operator to work to do its work. Here I have three examples of using these CRD annotations with the custom resources we have seen previously the first example shows how to define what sub resources will be created by the MySQL cluster custom resource operator when an instance of MySQL cluster is created. The second example shows what sub resources will be created by the Moodle operator for instantiating a Moodle custom resource instance while the third example is showing that the cluster issuer custom resource depends on a specific annotation on the ingress resource for to do its work and by the way the cluster issuer custom resource is coming from the cert manager operator and it helps with generating SSL certificates from authorities like Let's Encrypt and the way it works is you need to specify a particular annotation on the ingress resource and that basically is captured through this CRD annotation. We are there is no way to capture such operator developers assumptions today and these annotations enable defining and capturing these annotations these assumptions as part of your CRDs. So my appeal to all the operator developers is to look at these annotations and add them to your operators we are also maintaining a list of operators and CRDs for which we have added these annotations on our GitHub page. So you can take a look at that and we'll be happy to add your operators to that list as well. So once these resource relationship graphs are discovered using KubePlus we have KubeCuttle plugins which can enable you to do application stack level charge back. So for example the KubeCTL metrics plugin that is possible to use that to track CPU memory storage usage at custom resource level or at application stack level in this picture on the left you can see output of MySQL custom resource metrics which consists of two parts and on the right you can see output for the entire Moodle application stack which consists of three parts one from the Moodle custom resource and two from the MySQL custom resource. These metrics can then be surfaced in Prometheus to get view into resource consumption at different levels. So here the screenshot shows CPU consumption for a MySQL cluster custom resource instance and then this can be used to build charge backs for your pass users. So to summarize we have developed the platform as code practice that simplifies building passes using Kubernetes operators. It consists of operator maturity model that offers comprehensive guidelines on operator readiness for multi-tenant and multi-operator environments. It also includes KubePlus tooling that enables inventory and charge back for application stacks built using operators. Check out our GitHub page to learn more about these. Thank you. I will take questions now.