 We have Eshwardhan Kukreja with us. He'll be talking about writing your own Kubernetes operator in Python with COP to enhance and harden your Kubernetes clusters. So, Eshwardhan, the stage is on yours. Sure. Thanks a lot. So, hi everyone. I'm Eshwardhan and I hope you had a great time with all the talks till now. So, today I'll be talking about writing your own Kubernetes operators in Python with this framework called COP to enhance and harden our clusters. So, moving on, let me first of all introduce myself. So, I'm Eshwardhan and currently I'm working as a Site Reliability Engineer at Redact. And in my free time I'll have to do open source, they were on my favorite cloud native tools. Also, I love to read about distributed systems, astrophysics and just to break the programmatic technological monotony, I love to do long distance cycle writes as well. So, moving on, let's just super quickly talk about what Kubernetes is because it's the most fundamental piece of tech in our talk, right? So, Kubernetes is basically a container orchestrator and assuming that you have a very basic idea about Docker containers, so I'll just keep the lingo in terms of containers only. So, yeah, basically Kubernetes manages schedules and basically automates a lot of managerial manual stuff around dealing with containerized deployments. So, just spinning up your containers in a reliable manner, deploying your applications on containers, reliably managing them, even debugging them and architectureizing them in a very mature manner is done with the help of Kubernetes alongside other countless things. So, yeah, but from a system administrator standpoint it saves a lot of time and effort. So, moving on, so basically Kubernetes, it's safe to say that Kubernetes is one of the, it's the most majorly adopted container orchestration platform out there. This means that it serves a heck lot of functionalities and use cases out there. But for a business, like at times you might have some custom requirements from Kubernetes which are not natively present into it. So, the beauty of Kubernetes and actually one of the best things of Kubernetes is that it's very pluggable in nature, which means that you can actually develop your own add-ons and plugins which are programmed to perform some custom functionality and they can be easily, very easily integrated as a part of the Kubernetes stack itself. And another great part is that you can even use some third-party plugin or add-on to serve this functionality. So, these add-ons and plugins are called operators. So, basically an operator resembles a human operator in the sense that it keeps on, you know, watching a cluster for some certain events happening. Like for example, some resource is getting created or deleted or updated and accordingly performing tasks as a response to that. So, and basically, if you look at a Kubernetes cluster and see that, okay, as a system administrator, I'm performing one manual repetitive task again and again and again, then that serves as an opportunity to write an operator. Now, if it seems slightly complex, don't worry, I'll exemplify this further for you. So, but before moving on, let me just first of all quickly talk about custom resource definitions. So, if you have even a very slight idea of Kubernetes, you know that Kubernetes has a lot of resources, right? Like pod deployment, stateful set, replication controller, etc. So, similarly, you can actually create our own custom resources as well, which means that if you have a custom resource with the name MyCustomResource, you can actually write the email with the kind MyCustomResource. You can actually QCTL get MyCustomResource. But to actually make your Kubernetes cluster aware of this custom resource, you define it first of all, right? And how do you define it? Well, you define it with this file called custom resource definition. So, basically, it resembles the schema of your custom resource, because of course, your custom resource, when you'll write a YAML for it, it will have a structure, right? So, the custom resource definition will encapsulate all the information around your custom resource. For example, what is its name? What is its, you know, basically structure, etc., etc. And then your operator behind the scenes will watch for these custom resources. And whenever these custom resources are created, updated or deleted, the logic of what's going to be performed, that's going to be written inside the operator. So, operator is kind of like the back-end side of custom resources, whereas these custom resources are the front-end side of, you know, the user. So, moving on. So, if I give you a very simple example, and actually this is what we're going to implement today in the talk. So, if you look carefully, you know, in front of your screen, you can see that there's a YAML defined it. And if you look carefully, it's a custom resource. This is not a native resource in Kubernetes, right? Like the API version is my name, the kind is PostgreSQL. And basically, what this custom resource is meant to do is that whenever it gets applied now, the table field inside it, and the other values like name edge content inside it are fetched by Kubernetes and then written to a remote PostgreSQL database in the provided table. And the logic to do so, the logic to actually parse this custom resource, and parse the table value, parse the name edge country values, and then create an insert query and write to the PostgreSQL, that logic is programmed in the operator. So, operator is kind of like back-end in that sense. So, how can we write our own operators? Well, in Golang, there are very sophisticated and great libraries such as ClientGo and even Controller and Time, which can be used to write operators. But they're pretty complicated. But even to help us further, there are really amazing projects like CubeBuilder, Operator SDK, which in just one command basically establishes the whole boilerplate code for you. But the problem is that although they set up your whole project for you, they bootstrap the project for you, they're still complicated and hectic to deal with because there are a lot of Go files, packages and whatnot. So, as usual, Python comes to the rescue. In Python, there is a very awesome framework called COPF, which is Kubernetes Operator Pythonic Framework. It is a framework with which you can write Kubernetes operators from scratch using Python with ataries. And I mean, there are no extra files like literally to write a basic operator, you need one Python file and one Docker file, that's it. That's the beauty. And all of you know how straightforward Python is as a language, so the convenience is top notch. So, I hope you folks are hyped enough. So, let's dive in and code our own operator. So, as I present, as I exemplified about Postgres writer, so we're going to be implementing this one only. So, let's move on. So, first of all, we have to, let me just again, present you how our custom resource is going to look into action. So, after everything is ready, as a user of this operator, as a user of this custom resource, I'll just simply write this custom yamble, apply it over my cluster and behind the scenes, some magic is going to happen. And in my Postgres database, this row will be created. And I will also program a logic to ensure that whenever this custom resource is deleted from my cluster, automatically the deletion is synchronized in the Postgres database by deleting this row. So, moving on. So, as a first step, first of all, we have to make Kubernetes aware that, okay, what is this kind Postgres writer? What is this demo.sh.com? So, for that, we'll define the CRD. Now, don't get freaked out. It's a very simple CRD. I'll just quickly go over it. So, first of all, everything has a name. So, even custom CRD has a name and the name has a format. So, the name has to be in the form of your resource name in plural form dot group. Now, what is a group? Well, let me just go back quickly and notice the API version. So, if you notice the API version, it is in a format. The format is group slash version. So, the group is demo.sh.com and version in my case is v1 and kind is Postgres writer. So, whenever I... So, these are group version kind and together a group version kind basically, you know, acts like a unique idea of a resource in our cluster. So, in my case, in this situation, the gvk, the group version kind is demo.sh.com v1 Postgres writer. So, that's what I'm doing here. The metadata, the name of the, you know, basically CRD is the plural form of my resource, which is Postgres writers dot group. Inside the spec, I've defined the group which is demo.sh.com, which I just explained. And inside the version, I've defined v1. So, your CRD can have... your custom resource can have multiple versions depending on the functionalities. Now, inside this version v1, I'm actually defining the structure of the YAML to expect. So, if you look carefully, the structure here is expecting a spec section. Inside it, there are four fields called table, name, age, country. And each of them have a data type, right? Table has a string data type. Age has an integer data type, etc. So, all of this stuff is defined carefully inside the schema. You can see that, okay, there is a required field spec. And there are required fields like table, name, age, country with their respective data types. Then at an operator, at a custom resource level, I've defined the scope as namespace. Basically, this means that I can apply this resource over different namespaces. You can also have some certain, you know, use cases where you want just your custom resource to operate at the cluster-wide level, not specifically like at the namespace level. And if you are aware of cluster roles and cluster-roll bindings, they are similar in that sense. But don't worry. So, in my case, the scope is namespaced. And finally, I've defined the name of my operator, of my custom resource, right? So, the plural is going to be Postgres writers. Singular is going to be Postgres writer. Kind is going to be the Postgres writer in a camel case. And short name is PSW. So, all of these names are basically, you know, for doing cube CTL get Postgres writer, cube CTL get PSW, etc. So, this, I promise this was the complicated part. Now begins the fun part. So, first of all, now comes the, now, once we define a CRD, we are done with, you know, telling Kubernetes that how our custom resource will look. But still, what's going to happen behind the scene? That will be coded in an operator. And this is where we begin. So, first of all, we'll import the relevant libraries. So, the first one, you know, the PsyCop G2, this is for, you know, talking to Postgres database and, you know, inserting and deleting stuff. Cops is basically a framework to write operators. And then Kubernetes library is being used to basically, you know, make our operator communicate with our cluster, right? And finding the OS library to get environment variables. Now, the first function basically I've written is that to initialize DB. So, basically, of course, first of all, you have to establish a connection with the database, right? So, I've already set up a database of Postgres on this specific IP. This is just a toy, you know, setup, but in a more mature and productionized manner, you should have a dedicated setup where, you know, you are injecting the hostname and sensitive fields like passwords in a secure mechanism. But in this case, this function is returning me the connection to the database. The second function is initialize queue. And it is doing one thing. It is basically establishing the connection between my operator and the Kubernetes cluster so that my operator can talk to Kubernetes cluster and accordingly, you know, watch for the customer resource and take actions. Now, you might see that there is an if and else present here. Now, the thing is that you can run our operator in two ways. First of all, is obviously you can just package it as a Docker image and deploy it on your container or deploy it on your cluster, which is the normal productionized way. But while developing an operator as a developer, right, it's, you know, hectic to every time when a code changes, you package it as a Docker image and deploy it. So for the sake of convenience, you can literally run your operator locally through command nine and it will talk to your Kubernetes cluster via a cube config. So whenever you do kubectl get something, like that thing is routed to your cube config, right? So similarly, this operator will perform at that level as well. You know, a dev environment variable to tell my code that, okay, how to talk to my Kubernetes cluster, you know, when I'm developing locally or I've deployed it. So moving on. Now, this is the main helper function which will insert a row into the table. It is very straightforward. It is just performing an insert query into my database. It is taking my database connection to a variable and you know, feeding the table name, the name age country into it and just throwing that piece of row inside my table. And as I told you, we're going to be handling the deletes as well in a graceful manner. So I've defined a function to delete data as well. And again, it's a super easy delete query, right? So now moving on. Now, this is the bell of the wall. This is the piece of code. This is a function which will watch for the creation of my custom resource. Don't worry. I'll explain it. So if you look at the first line, I'm just like it's very self-explanatory, right? I'm just telling cough dot on create and then inside I'm defining the gvk. Again, remember what gvk was? It was group version kind. So for, so whenever cops, basically my operator, whenever it will notice the creation of a resource of this gvk, of this group version kind then it's going to execute the following piece of code. And the following piece of code is basically doing nothing except extracting the table name, the name, the age and country fields and throwing them inside the table of their postgres database, which my operator is aware of. And I'm also, by the way, like behind the scenes, I'm also, you know, like while inserting the data, I'm also defining a primary key call ID and that is nothing just, you know, the resource you are applying. That resources namespace slash name just for the sake of establishing uniqueness. So go super quickly back. You see that this table, like this record which got created. You see the ID, how it looks. It is namespace slash the resource name, the one which is defined here. So that's just for the sake of, you know, keeping things unique and graceful as to what any conflicts. So yeah, again, this operator, whenever it captures the creative end of this gvk of my postgres writer, it will essentially, you know, table name, age, country fields and throw them into my database. Similarly, whenever it will catch the deletion of a resource of postgres writer, my custom resource, it's going to again capture the spec of the resource which got deleted and accordingly, you know, look at the namespace and name and accordingly delete the corresponding row which was present in the database just to synchronize things and ensure that the database contains only those rows, those records which are actually present in the Kubernetes cluster in the form of postgres writer resources. And running the operator locally again as a developer, you can run the operator locally with this command called cop front, then your Python file and it's going to run beautifully. So let's just quickly hop back to our terminal, right? So let me just actually increase the, yeah, because my display is too, okay, so I'm afraid it's not going to be visible to you folks. Let me just, I hope the display is not fine. Perfect. Okay, cool. So right now I'm inside my project folder and first of all, let me just show you, let me actually, you know, get client level access to my postgres database. So right now this postgres database has, you know, this table called students which are already created and basically, so yeah, it is just a second, just a second. Sure, then I think the font size is too small. Could you please increase? Oh sure, I'll increase it further. How is it now? A little better, but it can be. I mean, it should be readable now, I guess at least, or is it still not? Or I guess I've increased it on my end, but it hasn't synchronized on your end. Is it still not enough? It is better now, it is better now. Okay, okay, I guess it was delayed a little bit. Anyways, never mind. So if you look back to the talk, so if you look at the left terminal, you can see that I'm a table created and a student's table doesn't have any rows and stuff. Not do one thing, I'll run my operator locally and but before running an operator again, let's go step 5. Let's first of all create the CRD, right? So I'll just create the CRD again, again, the CRD file looks exactly like the one which I demonstrated in the PPT. So I'll just QTTL, apply CRD and done. Now if I click a little QTL, it will get CRD. So my CRD exists, right? Now if I, before applying my custom resource, let's first of all start the operator because QTL has to be aware that what's going to happen when the resource is created. So in the lower terminal, I've said the dev is equal to true environment variable and now quickly I'll run my operator. Perfect, so now this is a sign of the operator running successfully and if I define this sample.amble, basically the sample.amble is the exact custom postgres writer file which I'll apply now and carefully notice the values, the table is students, the namespace is, oh sorry, name is sample student and namespace is default. So if I, let me just keep it here and if I apply the sample.amble, yep, it got applied and you see some logs are captured here. And now if I, again, now if I go back to my left-toilet terminal and actually see that if my database got updated or not, yep, see, the data got reflected. So behind the scenes, exactly the way it was expected to happen, the way it worked. And similarly, if I do kubectl get postgres writers, you can see that in the default namespace, this resource got created. Now if I delete my, this postgres writer, right, and samples student, oops, yep, yep, it got deleted. And again, the operator was able to catch this deletion and hopefully it deleted the row in the database. And if I go in the database and actually do select star from students, perfect. So the creations were synchronized well with the database and the deletions were synchronized well with the database. Now this is, like this proves that the operator is working perfectly fine. Now this, but still this is like the way I executed the operator, it was very local developmental way, right? So we have to deploy it the right way, right? So your operator, in a productionized manner, it has to be first of all packaged into a Docker image and deployed over a cluster. And then in that cluster, it will run as a, you know, this thing a pod and serve the functionality which it is supposed to. So let me just create a namespace called postgres, where I'll be keeping all the resources associated with the operator. And if you notice, I have a deployment defined which basically contains, which basically encapsulates the logic to run the, you know, my operator and my operator's image is going to be postgres-write. So first of all, let's quickly build the image of my operator. So I'm building the image of my operator and yep, it's built. And ignore this last command which I'm executing. I'm just making use of this tool called kind for Kubernetes testing. So ignore this last command which I executed. But yeah, I'm packaging the Docker image with the name postgres-writer latest. And once I apply this deployment, my operator will be spun up as a replica, as a pod actually. And this pod will actually serve this whole core functionality of the operator. But before that, we have to also provide our operator enough permissions to watch over this custom resource and take those necessary actions. Because Kubernetes does not believe a new incoming piece of code, incoming piece of operator. So I'm doing one thing. I'm doing called a service account. And that service account is basically defining something like an account with the name postgres-writer. And the next thing I'll do is that with the help of a cluster role binding, I'm going to correlate admin level permissions with this service account. So never do this in production. I'm just doing it for the sake of convenience. But in reality, you should only correlate the minimum amount of permissions with your operator. But yeah, in my case, for the sake of convenience, I'm just correlating the cluster admin role with this service account with the help of a cluster role binding. So if you see the name of the cluster role binding is admin access. The service account is going to point to is postgres-writer. And the role which is going to point the service account to is going to be cluster admin. So once I define it, now my Kubernetes cluster is aware that whenever the service account, postgres-writer will be used, it's going to correlate to admin level permissions. And if you notice carefully in the deployment file of my operator, you'll notice that I've defined the service account name, which means that the pod, which will run my operator, it will have the accesses associated with this service account called postgres-writer, which indirectly has admin level accesses. So now I'll run a deployment and hopefully it runs. So it got created. So let me do pods, postgres, gorgias. So it's running fine. The running status is an awesome green signal. Now let me just screen the logs of this deployment and see the logs are here as well. So again, we're going to do the same thing. We have already defined the CID, so we won't reapply. So we will again apply, basically the sample YAML of the student, which I applied previously for the sake of example. And before that, just to confirm again, this student stable does not have any rows. Again, it does not have any rows. So once I apply this custom sample, the student got created and you can see that actually the logs were captured. And if I again hop back to my postgres terminal, the resource were created. Now this is a perfectly working operator, which is perfectly deployed in a productionized manner. And if I delete this resource, then again logs were captured by my pod, my operator's pod. And if I look back into my postgres terminal, the rows were deleted. So everything is working perfectly fine. Just a few things I would like to mention. See the piece of code which I wrote, actually it's not very mature piece of code because I just want to demonstrate how an operator works. But in reality, you would have to handle exceptions in a graceful way, ensuring conflicts and whatnot. So just a heads up regarding that. So the deployment file, which I wrote, I'm already inside the folder. So if you notice, okay, my terminal is missing up. If you notice I'm providing the password and the username and all this config directly as plain text, never do that. Please never do that. Please always inject your environment variables in the form of config maps and secrets accordingly. Although secrets don't exactly encrypt your data, they encode it, but it doesn't matter. Additionally, you should inject your environment variables in the form of config map references and secrets. So again, I just did not do that here just to keep things convenient and straightforward for you. But yeah, just to keep the things out there. So this was my presentation. And so the key takeaways from this presentation is that we saw what Kubernetes operators are and why are they so cool. And we also learned about CRDs. We depth deeper into coding our own Postgres writer operator and deploying it. And we also, you know, see like how we actually practically saw in the terminal how things worked out. And you can check out this slide deck, the code to the above operator on my Github profile with the name yashvardhan-cookreja, yashvardhan-cookreja. And that's pretty much it. And also one more thing, actually, I forgot to mention, you might have noticed that for deploying our operator, it was pretty complicated, like installing service account, cluster role binding deployment rate. So all of this stuff can be automated as well. And there are a lot of tools out there to automate this stuff. So one of the most commonly used tool is called Helm. It's spelled H-E-L-M. And basically it abstracts away all this useless, you know, users, like from the user standpoint, all this useless stuff like service accounts, deployment, role bindings, et cetera. And all the, as a user, what I have to do, I just have to provide my, you know, basic variables. Like for example, my Postgres host, username and password, and that's it. REST Helm will handle for me. It will automatically feed in my variables into this deployment, automatically apply them, set up the namespaces, et cetera. So do check out the Helm project as well. So I guess we still have around two to three minutes left, or three to four minutes left, for taking any questions. So yeah, my four questions. And also I'm free to reach out to, on my Twitter and on my GitHub and on my LinkedIn as well. Although I haven't put it up here, but yeah. So any questions, feel free to shoot them. Girish, I can, I guess I can take up some questions, if there are any. Yeah, there are questions coming in. The first question is, can I use Kubernetes operator instead of using job when I'm deploying a backend application? And I want to run the Postgres operations like migration sectors, et cetera. Sorry, can you repeat the question? Your voice broke a little bit, I guess, on my end. Okay. So the question was, can I use Kubernetes operators instead of using, instead of using job when I'm deploying a backend application? And I want to run the Postgres operations like migrations, et cetera. Yeah. So see, I guess this is a very simple use case. And I guess if I were in a position, I would have written a simple cron job only. See, because the thing is that in your use case, right, you do not need to necessitate upon establishing a custom resource and, you know, basically making use of the reconciliation logic of Kubernetes, right? So, okay, I'll give you like a certain use case or, you know, basically a perspective where you can see and actually notice that, okay, I can use operators. So operators, don't look at operators as something which your application developers would use. Rather look at them, what are cluster administrators you would use, right? So for example, as a cluster administrator, if you are, you know, manually performing some administrative tasks on a cluster, manually repetitive, toily, then you can write an operator to automate and manage that. But the application level stuff, like, you know, basically performing some script operations or a basic script which happens periodically, the Kubernetes cron job will do most of the job very easily. Okay. The second question is, what are implications of providing all permissions in cluster role binding while application deploying production scenario? Oh my God, it's very messy actually. Because again, see, like most of the times you're going to be actually making use of third party operators, right? So in that case of scenario, if someone is malicious they can actually inject a piece of code in their operator code. Maybe inspect all the resources in a cluster, inspect the secrets in a cluster and dump them somewhere out there, right? So that's risky. And even, like, let's say if you are not even using someone else's operator and using your own operator and you are sure that, you know, it's basically you're not going to do something malicious. But still, the fact that you are providing admin level privileges to your operator means that your blast radius for, you know, making a mistake in your operator increases. So by mistake, if you coded something messy in your operator which performs some deletion in your cluster, right? So there would be anything there like a gateway to stop it from doing that. So just to avoid the risk of, you know, your own human from mistakes or someone else being, you know, getting malicious and privileged access to your cluster. Please do ensure that the level of service account and the roles assigned to the operator are minimal. Right. There is one more question. So what are the performance obligations of using Python to write an operator as opposed to writing one, say, Golang? Honestly, I went like benchmark both of them and honestly in my previous organizations we heavily use Python based operators and Golang based operators as well. See, there are fundamental performance benefits in Golang, right? Because it's like directly compiled into machine code and it has great performance level benefits. But I guess at the framework level there is not much difference. And honestly, like in my previous company called Grofers, we actually heavily used our operators in which we written a code and there were, you know, alarming or critically slow issues with Python which, you know, caused us any issues. So there are very minimal differences. But from my experience, you know, dealing with Golang and the level of, you know, gigabytes, I would prefer Golang to code operators and the performance benefits are also more than Python for obvious reasons. Okay, I think there is one can you define some use cases for Python operators that you use personally? Yeah, so for example, this post writer one, right? So this is exactly like something very similar to this we used in a previous organization. But instead of that, so basically in a previous organization we, you know, use the software called console KV for, you know, basically this thing called configuration management and we wanted to establish a practice where, you know, people instead of directly writing the key value pairs over their console web portal, then instead applied them on Kubernetes and some operator fetched them and threw them into console. So this interfacing and for this we created our own operator and apart from there are countless operators, right? For example, aqua six, oh sorry, yeah, for example, there is an operator called starboard, which is provided by this company called aqua security with which, you know, basically which constantly watches for your incoming deployments and pods and actually inspects them to see that if they are vulnerable or not. For example, if they're having unprivileged is equal to true or host network mode is on something like that. So these kinds of use cases I think we are almost out of time. We can take one more question and that is, can I interact with other resources slash CRDs using a CRDI, right? Yes, you can do that. So see, again, operator is just a fancy term behind the scenes. This thing is a controller which we wrote, right? An operator is basically a controller which is watching over CRDs. So what you are saying is that you are basically writing a controller which watches over other custom resources. So of course you can do that. The way I defined in the PPT, right, if I go super quickly, so here I defined the feature of this my resource set, I could have defined the other custom resources as well and then my operator would be, you know, watching the other resource and taking actions accordingly upon the creation, updation or deletion. So yeah, of course you can do that. So yeah, we are out of time. So I think there are no questions. I think we'll have to take those offline.