 Hello. Can you hear me? Good. Hello, everybody. Welcome to implement anti-patterns, aka implementing cross namespace resource ownership in Kubernetes. I hope I won't disappoint you with this talk. I know I will disappoint you with it being an anti-pattern and implementing it in Kubernetes. I hope you're not too disappointed by that. What we're going to do today, I'm going to talk a bit about some Kubernetes concepts that we usually use and we usually see like ownership, dependence, namespaces, garbage collection, and the rules that goes with those concepts. Then we're going to try to exploit and implement and do some forbidden stuff that Kubernetes disallow in the documentation and in the implementation as well. And we're going to try to do so without smashing through the walls of Kubernetes implementation. We're going to bend the rules so it fits our very anti-patternish, wrong use case that I'm going to, in the meantime, try to justify to you so you see that we really need to do something like that. First things first, who am I? My name is Tom Zofo. I work as a software engineer at the office of the CTO in emerging technologies. What do we do? We research various topics, we experiment with managed services, open services, and things like that. One of our key area of focus is analytics, data science oriented workloads, and things like that. That's where this use case originated from, and I'm going to talk to that a bit later. What I'm going to be talking about, there's a very nice coincidence. Yesterday there was this book signing event at the Red Hat booth. I didn't know about that beforehand, so I added the slides in the last minute about Kubernetes patterns book by colleagues of mine at Red Hat. Honestly, I haven't read this book. If those authors are here, I'm sorry. I'm sure it's a very nice book. Since I haven't read it, I'm going to be doing anti-patterns. We're going to shoot the deck, and we're going to smash through it. No! What just happened? Good. How are we going to do it? This is the agenda. As usual, first, we're going to start with introducing some concepts to you, going through the most basic stuff for most of you, maybe, but also going to some more advanced things. Hopefully it's not just a recap, and we will get some new connections between those concepts. Then I'm going to provoke you with the use case. Then we're going to explore only one solution, because we don't have time for more. Really, we've come up with this one solution, so we implemented it and we used that one solution. We can save that for the later Q&A or other discussions with beer or something, why this implementation may be wrong, or how to do it properly. Then, as a fourth point, we're going to implement it. We're going to see how much effort is it to implement something like this. Last but not least, if the time permits, we're going to do a live demo with our project that is running somewhere in the cloud. Hopefully, networks will be kind with us and it will work. First concept that I would like to introduce to you is, hopefully, not very new to you. How many of you here intentionally uses Kubernetes namespaces on their clusters? Great. Everybody knows what the namespace is. Awesome. Just to recap what Kubernetes documentation says about namespaces. It serves for isolating groups of resources and then the documentation talks about namespace-based scoping and cluster-wide resources. That gives us some information that there are two distinct types of resources in Kubernetes, that we have something namespace scope and then we have something that is shared across the whole cluster. This is a very important thing to remember that we can use two types of resources. There are a lot of things that comes with those different types. There's security and RBEC constraints that comes with cluster-wide resources that out of blue, you have access to all namespaces in a cluster. On the other hand, if I have a namespace-based resource, I can have a resource with the same name in a different namespace. Now, I have two people talking about the same resource with the same resource name in two different namespaces. They are talking about two different stuff. That's also an interesting topic or interesting point to remember that we have two things happening there. When to use namespaces? Well, usually in environments with many users or distinct workloads that I want to separate and isolate. Common knowledge. So far, so good. Another point here is that each Kubernetes resource, if it's namespace can live only in one namespace. So we can't have a resource spanning multiple namespaces. It can be either cluster-scoped or a single namespace. Good. And last but not least, what Kubernetes documentation talk says about namespaces is something called resource quota. This is a very interesting and important concept with namespaces that comes with namespaces. So resource quota. Again, I'm quoting the documentation. This is something I've learned from implementing this thing. Read the documentation. That's what we usually don't do when we're hacking stuff and putting things together. So that's why I'm quoting on a second slide. I'm again quoting Kubernetes documentation. So resource quota. What does it serve? What purpose does it serve? Well, we want to ensure that each distinct user group, each distinct namespace in cluster gets its fair share of resources. That we can't have a single namespace consuming all the resources in the cluster, eating it all, and there's no resources left for another team. So we can set some boundaries, upper boundaries, what's the maximum amount of resources one team, one namespace can consume. So that brings us to multi-tenancy. I don't know about your use cases of Kubernetes. Our use case is mainly running long-lasting clusters with huge footprint, tens of nodes, even better, even more, and hosting multiple teams that we want to keep isolated on those clusters. So we have user team A and team of users named B and many others, and each of them have access to multiple namespaces, or can be one, can be more, can be tens based on their need. And we want to isolate those. So we do that through namespaces, we do that through resource quota, we do that through some R back on top of it. Switching gears. Owners and dependents. One last topic to introduce to you before we jump into the real stuff. So owners and dependents also a very well-known Kubernetes concept, right? The most obvious chain of owners and dependents that everybody uses is a deployment replica set pot. So deployment owns replica set, replica set owns a pot. If I delete deployment, replica set gets deleted, pods get deleted. Very common workflow. And this is something that is used across Kubernetes in many various formats and ways. So this is how it looks on the API side of things, on the manifest side of things. You usually don't define a manifest for replica set. You usually don't define a manifest for pod. So these are usually very hidden from the end user perspective. They create a deployment manifest and are happy with it. Everything else is orchestrated by Kubernetes for them. But these own references are still present in those YAML files, YAML manifests, JSON manifests in the cluster once deployed. So I can take a look into a replica set and see there's a metadata field called own references that is referencing the deployment. And it contains identifier information, it contains unique ID, kind, name, API version, but also two other flags that are used for additional control, additional stuff on top of it. So there's a flag setting a controller, setting which resource actually controls the dependent because you can have multiple owners for a single resource. So you want to avoid them fighting over this resource. So you have this control flag that it can be set only in a single owner reference. Then you have a flag for block owner deletion. So this is telling the garbage collector how to behave when this resource is being deleted or when the owner is being deleted and how to handle those situations. Then we have another concept that is kind of working hand in hand with owners and dependents that's finalizers. So you don't have to specify an owner or a dependent in finalizers, but finalizer needs to be clever enough to determine these relations. Usually you see that in persistent volumes and persistent volume claims where you can actually start deleting a persistent volume claim or persistent volume even if it's mounted to a port. The port, the persistent volume, persistent volume claim will still stay in terminating state until the port gets deleted and this is ensured by the finalizer. As you can see in the manifest for persistent volume claim there's no definition, no owner reference that this port owns this persistent volume claim but still the relation is maintained. So when it comes to regular owner references and ownership of resources in Kubernetes, we can have these things, this matrix. I can have a cluster scoped owner, again going back to namespaces and those resources, those two distinct types that I've talked before that I can have cluster scoped and namespace scoped resources. So I can have an owner which is a cluster scoped resource and I can own a cluster scoped dependent, again comparing apples to apples. Good, that's that's normal thing to do. What I can also do from a cluster scoped resource, I can own a namespace scoped resource which we see as like a small subset narrowly scoped resource than the cluster wide one. Okay, good. Then I can have, I can't have a namespace scoped owner owning something on the cluster scope that kind of breaks through these isolations, right? That's also a very obvious thing to have. And then I can have in the top right corner, I can have both of them namespace scoped. Well, can I? You can, but they need to live in the same namespace. Kubernetes will allow you to define the ownership reference to a resource living in a different namespace. It will allow it. The API, the admission control will accept it and then the Kubernetes controller will come in and start working on it. So what will actually happen are these issues? What's going on with my resources? What's going on? What's happening? I have a resource that is owned by something and this resource is just out of blue deleted by the controller. What happened? Well, what happened is the controller goes into my resource that is owned by something, goes through the owner references, looks through the owner references and sees I have this namespace scoped owner reference and it doesn't exist. Okay, there's out of sudden there's no owner and I'm a dependent resource. I'm a dependency of something. I'm generated from something. So I'm ready to be garbage collected. I'm ready to be deleted and just delete this resource. So you define a cross-namespace ownership. Everything's great until there's time for the garbage collection to kick in and it just deletes your resource without any warning because it feels that this is a legit thing to do. So if I go just really quick back here to those owner references, you see no namespace field in it. You see name, kind, UID and flags. So it really doesn't convey the information that there's something to be happening with namespaces. So we get those issues. What's happening with my resource might be getting deleted and also articles about these are the lessons I've learned and by skimming Kubernetes documentation and these are the things I really shouldn't be doing. So beware. Okay. Since version 1.20 of Kubernetes, we get this nice notice in the Kubernetes documentation telling us don't do this. It must exist in the same namespace. Otherwise, the garbage collection will kick in. Beware. So this is what we're going to implement. It's disallowed by design and we're going to try to bend Kubernetes. Why? Well, okay. We have some extra stuff on the right that should be there. Well, we have this operator. That's what you see on the top. And we have a cluster-scoped operator, a controller that's working on the cluster-scope level. Normal thing. Everybody does that, right? And we have namespaces. We have users in different namespaces. So we deploy something, some custom resource, into various namespaces. This is called in our project, this is called Meteor Shower. And this resource is holding some specific generic configuration for that particle instance of Meteor Shower deployment. Again, imagine it as a Kafka cluster custom resource or something like that. We can generate additional resources from this Meteor Shower within the same namespace. So there's a Meteor resource which is part of Meteor Shower. It's owned by Meteor Shower. And this is a very normal relationship. Again, deployment replica set pod. Similar relationship here. I have a Meteor Shower, a Meteor derived from it, and some workloads pods generated from it in the same namespace. That is fine. That is good. That is normal thing to do. Well, then I have a problem that I want to solve. Because for some reason I want to generate resources in different namespaces. I want this operator to be able to integrate with external services in different namespaces. Well, why? In our scenario, this is a data science project that needs to integrate with legacy additional applications deployed to different namespaces. And I want those namespaces with resource quota. I want it self-contained and isolated. So we have one namespace where we are running built pipelines for data science repositories. And this should have different quota than deployment of some data science workload. It should live in a different namespace. And I also want to support multi-tenancy. I want to support multiple instances because I have multiple teams on my cluster. So I can't have my Shower instance cluster scoped because I want more instances of it. That would solve the issue, right? I would have cluster scope owner of all the resources. Fine. I can't have that. Well, what do I do now? I can't just set the ownership reference to that other namespace that would just delete my resources. Why would I want to set ownership references? Well, this is simple. We don't want to leave garbage. We don't want to leave mess after ourselves. So if user leaves and deletes their workload, deletes their meteor, I want to be able to garbage collect from other namespaces. I don't want to leave images and image streams, which we know in OpenShift, left in those other namespaces. So I want to be a good steward, good guy and clean the cluster after my usage. So what do I do? Well, the solution we came up with is to provide a resource called Meteor comma, which is a mirror shadow of my meteor in those other namespaces. Then I can define the ownership reference as a normal thing within a single namespace. So Meteor comma in namespace f owns all the pods in namespace f, and this is a normal relation. Now, instead of syncing ownership references across multiple namespaces, I just need to ensure that I have Meteor and Meteor commas side by side in the namespace, and I need to ensure that they both exist at the same time, and I need to synchronize their creation, their deletion, and passing of the ownership references. Fairly straightforward solution, but it's an anti-pattern. So this is the API or the resource definitions that we came up with. Just custom resources Meteor and comma resources, and it defines a couple of fields in it. So first thing is an owner reference in Meteor. That's the standard thing, right? This is owned from a shower called perzates in the same namespace. It's an ownership reference, so if I delete my Meteor shower, I can be sure that this Meteor gets deleted as well, so it's not left behind. The other thing here is a finalizer, and this finalizer is here to ensure that if I'm deleting Meteor, I will also query or schedule for deletion all the commas. Then in status, I track all my commas in all the namespaces, all the external namespaces as something like an ownership reference, which is understood only by our custom controller, but it's there, and it has additional field for a namespace. So this way, I know that there's this additional resource in the other namespace. It's pointing to the comma. It has the same UID. It has the namespace, name, and kind, and stuff like that. So in my comma, I also need to set a finalizer to ensure that the comma is not deleted until somebody deletes the Meteor, because I don't want all the resources in the other namespace just disappear if somebody's doing stuff in that namespace. I want to wait until they intentionally delete the Meteor, and just for good measure, I also track the owner in here in comma just for redundancy and to know where to look at if I'm doing something with the comma. So we have this pattern of all referencing everywhere, and so far it seems good, but how do I implement it? What do I do with it? Well, if you think about it, it's not that easy. It's not that easy problem to solve. If you think of creating resources through kubectl apply or something like that and manually creating all references and getting those UIDs and everything right, it's not an easy thing to do. And if you do it wrong, the resources get deleted because they don't exist, so the kubectl controller is very strict about that. What we can do, what we can use, is our goal line binding. Let's go for the goal line SDK and have fun with it and try to implement it through kubilder and operate it through operator SDK. This was our solution, or this is our solution, how we approached it, and it turned out fairly simple for such a complex problem. So first, the finalizer. What do we do in the finalizer? Well, it turns out that this is just a few lines of code. So if I don't have a deletion timestamp, the resource is not being deleted, so I need to ensure the finalizer is properly set on those resources. Standard thing to do, I need to add it to the manifest, I need to update it, update the resource, good, and if the resource is being deleted, I just query all, I just schedule all the commas to be deleted as well. And then if they are deleted successfully, I just remove the finalizer so the resource can be deleted. Well, what do I do when I want to delete all those commas? Again, very simple, single for loop. I just list, I just walk all the commas that I have in my status and just schedule them to be deleted. Simple. Well, when it becomes more complex is actually synchronizing the creation. But our operator acts, watches the CR creation and acts upon it, watches Meteor being created. So we just, when we are creating Meteor, we just list all the external services we need to integrate it and loop through them, and then it's all the same. It's just, do I have a namespace for that external service? I do, okay. So I get the namespace, do I have access to the namespace? If I do, I can create comma in there. So I create a comma in there. And I just do some, on the right, I just do some extra magic with owner references and setting up those names, generate own references and set them for my tracking in the status field. Which is, again, fairly simple. So now this is all the implementation that I had to do. I've implemented the creation of all the commas across all the namespaces, and I've handled all the deletion phases. This was all the code. This was all the implementation. Well, now I need to deploy and test and run it. So we use this community collaborate first where you can get for community workloads, you can get free infrastructure and deploy their workloads in there and operate it and collaborate on operations and on SRE related stuff and things like that. So we use it as our go-to community for Kubernetes and OpenShift and stuff like that. And we deployed it there. So let's take a look at it. This is our meter instance. It provides you a way how to get, this is unrelated to the dog, but this is the use case being explained. So we automate away from a data science type of Git repository to images being deployed to Jupyter Hub and other ways. So we get some meteors in here that are being created. And if you fill in this form past the URL, that what happens, there's a meteor resource being created on a cluster. So we can take a look what actually happened in the cluster. I have a meteor resource, as you can see managed by Perzates. It's owned from a shower resource. I can go into the manifest and I can see that there's an owner reference for Perzates shower and it has some spec and it has some commas in its status. So it's integrating with some other namespaces. So I can query for, well, yeah, I can query for meteor and I can query for comma. So I can see I have quite a few meteors in this namespace and this is my meteor that we've been just looking at. And I can take a look at a different namespace, at the other namespace that I'm integrating with as the external service. Now, I don't have any meteors here, but I have commas and they are named the same. So I can, just for consistency, they are named the same. So comma really does nothing as a resource. It's a blank shadow of the meteor that we can use to own the additional generated resources and garbage collect them afterwards. So if I take a look here now into builds image streams, in this namespace, I can query for this name. There's an image stream which is owned by this comma. So this image stream, which is an internal OpenShift resource to hold images in their internal OpenShift image registry that is internal to the cluster. It has some tags, it has everything, but what's important here is there's an owner reference that we've added to this local comma in that local namespace. What we also have for this meteor in the other namespace, we have built pipelines that are run against this meteor, against this meteor spec. So again, it's owned, managed by this meteor. So if I delete my meteor, which we can do, if I delete my meteor, what should happen is that I'm going to duplicate this so we can see it. All these, all these pipeline runs and all these image, this image stream should be deleted. And as you can see, they are living in a different namespace. So if I delete this meteor, the controller, yes, the controller should delete those pipelines and delete the image stream. And just to be sure, we can take a look at this namespace and get those commas. I shouldn't be having it here. Yeah, it's not there. The comma gets deleted as well. So it worked. It did its job and it was a few lines of code. So if you have a complex use case, if you have something that is hard to solve, well, if you think about it and go into QBuilder operator SDK and really try to implement it through the Golang SDK, it's really easy to do, to bend the Kubernetes rules and to implement something that is really, really disallowed and really advised not to do in 20 lines of code. Thank you very much. If you have any questions, now's the time.