 My name is Arun Krishnakumar, and I am from VMware. And I want to just give a background about my team and where we are coming from so that the problem can be set in context. So I'm an engineer at a particular product called VMware Cloud Director, which is a multi-tenant cloud provider. So people basically buy resources from VMware Cloud Director, and they create their own clouds and sell it to other tenants. And my team builds Kubernetes clusters for those. And one of the requirements which I have got in this team from my customers and also in earlier teams is about how do we actually let a user move to another, leave the company, or move to another team while still keeping their workloads active and while still keeping all of their jobs intact. And that is the overall aspect of this talk. And it's not as simple a problem. I mean, it's not easy to solve in general, but we can actually look at some aspects of this. So that's the part of the talk, when Sysad means pretty much quit. At that time, how do you protect your Kubernetes cluster? And it's an interesting thing, because you would think that it's a solved problem, but it is not. So yeah, usually humans and operators are owners of clusters or applications in clusters. And these humans change teams or companies. So we will focus on one particular aspect of it, which is when a cluster owner leaves a team, they have to. So the product needs to have a clear way of transferring ownership from this particular cluster owner to another particular cluster owner. And so the thing is that there are multiple owners possible, because it's a multi-tenant cloud. And in this particular case, we are discussing in the context of Kubernetes clusters created using QBADM, which is a very common cluster creation mechanism nowadays. And unlike the rest of the talks we have seen here, this is not very industry-generic or what do you say. It's not a very broad scope problem, but it's a particular problem. And there are solutions at least achievable. So it's not like, hey, this is the industry, and I don't know what's going on. And these are all the tools available. So this is more focused, and we will actually see how we can actually solve this. So as a solution, the simple solution is you transfer cluster ownership to another user. That is the most obvious thing to do, and you revoke access from the old user. However, is it really feasible? The problem is very clear. Actually, if you look at the products which build it, doing this is an afterthought. I've been in multiple startups and also semi-non-startup companies. And usually whenever somebody actually has to live, there's no UI-based way, or there's no simple click-through saying that, hey, take all of the resources which this person has, transfer it to another person. And while also keeping all of the workloads active and running and so on. So it is an afterthought, and people actually, whenever it happens, developers scramble, they sort of go behind to a database, and they begin to run queries, and go and update all of these users, and so on. So the solution is sort of simple to state, but it is not necessarily easy to achieve. And if you look at a Kubernetes cluster, you, for example, have the control plane, which I mean, typically there's a load balancer. Not sure if my cursor is available, but yeah, so there's a load balancer fronting the whole cluster. So in a Kubernetes cluster, pretty much the IP address and port are the stable aspect. There's a load balancer. You have a series of nodes. Nodes could be connected to disks or volumes. Those are the PVs, the persistent volume, as mentioned in Kubernetes. You could have some GPUs attached to nodes, and so on. And each node has the compute and storage and so on. And we would, in a multi-tenant cloud, every single bit of this is owned by a user. It is actually referring to a user idea at the back. And when that user is deactivated, all of these have to move to another user. So we have other logical aspects apart from the infrastructure aspects. There are meta, italics, some set of secrets. There is root access to the node you will have to revoke. And beyond all of this, there are application-related secrets, such as if there is a Postgres database, there may be a Postgres admin password. Now, you cannot, I mean, you will have to essentially rotate that password or change it. If there are a series of certificates which are used, you will have to revoke those or change it. So it actually just goes in a fractal manner. I mean, it just keeps going further and further depending on if you have an Apache web server, then what is the essential root behind that and admin user behind that, and you have to go and change these and so on. So the scope is really very wide and people don't have a very clear system of saying that these are the things and this is how you transfer. So just to recap on the cloud, in-fra object level, you have nodes which have to be transferred, networking components. So load balancer is a block, but for example, in VMware Cloud Director, you have a virtual service, load balancer pool, port-related some details like application board profile and so on. You have certificates which actually decrypt some of these load balancers. If there are application-based ingresses, you have storage which has to be moved to the new user as well. And VMware Cloud Director had a bug wherein it could not move and we are looking at solving this. So though we have done nearly everything, there is still one bit hanging and this has been a product which has been in the industry for a while, right? So that is the sort of the complexity in general. And there are other complexities. So you take something from one user and you give it to another user. This has to be done by an administrator who actually can perform this operation. But now, does the destination user have enough quota to accept these VMs? Suppose they have five VMs and their quota is five, they may not be able to take a cluster which has another 100 VMs. So those are some considerations which we will need to look at. They need to have permissions to be able to do all of the operations which were done by the user in the previous case and so on. So every object of the user must be transferred to the new one, logical and physical. So yeah, and administrator needs to have access to both objects and be able to transfer. So that is the other one. Yeah, and in the infra objects, we have some logical and some physical. Logical ones are certificates, route access to the nodes and the actual Kubernetes user accounts. User accounts cannot be transferred. You have to delete and pretty much recreate. Then you have port profiles and in VMware Cloud Director, you have some other metadata like defined entities and VMware's metadata. And Kubernetes objects are more standard. You have user created secrets, secrets on the cluster, you have RBAC on the user and user accounts and kube config and so on. Now we come to the actual problem. So all of these can be transferred. However, the admin kube config cannot be transferred. So the admin kube config is a break glass kube config. So once a person has access to that particular kube config, there is no Kubernetes command or anything which just says, hey, go and just rotate the certificates or revoke this. You cannot revoke a particular kube config. And you cannot just say that, so there are some mitigations to this. The admin kube config will be on the control plane nodes. So you can say, I will revoke the control plane nodes route access, right? But an admin could have made a copy of this and kept it in his cache. That will still work. I mean, if the admin is a malicious admin who is quitting the company or he is, I mean, basically we won't be able to access it. So if the other thing is, suppose you say that, okay, I'll block the network, all of my clusters are internal. But if the admin moves to another team, then at that particular case also, essentially you will not be able to block it because usually within a company, all of the IP addresses are accessible through a VPN. So yeah, essentially you cannot access network unless there are provisions from the infra. You cannot change the cluster IP easily because the certificate sign of Kubernetes uses it. So what is the solution? Revoking the admin kube config. So there are open tickets in kube Kubernetes, in kubadium to actually handle this and there is no solution. It cannot be revoked easily, that's what I'm saying, because the purpose of this talk is to actually revoke. And the way to do that is the manual revocation. It is definitely not a simple process. There are some documents which hint at it. They're at very high level. And there are some resources available, but they are really advanced and I couldn't really make a head or tail of it. So that is the whole point of the talk. So some of the things which helped me were essentially the last one, Kelsey Heiter was Kubernetes the hard way, wherein he says how to create certs and so on. And Kubernetes also has some docs about manual rotation of CSS, but that is also limited. It just says where the certs are and what they must contain. So the overall procedure essentially will be to create a root CA on your own. You can have a self-send root CA. Then certs for ETCD, kube controller manager. I mean all of these certs for APS server kubelet and kube scheduler. Copy the certs to all of the nodes. Create new kube config files by using kubectl commands for all of these node, kube proxy, kube controller manager. By node I mean pretty much kubelet and also for the admin. And then copy all of these kube config files to the nodes and then there is the update of static manifest. All of these manifest for kube controller manager, kube APS server, they all refer to these certs and they refer to multiple certs in a complex way. Essentially you have to do all of this. Then you create a new role in order to be able to access kubelet and then update the kubelet service files. So once you do all of those, then you will be able to essentially get a new admin kube config and the old admin kube config will not be accessible. So just to be clear, rotation of the kube config does not touch admin. I mean, the old ones are still accessible. So that is what I would like to demo here. I have recorded a demo with kind VR itself, but we can potentially, we could have run the demo here itself. So that does work. So now I have to stop the slideshow just a second. You can manage, but you cannot manually revoke and rotate the certificate. You can refresh the certificate, which is after the certificate expires after a year, so you just refresh it. But the old ones will still be accessible because they have an overlap of the last and the new. Yeah, that will not handle the admin kube config. There are some tools from Google which do that. Yeah. And anyway, so this is what happens ultimately behind the scenes, even if Google does this, it is pretty much the same way. So I have a kind-based cluster and a demo based on kind over here. So I mean, basically, this is the cluster. It's one control plane and one worker. This was simpler to use as compared to accessing something over GCE or AWS or other things, so that we don't have any network issues and we can look at it. So creation of the cluster is pretty quick. I'll just forward it some minute or so. And please ask questions as these go on. I will pause at some aspects and also show how the files look like. What is the thing? So once this, okay. So yeah. So basically after that, we can just copy the admin kube config. So that is getting copied from adc-cobanatisadmin.conf and that is inside the thing. So we can just, after copying it, we can just see that we have access to the pods and so on. So this one will say no host because actually this is a port mapped in the case of Kubernetes in general, in case of kind. Kind is Kubernetes in Docker. So we'll have to just edit that in the port map. So that we can very quickly do. I should run this as 2x the speed, but you can see all of the things now. So it has the kind control plane and the kind worker as nodes and multiple things. So let me see if I can run it faster. Playback speed, let me just make it as 1.0.5. So I created a directory called mypki and I now begin to create search here. So this is a base called ca.pem, I mean ca.pem and cak.pem. So I use CFSSL, that is cloudfares SSL to create it. It's just a simpler open SSL can be used. Likewise, we need to create an admin pk. I mean, so yeah, basically the search request and capem is here. So the cak is called ca-key.pem, that's why it's not shown there. And then we need to create the cubelette search. So one of the things is the internal IP over here. So the internal IP is essentially obtained from the VM, that's the, that's not the, so that is what the internal nodes IP is and many of these will directly come from the node. So that would have to be in the list of internal IPs. So just you can add that. The external is 127, that is where you actually look at. So we are doing it for both the kind control plane and kind worker, so it's a loop. So these are the cubelette related search. And then, so if you look at that, there's ignore the middle cube config, we'll generate that again. And now the, this one goes in a bit. So there's the controller manager and then the cube proxy. There are a bunch of these search which have to be created and used. So let me skip to the end of it rather because this demo takes a little bit of time. Okay, there's only one, let it continue and then we can skip. CSK. Yeah, this is the one which is sort of interesting in the sense that, yeah, these are just the host names. So this is one used by ETCD. And so you need to have all of these default SVC, default SVC dot cluster and so on so that it begins to use it. And yeah, so this 10.96.0.1 is the cluster IP. So that is what I wanted to just show. So the admin.cube config is still accessible. As you can see, it's called dot-orig. Yeah, all those are accessible. Very soon it will become inaccessible once we change this. So yeah, this is the point where I'm copying the search. I copied to something called myPKI. So the Kubernetes search are actually under ed. ETC Kubernetes PKI itself. I've just put it under myPKI. So, and the search also need to be installed on that. So there is this update CIS certificates because this is like a Ubuntu machine-like equivalent in kind of, basically that Docker machine is like a Ubuntu, yeah. So that has installed the search. We have still not changed anything in the sense that the next set is creation of cube configs. So we can essentially skim through this. So we need to create one for each worker node, that is these are the cube related cube configs. Then the next one is the cube proxy cube config and so on. So let me just go forward. Yeah, you have the cube controller manager cube config, cube scheduler cube config. And then we essentially ultimately have the whole lot. The admin cube config also was created in the interim, right? Okay. And at this particular point, we copy all of those into multiple nodes. So now we have a set of manifests and these are the manifests which we'll be replacing. I'll just let it replace and show the crycuddle, but I want to actually go through the manifests individually. So basically I want to show that it is getting restarted. So if you look at it, you can actually log into the machine and you can do a crycuddle ps. These are all Docker images. So code DNS and everything is there. This is another window where I do the copy. And as soon as you do the copy, Cubelet will start to restart it. So I mean, if you do a ps minus a, you can see that some things have started to exit. So it is it immediately dies and then it'll start to come up. And until it is it dies, APS server will also die, it won't come up. And the last to come up will be a cube controller manager. So this just takes some cycles and it. So yeah, basically after a while, it just comes back up. It tried three times. And basically you can look at the logs of cube controller manager in crycuddle and it is working. So this is crycuddle is like internal Docker image logs. This does not mean that we can actually access from outside. So the things are out. So now the interesting part is the, so the next part is the creation of the admin cube config. So basically now I am going to use the new cube config which has been created and I want to install some roles into it. So it could not find the right port because we had to essentially change the IPR, I mean the port number because it's mapped, right? So basically it's the same Docker PS and change the port number. And this is the new admin cube config. And you can see that once that is changed, it should just work. So once you actually add this, we can begin to see the pods from the new one. And the old one is the old admin cube config is unusable as of now. So I'll just, so basically I'll just show one thing. So though the new cube config is accessed, I mean, before we do that, we just need to actually see one. The thing is the nodes and other things are accessible but you won't be able to see the pod logs yet because the cubelet has to be restarted in order to see the pod logs. And to do the cubelet, it is again, you have to put in the certs for that and so on. So you can actually see that you can see all of the pods. However, if you actually try to get the logs, you actually get access denied. You can just, I mean, basically try something. So it says that you need to be logged in and so on. So the next part is to set the cubelet service and to restart it. So that one, basically, there are two large files. Many of these are, I'll just go through the files right now because as you can see, some of these, the line number 525 and 526, okay, we can just come back to this. So the line number 525, the CLS, TLS cert file, for example, it is referring to mic PAKI. So these were things which we created just now. So that is essentially the demo. Yeah, one thing is, yeah, to one gotcha sort of, is that you need to sort of delete the existing cube configs. So actually here, a few lines ago, I removed the, not the Pico ComPsy, the where lib cubelet slash PKI that is reused. So if you don't delete that, it'll use the old certs. So you can, I'm just looking at the cubelet cert and if you go to the bottom, you can see that actually certificates have been rotated. So it will shut down the client connections and restart with new credentials. So that is, yeah, it just wipes it out because the PKI is new. It just goes and gets rid of all of the old ones and cleans up and gets the new one. So you can see that some of them are just created 29 seconds ago. That is the local path provisioner and the kind that's here and so on. The ones which are four minutes ago are still running. Those are the, this comes in. But yeah, the ones which are running ETCD and so on, they don't get restarted because they have the right PKI. So cubelet just restarts the others. And now I just, yeah, I mean, basically, I'm just showing that with the new cube config you can still access. I mean, you can look at all of the logs and so on. So yeah, the same command for looking at the logs now works and it has essentially created it. So, and the original cube config is not usable anymore. So this is what we set out to do to revoke this. And if you actually do this admin.config, whatever kget pods minus a, you get that this certificate assigned by a normal authority. And this is what you intended to do. However, there is no automation of this from the Kubernetes end just to be clear. So we need to essentially manually revoke and let's get back to, that's the end of this demo, pretty much. The new cube config is accessible and it does it. So is that it? Yeah, there's nothing else. So one thing which I want to say is that the kind worker, for example, we didn't change the worker's cubelets. So the worker, and we didn't change the certs in the worker or update certs in the worker. So the worker is still not, yeah, they will need to be changed. So if you want to access the logs of the worker, it will not show anything and so on. So that one has to be done. So this has to be done in a loop. So let's get back to the demo, I mean the post demo thing. How are we on time, 4.14? Okay, right, so we still have quite a bit of time. So yeah, the limitations are, yeah, there is a cluster downtime if you have a single node cluster. If you have multi-node clusters, then there is no downtime. However, all of these are restarted and a cubelet service has to be restarted on every node. So those are all limitations. There is a general risk. So once you start on this process, you can retry and continue, but you have to complete. You cannot stop in the middle or anything like that. And this has to be currently managed by custom scripts. There is no sort of operator or anything which actually goes and does that. So that said, we know that this is the case now. How do we, going forward, what do we do? So there are some best practices which we can use. So never have your root certificate in any of the nodes. It is always good to have our own root certificate somewhere and create a series of intermediate certs and give it. What I demoed was actually like that. We had a CA.pem alone and CA.pem and CA.key, but however, we created a lot of ETCD certs and so on as intermediate certs, cube, proxy cert and controller manager and we pushed that. And yeah, so that will be used by Kubernetes to sign some things. So that is one which I found, at least there is this link wherein this person actually gave in some details, though not the end. So yeah, the Kubernetes best practices also says that do not send any keys into the cluster. So they say that generate every single thing and send it. However, yeah, that has to be scripted quite a bit. Cubadium sort of allows it. So it says that you can, as long as you create it and push it, you can skip the step in the unit phase. However, it's not very simple. So to tie it all together, one of the things is that this is on the Kubernetes side. However, on the Infra side, it is a large problem again. You have to be able to be ready to move every single resource object to another user. And those are the things which has to be already planned and done. Use the external CA, that is a simple thing, and use an intermediate CA to keep the root secure. And if any intermediate CA is compromised, just revoke the certain manually loaded certs. So the other thing which we have not covered is there can be a series of users which have been created by the old administrator. Those have to be deleted and recreated. So that is also like a sort of a pain given it, because we don't know how many other applications that user is linked to and so on. So this is basically quite a large disruptive effect. And this has to be made a streamline in some way. But all of this we'll have to do. And we have still not got into what happens if the Postgres administrator's password is lost. So that is another set or all of the other application related passwords. So that is another set. So yeah, Q&A or any particular thing. So this is a bit unlike the other talks mainly because this is not as global in scope as those. And it's for a particular problem.