 Hello, everyone. Welcome to Cloud Native Live, where we dive into the code behind Cloud Native. I'm Annie Talvasto, and I'm a CNCF ambassador, as well as a senior product marketing manager at Camunda. And I will be your host tonight. So every week, we bring a new set of presenters to showcase how to work with Cloud Native Technologies. They will build things, they will break things, and they will answer all of your questions. So you can join us every Wednesday to watch live. And it's a fun thing that's happening within the Cloud Native sphere in general. There's two microsurveys going on that you can complete. There's one for contributors and one for WASM. And also, this week, we have Christopher here with us to talk about creating a paved road for security in Kubernetes operators. And as always, this is an official livestream of the CNCF and as such, it is subject to the CNCF Code of Conduct. So please do not add anything to check or questions that would be in violation of that Code of Conduct. Basically, please be respectful of all of your fellow participants as well as presenters. And today, we'll be taking most of the questions in the end, but feel free to pop them in the chat throughout the session as well so we can get answers to them, all of them, towards the end of the session. But I'll hand it over to Kristoff to kick off today's presentation. Thank you. Yeah, so, hey everyone, my name is Kristoff Stiefel and I'm working at QBOB as a technical Kubernetes architect and today I will talk about Kubernetes operators, like how they work, what are the risks if you use them and how you could mitigate those risks. So essentially, Kubernetes operator is an extension of the Kubernetes API with some, which is provided by customer source definitions, which extends the capabilities of Kubernetes beyond its initial primitives. So there's a controller that reconciles those resources for you utilizing all the magic that Kubernetes has. And the common use case for it could be solutions or applications. So you might have a database that you wish to update. You may want to train the data out of it. You may want to rehydrate it. And this is usually a very good pattern to use and try to keep your application in sync with your customers. So introduce, we've talked about customer resource definitions and what they do, but they can basically be anything that you want them to be, to run your cluster to provide audio operational knowledge. So, yeah, they also introduce a customer controller which essentially gets deployed normally in a dedicated namespace of a dedicated service account. But this isn't mandatory, so you can leverage existing resources if you wish to, which is kind of a bit scary, but we go through it a little bit later. And then alongside all of that, you would like to do with the operator and how it's going to work. It could also introduce further Kubernetes resources and resolve those, but also extend further outside your cluster and actually start configuring cloud resources, which is quite scary stuff too. And yeah, last on the list, there are some logging and metrics as well, just to give you some of the visibility. Now, what can go wrong? So from a key threats perspective, you have service accounts permissions. Now, operators are administrative tasks and so as you probably know, they're highly permissive or certainly have highly permissive service accounts. So if the operator gets compromised, you can leverage those permissions to pretty much to quite a lot of stuff. Getting close to cluster admin and from the deployment of the actual controller itself, these can be privileged containers as well, just so they can access and reach the resources that they need to. And also they're just as skeptical as any other container or image to vulnerability or dependencies that would manage. But I think the biggest thing from a workload perspective in Kubernetes is the scope of an operator because not only it could be just namespace bound with specific namespace, that you would want to resolve your resources or you want to do your work or it can go completely cluster-wide or even multi-cluster-wide if you wanted to, which again, extends those permissions. Yeah, but even further, it can be externally bound so it can start resolving those kind of resources. But any of those permissions, if the operator gets compromised, can really take an attack on. So let's have a look at what the common attack path may look like. Well, an attacker could steal some credentials and contain cluster access and from there, you could enumerate the part. Maybe notice that there's an operator in their unit and then maybe from there, you want to start enumerating what the service accounts can do, maybe the namespace is already in it. If it's in a dedicated one, maybe you want to use the system namespace and basically from there, you can leverage the cluster-wide bindings if it has those other malicious container into it. So for instance, in the keep system namespace and then you start bit by bit, start taking over other resources with like a privileged container and but yeah, granted that there are a lot of ifs and buts and other controls that you can apply in the more kind of stealthy kind of attack path that you could take is actually replacing some of the code on operator within the image or inside like a registry. So that could be an internal registry with an angry deployer or employer, for example, and chain type of attack. Now the functionality of operator will maybe remain the same, but what you could do then is install like a malicious sidecar and that sidecar will then be deployed alongside other containers and then intercept the load of requests that come through to get a sensitive information and information you can transfer to an adversely controlled cloud provider or account, for example. It's also worth mentioning that there are several CVEs you can find to kind of run operator and they're not always just about the operator itself. So there are things that are deployed alongside the operator and that affected and how it's done. So you can see that with, for instance, decapsule proxy essentially the vulnerability is within there because the proxy is running as cluster admin gives you cluster access and further on to the operator and so forth. So, but yet it just gives you an idea that operators are also vulnerable and are not immune to this topic. So, yeah, how bad is it? This is of the official operator hub where all the operators there were reviewed for key threats like, what are the service accounts commissions that were deployed? What security contexts have they got applied? Are there any sensitive cluster bindings? And then are they deployed in a separate namespace? Now, this is for security context. Red is bad, so red is basically saying that for specific security context that's not being applied to an operator asset but the purple ones are typically the opposite of what you want. So that is, if you have privilege escalation force, for example, you would be setting it to true. So that is kind of a bit scary but at the same time some operators sometimes have like accessing or setting up container storage interface, for instance, which could justify such settings. So next up are the classroom permissions? There were a lot that had classroom permissions. You can see that there are a few admin or cluster admin equivalent. That is, I have access to every single resource and every single Kubernetes API object. And alongside that's surprising about 10% of the operators have binder escalate or impersonate cluster rows. So the breakdown of this would be like 90% of this have dedicated namespaces, which is really cool. The only issue is that maybe 4% had cluster rows which almost defeats the point completely of having dedicated namespaces somewhat. The percent had access to secrets which we'll take a look at later. And 70% could execute into parts. 58% do not use security context at all and only 10% drop leading capabilities. So as you can see, you have to be careful from an external source and really have to kind of review with the settings and the resources matches your environment security guidelines. Yeah, so I guess that's demo part. Just switch my windows. Right, so I have a basic Kubernetes cluster here with three master and three worker nodes and for this example, let us just deploy on the community with engineers in this content. So just as you can figure out the dots and copy that here and like this. So it's deployed in different namespace and as you might know, it's basically a greater part of our container which we can see here. And alongside that we have some custom resource definitions deployed here and there's also a service count. And I'm also a cloud server with a lot of stuff bound to it. Now there's a useful tool called that robot from control plane IO, which I have already installed on my local machine. And it's basically a Kubernetes operator audit tool. You can say it's analysis and yeah, can analyze or manifest for you and scans the configuration and gives you some indications in front of the score. How bad a specific configuration like a cluster role or certain security context permission is. So as an example, let's just scan the RBSC gamifier from the out-of-the-box engineering this controller deployment which I have copied here. It looks like this. As you can see there's a service account with a cluster role and all the permissions to the resources and so on. And so we scan the YAML file. And as you can see there's a service account with, yeah, right. It basically says the operator service account or the cluster role has permissions to read secrets in our namespaces little score of minus 12 which is actually pretty bad. But to be honest, I'm not quite sure if this is actually needed in order to let the RISC controller work as intended. But, yeah, later. And another way to check that would be with a QPCR command just describe the cluster role and you could see, okay, we can read secrets get list and watch them or what you could do it would be to use the QTL notification command and ask basically, can I get secrets as the service account of my engineering spot which should say yes and we could do that in every namespace. So now let's assume that an attacker site the engineering operator container and maybe his goal is to deploy a malicious spot or container from there. So an option could be that he finds out that he can read secrets with the engineering service account and abuse it with another service account with the which then has permissions to deploy parts. So for example, there are some basic default Kubernetes service accounts especially in a Qt system namespace which have those rights. So maybe let's check them real quick. And for example, the replication controller which is capable of deploying parts by default as you might know and again, you could check that with the QTL off command and say, can I create a part with the service account? And it should say yes and also in every namespace. Now we would have our target service account so to say, but use as an attacker. And as I said, for this scenario an attacker managed to get into the engineering operator pot. So let's execute into it like this. And let's see what we have here. Of course, there's no QtL but we do have current and with the current command we are able to send requests to the Kubernetes API. And therefore we will have to use the credentials from the engineering service account to authenticate against the API server. But all that stuff is already mounted inside the container. So it should be under while running secrets, Kubernetes IO and then service account. And you could find it out just by describing the part as well, but here we have all the stuff we need like the certificate, our token and the name of the main space. So for the code command, we would need some variables. And let me just, it's like, if the API server because our basic Kubernetes default service and name of the service account, name of the namespace then we have our token and the certificate we would need. And our code command would basically look like this. We want to get all the secrets from the system namespace and then just type it into the tag directory because we have my ID here. So let's see, right? So now we have all the secrets and as you want to use the application for, maybe that's just grab it like this. And you have our secret for the application for the service account, for the credentials and now we want to use this to deploy our images. So first of all, we would take the certificate and there's 64 encoded and just resize it and we save it. So the code is encoded, the application for the service account like this. So the token is a new variable here. Our code command we would want to use a JSON template So this is basically a bad part as it says it wants to be in a Qt system namespace. It just leads that permissions like privileged through and access to the root file system of the host and also the host processes and so on. And now the code command would look like this. We want to place this part in the Qt system namespace with the help of the template file and this should deploy the part in the Qt system namespace. So let's see, let me exit here. Here it is up and running. And typically when a tagger would place some kind of a backdoor into that container. So let's execute that onto Qt system namespace and see what we can do. So just a title, sorry. Right, so now we're in the bad part. Now we do have Qt CTL, we do have Let's see, can we become Sulu? Can and we should also be able to access the root file system with CH root slash home and now we are on the host machine on some working node. Let's assume that maybe a developer has a Qt config file in his home directory like this. It looks like we are cluster admin here and we have all the credentials we would need and also the IP of the code. So maybe just to verify that you could say Qt CTL config get context and then just use this Qt config file we found and indeed we are Kubernetes admin and could probably be able to delete a node for example and you could also check that and instead of a service account we now use the Qt config file which would say, yes we can. So basically an attacker now managed to take over the whole cluster, so to say. Right, let me jump back. So how do we start to secure an operator? This is essentially what the CLCF says for operator security. The first and quite big tribe here is about transparency and documentation. It is essential that you define exactly what things doing and what you are trying to achieve so in that sense you can then start breaking down what permissions you need whether it needs to be externally solved or so forth. So that's a very important thing for an operator to do and finally let's go. So like cluster-wide, external or namespace-wide so cluster-wide you have access to resources that you need external cloud resources and then namespace they try to basically respect RBSC permissions as much as possible and so on you use it if it's necessary. As you see from the operator hub which doesn't contain all the operators on the planet that is not true and quite a lot of them were being said to cloud permissions. Being to be on mind if you think is the operator does have access to external resources and you're going to need to review that as well. And they say that or they say that you should leverage as a use of armor and seek on profiles as well well known across the industry and as always scan for vulnerabilities and consider your supply chain security. So from prevention strategies point perspective maybe let's say you want to better and it is worth mentioning that as a operator SDK version 1.18.1 or higher there are two default security contacts set by default like run as non-root set to true and allow privilege escalation set to false and as part of developer who has has to remove you kind of have to question that or question why somewhat and so yeah it may be bound to the way the thing works but I think you should really question that one so my advice would be just to be explicit and careful the resources you want to access and the APIs. So I'm trying not to do star permissions trying not to do core API star that is just being quite lazy so the only issue is that an operator might actually require that as part of it in order to work to question that somewhat especially when you get into around RBSC API because that's where if you buy a star then you start to define the escalate privileges so last kind of a bit of advice is to start to work with the developers who scope out respecting the operator to the namespaces that it's supposed to have but not only the namespaces it's supposed to be so I'm also restricted to what it can watch as well just slightly different and review the cluster permissions and make sure they're not overly permissive and also do the same stuff for cloud AIM and also make sure that role and not cluster role is defined if you only got a namespace operator so restrictive way to secure operators in your cluster in general is shown here remind that you would have to check and make sure the operator itself still works as intended afterwards but in theory it hardens your cluster security when using operators so to make the payload in your cluster from operators so payload or applications should be in a different namespace than the operator there should be no payload in your operator namespace and vice versa and to further rate payload from the operator you can use as well as port security policies or as often do beneath this one point 24 I guess the port security admission feature so for example the operator is not allowed to create or deploy other port security policies etc. or network policies in its own namespace and no other and as I have already mentioned the service account for the operator should be as minimalistic as possible regarding its permissions as well so if possible the operator should only be able to the payload in its intended namespace right so let us come back to the console and so our engine X English controller operator was supplied into the default namespace as you might have noticed and what was the operator namespace so maybe let's just create that one just exit from here and say the namespace is called and maybe let's deploy the engineering snapshot the same thing is done but now the operator namespace and now it's like so the first step is to separate it from the workload and basically what we want at first is to deny all traffic coming in and out afterwards like this unnecessary stuff so from now I'm not going to do traffic like this so we say I'm not going to deny all the ebis and ebis so let me fly it quickly now it should already be possible for an attacker to read the secrets with the service card from the other things as we've seen before so let's execute that one again so now we will set our variables and try to create the system namespace and as you can see we won't get any response here and the network policy seems to work and another separate or restricted permissions would be to change the cluster rows and cluster load bindings maybe to roll and roll bindings bound to the operator namespace so let's try that one I already saved the cluster load binding and cluster load 47 and set it to roll and roll bindings so they are bound to the operator namespace so they should look like this just edit the namespace in here for the roll and roll binding that's basically it maybe just delete the cluster load binding and the cluster load like this and create a new roll and roll bindings maybe we'll see for now maybe like this and let me try the whole thing again and try the curl command and as you can see it says our service account is for taking three secrets in the criticism namespace and now we should only be able to read the secrets in the operator namespace since the roll and roll binding is bound to this one so this curl command with the operator namespace now should work and yes it does we now see all these secrets from that so the last option would be to deploy a productivity policy maybe let's have a quick look at what I've already prepared looks like this and now this one restricts all the nice almost anything like previous set false processes and false ones and so on just for the purposes the operator won't work but let us deploy that one and activate the feature itself and we have to edit the so we just add the feature and maybe we start let's get back to our old cluster roll and cluster roll binding will delete the roll and the roll binding corresponding and deploy the old ones again and I guess since we want to create the bad part again we would have to delete the old one it's running already could take some time but let us move in a few seconds so we execute in the next part again like this now we should be able to read the secrets again he will put it in the output file so let's check that one real quick now we have the secrets again so let's grab for the replication controller again and do the same stuff we did before like decoding the credentials file like this same for the token say token equals that one served equals the file we created so and I guess we have to recreate our bad part chasing template since it missing right one and now finally we can try to create the bad part again and so now as you can see I'm not saying that our policy because of the security context setting as you can see there are many different ways to secure the users especially if you have further questions perfect thank you so much absolutely lovely demo as well let's see there was a few audience questions slash comments that we can take right now but while we take those questions now is the time to ask questions as mentioned here so type those away and we'll get them answered so there was a question essentially on will these YAML and jason files be shared on github so people maybe want to try things themselves out I guess it would be possible I can check that and contact you I don't know how to publish them yeah we have a cloud native Slack channel within the CNC of Slack so if you decide to do them you can plug them in for example there and that's gonna be great for example and the audience members can find it from there and the other comment is for example it goes settings such as must run as non-root and privilege escalation false ETC can be set in port security policies community standpoint I guess he answered his own question yeah he said forgot to mention afterwards I think after he had that question yeah all good then that went well on that front as well but yeah those were the two questions we got during the session but let's see if we have any more so please audience members type them away but I have a question meanwhile is there any kind of learn more resources if anyone wants to dive even deeper into this topic that you could recommend yeah maybe the question was for operators itself if you have time for it it's really big but I can give you the link it should be in some slides this one I guess perfect we can find it there lovely yeah let's see no new questions so far there's actually another question from me I always love asking this question by the way so do you have any predictions on this space what's going to happen in the future what will be the next steps for this good question I have a surprise what happens next there's a lot of movement in the more education there's a lot of education in general and in real production environments more I think the security will also grow there will be some questions perfect sounds good no audience questions so far so we will start wrapping up so final call for those audience questions there do you have any final words or notes that you want to share okay perfect well it's been absolutely lovely thank you so much and thank you everyone for joining into the latest episode of cloud native live it was great to have a session about creating a paved road for security and communities operators I really like the audience interaction there was a lot of people saying hi in the chat as well so hi to everyone as well and as always we bring you the latest cloud native code every Wednesday so tune in in the future as well we have a really great sessions coming up in the future and in the coming weeks thank you for joining us today and we'll see you next time bye