 Welcome to my talk about hardening Kubernetes by securing ports. So, okay, yeah, sure. Yeah, yeah, yeah. Can you hear me now? This is okay. Hi everyone. Welcome to my talk about hardening Kubernetes by securing ports. I'm Suresh Deshmukh. I work for a company called Kinfolk. We do, we do a lot of Kubernetes work, like consulting around Kubernetes, writing code for controllers, operators and things like that. So that's my Twitter and website. I write mainly about Kubernetes and things around it. So a quick show of hands. How many of you have utilized Kubernetes or done something with Cube CTL? A lot of folks. Okay. I'll quickly skim through the initial introduction of what Kubernetes is just, just so that people who are not on board are onboarded. So it's, it's a container orchestration system and started by Google based on their internal project called bog, which they use for everything, all the VMs applications, everything run on bog as a container. And so based on their experience and knowledge, they started this open source project to write when the Docker project was started as well. It has very robust API system. Everything is highly declarative. You can talk to any API, just define a YAML and push it to the cluster and then it tries to, it tries to regain that state of the cluster. The workloads run as part. We'll, we'll see what part is in a bit and all the, all the configuration, all the information of the cluster is stored in HCD. HCD is a key value pair database storage. It's a cluster software. So just a quick rundown of all the, all the distributed components that form a Kubernetes cluster. So this is taken from the Kubernetes docs. So here on the left, you can see all the things that, that form a Kubernetes master. So it's a master worker architecture. So master has API server, which is like a gateway. Everything goes through it. Everyone, like all the components talk to API server and API server does the delegation of work. There is cube, cube scheduler, which, which will schedule the workload as, as you, as you, as you create something, any, any configuration of pod or something like that, it will be, it will be scheduled on one of the nodes depending on various factors. There is controller manager, like Kubernetes comes with default set of controllers, like for, depending on what kind of workload you're running, like if it's a batch processing, then it will, it will, it will do, like end, end, end job. If there is a server kind of workload, it will keep it running all the time and things like that. So that's controller manager. You see HCD and then there is, and every node has a demon called cubelet. So cubelet is basically, is responsible for taking care of starting workloads, talking to API server and things like that. So for, for this talk, I is very much, very much close to pod or very, that's why I'll talk more about what parts are just a bit. So parts are basically set of containers, which, which are always scheduled together on any node. Like they'll always be running on a single node. Like there won't be a case where parts are running on containers are running on different nodes that form a part. They share network namespace and they also allow to share a volume. So like here you can see in the image that here on the right, there is a web server. It's serving the web content continuously. Now the other, other container is file puller. So what, what, and they're sharing this common volume here. So what's happening is web server is continuously sharing the content from a static volume and file puller is basically pulling the latest content from a gate repo. So that's how this pod is always serving the latest content. Now you might ask like why you have a different container. They could be in a one container as well, but then the life cycle of one container doesn't depend on the other. Like you could have your custom file puller application and web server could be engine X. So like that. So just we'll quickly see how a pod config looks like. So here you see a pod configuration. It's like everything in Kubernetes. You'll see a lot of emails in this, in this presentation. So yeah, so here you can see there are two containers. One is the web server engine X you saw that we saw before. And then there is the file puller app. So web server engine X is basically as a volume, volume mounted at this path and the file puller is another application, which is a mounting volume at the slash data path. So what what's happening here is they are the file puller is dumping the data into slash data and the engine X container automatically gets the data at the mount path for the app static. And they are they're using a kind of volume called MTDR. So MTDR is a type of volume in Kubernetes. It's a it's a Kubernetes supports multiple volume types. MTDR is one of them. So it's a it's a it's it's not a persistent storage. It's a it's life is very much attached to this part. So if the part goes down, this volume is deleted. So you use only use this kind of volume only when you want to share content across multiple containers. So yeah, let's get back to the presentation. So let's let's start with the with the threat model models in the Kubernetes. Like there are external attacks. Like you saw, there are multiple web servers running. There is API server. The cubelets are exposing on some port. So they all form a kind of attack vector that that attacker can try to exploit. Like in few releases before there were a few server a few few ports that were left open like where you can where you can scrape the scrape the configurations from for for for CIDOS and things like that. So that could be one of the attack vector. Then there is a compromised containers or nodes or attack from inside. This is what we'll be focusing on. I'll detail on it in a bit. Then there is compromise credentials where you just put the configs on GitHub and then everyone gets access to it. And the last one is misuse of legitimate privileges as in the RBAC policies are defined in such a way that users is just wide open and anybody can do anything on the cluster. So, so yeah, like I said, we'll talk about what the attack from inside is. So, yeah, so attack from inside basically what you could say is that you storage with trust our developers and we allow them to do whatever they want. We don't want to restrict them, but I mean, that's that's a good thing to do. But without checks in place that could be configuration misconfiguration submitted to the cluster and and something can fall through cracks. So for such cases, you need checks in place where even if you trust your developers and allow them to do stuff, you you still put a check where something is red flag, then it is it is brought to notice or or in other situation where you could say that there is a like you developers are using some open source tool or library that is that is not properly audited or and and and new vulnerabilities found and then you will be surprised with the news. So another form where multi attack from inside can happen is let's say you are hosting a multi tenant Kubernetes where multiple people or multiple clients are sharing nodes across now there could be a situation where a legitimate looking client is trying to basically do some malicious stuff like try to get access break out of the container gain access of the node and once they are on the node they can basically since multiple people, multiple pods are sharing the same nodes. The secrets can be exposed to others as well. Now the secrets could be AWS credentials, ideas credentials specifically. So yeah, these are various forms of internal attacks. Let's see what the state of container and Kubernetes security is like currently. So this is the definition of what a secure default is like it says that out of the box experience of any configurable any software should be secure by default and then Kubernetes and then users users could be allowed to basically opt out of it like so basically they are allowed to screw up themselves but you give what is secure by default. So yeah, so what has happened throughout the life cycle of Kubernetes development is like the project started and there were multiple aspects of it like Kubernetes is highly configurable. You define EMLs and configurations and things like that. So the configurations that were or the default flags that were defined earlier were not secure enough. Like those security things are put in place now but not everyone is updating with the newest thing because now somebody has developed application thinking those defaults won't change or since those are defaults now making anything make making change to any default is basically a change that will that will affect everyone. It will be a breaking change for users. So so this the secure defaults are now not default but opt in for user. So this is the problem also not a user name space is a very new thing and not everyone is using it. It will allow people to be secured also a lot of people are running root inside container. That's a that's a huge no no. And things are basically not secure by default like people have to opt in now. So let's let's dive deeper into what's you ID zero inside container or root inside container. So basically if you see if you reckon this is a Docker file. This is how you build a Docker container. Basically you start from you start from one of the you start from the base image like here it says federal 30 and this process that starts in this container is sleep like it will it will just sleep it won't do anything. Now the difference between the right and the left container image is that the sleep process that starts in the left side it's it will start as you ID zero or it will start as root and the one on the right will start as you ID thousand or it's it's basically non-rotors or has less permissions. So so like a Dan Walsh the AC Linux guy likes to say containers don't contain and which is kind of true because containers or Docker containers specifically if you talk about these are they are just using bunch of Linux kernel technologies to try to contain various various aspects of thing into into into itself like the big names first will try to contain the the processes the user names will try to contain the user so it's not it's not like VMs VMs are actual sandboxes are because they come come with their own kernel and things like that. So to compromise to compromise the VM it's it's kind of hard like because you have to find a vulnerability at the hardware level and for for containers you just find find a kernel vulnerability and it goes for toss. I mean you you might say that the hardware vulnerabilities are now higher with because now like every every few months we see a new hardware vulnerability but I think I mean because statistically speaking still kernel vulnerabilities are more so talking about the recent recent vulnerability in the in the space like how many of you know about run C. Okay so so run C is basically a lightweight container runtime which this run C project came out of Docker. They basically abstracted this whole code that's that is the container runtime behind Docker and then they donated it to the CNCF and then from from that implementation they created a spec called open container initiative. Now anybody who adheres to open container initiative is so so run C is one of them and it is used by Docker it is used by Creo and so majority of the container space has run C this vulnerability what happened was if you are root inside the container in that Docker container or this run C container you could basically just switch the run C run C binary to whatever you like and then gain access to the host. So this was a major vulnerability and the basic fix to this vulnerability was if you are not root inside the container you are you are basically saved but talking about root inside container what I did I did I did a survey that if you go to the Docker hub marketplace you see a bunch of applications that are widely used like frameworks like Kafka and other messaging services and web servers and things like that. So the ones the default the official images of this applications out of like out of this like I think 80 applications only 82 percent 82 percent of them are using root and only 17 are using non-dotal what that means is the example that we saw earlier only 17 percent of the official images which are like the widely used images only 17 percent of the user will be able to define some user which is not root in there. So it's a really bad situation because if the official images are like not providing you secure by default things obviously users users won't take a effort because they they have to just consume and deliver applications within time within deadlines. So that's that's the situation right now. So what could a route inside the container. So that's where port security policies come in. So port security policy is basically a configuration that you define in the Kubernetes. It's a cluster wide configuration. Like there are two types of configuration Kubernetes, like a namespace level and one at a content cluster level. So it enables you to do a fine-grained authorization of pod while creation or while updating the pod. So basically you can define what UIDs are allowed inside a particular container or what file system groups are allowed and what capabilities you can allow to that particular container and things like that. So let's see how a port security policy looks like. So what I have here is there are two clusters. One is one is with the both are having three nodes and the upper one that you see has port security policy enabled and the lower one you see it has no port security policy in the cluster like you have to boot the cluster with port security policy enablement. That's why I have two clusters. Now up there like down here you see it doesn't have anything and here you see a port security policy defined called restricted. It's called restricted because a lot of things in there are like it's kind of strict one. So let's let's see the amel. So here you see that the this particular port security policy it has in this annotations defined seccom. So if you don't know what seccom is seccom we're using seccom profiles. You can define what a particular process like if you if you bind a seccom profile to a process it is allowed to do certain syscalls or it is not allowed like you can define with docker comes with default seccom profiles which which has like a bunch of things which like you cannot do things that will that will allow you to gain access to the host and do things do nasty stuff on it. So so here you here in this port security policy I have said that they use the defaults profile. You can also have other profiles as well down here you can see that there is FS group information and you see the allow privilege escalation is false. What that means is user with this port security policy user cannot start a port with privileged access like cannot share the host names host names spaces like normally when you do docker run you could also do hyphen hyphen privileged and share the host name space or network name space with this port security policy enabled on particular port if a port is requesting it it won't be allowed. You see following capabilities are asked to be dropped like this capability simple container tries to gain it won't be allowed. Also this run as user configuration as in when the when the container starts inside the port it should be non-root it cannot be UID 0 and it also lists the number of volumes like I said before Kubernetes has multiple volumes. So these are some of the volumes which are quite listed like only these volumes are allowed in the port that will start. So what we'll do is we'll start containers started deployment here. So I'll just try to show like what happens when when you run this when you run this command. So basically it's trying to trying to start the engine export with name web and it's it's dry run because it just want to show like what happens here. So it's creating this specification like there is one container with one replica and then the images engine X it's called web and let's let's create it probably it goes in here. So now here in the in the cluster which has no PSP. I'll just go and run this command and it has created that deployment and then when it says that it's a container creating what it means is that the port request has been accepted and now it's going to form the port and it will pull the image and things like that up in the cluster which has PSP enabled I did run the same command there also it says deployment created and if you see the parts it says create container configurator. So what that means is that the port request is not allowed or it's still waiting. So let's see what's what's wrong with this particular port in Kubernetes. Like I said, you can see any YAML any any resource it has a YAML associated with it. So content supports has this field called container status and if you see the it says that the container has run as non root image. So what we did was we started the engine X in engine X image that it is the default image that is pulled from Docker up and here you see that since it doesn't have a non root user defined in it and in the policy we said that run as not must run as non root. That means it should start as something that's not zero. That's not yet zero. So here it it clearly starts at you guys yet is zero. That's why this image will will not run. So when it is blocked it or this port security policy blocked it for us and to find out what the all the all the conflicts in the port security policy we saw earlier are now added to this particular thing because if you saw earlier in the dry run the spec was very small, but now the spec has grown because this all this all this configuration was augmented into the into the request that went to the API server. So you can see those capabilities are dropped here like the kill MK NOD and said GID UID are all dropped. Also the must run as root is true. That's why this is a conflicting thing that Kubernetes found and then block the block the request. So also in the in the just to see whether which which you can have multiple port security policies in the cluster enabled to see which one rejected this request or which one accepted this request. It's shown in the annotation. Now let's see what happened on the cluster which is basically so basically this this didn't go through this request and port security policy helped us stop it. So going back to the I have one more demo which is I mean it's it's not everyone or not anyone will not someone in the same mind will do it. This is like doing RMRF from the container from the port that you're running. So let's see what it is. So like before I have two clusters. One is one is with PSP and another is without PSP. And yeah, so this both the clusters have one master and two to worker nodes running there. So what I'll do is now this is a this is a deployment config file which I'll run and I'll try to I'll try to host I'll try to mount the root of the node like wherever this port is running its route like you see the slash it will be mounted inside the container or inside the container of the port at slash host path. So I can go inside that host path and see stuff. So let's see I'll create the configuration in both the clusters. The deployment is created in the first cluster in the PSP cluster and also in the PSP cluster. So if you see the the pod started in the no PSP cluster it said container creating like I said earlier the container creating that means the pod request went through and now the pod the image will be pulled and pod will be started. But in the in the cluster with PSP it didn't start. So let's see what's what's going on. So those who are familiar with Kubernetes they know like when you create a deployment, it creates another conflict called replica set and in replica set takes care of starting those ports. If you see the desired is one and current is zero that means that is only there is only one. I mean it didn't reach the state it wanted to. So again looking at the AML it shows that in the conditions. If you if you see that the host path volumes are not allowed to be used while we were asking for the host path volume. Basically all the port security policies in this cluster none of the policy allows host path and hence this port couldn't start with the host path volume. If you remember from before like we were only allowing these many volumes host path is not in this list that means this request was denied by the by the Kubernetes cluster. And that's that so basically since you are trying to mount the root of root of the node and the port security policies saved you from doing it like it is not allowed. But in the cluster with no PSP the pod is running fine. Now what I'll try to do is I'll try to do some evil stuff here and not not everyone will do this stupid thing like mount the entire root of the node inside the container. But like for the sake of demo like why not but so yeah let's see. So basically what you saw earlier was here you see the pod is running in a w zero node like we have like I said there are two nodes w zero and w one. So I'll I'll exact into this particular part and go to the slash host path. That means I'll have access to the so here like I said before the slash host path is mounted inside the container which is root of the node. So let's go in there. I just I just just to show that the things from the container are shown inside are seen in the node as well. So here I'm I'm on the node. I have a session to the node and just go to the root and I can see that the file that was created from the container is available here now. So going forward we'll we'll just try to delete this whole stuff basically making this node unavailable from the cluster that doesn't work. I don't mind of Star Wars. So yeah deleting this whole thing and all the all the files that are currently used are we see like before root now only a few things are available. Basically you can see that I exited it. It still shows that the pod is running because Kubernetes runs this reconciliation where it will try to talk to the Kubelet if after a certain point if Kubelet doesn't respond it will it will it will Kubernetes API server will declare that node to be unavailable. So here you see I'm trying to exact into the pod again but it says that no such container or it tried to run the nsenter command to bring that shell to the to this to bring the container shell to the here but it didn't work also inside the host. I don't see there is less working or when PWD doesn't work. So so basically what I did was I I basically made this node unavailable in the cluster now any user or who has any data on this cluster or on this node basically is will be rescheduled on the another node to demonstrate like in the meantime it didn't happen that the pod deleted pod was deleted and started on other node. I manually deleted the pod and now it's it started on another node here. I'll do the same stuff here as well and basically make the entire node unavailable for the cluster again. So if you if you if you have noticed that this is two node cluster and both the nodes are gone from the cluster now and basically anyone who is sharing this particular cluster with me will will will not have any information now. The so anyone who is a any attacker who is smart enough won't do such a stupid thing. They'll they'll silently try to gain access of the secrets. So any any pod that is scheduled on particular node, it's all the secrets mounted inside the inside the container inside the pod are also available on that particular pod. So as an intelligent attacker would try to just gain the secrets then go to their AWS or whatever other conflicts they have in there. So this is like I mean even though it was a stupid demo, but you know its implications that it could be really bad. Now the last thing that you see here is what I tried was now both the nodes are gone and the pod is trying to be created since like Kubernetes tries to bring it up again. You see that it has bunch of errors that it cannot do it. The APS server has still to figure out that the node is not available. So that's that's a kind of attack that is that is possible if you don't have a PSP. So just to talk where PSP sits in the whole process. PSP is called admission controller. So any request that go comes to the APS server. It goes to three phases. The first phase is the authentication models whether you are allowed to even talk to this cluster. Once you once you go beyond that phase the authorization kicks in where RBAC and other policies like whether you're allowed to even create a pod or any other request that you're doing and then comes the admission controller. So the pod security policy is a mutating admission controller, which what what means what it means is that when like we did a request earlier you saw the YAML was very small, but then it was it was exploded because using the pod security policy what what this what this controller did was it added those defaults like the capabilities are not allowed. You cannot run as route and things like that. And once the request passes through these things then it is sent to the APS server. So what's the state of PSP in in general in the Kubernetes space right now? So out of these I mean I just picked for most used public hostings and so on Google Cloud you can see it is it is allowed but anything with with PSP enabled it's a beta feature. It's it's not a GA feature and any problems with these are not not subject to the support. Then on Azure it is not allowed yet the Azure's Kubernetes service on Amazon. It was announced like two days back on their 1.13 offering. It's it's a very new PSP enabled by default offering that OpenShift online has a pod security policy enabled by default. Now so how many of you know about Helm or have used Helm? So so those who don't know Helm is basically a packaging packaging application which helps you since Kubernetes has bunch of configurations right? You cannot give just a huge amel and say, okay, deploy this thing and it's not configurable or even comprehensible by a normal user. Helm makes it easier for someone to write bunch of configuration package it distributed and install it on your cluster upgrade applications. So Helm is there are bunch of Helm charts like for all the default application there are Helm charts you will find on the Helm repo and every chart that comes in goes through two stages incubator stage and stay and then it is graduated to stable. So on the Helm chart repo if you go out of the all the Helm charts only only 8% have pod security policies defined in them and in the incubator repo only 3% 3.6% have pod security policy defined in them. So it's a very it's a very small number of charts that support this and so yeah what we can do to improve this state. I think talk like this or more information about the importance of this awesome feature of poor security policy would be needed more demos also what I have seen in my experience is that since for security policy is inhibiting things like the default things might not work by just enabling this you will have to carefully roll out this feature because there might be images that are running as root or you are doing host part mounting you'll have to configure those for security policies only specifically for those applications. So it's a it's a gradual work so Docker images need to work Helm charts need to work and since these are secure practices they should be incorporated from Dave and like generally security is just an after thought it's not given a iterative approach it's like okay we need to deploy and let's see if this works if it does and it only then deploy or just scrap it will look at it later and due to this mentality the security never comes in like this Andy black Blake who came here for the Kubernetes introduced he said security should be given a iterative approach and not a waterfall one so this really resonates with the thought. So Kubernetes port security policy is just one thing or one aspect of the whole security features that Kubernetes allows you should always do security and layers so even if one breaks others will hold on strong. There are other features like network policy you since any any attacker who comes or gains access to the container or shell of the container they won't just do stuff in there they'll try to talk to other things try to gain try to talk to databases and see if they can pull out some data from there. So network policies sort of help you to restrict that secure secure image building practices like not putting root inside the not allowing root inside the image from the from the building time itself. Audit logging is a Kubernetes feature which when allowed when enabled you see all the requests I mean you can configure whether how much of the information you want to be logged the what request where what responses where who who did that and all this or this trailing you can do later if you if you meet a problem then avoid mounting service account. So service account is a secret that is mounted in every part that is started which basically helps you talk to the API server from within the cluster. If you like if your application doesn't need to talk to API server you can basically say that no service account so it's not mounted and then a user cannot do anything if attacker cannot do anything if they gain access to the shell. The other thing is Arabic should be very scrutinized like you don't allow everything to everyone. It's a straw. It's a very powerful feature and you should be used with caution. The another point is you can use containers that actually contain are which are actually VMs running as pods. So they are not sharing any kernel. They have their own kernel. So you could use Cata containers or Qwad Gweiser is not is not really a VM but it's it starts its own layer of it's a project from Google where the containers don't directly talk to the kernel. They talk to the internal layer and then those requests go to the kernel and the another one is deny escalating is like poor security policies one admission controller. This is another admission control you can enable. So let's say if a container or infrastructure pod is running as a privileged mode, a user cannot directly go and try to gain the shell access. So these are a bunch of thing references and thanks. I would like to take questions. Yeah, thanks for the awesome talk. Just just a quick question. Can PSP's be configured cluster wide policies and and also namespace wide? The which one puts PSP's it's a cluster wide policy. You allow it using role bindings and cluster roles. Okay. All right. Thanks. Hi, uh, Ankit here. I just want to know that, uh, you have destroy list, uh, images also. So is that more secure than this way? Because since it does not even have bash in that, uh, can you repeat please? The destroy list images, which are there for our applications. So they don't even have bash in them. So is it more secure than, you know, this way? The number of security features you have the better. So just using the scratch images without any bash without anything is is always good. Yes. Yeah. Hi. Uh, so, uh, like you mentioned about, uh, the eight in the Docker hub, there are around 82.6 percent, uh, images which are, uh, having root as their primary user and there are around 17.6 percent users, uh, images which are not. So, uh, like I was just curious, how did you arrive at this stat? So yeah, these, it's not a comprehensive number. It's, uh, I've put those numbers on the GitHub repo as well. It's in the references. So I picked up all the most used frameworks on the Docker hub marketplace, not it's, it's not, I mean, if I go and do for entire Docker hub images, it's not possible. It's like millions of images there. And so it's just most used frameworks on the Docker marketplace, not on the Docker hub. So how you mentioned you can have multiple PSPs, uh, and how do they interact with each other? If I try to create a pod, is it denied by default? Yeah. Yeah. So, uh, so basically you create a cluster role which allows it's you allows the usage of port security policy and then you use a, a role binding, which basically binds it to particular service accounts. So the, the service accounts who have access to the, the ports that have service account access, they, they'll, they'll be, uh, they'll be scrutinized under those port security policies. Otherwise you'll get that error that no port security policy allowed, uh, was allowed. So pod won't start. So before enabling it, you have to have those configurations in place and only then, uh, then start the cluster with that admission plugin. So one last question like, so I have seen some containers that, uh, you can say still need the privilege mode to run. For example, there are some Docker containers that still need privilege mode. For example, if you can talk about the tensor flow containers. So if you want to run them, you need the privilege mode. So if I have a cluster wide policy with this privilege mode, so then what's the workaround for that? So the workaround is you write port security policies, uh, specifically tailored for those applications and don't allow them to be accessed by other normal containers that don't need that privilege. You can have multiple port security policies, right? And you just enabled it for that particular tensor flow application. So is it like binding to a particular namespace, which is binding to a service account? Yes. Okay. Question. Was this the last question? Oh, yeah.