 All right, I think we are all set here. Hi everyone, thanks for joining us today. Welcome to today's CNCF webinar, Kubernetes Security Controls and Enforcement, Applying Lessons from the Kubernetes Security Audits. I'm Chris Shorter, Principal Technical Marketing Manager at Red Hat on the OpenShift team. I'm also a cloud native ambassador. I'll be moderating today's webinar. A few high-QV-9 items before we begin. During the webinar, you're not able to talk. Sorry, but there is a Q&A box at the bottom of the screen. Please, please, please use that. Drop in questions. There is no such thing as a stupid question. There's no such thing as a stupid person. This is all hard stuff. Ask away. Also keep in mind this is an official CNCF webinar. As such, the CNCF code of conduct is in place. Please do not add anything to the chat or questions. That would be in violation of the code of conduct. Basically, please be respectful of all your fellow participants and presenters. And with that, I would like to welcome Conor Gilbert, Senior Product Manager at StackRocks. Take it away, Conor. Hi there. Welcome. As Chris mentioned, this talk is about Kubernetes Security Controls and Enforcement, and especially about how you can apply lessons from the Kubernetes Security Audit. Now, who's this? Well, I'm Conor. I am a Senior Product Manager at StackRocks, and I used to be an engineer, so don't hold the product title against me. I happened to stumble into Kubernetes in 2015, back when it was a little bit different from how it is now, introduced it at one company while I still worked there as an engineer, and then moved on to start building container and Kubernetes security products after that. Because I saw just how good this could be for security, the job that I was doing. Now, what do we cover today? Well, we'll start with just what happened in the Security Audit, a little bit about the process, some takeaways, some key takeaways, and selected results. Then we'll cover some of the advice from the audit, including some native case controls that you can leverage for better security. We'll talk about how security responsibility is shared across the stack, what's new in containers and Kubernetes, and what's not, and default configs versus the ideals. And finally, once we've covered the controls, we'll talk about how to design a Kubernetes-native security strategy so that you can enforce security controls and practices without sacrificing the velocity that you've come to enjoy with Kubernetes, OpenShift, and related technologies. So let's get started. After each of these sections, we'll stop for questions. So if you do have a chance to put something into the Q&A, we'll try to handle some of those as we go. So let's get started. What happened in the Security Audit? It made a lot of headlines. There was a lot of news coverage about it. But what actually happened here? Well, maybe it's best to start with what came out of the audit. So there were four main documents. And let's cover those in succession. First is the security report. And this is the part that most people have maybe read or read a story about. And this is the one that focuses on Kubernetes product security itself. So security bugs inside of Kubernetes, problems in the code base, the documentation, things like that. This is where some of the headlines about Kubernetes vulnerabilities came from. But there were three other documents. And I think they're all quite interesting. And they have some unique value that we'll explore throughout this session. The next is a threat model. So that describes some key components of Kubernetes and how effectively they're secured. So it applies sort of a structured approach to analyzing the system as a whole. It goes through piece by piece parts that I hadn't even heard of because I don't take as much of an administrator role in managing Kubernetes. It's more of an app dev and security. There's also a white paper, which explains important aspects of how Kubernetes is designed. And it recommends some actions that you can take as an operator, as a developer, or as a security person responsible for some Kubernetes based infrastructure. And then the last one is a threat guide, which goes over sort of how to secure or attack a Kubernetes cluster with the sort of penetration tester, maybe red, blue team kind of vibe. So four big documents, a total of 241 pages. And I expect you all to have read them before you started. No, just kidding. So obviously, not everyone in the world has read all of these, although they are very rich. And I'd recommend that if you want to spend some time seeing what a good test result looks like, do take the time. So how do we get these documents? Well, logistically, there was a security audit working group. They solicited some proposals from the community. A number of vendors were invited to bid. And eventually, the group selected a winning bid, a team of trail of bids and a TREDIS partners. And then the main part of the assessment was March through May, Kubernetes 113. And the results came out in August. So that's when we got to start churning through those 241 pages of outputs. And not even just those, because this was an unusually open audit. Now, CNCF has sponsored audits of other projects, like Cordy and us, Envoy Prometheus. I understand there are some plans to sponsor more, which is a great, great trend. But when I said unusually open and lots of material, the team actually published a lot of the notes from the security audit, including some of the interview notes of how they talked to different parts of the community's community about how different components worked. And those are some really interesting details that you can find there published on GitHub. And then also, unusually, in some context, all security findings were released publicly. So issue 8146 on GitHub, Kubernetes, Kubernetes is where you can find that. And that has a master issue that links to all the issues that are being covered. But beyond the context and what PDFs came out of it, let's talk about the takeaways. So as I mentioned, CNCF has sponsored this and some others. But really, what this shows is that the community's project and CNCF are really investing in product security, that the community is getting serious about this. You've seen the processes and everything getting worked out through some of the recent vulnerability disclosures. But really, this is another way of investing in product security, because the community realizes how important Kubernetes is to the cloud native stack. Second, security audit did identify a number of security issues in specific Kubernetes components. And those are things, this is probably what you might have heard if you read any of the press about this. Those severities, they range from informational things up to high severity. There are improvements recommended in various areas of the products which you'll cover. There are GitHub issues for each of those. And some of them have already been fixed. In fact, some of them were fixed by the time people even read the report. Now, configuring Kubernetes can be complex as well. So that's the third takeaway. And that's sort of the undercover part of this audit report. To be secure, you do need to take a lot of steps to protect your infrastructure and applications. But they're not as mysterious as we sometimes might think. So there is a lot of really specific advice inside these audit reports that we can use to get more secure as a community. Let's go over the product security findings. So first, we'll cover sort of an overall assessment from the security report and then some of the details before we move on to those native controls that we can use to secure our own workloads. So there's one paragraph. I've broken it up into pieces, so it's easier to digest. But the penetration testers wrote that, overall, Kubernetes is a large system with significant operational complexity. And if you've administered a cluster, you already know that. The assessment team found configuration and deployment of Kubernetes to be non-trivial, with certain components having confusing default settings, missing operational controls, and implicitly defined security controls. And the state of the Kubernetes code base has significant room for improvement. The code base is large and complex, with large sections of code containing minimal documentation and numerous dependencies, including systems external to Kubernetes. Now, I think it's important to recognize that this is a sort of measured look at, sort of honest look at the state of the project, and that this isn't an adversarial thing. This is something that's part of the community process. But it does highlight some of the topics that we'll go through. Now, let's go through some of the themes of the issues. So one of them is parsing problems. So certain handlers do have to take user input. Otherwise, Kubernetes is just sort of sitting by itself doing nothing. But some of those handlers could overflow or run out of memory. There are three findings. Those TOB dashed gates numbers are things that you can search for in the audit report to get all the details. You can also find them on GitHub because the issues are tagged that way. And this is also an area that's seen more security coverage lately with the billion laughs attack or vulnerability where you could have a specially crafted YAML that would exhaust memory on the API server. Other kind of changes, documentation. So there were some recommended edits about maybe for an encryption settings, we should deemphasize this one, suggest more secure settings, that kind of thing, as well as document that if you have a pod security policy with host path limits, if you mount storage through a persistent volume claim, it's not going to be enforced because pod security policies are only looking at the specific object. So that was sort of a known system design, but needed to be documented better. And that's what was done in response to that finding. So there are some insecure deprecated features. So things that came on early in Kubernetes lifetime but still exist. So the recommendations to sort of turn them down or deprecate them further or just finally remove them, as well as some problems with how they were working, some non-constant time password comparison, passwords being stored in clear text, that kind of thing. It's important to know that those features weren't things that you were likely using, but they did exist and the audit did find them. There's some information leaks, some verbose logs. If you bumped up the log verbosity to, I think, seven out of 10, maybe you could get some secrets and logs in specific subsystems. Those are fixed kind of right away. There was a bug in Cordeon S allowing a zone transfer. So sort of leak out your service information, some sensitive environs used on the host. And then some dangerous designs where features are really working like they're supposed to, but the audit realized they could be abused. And maybe people knew this before, but important to note that things like readiness of Linus probes, the context in which they operate in the host networking stack means they can be used to try to figure out more about the cluster. And then some feature requests. So some longstanding ones like set comp, which there's been some movement on recently, and also some TLS or revocation issues. And it's important to know that a lot of this stuff or some of it had come up before, some of it hadn't. And that's sort of gone through the normal process with the community. And you can see all of that actually in GitHub. So that's it for the section on the audit findings. Chris, I don't know if there are any accumulated questions about this section or we can move on to the native Kubernetes controls. Keep on going. All right, let's keep going. So this is the part beyond sort of the facts of the report. Some real interesting analysis about how you, as an operator, as a developer, as a cluster administrator, how you can use Kubernetes to be more secure. We'll cover more about the overall context, but first we'll sort of dive into security responsibilities and specific controls before we talk about how you can enforce them and sort of get an overall better security posture. So, you know, it's almost cliche in a security talk to talk about shared responsibility, but I do think it's important to note this. And the audit does go out of its way to highlight that there are different responsibilities involved in securing Kubernetes cluster that ranges from the application developers, the folks doing application level operations, making sure the apps are up. It also includes the infrastructure. So the nodes, if you're using a, you know, the standard kind of node-based Kubernetes cluster, manage service, if you're, you know, sort of asking someone else to manage the control plan for you. And then sort of the security functions, your risk, security, compliance, things as a business or as a large organization, you know, those are real stakeholders and teams that will have to be involved to get your overall posture right. So that's kind of the organizational structure, maybe, but we can also swap to kind of the security surface. So if those were the people and the teams that were involved in security, we can swap over and look at sort of the security surface that you need to pay attention to. So on the left again, we've got sort of the application and that usually again starts with application security. If you have an insecure application and you put it in a container, it is an insecure application. The container does not solve that problem. If you have a vulnerability in your code or if you have a vulnerability in a library like Struts or something, the container doesn't help you. But once we get sort of past that, there are some interesting patterns and things that we can do, especially because we're using Kubernetes. And that includes the deployment configs. So Kubernetes, as you know, will have declarative specifications of how your application can operate and we'll cover some of those in more detail, but they do let you lock down your application a lot more in a much more standard and visible way than you might have before. And the other thing about sort of cloud native design is microservices. So microservice interactions is it sort of, depending on if you were using microservices before, it could be a new attack surface or new security surface to concentrate on in this Kubernetes-based stack. And then again in the middle, we've got the infrastructure. So Kubernetes API access, the API is actually gonna be a pretty significant security surface to look at. And then the nodes still do matter. And finally on the right side with the sort of security governance risk compliance area, you're gonna have some security configuration up front as well as some reactive monitoring. So important to think about everyone who's involved. This isn't just a developer problem. This isn't just an ops problem. This isn't something you can make someone else do. The only way to do this right is to have everybody use Kubernetes as a common language where you can meet and understand sort of what's going on, use the configs to your benefit for better security. So with that in mind, let's cover sort of what's different in Kubernetes and what's the same. So let's talk about threats. So apps. So on the internet, nobody knows you're a dog. Sorry, nobody knows you're running in a container. And that's something that we've founded and some research that we've done here at StackRocks. But it also kind of makes sense. A container is the Linux process from the outside, you might be hard-pressed to figure out that it is a container. And if you're poking around inside, well, you've already gotten in. So from the outside, the threat posture is actually gonna be fairly similar. On the infrastructure side, you do have sort of a new thing, a really, really powerful API service to protect. And depending on how you're doing your deployments before this might be new, but it also might be sort of just replacing your old sort of VM control plane or something. But it's an important, it's because it's new and it's often exposed in the internet in a way that maybe wasn't done before. Let's talk about security workflows around those same two apps and infrastructure buckets. So for apps, well, the cool thing though is, while on the outside, nobody might know you're in a container, you do because every app you've deployed has an immediately shots of 56 of it. So you know exactly what it is. And if you've ever tried to do inventory, if someone came to you 10 years ago and told you you'd know exactly down to the bite what was in every piece of software deployed in your cluster, you'd be pretty pleased. The apps are configured with declarative specs. So you've got YAML files. Everyone in this community has learned to put the rule around the machine and work out the YAMLs and figure out the spaces and tabs. But all that learning curve and stuff is worth it because it gives us a really set down speculative how the apps can work when it comes up. Typically you're not messing manually with your apps as much. If you are, it suggests that you kind of wean yourself from that practice. So there's much less of this like, oh, I'll just SSH in the VMs probably messed up. And then apps are ideally built to be failure tolerant and sort of this microservices cloud oriented, cloud native design, which means that you can take different actions when you have a security problem. Let's talk about infrastructure. So I had mentioned a new powerful API service to protect, but you also, that API is a central place where you can configure a lot of the important security configs that you need. I kind of call this software to find everything. And then I looked it up and found out that analysts have been using that term for, I don't know, five years. But the cool thing about Kubernetes too is that a lot of this is software so the configs are kind of right there for you to look at. If you've got your entire networking fabric as sort of a mesh, say it's Calico or something underneath, you're not beholden to exactly how your physical topology is set up or anything. And the infrastructure again, so it's a critical foundation. It's really important to keep the infrastructure secure, but it's a little bit more strictly separated from the apps because your apps are built to be deployed using the ammo files and to really kind of not care where they are. They shouldn't be using host mounts in general, that kind of thing. So your infrastructure is a bit more cookie cutter, a bit more strictly separated and that can be a benefit for security as well. So let's talk about that API and infrastructure security segment of your security posture. So here's an excerpt from the security white paper from Taylorbits. So we started with the security report, which is mainly about the Kubernetes project itself. We've moved on to the security white paper now because we're talking about sort of security practices. So this is saying, well, Kubernetes facilitates high availability work load deployments, all that stuff that you love. The underlying hosts, components, and environment of a Kubernetes cluster must be configured and managed. And this management has a direct impact on the capabilities of the cluster and affects the behavior of an operator's composed objects. So some dense stuff there, but the examples they go on to discuss are that your cluster has to be deployed in a way that say network policies will work. You need to have an underlying network provider that will support those policies. If you configure different parts of the API server, you might be bypassing entire configurations like RBAC. So that's kind of what they're getting at. And the composed objects, we'll see more of that in the RBAC discussion. So in terms of sort of basic API and infrastructure security, one of the most important things or maybe the easiest to add is just controlling network access to the API server. So I mentioned earlier the billion laughs attack. I can never remember CVE numbers exactly, but it's the one that you've heard about recently. And this affected more or less all Kubernetes API servers, many of which are exposed on the internet. So one of the sort of easy ways to secure this is to consider that network access a little more carefully. So I mentioned the billion laughs. Oh, and I even put it in the slides. So CVE 2019, one, one, two, five, three. And you would have seen that recently if you've been keeping up to date. Another example of this sort of is RBAC, which we'll cover in more detail later. That's Kubernetes role-based access control for Kubernetes API itself. So controlling who can do what in Kubernetes. Access control for nodes. So one of the things highlighted in the report is that you do need to manage who can get on your nodes because if anyone can get there, then anyone is rude, especially because Kubernetes has to trust certain components of the underlying infrastructure. And then some last bits here, just standard Linux node hardening. I'm not a Linux security expert, more of a container Kubernetes guy, but there are plenty of practices, including CIS benchmarks and things like that. And but your advantage here is that you have these sort of limited purposes cookie cutter nodes that are really just exist to have a container runtime, run the cooblet and then run your apps. They typically aren't gonna be doing lots of other stuff. Maybe some infrastructure stuff like logging, but not too complicated. So if we talk about infrastructure and API first, well, what's left? Everything else, pods. So the attacking Kubernetes guide, which is sort of a pentester oriented guide, talks about how we can break into Kubernetes clusters. And they point out that, often compromising a Kubernetes cluster isn't actually starting with an API problem or a vulnerability or something. Often because there's one API server and maybe a million apps in the cluster, it's often the app. So compromising a cluster often first begins with compromising a lower privileged pod. And the secure configuration of pods is an often overlooked aspect of the system. So basically what they're saying here is you have, job number one is keep the adversaries out of your pods. And then when they start giving some more security guidance, it's a second job, keep them in one place. So if you wanna keep your cluster secure, yeah, make sure you do the API infrastructure security stuff. That's kind of the basics. Make sure that nobody can log in and change what's running. But then once your apps are running, they do have some interfaces to Kubernetes. And it's important to both defend each pod and then try to put a sort of wall around each pod so that people don't escape out of that pod into the rest of the infrastructure. And when you're doing this, defaults aren't enough. So here's another section from the security white paper. So Kubernetes contains many default settings which negatively impact the security posture of a cluster. These settings also have conflicting usage semantics where some use either opt in or opt out specifications. And they have a little empathy. They say the conflicting usage generally boils down the preservation of backwards compatibility for both workloads and component configurations. So one of the things that the community's been good about is making sure that upgrades typically don't have problems. And that means that you can't have too much of a surprise when you do an upgrade. You can't have a security feature magically turn on that you weren't expecting. And that does mean that some things are disabled by default. We'll get into some specific examples later. So now back to the quote. Ensuring appropriate configuration of all options requires significant attention by cluster administrators and operators. I want to make sure that we're not being too hard here. It's a complex system and there's a ton of stuff getting deployed. So this is really about empowering you to know what the kind of things are that you can look for and how you can use Kubernetes to your advantage. So that say those default configurations that you can make an informed choice to adopt them, that kind of thing. So let's start going through the parts of a pod. First we'll go through image security. This is something that the attacking Kubernetes guide mentions in detail because they cover pods a little bit more. So file systems also often contain bash or package managers that further enable an attacker to gain a shell and install additional tools. An ideal installation should remove all non-essential binaries and prevent modifications to the binaries that are required. For cluster administrators, care should be taken that vulnerable applications and pods are patched as soon as possible so that internal attackers may not gain an initial foothold within the cluster. So there they were referring to the sort of personas they made through the test of internal attackers, external attackers, things like that. So basically like look, if we want to make sure that our pods can't be abused to get into our cluster. Well, one great way is to make sure that even if there's the vulnerability in the application that there's nothing there for the attacker to use. So if we can have fewer things inside the containers, then that's more work that we force every adversary who gets in to do. And if we have a package manager, well, that may encourage us to actually do bad things at runtime like installing more software into the container. But it also makes it really easy for an adversary to do the same thing. So they can, if they get a command injection, say the struts vulnerability is the one everyone thinks about, and if they can just apt-get install whatever they need, you've made life so easy for them. So it can be a lot better if you don't need something, just remove it. One thing, one of the things we always suggest is after you finished installing stuff in your Docker file or in your other way of building an image, just uninstall that package manager. The package manager sometimes needs an extra flag or two to convince it that you really know that you want to remove itself, but it's possible to do that. So that's sort of the image security bit. And then as that second quote sort of gets at, vulnerabilities are sort of an own goal. Now the way to fix those obviously isn't to just give a big list to some team and say like fix them all. But particularly if something is high severity, particularly if there's a fixed version available, really there's not a great reason to keep it running like that. So the more you can get in the rhythm of having teams regularly understand the outputs of an image scanner, other plenty of open source options or commercial options, the more you can get in that rhythm of understanding what's problematic, what can be fixed and then rolling it out just like any other code change, the better place you'll be in. So that's the image because the image is kind of a foundation for the security of the pod, but there's a ton more because Kubernetes obviously doesn't just run containers alone. There's a lot more. So let's talk about our back. So this one again has a fairly large excerpt from the pentest guide, the attacking Kubernetes guide. So all containers in a pod run with a service account and attack scenarios have been documented against third party services that orchestrate pod deployments using overly permissive service accounts. In these instances, a compromise of a pod container is catastrophic, but they note, however, those compromises may not yield much access if you're using role-based access control authorization controls, but at the time of the report, Kubernetes mounted default credentials in every pod. So an internal attacker could use those to access other resources in the cluster like the KubeLit. From there, that internal attacker might be able to move laterally throughout the cluster to get wider access and some of this is still true. So sort of an opt-in, opt-out, again, you do get a service account in every pod. It doesn't have much access, but it does have some access. And if your application really doesn't use the Kubernetes API, that's a really low-cost thing to just remove it. We'll cover again sort of a summary of those at the end. But let's get into our back in a little more detail. So we know that service accounts get mounted into pods and that that might be bad, but we also know from the guide that, hey, if we use our back, maybe we can mitigate that problem. So let's learn about our back a little bit more. So some of the difficulties of using our back, well, objects can be composed by referencing other objects that may not exist. So if you've seen a YAML, it may have a reference section where it says, I'm gonna reference another object. It's API version of V1 and it's a type, it's a cluster role and here's the name. And this is nice because you can sort of create your YAMLs in any order. You don't have to do sort of relational database management, figure out which one needs to come in first and foreign key constraints. But it does mean that you might make a mistake because you can reference objects that might not exist. Also, the objects can be created even if the controller, the component that's using them doesn't exist. So in this example, they're saying, it can be really dangerous when you're making our back policies because you actually have to test it. You might believe that your policies are in effect when they're not. This can also affect network policies which we'll cover in a little more detail. But if you haven't enabled the RBAC controller, all the role bindings in the world won't do any good. If you haven't disabled the attribute-based access control, the old authorization controller. For network policies, if you don't have a plugin that knows how to deal with policies, all the policies in the world won't restrict your network access at all. So that's one difficulty with RBAC is just that it's complicated to figure objects and that the infrastructure might not even listen to them. So a little bit more background just to show there's roles and role bindings more or less. And a role plus a role binding is what grants you access. So in this example, we've got some cluster roles. These allow you to use a pod security policy, which is something we'll cover later. But you can see on the left-hand side that the RBAC role is telling sort of which API, what kind of resource, and what I'm allowed to do, what verbs. And then there are some nouns like which policy names I'm allowed to use. And then the role binding on the right side of the slide shows that it's one of those references where the object might not exist. But again, it's to a kind cluster role with the name that matches on the left side, API groups that matches. And then it grants that role to a set of subjects. And those subjects in this case would be a service account, but they could also be a user, a group, that sort of thing. So that's how you get access. And based on sort of what we covered in the last few minutes, you might be thinking of some of the problems that might ensue. So there are a couple best practices. You can find this a little more detail on the, believe it on the CNCF blog. So some RBAC best practices with a nod to it's good extra. Cluster admin, considered harmful, just like go to. It's as bad as go to, I'm telling you. So cluster admin basically gives you access to everything. So it can be really easy to grant that to your initial couple of users and they can always fix something when something goes wrong. But that doesn't work so well when their account gets compromised, when they have a mistake, when they make just any kind of problem. And it sort of walls you into a sense of security without actually having any security because all these people can just change anything in the cluster. When you use role aggregation carefully. So there's a role aggregation feature. And if you're going to use that, just be careful that you don't sort of mix the meaning of the roles. So if you have a view role, you can add more permissions to it using role aggregation. But if you add an edit thing, you've changed the meaning of the base role that you're aggregating to. And that can be confusing and grant unintended access. So grant each role with just one binding. You don't want to have multiple paths through to granting access because then if you remove one, you'll think you've done it, but you haven't. Paired with that sort of cleaning up unused roles and bindings, it's much harder to tell what's going on if you've got a bunch of trash lying around. Then avoiding dangling bindings. That's where the white paper was talking about how you can reference objects that don't exist. Well, why would that be a problem? Well, one thing that can happen is you've got a binding that points to a role. The person doesn't have any access because that role doesn't exist. You add a new role and you give a binding to the person you want. And all of a sudden your old subject also has access. So just some problems that can happen with RBAC. And then a warning, just none of this really matters if you still haven't disabled attribute-based access control. So if you've installed since one.8 or so, you're probably fine. But if you've upgraded an old cluster, sometimes we find clusters that still have this stuff enabled just by accident, because it's a nice easy one-touch upgrade, but you've got to actually go and change the API server to disable this old feature. So that's RBAC. This plays in with namespaces, which is the next thing we'll talk about. We're getting toward the end, so there'll be a time to pause as well for questions. I'd have seen some Q and A come in. So let's talk about namespaces. So the white paper talks about how related namespaces were developed as a method to help provide workload isolation because running multiple potentially multi-tenant workloads in the same namespace, that would sidestuff the protection of namespaces, giving you a single flat namespace. So that's sort of the diagram on the right. You've got maybe 16 applications and they're all in one namespace. And that we'll talk about some of the controls that apply or don't apply when they're all in one namespace. You might ask, how do I put myself in a namespace? Well, pretty easy. So at the bottom of the screen, you'll see a deployment YAML. Inside the metadata, just where you'd name and label your stuff, you can also put a namespace. And that will locate the object inside of a Kubernetes namespace. Many, if not most objects in Kubernetes aren't namespaced, but you just need to put the stuff that you want together. And then when you use kube control to get pods or get your deployments, you just added namespace flag. And that keeps sort of different, often different teams, depends on the organization. But often we'll see people go either by service tiers or teams, things like that, because there's a natural boundary where this team is responsible for the stuff in this namespace, this team is responsible for that namespace. And it just makes it a little more predictable and also minimizes conflicts because when you name space things, the names are namespaced. You don't have collisions where someone wants to call something DB. And then they all try to fight for the name. So that's sort of what namespaces are. And let's talk about the security controls because they mentioned this, sidestepping the protection of namespaces. So on the left, there's that bad, large, flat single namespace. And on the right, maybe another way of breaking up your services. And each of those sort of boundaries there is a natural boundary for network policies because network policies by default, we'll talk about pods talking to other pods, but that'll be within a namespace. And to grant access across namespaces, you have to take a different step. So you can have a better sort of natural default isolation if you use a namespace for network policies. For RBEC, all of the role bindings and stuff are granted inside of either the entire cluster or in a specific namespace. So if you need to give someone access to edit their own services, you really can only do that if it's in a separate namespace. That way they can mess with other folks things by accident. There's the ability to mount secrets. Those are references within a namespace. You don't get to mount things across namespaces. So if you're worried about accidental secret exposure, things like that, just the way that the API works, there's no way to say, like I want a secret outside of a namespace as far as I know. It's a local reference is what it's called an API types. So breaking up your namespaces like this means that somebody can accidentally snag your secret value. The use of image pull secrets is also, you know, usually broken up by namespace because the secret lives in a namespace. That's if you're pulling from a private registry and maybe if you have different registries for different teams or for different service tiers, you can keep the secrets different and then have less of a everything breaks at once if you change the password kind of deal. And there are other boundaries as well. They really lend themselves to this namespace idea. I mentioned a second ago that people might make mistakes and accidentally edit services that they're not supposed to edit. I'll tell you a friend, I won't tell their name but they told me they're gonna start using RBAC right away because they had accidentally had an API call that didn't put a namespace selector on when it was deleting replica sets. And so luckily they were using deployments for their application but every so often in this batch process would delete all the replica sets, delete all the pods, prod goes down. They're trying to figure out why it's going up, why it's going down, why it's fixing itself. So a little bit of Kubernetes magic in there but they figure out that the service account they're running in, they weren't using RBAC. So they weren't enforcing that control and this was fairly recent. So again, if you're using it, extra credit, good job. If you're not really think about it because the range of mistakes or the range of things you're fighting against to go from mistakes to real security adversaries. Mistakes are things you'll encounter way more often than adversaries unless you're in a very particular line of work. So that's namespaces. Let's move on to one of my favorites. This is a pod level control. So the pentest guide talks about how in many pods the root file system is not usually read only. So that means that you can install additional tools. And again, sort of like having your image have extra package manager and things like that in it. This is a thing that really makes life easier for an adversary. A friend of Pentester was telling me after I went off on read only root file system, he said, yeah, this would make my job a lot harder. And I hope people use it, but I also hope they don't use it because it makes my life a lot easier. But so this is similar to having sort of a clean slate for your images. You want to have, you can set a read only root of file. So it's just one option. It's inside the security context for the container. And what that does is makes a slash file system read only. So by default, nothing in the container will be writable. For many apps, that's actually okay. You don't need to write things if you've gotten sort of a stateless app going or if all the persistence is in a database somewhere else or if you're just handling API traffic. Some apps do need writable paths. A lot of some apps do use a temporary path something like that. You can add writable paths, even if you have read only root file system by adding either a volume Docker file instruction, which will work if you're using Docker or so that'd be in your image. And if you're using cryo or other runtimes, you can use your Kubernetes spec to add an emptier volume, which gives you kind of scratch space, but it'll go to a specific path, which means that the adversary won't just be able to have a script that writes into temp. That's what almost everyone does. Temp is always writable. So if you can sort of just make it slightly harder for them by adding specific paths that you need writable, that helps you out too. This is the kind of thing where you can, I think everyone can do it. I haven't heard anyone come back with something that actually means they can't. So I'd love to see this one out. I'm gonna keep saying it till I die or till Kubernetes goes away, but that's not gonna happen. So read-only-root-if-s is one of these things. It also helps you with operational practices because if you accidentally start writing to a path, this can stop you, because if your developer has accidentally changed some path or something, maybe operations is gonna like that because they've allocated storage or something. Having the root file system read-only encourages people to be very specific about what they need to be writable and how, things like that. So read-only-root-if-s. And then finally, our last control will be network policies. So the advice from the security white paper is to ensure that the container network interface is as restrictive as possible through the definition of cluster network policies. And I think that's about the extent of the paragraph that's shown there because by default, every pod can talk to every other pod. It's great. We even have entire projects about making mesh networks between services. But one of the things you come up with is once you've got a bunch of microservices, not all of them need to talk to each other. And the default can either encourage bad practices or make you a better security target. So that's the default. To change that, you just have to apply a network policy. And then network restrictions are gonna be enforced for a given pod only once you apply a policy to it. So this is kind of an interesting rollout. By default, nothing is restricted. And then to restrict a single pod, it'll go pod by pod. And policies start getting enforced as soon as you've got a single policy targeting them. So one of the strategies we often use is to go namespace by namespace and apply a policy that just lets people inside the namespace talk with them in the namespace. That way you cover all your pods. They start getting enforcement. And then you can start whitelisting sort of external connections that have to happen. And then you can also work sort of iteratively to narrow down even within namespace if that's desirable. Just an example of how network policy look. So this is a graphic of active and allowed traffic. And then on the right, you'll see sort of a network policy that's been generated. And you can see that the spec says, this is an ingress policy. And it's saying from given pods that match the labels that have the app key and the TLS proxy value. Those are the people that are allowed to ingress two pods that match labels with the app WordPress. So WordPress is allowed to get connections from the TLS proxy. And it's an ingress policy. There are also egress policies, but often those are kind of a little more advanced in terms of your deployment model just because ingress is a little bit easier to reason about. Then egress, often we see people really just restricting like internet egress because otherwise you have to say on service A, I wanna be able to talk to service B. On service B, I want service A to be able to talk to me. It's a little more difficult to manage. This is also an example where labeling, which is part of sort of standard communities, metadata is really helpful because you can see all this is based on pod labels. So having consistent labeling practices and doing sort of tier-based labeling or any other kind of strategy that helps you not go container by container or pod by pod is really helpful. So those are the network policies. And that's the last of the specific communities native controls that we wanna talk about from the security audit. Chris, how are we doing? It looks like- I got one question if you wanna take that. How do you manage restriction between pod if an async non-HTTP TCP integration is used, i.e via a queue? Sorry, how do you- How do you manage restriction between the pods? I'm guessing this is like you're using a message queue on a non-standard port. What's the application used in slide 27 is the question? Oh no, that's a separate question, nevermind. Oh, okay. Okay, yeah. So this is the Star Wars product. I'm mainly using it for the visualization also because this is a CNCF webinar, but just to show you the network policies and kind of how they apply and things like that. The example apps are just custom apps. So network policies typically where you're gonna be applied at sort of pod labels. So that's your end points and then ports. So it works fine if you're not doing even if you're not doing HTTP and stuff. Some service tools will be more oriented toward HTTP, HTTPS, GRPC, but network policies are kind of like level three, level four controls that apply even if you're doing non-HTTP workloads like message queues. Good to go Chris. I think we're good. Okay, so let's get to the good part. How you can use all this to design a Kubernetes native security strategy and I'm mindful of time. So I'm gonna go through some advice, talk about the opportunity we have and then some enforcement options and maybe some ideas to get started. So remember that paragraph from the beginning where it seemed a little bit harsh, even well-considered friendly advice, but the code base has problems. Some configurations are confusing. That kind of thing. Well, there's another paragraph. I think it's on the next page or right after. Highlighting the opportunity we have. So we started off talking about problems, then we moved on to some of the controls we can use, but overall let's talk about the opportunity we have with Kubernetes. So despite the results of the assessment, they're talking about the vulnerabilities, things like that. And despite the operational complexity of underlying cluster components, Kubernetes streamlines difficult tasks related to maintaining and operating cluster workloads, such as deployments, replication and storage management. So Kubernetes is doing a ton of stuff for you. Additionally, Kubernetes does take steps to help cluster administrators harden and secure their clusters through features, such as role-based access controls, RBAC, and various policies that extend the RBAC controls. Here I think we're talking about sort of pod security policies, things like that. And continued development of those security features and further refinement of best practices and same defaults will lead the Kubernetes project toward a secure by default configuration. So even in the midst of we've got problems, we've got some configurations that are maybe hard to use. The assessment really does highlight that we do have an opportunity to do security in a cooler way based on the stuff that Kubernetes does for us, streamlining those difficult tasks. And here's a graph that kind of shows that. And this is how we think about the world at StackRox. Really like Kubernetes deployment is in the middle. And then there's all this stuff around it. And pretty much all of this is gonna be based on Kubernetes specs or the image, like an OCI compliant image, stuff that's in Kubernetes objects. And again, if you showed this to somebody in another stack and you said, hey, look, I got all this stuff and it's basically free for me to get it. I don't need an agent. I don't need anything. I just make an API call and say, whoa, I would love to be in your shoes. So let's start from the top. With a single deployment, you know what image is running, exactly what image with a shot. You know, the Vones, if you've got a scanner going, you know what registry it came from. Again, if you're scanning, you'll know what exact packages components are inside. Based on which cluster it's in, you might know if it's test or prod, which sort of business unit, which part of your organization. So it's sort of implicit context. Based on the annotations and labels, you might know something about the criticality of the app, the dev team that's responsible, the app team, maybe. This is where having good annotations and labels can be really helpful. The privileges are specified in YAML. So in the security context is where a lot of them come, but other things like host mounts and the surface account settings, which surface account it is, then all the RBAC YAMLs that say sort of what that gets. In terms of access, you might have like a secret, like an API key or crypto material, secrets, that you might be mounting a cloud disk. And that's important from a security perspective because we're always looking for where people can do persistence. On the network, you'll know something about the blast radius because you'll see if their network policies is applied. You'll see if this pod, if someone gets inside, can they talk to other pods? Can they talk to the APS server? Can they talk to the internet? And then on the other way, on the ingress, you'll be able to see services and ingress and understand what's actually exposed so that you know all the stuff around the rest of the Heptagon, whether that's exposed to the internet, whether it kind of matters on a burning fire basis or a best practice basis. And finally, the behavior. So there are a bunch of tools you can use to do runtime monitoring, to see like what kind of processor launched and which active connections, which can inform the other configs so that you can kind of, you know, back through the rest of the context and the specs. It's a really cool opportunity because every application we have has all this data which makes doing all the security workloads a lot easier. Makes a diagnosis easier, makes fixing the configs easier and makes response easier too. So let's talk through some enforcement options. So first is the pod security policy. This is kind of a native upfront config that's built into Kubernetes. You can define policies and then it sort of uses our back identities to tie who can use what. We saw actually that was the example I used earlier in the slides about our back. It controls things like privileges, host mounts, other settings like that. You do have to enable pod security policy as a cluster feature once you're ready to use it. And you do need to have policies in place before you do that otherwise you will block everything. This is another example of defaults being maybe different between features. Sort of the middle option here, a dynamic admission controller. This is something that one of my colleagues has written about in the Kubernetes blog. It's a good example of getting started with admission control. It does require deployment. So you need to do a cloud service or a local service ideally replicated, but it can consider any data. This guy and the API timeout is a limit. So you can do more or less anything. It's code, it's an API call, but basically Kubernetes will you register for before you make objects of type X, give me a heads up, I'll say yes or no. And you get to do whatever you want once you get the API call. You can integrate with an external system, see if it's in your service registry, anything like that. Just be mindful of the API timeout because you don't want your kubectl commands to be hanging for more than five seconds and to take down your cluster either or automation. And then finally, sort of ongoing monitoring and analysis. This is sometimes the neglected part of security where people wanna do a lot of upfront configs, but that has the effect sometimes of making it unusable cluster. So if we do ongoing monitoring analysis, we get the benefit of hindsight. We can see sort of what's gone on. We don't block progress. It is reactive, but it means that we can sort of work it into our regular rhythm and not try to block things up front and handle exceptions maybe. So overall, just remember the user experience when you're choosing what to enforce and where to enforce it. Being in a necessary bottleneck is a great way to get managed out of your cluster and don't cry wolf either. If something's not a huge critical issue, don't wake people up in the middle of the night. Using all that content can help you diagnose that too. So how to get started, we'll sort of jump through a fast before handling any final questions. So I always say, start with the easy stuff that helps everybody. Everybody is helped if you annotate and label deployments consistently. If everybody can look at a thing and know what it is, that's great. Using concrete image tags, like not latest, and maybe start scanning images for low-hanging fluid. And you can do that in the dev process offline. You can do it all points in the process. Then maybe you might level up to other sort of self-contained changes. Maybe they don't have as many effects. Maybe limiting access to the KTAPI because that can be an important security surface. And as we saw with some recent vulnerabilities, like resource exhaustion and bulk billion last, if you don't need it exposed, don't expose it. You can start disabling automatic service account mount for given service accounts if you know you're not using them. And you can also start replacing your cluster admins with scope access. So person by person or service account by service account. Then cross-functional changes, maybe best to go app by app. So maybe start trying read-only-root file system app by app for a stateless service. Make sure resource requirements are specified because that has the availability impact. Part of the CIA triad, confidentiality, integrity, availability, you always forget about that one. And then adding ingress network policies helps. If you start with sensitive stuff and just limit who can talk to it, that gives you some, you know, gets you in the rhythm of network policies before you get to all your apps. And finally, just keep going. There's so much that you can do and it's not that you'll get to security one day. It's just get the process to keep going through. So that will be done. If you have questions, you can ask and zoom now. You can also email me or get me on Twitter. I put my picture in the front so you can find me at kubecon2. I'd be happy to talk to you there. And there's a link to some other resources. Yeah, that's what I was gonna ask you. Are you gonna be at kubecon? The answer is yes. There's one question, or maybe you and I can tag team it. Is it recommended to use pod security policy as it looks like it is never coming out of beta? I might sidestep that issue. I would say that beta stuff is typically deployed in most environments. So beta doesn't mean it's going away. Chris, maybe he's neighbor. Yeah, so the alpha stuff is stuff that I would not recommend using in production when it comes to Kubernetes. Definitely use it in dev. Definitely make sure you kick the tires on it because it's coming soon. The beta things were actually put in open shift for production workloads long ago. So beta in Kubernetes is definitely usable and stable. The fact that it hasn't gone into the main V1 API pipeline is just a matter of bandwidth, I believe. There's another question as well. Well, go ahead. Oh, just one more thing on that is, Chris mentioned if something is alpha, go kick the tires, do it. Because alpha is the time when you can change the API. Beta is when things start to get solidified. So if there's a problem with the feature, you wanna make it better? Try out the alpha features and give some feedback. Exactly. All right, so next question is two-parter. Okay, PSP, we find that there are app users buying to the most privileged PSP instead of the least privileged. They don't want to deal with figuring out what the least privileged rights will be. I mean, this is a common problem that I deal with all the time too. How can we prevent users from buying into certain PSP or only allow users to buy into some PSP? So like a policy on which namespaces could use which PSPs maybe? Yeah, so I'll confess I'm not a data user of PSPs, but the bindings should control who can use which one. And I will say that if you're trying to do least privilege, I think one thing that I found key is having a quick iteration cycle. So if you wanna require read-only root-of-fast or something, get Minikube on your machine and try it there, don't push it through CI every time, right? Do the local stuff so you can see the logs, do a quick iteration, figure out what it really needs. And also consider what kind of PSPs you're doing. If it sounds like it might be a little too restrictive, maybe, so if there's some baseline that's more appropriate for everyone, that might be a... Yeah, like changing the threshold, I think, maybe it's to kind of get people more in line and then restricting relatively up from there, maybe. So you can always change the PSP too, right? Right, yeah. The second part is network policy. Is it possible to deploy a single policy that allows traffic within namespaces globally? So any example? Oh, okay, I think I see where you're at. Single policy that allows traffic within namespaces globally. So I don't, if I understand that question, I don't think so because the policy itself is namespace, right? Right, so yeah, so each policy is applied to a namespace. So one policy to namespaces right now, so... It exists in a namespace. In fact, like, network policy, it's office namespace, right? Yeah, like so, like you can't really, like have a cluster-wide one, to my knowledge, unless you could have an operator that just instantiates every new namespace, gets a thing maybe running, I don't know. Yeah, in fact, I might, so we've gone through this because we do some network policy generation and stuff based on active traffic in our company. And we kind of settled on a thing of, like if this namespace isn't ready for network policies yet, like start with the blanket, but sort of that one network policy lets you opt in to segmentation app or app by app or team by team, which actually kind of nice. It can be hard if you've got a lot of namespaces to keep track, but there's some benefit to it too. Yes. Cool, all right. So I gotta go follow you on Twitter and find you at kube.com. Everybody else do the same. If you have more questions, please email Colin, myself or Connor, sorry. Connor, myself, the CNCF staff, and I posted a link to the blog that has a lot of the references that were in this slide deck and we'll get the slides and video out as soon as we can on cncf.io slash webinars. Thanks for joining us today. Thank you, Connor, appreciate everything. We look forward to seeing everybody at the next CNCF webinar. Have a good one, everybody.