 So welcome everyone, thanks for joining. Today's CNCF webinar topic is Kubernetes Security Best Practices for DevOps. Looks like we have a good crowd joining for that and security is always a popular topic. My name is Phil Estes. I'm a Distinguished Engineer and CTO for Container and Linux Architecture Strategy at IBM Cloud. I'm also a cloud native ambassador with the CNCF. And I'm your moderator for today's webinar. And we're happy to have Connor Gorman here, a Principal Engineer at Stack Rocks who's going to take us through. But before I turn it over to Connor, a few very simple housekeeping items. Obviously, this is not a free form open discussion. So you won't be able to talk during the webinar as an attendee. There is a Q&A box and we'd love for you to use that to get your questions to Connor throughout the talk. He's selected a few spots along the way where we'll stop and see if there's anything pressing that he can answer then. Otherwise, we'll deal with questions at the end. So anytime during the talk, feel free to open that box and type your question. And we'll get to it as I said throughout the presentation. This is an official webinar of the CNCF and therefore as such it's subject to the CNCF code of conduct. Of course, hopefully common sense, don't do anything in the chat or in the questions of being a violation of that code of conduct. We'd love for everyone to be respectful of all the participants and presenters today. You will be able to find the recording and slides posted later today on CNCF's webinar page. So CNCF.io slash webinars. And with that, that's all the housekeeping items. I'm going to hand it over to Connor who's going to kick off today's presentation. Awesome. Thank you. Thanks everyone for joining. As he said, I'm Connor, my principal engineer here at Sacrox. Sacrox does community security. So right on topic we'll be talking about community security best practices. Before I worked in security, I worked in infrastructure. So been on kind of like both sides of the same coin. You know, working on pushing containers through to production and then also kind of secure containers at production as well. So let's just jump into it. So what we'll cover today is some basic Kubernetes hygiene, some workload best practices, and then a demo on some of those workload best practices, which I think are really interesting. And then I can wrap up with some questions at the end. And I'll stop periodically and answer any questions you have. So make sure to put them in the box. So first and foremost, what are we doing here, right? Kubernetes is coming. Adoption is growing and growing. People are moving all throughout the life cycle in different stages or in depth or in prod, right? They're serving production workloads with Kubernetes. And you might be somewhere along that spectrum, right? You might be someone who's pushing Kubernetes at your organization. You might be having Kubernetes pushed at you from a security perspective, or you might just be starting and trying to figure out, hey, everyone's doing Kubernetes, I need to figure out what to do. We don't really know where to start and hopefully this presentation gives everyone a little bit of information about that and kind of like where you can start, like, you know, just starting with Kubernetes. So Kubernetes, I said Kubernetes is coming. Realistically, it's pretty much here. And what I find really interesting about all of these trends is that, you know, some other container orchestrators aren't doing as well. Kubernetes is coming to the top. KubeCon is starting to become a huge conference. And then also the growth of contributors, which really show that it's a, you know, a very widespread group of contributors and not just like a couple companies driving a conversation. It's really kind of like a whole community coming together to build and support Kubernetes. So first and foremost, Kubernetes hygiene. How do I run the orchestrator? How do I make sure that I'm securing that and doing kind of like hopefully what we would consider the no brainer stuff, but in a new world, it's always hard to figure out exactly what you should be doing. So always upgrade to the current version. The reason for this is that the last three versions are officially supported, especially when it comes to vulnerabilities or upgrades or patches. And you don't want to be in a situation where you're behind the trend, right? The last thing you want to be doing from an infrastructure perspective is trying to patch a critical vulnerability by rolling forward, right? And so you can run into a lot of situations. I don't know if many of you were through the STD2 to STD3, but basically you don't want to rush that, right? You don't want to rush migrating your data plane. And so always upgrading is just really good hygiene to make sure that you're picking up all the deprecations. You're staying within the support window for all the CVs. I will note that a lot of the cloud providers will backport things. So, and I know GKE defaults 114. That technically falls outside the window, but Google is pretty good about supporting that and other cloud providers are the same. So, you know, your mileage may vary, but always try to roll up to the latest supported version on the cloud provider or mainstream Kubernetes. It's just a really good practice. And it's kind of been shown also in terms of CVs that being on the most recent version or the newer versions is actually better in terms of the odds that you're going to be exposed to a CV. So the question I usually get after that is like, how do I know what versions out? How do I track this? I personally love this Google group. It's just kind of like a live feed. You don't have to be too involved with it, but it'll keep giving you new updates about what versions are out, what patch releases are out. And so you can kind of plan from there basically just looking at, okay, you have a new patch version that should be a pretty easy migration. Why don't I just migrate to the newest version, get fixes, get vulnerability updates and all the above. So this is just one I like Kubernetes announced. Guess you just follow it. It's really easy. So more until kind of the nuts and bolts now, what do I do to secure Kubernetes like itself? And this is kind of going to traditional infrastructure approaches, right? So make sure you control network access sensitive ports. So make sure you're restricting the ports used by the KubeLit, right? There's been a couple cases where the KubeLit's been exposed via the node externally. People have exploited that, right? And then make sure you limit access to the Kubernetes API server itself. You can restrict it to IP blocks. You can restrict it to VPN. You can have it be internal to your cloud provider. There's a lot of different options there, but make sure you limit access to the Kubernetes API server. There was a recent vulnerability where an anonymous user could like DDoS your API server. So, you know, having that be internal really knocks out a lot of the potential perpetrators of that. And so that's definitely beneficial. Next on the hardened node security is make sure people don't have access to the nodes, right? A lot of exploits are done based on having like root access to the node. And so generally, just try to restrict the overall access to the nodes because this is generally infrastructure best practice. But a lot of debugging and other tasks can be done without direct access to the nodes. And so this is super important. Basically, if you have root access to the node, then you can access Docker socket, you have access to Kubernetes. And so there's kind of a lot of ankles there, but restricting access to critical personnel is definitely the way to go. Hey, Phil, do we have any questions coming in? Yeah, we actually have a few questions about your recommendations on versioning of Kubernetes. Okay. The first is maybe tricky for you to answer because it's about a specific managed service. But regarding your advice to be on the latest version, does that mean we shouldn't use EKS, given its highest version is 1.14 while 1.17 has been out for some months. Yeah, so for EKS specifically, I think they just announced support for 1.15. So they should be rolling up into that window. But no, I don't think you shouldn't avoid managed services. I think, you know, the cloud providers do a pretty good job of patching them. You know, GKE has a lot of like Docker releases where they're bringing in patches. So, you know, they're really responsible and they'll make sure that you're not exposed to vulnerabilities. So I definitely wouldn't shy away from managed service or managed versions. I would just make sure that you are validating that you are getting the vulnerability updates. Right. So, you know, right. Yep. Yeah, kind of a related question. How long should we wait before switching to the latest version is bleeding edge the safest? Is there a rule of thumb? Yeah, so I think it largely depends on your organization. How quickly you want to move? I mean, frankly, I never like to move directly to the latest version. Sometimes there's bugs to be hammered out. There's upgrade docs and paths to consider. But, you know, and hopefully you have a staging, a lab, a production environment, right? So you're not going to go straight role production. And you, I will say that you are, you know, in the last three releases, you are covered in terms of vulnerabilities. So they will backport the patches. So there's not a ton of criticality there. But if you are not within the window, you know, definitely try to get within the window, right? And by the time, you know, it's a three month cadence. So by the time, you know, you're on 114 and 117 is coming out, 115 has been out for months. And so at that point, you can probably safely move forward. And so, you know, you kind of stage it through your pipeline, make sure it's working for you and all your configurations are correct through, you know, different clusters. But yeah, there's no like kind of tried and true solution to that problem. All right, great. Yeah, that's all we have for right now. Cool. Thanks. Awesome. So we were just talking about hardening node security, right? We're talking about physical access, making sure your network is blocked off. So now we've narrowed down our API server to a select group of people that we want to have access. And now we need to basically chop that up into different sections to make sure each individual has the access that they need and only the access they need. And so at a high level, the way that Kubernetes RBAC works, you have things called subjects, which are either users, you know, you have users in groups, you can have processes and pods, you can have service accounts. You have API resources, which is the whole span of everything in Kubernetes, you know, your pods deployment secrets, demon sets, nodes, jobs, persistent volumes, like there's so many resources, right? And then you have what you can do with them, right? So you can list them, create them, watch them, which is just a nice form of listing, patch them, update them, right? And so what you really want to do is audit this. And I think one thing to do, especially if you've kind of started a Kubernetes cluster, you were playing around with it and it kind of ended up becoming a production cluster for you, is really looking back and seeing like, hey, did I grant someone, you know, way too many privileges in the beginning because we were just trying to figure out Kubernetes and then over time they kept those permissions. That's something that we see a lot is that someone will have cluster admin, which means they can do anything in the cluster, but they were one of the first people to build the cluster and try to figure out what to do with it. And then they just maintain those all the way through when really they should now be scoped back down to, you know, the specific team that they're working on or specific namespaces, and we'll get into that a little bit more too. So, in general, yeah, make sure you are utilizing RBAC, make sure you're auditing that and looking at, you know, what people are doing and what operations they can do. There's a cool project that recently came out by Jordan Liggett. He wrote a thing that will look at your audit logs and try to build RBAC rules for you. I haven't used it. I can't endorse it necessarily, but it looks really cool. So you guys can check that out. Awesome. Now we can jump into workload best practices. I think this is where the bulk of Kubernetes security really comes in. I mean, securing the orchestrator is number one. If people have access to the orchestrator, then they can, you know, change your, your services or run things in your cluster. But this is where you're kind of exposed, right? This is really where you're running your applications. You know, you're exposing them on load balancers and it's kind of like really getting into the nuts and bolts of application security. So the first thing to think about in a Kubernetes environment, which I think is significantly different from what we've seen before in traditional infrastructure is like, how do we think about risk, right? Like what goes into Kubernetes and how can we figure out, you know, what our risk points are and how to address those. And so starting with the image, you're looking at the vulnerabilities in the image, what registry came from, what packages you have installed. You're looking at where is this thing running? Is it, you know, in a production workload? Is it in the test cluster with no customer data, right? You're looking at how is this service configured as a whole? You know, what secret does it mount? What storage does it have? And kind of like what's the configuration of that? So, you know, in containers you can run things as privilege. You can mount host mounts. You can have service accounts which have RBAC access and can access Kubernetes. And then finally like where is it within the cluster, right? And so, you know what I like to say is you're looking at a Kubernetes deployment that has a really critical network vulnerability. If that critical network vulnerability is exposed via a load balancer over the internet, right, then that risk has now been amplified, right? And so what you really want to do is think all of these things kind of come together and, you know, you take that workload, you make it privileged, right? That means that someone can, you know, exploit your current application and potentially get root on the host, right? And so, all those things kind of amplify and work together to create what I call like a Kubernetes risk assessment. And then when you get into behavior, it's kind of like, okay, now we're looking at what this pod is doing. Is it deviating from the normal? And like what active connections is it making? You know, is it excelling data or random connections coming in? And so it's definitely a super interesting environment and kind of how we like to think about it. And so what we kind of do, and the approach that we've taken at StackRox is really looking at kind of all of these factors in order to kind of build a risk assessment for you. And you're looking at, you know, what policy violations. So that's really about your organization and what your organization cares about in terms of, hey, we're looking for these things. You know, are you running processes that we haven't seen before? Do you have image vulnerabilities? Or you want a load balancer or just on a cluster IP? You know, what's the service reachability from the internet? Like, there's stuff in the image which are interesting and you're thinking about what components are useful for attackers? Do you have in-map in your container? Do you have Curl? W? Get? Right. Those can be used to go pull resources from the internet and run them in your container. How new is your image? As we're talking about making sure you're on the current Kubernetes version, are you on the current distribution version? Because those can be exploitable and have a lot of CBEs. And many times those CBEs are fixable if you just upgrade. And then finally, what's your RBAC configuration? Do you have access to Kubernetes APIs ever? So that's kind of like how I like to think about it and break it down in terms of there's a lot of different categories in Kubernetes. And it's kind of the new thing. It's like, how do we take all of this risk and really quantify it and say, what should I care about first from a security perspective? So I think I could stop there for a second. Phil, do we have any questions coming in? Yeah, we actually have quite a few. Several folks were curious about that tool or the audit log package that can create RBAC rules. So yeah, what we can do is that we can, maybe in the slides we'll send out, I'll go ahead and add that link to the GitHub. I don't have it like on me right now, but we can definitely make sure we can send it out with everything and make sure it updated people. Okay, perfect. Yeah, so that'll come with the follow up. Given the rate of change for DevOps, what are best practices for continuous auditing of RBAC? I don't know if that's related to that tool as well. Yeah, so I think in general, what we need to make sure that we're doing is, you know, having some process in place, right? That's the first start is making sure that you are looking at them and auditing them, right? And then trying to understand what access people have. It can be challenging Kubernetes. I mean, you know, to be quite honest with something that we've tackled as Backrock is basically like how can we render and show you what access people actually have, right? Because you're kind of looking at a tango with RBAC. But I mean, there's tools and ways that you can go about looking and making sure you have RBAC finally tuned. In general, I would say, start with the low hanging fruit, look for the really high privileged things and audit those aggressively and then you can kind of work your way down from there. I mean, there are some other, I'll get into here actually on this slide about how you can leverage some of this stuff to make sure that the RBAC is more finely tuned. So I'll go into that in a second. Okay, perfect. Someone asked what would be a good practice for pod security policy and approach for preventing running privilege pods. Of course, that's part of a pod security policy. So I'm not sure what other tools could be used to prevent that. Yeah, so I think. Yeah, I think that you hit the nail on the head there. I think you can use the pod security policy to prevent that. Right. And then you could also use, like other admission controllers will do something similar. Right. Like you could, if you don't really want to use pod security policies, you feel like they're too restrictive, you could, you know, create a basic admission controller that will deny that right so that's an option as well. All right, great. There's a lot of tools out there to use something similar. Yeah, yeah. One, I think this was a follow up to that discussion about versions we had a few minutes ago. And, you know, I'm not sure it has a very simple answer but someone asked is upgrade impact existing working environment how can we manage not to impact our current environment. Yeah, I mean that can be challenging right I think we can do rolling up like you should be able to do rolling upgrades. And I haven't looked too deeply into committees upgrade path in terms of how that works and manifest itself against your cluster. But if you have multiple replicas of like masters you should be able to keep the availability like higher than you shouldn't have to go down whenever you want to upgrade right. So you might need to roll up version diversion. But there are there are some some upgrade paths there where you shouldn't necessarily have to take downtime. All right, great. I guess we can do a couple more than maybe we'll save definitely a lot of questions coming in. Let's do another couple. I don't know your awareness of spaces like high performance computing and AI, but someone asked, do these sort of special workload types need additional security support. I mean so as long as you're running on top of Kubernetes right it's kind of like your data like that's your application plane, the configurations will still apply. Right, I mean, there are going to be challenges if you're running on you know GPUs and stuff like that you might not get the same level of granularity that you can get but running, you know, a Kubernetes application you can still look at the specs of the applications and evaluate a lot of this right. And so I don't think there's a lot of different security constraints there. You know, based on your organization you might have some compliance that you need to deal with. But in general I feel like the security kind of applies to kind of all sets of applications. Yep, yep, good. Let's do one more than maybe you should move on because some of these may be answered as you continue. Sure. Are there any best practices for securing Kubernetes clusters where Istio can help with security. Sure, absolutely right so you know if you can provide like mutual TLS out of the box for your developers don't have to worry about it and certificate rotation. And those are obviously great, great wins from a security perspective. I think people found Istio to be kind of hard to operationalize sometimes and so you kind of go back and forth between the operationalization of it and the security aspects you can get. But yeah I mean like you can get a very highly granular pod to pod firewall. I'll talk about network policies later Istio is kind of like even deeper in that right. But the cost of that is like operationalizing it and making sure all your applications can run with it. Kubernetes is doing stuff to make Istio easier run to be quite honest they're adding sidecars to pods in the upcoming versions and so it'll work better with Istio so I think that will help. But yeah I mean Istio definitely can provide a lot more granularity in terms of like pod to pod network connectivity. All right great. Yeah why don't you go on and we'll come back to some more questions. When you're ready. Sure. Thanks. Cool so more talking about our back right and trying to do finally tuned our back, like, number one thing to do and this is pretty straightforward is just leverage name spaces right. Don't put 1000 applications in the default namespace. Basically what you can do then is you can break down our back by teams right you could have someone be admin for a namespace but that namespace may not have that many applications in it they could only affect things on their team and that kind of. I wouldn't say it gets you out of doing finally tuned our back but it's just a nice easy segmentation for our back. You know namespace is also a lot you do resource usage tracking so you can see kind of how many resources each team is using. It allows for generic network policies and network segmentation, which I'll get into. And it also just for my benefit makes cup cuddle results more more sane. Another member listed like, you know, 5000 pause in the single namespace and you can't read them and then you're trying to filter. So that's just a fun one but it definitely helps when you're trying to like, you know, debug your cluster. And so we're talking about leveraging network policies. And maybe a different different time I can talk more about is the O but basically what network policies do is that they're pod centric firewalling. And so you can restrict or allow pod a can or can't talk to pod B. This allows you to basically pod to pod firewalls on ingress and egress and you can ensure fine grain connections namespace isolation really helps ensure compliance. If you're running like a staff model or cloud native model with multi tenancy, because many times you need to make sure that customer a and customer B data is completely segmented and creating a network policy between those namespaces ensures that you're not accidentally talking to the wrong database, or two services are talking to each other. You know, this doesn't come without challenges. So, you know what if your environment already exists. How do you go implement network policies on top of that. Right. How can you scale them at your organization. And how do you make sure that developers are enabled to build their own network policies because they have the most knowledge of their own applications and what their dependencies are. And so that's like the natural point for network policies to be built. Kind of a funny story. I was trying to build network policies. Two years ago for sac rock services, and I couldn't do it. I couldn't figure out how to do them and create them correctly and there's a lot of like YAML issues and, you know, empty brackets is different from no brackets and it's really hard to understand. And so what one way that you can really utilize network policies and and create a lot of value in the organization is looking at the live traffic that you have and then analyzing that and building network policies based on the live traffic. And so, in this case, if you already have an existing cluster, you can feel a lot more comfortable and you're not going to break things because you're already looking at the traffic that is actually being accessed and like the access patterns. And so then you can build really finally great network policies for that. And so this is just an example of something that we've done that, you know, basically will generate network policies for you based on the live traffic because it can be really hard to operationalize otherwise. And so one more thing on operate like operationalizing and reducing your risk is make sure to slim down your images. If you can go distrelist or lightweight based images. A lot of those just avoid vulnerabilities off the bat. They don't really have a lot of network utilities or other utilities that just are bloke and containers that can be used by attackers. And then finally, like make sure to scan your images and and enforce on them to make sure they don't into your environment again once you found them in your environment. So the way I like to really go through it is you look at a cluster, you have, you know, a certain set of vulnerabilities. You analyze those vulnerabilities and try to reduce them over time. And, you know, you're doing continuous continuously and trying to reduce your risk. And then you can say, Okay, well, I know this image is bad. I marked this bad. And basically don't let people to deploy into your environment again and make sure they fix their vulnerabilities. Give that instant feedback loop to developers to say, Hey, yeah, this, this image will fail, you know, validation into our cluster. They'll go rebuild it, fix the vulnerabilities and come back to you. And so that's kind of a nice flow for developers to make sure that, you know, once they get to the deploy time, when they say they want to deploy something, they don't want to get rejected, right? It's better to reject them and see ICD and have that kind of instant feedback, as opposed to saying, Oh yeah, security won't let me deploy my application because then you're kind of butting heads at that point. You're not really working together. What are the main challenges with swimming down your images is how do you debug, right? How do you have, you don't have curl or W get your moved open SSL. How can you go figure out why you application a can't talk to application be so looking ahead. This is not something to do today. A femoral containers. They're an alpha out of one 16. So don't use them in production. Use them with caution. But the general concept is that you can take a container, bind it to an existing pod so that then you can run debugging commands network utilities. But this means is that your main applications, right? That are, you know, that are exposed on the internet don't have to have a lot of these networks and debugging utilities. So in my opinion, this is a huge win because people can go try to exploit your container. You don't have any packages or utilities that they expect, and then they move right. A lot of attackers are just scripts that they just run against public endpoints. See if they can exploit a specific vulnerability. Once they can't they move on because you're not that high value of a target anymore. So, looking ahead of this, keep track of it. It seems really interesting. And I can't wait for it to kind of go into beta and, and then a view one. Awesome. So maybe I can take some questions now before jumping into the demo. Sure. Yeah, we've got several queued up. I'm going to grab one that ended up in chat that I thought was interesting. What's your take on the CIS benchmark for Kubernetes versus the CIS benchmarks for host OS like the Ubuntu or rel benchmarks. Yeah, so I think they're both perfectly valid. I think they definitely have different use cases right and so a lot of like cloud providers now are providing container optimized OS is. So you look at like Google costs or, I think, I think it is US might have just like recently announced one, but basically running, you know, optimize container OS. You know, those are kind of traditional benchmarks for Ubuntu don't apply necessarily. But I do think yeah if you're running an Ubuntu distribution and a real distribution following the best practices, the people who write these benchmarks are really sharp, and they know a lot about the subject matter. And so I think those are always good to follow. Right. And I think, you know, if you can do both, right, you know, the Kubernetes specific ones about Kubernetes and, you know, file permissions and stuff like that on the host, and then, you know, the Ubuntu ones about, you know, generally your host security so I think I think both are necessary. Great, great. Someone asked, can we apply the same configurations for OpenShift as well and I assume they mean some of these tools and capabilities. Yeah, so, you know, OpenShift in the recent versions is like very close to parity with communities obviously some differences in just in the distribution but yeah all of these things can be applied on OpenShift. I know personally because when you run StackRock on OpenShift so it's, you know, all of these, all of these things still apply. You know, you have some minor differences exposed on the internet right you need to look for routes, potentially and not load balancers right there's there's kind of some some minor differences there but I would say generally they're very similar and since they have the same base of Kubernetes I think I don't really differentiate between them. Yeah, okay, good. Someone asked, as you know there are many options for securing passwords, connection strings, I assume they mean any kind of secret in Kubernetes, how can we rate or evaluate these tools from a security perspective? That's a tough question. I think, in general, it's hard. I mean honestly it's hard to evaluate these solutions. I think I tend to lean towards, you know, a cloud provider hosting. You know, a cloud provider hosted solution, if you can use like KMS or, you know, that are provided by the cloud providers, it's usually pretty secure. What's usually very secure, I say that. I know that a lot of people use vaults. I think, you know, hash cores put a lot of work into that as well and so we found a lot of people have been successful with vaults. And so, you know, I think talk to your peers and see what they're using. You know, what works for them, and especially from an operational and secure security perspective. But yeah, I don't have like a great answer for that, I'm sorry. Yeah, no, it's an interesting area. Like you said, there's, you know, depending on your cloud provider or vendor tools, there's a mix of options. Right, I mean, and you can be totally running on-prem and then you don't have any of those options, right? And so a lot of Kubernetes deployments are on-prem. And so, you know, you kind of need to go to a third party and get a solution. I mean, the one thing I will say is don't roll your own. When it comes to, you know, secrets management, typically you don't want to build your own. Yeah, that's a good suggestion. Someone has a question, you showed a view of the network including the API server. And so their question is, you know, does that, in a cloud managed service, GKE, EKS, AKS, a lot of times you don't have access to the master nodes. You know, are your suggestions different from that scenario or how do you interact when you're not controlling those nodes? Right, so is this, I'm trying to figure out the context here, is it from like a network, the network connections? Yeah, I think they're referencing your slide about network, you know, maybe the routes and the blocking, you know, the API server from external traffic. Oh, okay. I assume that was the context. Gotcha, yeah. I mean, there are like cloud provider specifics to each of those, right? If, you know, if GKE puts the API server on the internet then there should be some access controls that you can use there from a cloud provider perspective. If there's not, that's definitely a challenge. And so you kind of have to work with what you have. But yeah, I mean, you should always try to inspect or file tickets or try to get updates for those things and make sure that people are considering, you know, just restricting access to the API server. Yeah, and just because I work for a cloud provider, you know, I would say go search the, for hardening or security guidelines for your cloud provider, they usually want to provide you, you know, help and documentation on, you know, what ports they're using and what traffic goes where. So, your specific provider should be able to provide you with guidance on that. Yeah, exactly. Someone has a question you announced or revealed kind of this new ephemeral containers feature that's, I've seen a lot of people chatting about even in the last few days. I don't know how much you know about the specifics but they're asking, is this deployed as a sidecar? When is it going to be available in Kubernetes distributions? I don't know if those answers are available but if you know someone's asking. Yeah, so my understanding of how they work is that you they bind into the namespace so it's basically I think they just add another container to the pod. And then it just binds into all the same name spaces as a container, or as the rest of the pod so you're in the same network namespace and so you can get the same like network profile where you're trying to like curl or W get or debug. And so I think it went down 116, you might have to use like feature flags if you're running your own. And I'm not sure what the cloud providers will do in terms of supporting this. But in general, I think it's just something to track and follow. And if you're interested, contribute right and that's like kind of the beauty of Kubernetes that you can always contribute. If you're interested and helps push something forward. And so, yeah, I think it's just something to track something to watch something that I got really excited about because I've been trying to debug containers before and such a pain. And you can't figure out, you know, if your network policy is wrong or the node network doesn't work and just being able to bind a new container with a bunch of network utilities will be super useful. But I don't necessarily have a timeline on when that will kind of be a GA feature. Yep. Yeah, that's definitely an interesting area to watch. Let's do one more and then let you do the demo and make sure we have plenty of time for that. And then we'll come back to see what's left. So last one. For now, have you evaluated DoD and I assume they mean the US Department of Defense container image base or base images do you feel be preferable to use these versus Alpine or other hardened district. You know, you had mentioned about minimal bases. So I haven't looked into it too exclusively. I think they use like rel universal base images as their base is my understanding. Don't quote me on that I might be wrong. And so yeah, I don't think you can really go wrong with kind of any slim down, slim down version. I mean, obviously the DoD is very focused on security. So it's a good bet that that is a, you know, like a good base image to use, right. And if it works with organization, that's great. Right. And so it's kind of what I want to, I think you should try to use slim down images, but there's reasons why you can't sometimes. Right. And so Alpine is great, but they don't use the same G-Lid C. Right. And so sometimes you need that capability. And so, like, there's reasons to use everything, right. There's reasons to use privilege in your containers. And so I think you just have to take it by a case by case basis and make sure it really works for your organization as a whole. I don't think there's really like, hey, you should use X, Y and Z. There's no like cookie cutter thing that you can do for your organization. It depends on whatever applications you're running, what features you're using of these distributions and kind of all of that. So I'm kind of wary of saying, yeah, we should definitely use, you know, X, Y and Z. Yeah, that's good. Good guidance. Yeah, so why don't you take it away with your demo and then after that we can come back and see what's left question-wise that you can answer. Awesome. So I'll speed through this and I know a lot of you have a lot of questions. So what we're going to explore in this specific demo is we're going to explore one of my favorite configurations, which is the read-only root file system, which I think is a great way to stop attackers. I might actually skip this one. It'll be in the slides, but that one is a little harder and then network policies, which I'd like to show, which I think are a great way to start in terms of network segmentation. So what's a read-only root file system? It's exactly what it sounds like. Basically, you can make your entire pod file system read-only, which means that when an attacker comes and tries to drop a payload onto your file system, it will fail, right? A lot of attackers, and I don't know if people have ever used Metasploit, but a lot of times the payloads are dropped in slash temp, which are just super traditionally always read-write on VMs and other servers. Temp needs to be read-write. A lot of applications rely on that. Kind of the beauty of containers is that typically you're running one specific application. Hopefully you can know exactly what that application is doing in terms of reading and writing files. And so you can make slash temp read-only, for example. If you do need read-write capabilities, you can mount volumes at a specific path and that will become read-write. So you can make a majority of your file system read-only, but then a specific path read-write. And so when we jump into it here, I'm going to launch an Apache Struts pod. This is the version that was exploited at Equifax. And so what basically happens is that I can show you what the exploit will look like. You can run a curl with a header, and then basically it will give you remote code execution, and you can launch a shell. What we're going to do is we're going to download a minor D binary and run it. And so if I go ahead and run exploit, we're going to port forward to the Struts pod. We're going to try and download a crypto miner, and we're going to be successful. And so you can see that basically they're running a minor D crypto miner on top of their infrastructure by exploiting a vulnerability in your application. There's these kind of like crypto jacking exploits have been like fairly popular, happened at Tesla, had a Kubernetes API server open and someone was running like Monero miners and just like stealing their resources. You know, it's not the worst thing that could happen, but it's still something that you definitely don't want to happen in your organization. And so what I'll do here is I'll go ahead and apply a read only file system. Now I'll show you the Docker file that I applied. So what I did here is I running Apache server, basically needs the right to use your local Tom cat. So I made that a volume in the Docker file. And obviously I'm running a very vulnerable image. And then the rest of the file system is read only. And so by this here, right, you can see right here that I made the file system read only. So at this point user local Tom cat is read right, but the rest of the pod is read only root file system. I'll go ahead and rerun the exploit. Right. And we tried to write they tried to write it to the root path and it failed. So this is just a super simple way that you could take a pod and make it read only. And then basically any attacker who tries to drop a payload will be denied. The next question I get after this is, okay, cool. That was great. But how do I make my application read only? You know, if you wrote your application, you probably know what files it writes. If you do like your application, you're using an off the shelf application. And in this example I use inch next. How do you know where inch next right to stem files right like how do I know that it writes the barcash next. And so I like this tool. You just use Docker diff. You can look at a specific pod and it'll show you the changes in the file system. And so in this case, you know, I would put a mount point and barcash engine X. I need to figure out if engine X will run with slash run engine X that's read right. And the rest of these are secrets which Kubernetes will mount. So you don't need to worry about those. Those will be. Those should be read only anyway because they're secrets. And so basically this is just like simple tool you can use to see which files have changed in your file system and figure out which paths need to be read right. So that's how you can kind of solve that problem. You have to use, you know, it's a little manual and a little intensive in terms of doing this. But I think it's definitely one of the biggest benefits is making your file system read only in terms of security posture. Awesome. I will skip Linux capabilities and come back to it if we have time, but I will jump into network policies. This is an example of a network policy. Basically use these things called pod collectors to determine which pods this network policy applies to. So in this case, we're going to match any pod in the default namespace because I didn't give a namespace, but it will match any pod within the default namespace that has the key label app and then web. So it matches any app web pod. And then what I'm saying is I'm allowing ingress from any namespace that has the team operations. So any from any namespace for my operations team, and then anything that's type monitoring. So, you know, super common imagine you're running from ETH, you're running it in your monitoring namespace, you know, you've labeled it with team operations, and you're curling all of the other endpoints to pull and scrape all the metrics. This, for example, would allow that this would allow, you know, as long as your Prometheus pod has a label type monitoring, it'll be able to pull those. So that here, and I'll show you kind of an example of what the network policy would do. And so I go ahead and add this network policy. Basically what I'm saying here is that I'm going to deny. And I can print this out. I'm going to deny egress. And so this is where it gets challenging because I have specified a policy that I want to apply, except it doesn't have any selectors. And so if you have no network policies, that means you're, it's completely open. Kubernetes wants to be usable by default right out of the box. And so no network policy means no network restrictions. And if you add a policy type egress, but you don't specify any, any allowed connections that means you're going to deny all. And so in this case, I know my Apache stretch should just be serving traffic and not sending traffic. And so I'll go ahead and try to run the exploit again. And my policy is still read only or I don't have network policies enabled. Hey, this has never happened to me before. Classic, classic demo. So basically what should happen is that we should deny the network traffic. I will note that you have to create GKE clusters with network policies on, which is apparently what I did not do, because I didn't restrict any network access. So that's exciting. But yes, make sure that your cluster is enabled to have network policies. And I will note that some C&I support Kubernetes network policies and some don't. And so the default one GKE doesn't, it's Krugnet and, and other providers do and don't. So, you know, kind of one of the main ones that do is like Calico or Celium will have their own network policy implementation. Awesome. With that, I will say that security is hard, as you just saw. And, and that it's always like a continuous process. So you have to continually try to, you know, pick a couple well hanging fruit for your organization, work through those, make sure that, you know, you're looking at if you want to pick read only root system. Do you want to implement network policies, you know, make sure you kind of go through each of those, you know, pick one, start with one and, you know, anything you can do will start lowering your, start lowering your risk and increase your security posture. Awesome. Phil, do we have any questions? Yes, we do. We saw some questions. First, some just practical matters people asked about as you were doing the demo. One person asked if, you know, if that exploit is available for demo purposes anywhere. And then related to that, you know, any of your scripts or your demo content is any of that available anywhere. Yeah, so I'll publish those on GitHub. They might already be up there. But just so people can poke through and look and see what I did. I'll go ahead and add a GitHub link for that. Okay, great. And then, yeah, so I'll tee up some questions. Michelle, one of Connors colleagues has been helping me filter those. Your question doesn't get answered. Michelle's offering to handle those offline. You can see Connors email right there on the screen. Michelle is Michelle at stack rocks.com. So whatever we don't get to feel free to follow up with with either of them. But here's here's a few more that have come in. So back to networking. What's your take on node port services since the service is exposed via the host IP? Right. So, typically try to avoid node ports. It depends on what your, you know, your on-prem or in the cloud. You know, cloud providers will provide load balancers for that. And then also you can make your node IPs not be external. So if you are going to use a load balancer from like GKE, for example, you could, you know, create a load balancer object in Kubernetes. And then that might bind internally to a node port, but you shouldn't need to have your nodes exposed over the internet. Yeah, I typically would try to avoid having your nodes exposed externally. Yep. Yep. Good advice. Another interesting question. Do you have an opinion on good break glass solutions for admin access? You know, taking the case where authentication or the control plane itself breaks or I assume is maliciously interrupted. Right. So, you know, typically, yes, right. You know, I've seen, there's a lot of implementations of this. I don't have like a specific solution. But there's a lot of implementations of this in terms of process or tools. I recently saw one where someone was needed to use like root access, and then they had to fill in why. And they have to send, you know, they fill in why and then basically like they would be granted that access through like internal systems, whether it's like LDAP or other authentication systems. And so I think that's a really good process is making sure that one that, you know, operations can do things that they need to do. Right. At the end of the day, what you're trying to do is run applications and sort of traffic. And so you got to make sure that you're enabling people to do that, but also making sure that the proper steps to document that and make sure people are abusing that system as well. Yep. Yep. All right. So a couple questions more on kind of roles and separation of duties. Who typically performs these kind of policy generation management. Would you see this as security team or DevOps engineering. And again, obviously that depends on, you know, whether an organization is still kind of operating in traditional, you know, silos or has really, you know, done that transformation to more DevOps approach. What's your view. Yeah, I think in a Kubernetes environment, security and DevOps need to work really closely together. Right. I think if you want to scale security, you can push it into developers and DevOps. Right. Like, you need to enable people to do the right thing. Right. I think I think everyone wants to do the right thing. You know, we're talking about vulnerabilities, for example. Something that I really like is you can see like if the vulnerability is fixable. Right. And so if you tell a developer that this vulnerability is fixable by upgrading a package. I think the developer will go and upgrade the package right because typically it's, you know, that might not have any impact on their application but the security posture. Right. And so, you know, and developers are configuring their own services or using Kubernetes to do that. Right. Security needs to be a part of that to understand, hey, I'm putting in these controls. Here's how we can, you know, here's how you can write your applications to obey these controls, for example, or here's a process for you to run privilege. And so I think you really need to work hand in hand in kind of like a Kubernetes environment in order to utilize kind of a lot of the constructs Kubernetes gives you are available for security and DevOps. Right. So, I really think, you know, kind of we get like the DevSecOps type thing, right. Is where you're kind of, I think people are seeing that merge together. But I think, you know, obviously you can just work together. Right. Yeah. Very great. Any best practice advice for setting up and managing TLS certificates in Kubernetes? So something I've seen recently, I don't have too much experience with this. Honestly, I typically use the managed providers. But a lot of people start using cert manager. It's just something to look at. I wouldn't endorse it necessarily. I just, it's a tool that I've seen used. And so I think it warrants a look. So that would be my take there. Yeah, I know. There's some good blog posts I read recently on Istio and certificate management as well. So I know Istio came up earlier. So let's ask, public home charts are secure, especially when tiller has full access, of course, Helm v3 is dropped tiller. So if you're in a cluster using Helm 3, that shouldn't be an issue anymore. But I guess the question is the use of Helm and its effect on security. There's something inherently wrong with Helm. I think you hit the nail on the head there with the two to three upgrade. Just make sure you move to three, seeing derivative tiller. You know, and that was one of the main reasons they got rid of it was for security purposes. But yeah, I mean, Helm in general, right, just is creating Kubernetes objects. Right. And so the same security applies. Be careful about secrets. I've seen a lot of different solutions for Helm secrets. But yeah, I think that's the one the one got you to look out for your secrets and make sure you're not, you know, putting them in source controls. But in general, yeah, I think, you know, Helm is a good tool. All right. We're winding down just a couple, couple minutes left. Here's an interesting one about the mission controllers are validating web hooks. How do you protect them. Assumption is, you know, someone with appropriate access can create a web hook that's actually doing something malicious like exfiltrating data. Is there any guidance in this space. Yeah, so you mean this is like a great example of our back right so make sure that people can't can't create validating web hooks. You know, or if they can that they're going through a specific process to do so. I think there's definitely a security concern with creating, you know, potentially someone's exfilling data. I think there's other ways people might do that. But validating web hooks to also need to be very, need to be very careful about them because they're going to be in the critical path of your API server. And so, you know, sometimes they can take down your entire API server. And so you got to be very careful about the ones you're using make sure you trust them. And so definitely kind of fine grain RBAC privileges there and auditing of that is definitely important. Absolutely. Any good ways to do network tracing when you're not using a mesh like Istio. There's a lot of different tools. StackRock is a tool, for example. But, you know, there's other sidecars that you can use. Yeah, I think it's hard to do distributed tracing without. Yeah, but without distributed tracing. So you need to use like typically done via sidecar, like you don't have to use full on Istio right you could use other sidecars. Right, right. I'm just asking a question. I believe about a specific tool that is new to me. Cured, KURED, asking what's your take on cured, do you prefer to use an automated solution for patch management, or manually patching the, the data nodes. I would have to look into cured. I've never looked at it. So I don't really have an opinion on that. But, you know, if it, if it helps you patch things into a nice tool, I think, you know, anything that helps you patch and upgrade and stay up to date is always like a tool worthwhile looking at but I don't have any specific opinions on that. Yep. All right, no problem. I actually am interested to go look at what that is myself. I haven't heard of it. We are right at the top of the hour and to protect people's time and investment we're going to hold it there like, like we said there's contact information on the screen for Connor. And so feel free to follow up. As necessary, you'll be able to find slides and the updates Connor mentioned and the replay available soon. And so thanks everyone for joining. Thanks Connor for a great presentation and answering quite a host of questions. And so thanks for joining us and look forward to seeing all of you at a future CNCF webinar. Have a great day. Awesome. Thanks guys.