 Yes, hello and welcome. Thanks everybody for coming over from Christian Stream. Great job. Little get ops guy to the galaxy. Now we're gonna be talking security on Stack Rocks. Office hours, I'm joined by engineer at Stack Rocks, Connor Gorman. Connor, you wanna introduce yourself before I get through the agenda for the day? Sure, yeah. Nice to meet everyone. I'm Connor. I've worked at Stack Rocks and now Red Hat for a little over four years. So kind of since the beginning of our current product cycle. And yeah, I'm a principal engineer at Red Hat and primarily working on the backend. So been around, been around for a while, been through the evolution of Kubernetes as well. And so excited to talk, you know, anything about security and containers and Kubernetes. Awesome. Yeah, for those who don't know Chris Short, obviously if you're watching the channel, Chris Short is gone last day earlier this week. So we're having a little bit of Chris Short woes. That being said, I'm Mike Foster standing in for him for Stack Rocks office hours. Couple announcements next, the following months, we're not gonna be doing the show on Thursday. It's gonna be on Tuesday, same time. And yeah, today we're gonna be discussing Kubernetes security, our top 10. Connor and I kind of put together a list of our top 10 things that I think people should be aware of, some mitigation issues in the Kubernetes that open source community that you can do. And then, you know, some more advanced features and more advanced solutions that you can look at as well. So with that being said, should we kick it off? Let's do it. All right. First one I think is a little bit of a layup, but we have Disabled Public Access as our number one sort of intro. I think it's kind of general, what do you think when you say Disabled Public Access, it's kind of Kubernetes private cluster, secure the API. Yeah, exactly. I think one of the main things from the API server perspective is that there's surface area there, right? I mean, it's the way you control your cluster. And so, you know, there's been a series of kind of attacks or DDoS attacks that happen against public API servers. I mean, famously people have run crypto miners through exposed Kube API servers. That's probably the least malicious way that they could use it. And then, you know, there was like the billion laughs of YAML DDoS that could occur. So, you know, those things happen, those things exist in the wild. And, you know, the Kubernetes security team does a really good job of triaging them and fixing them as fast as possible. But, you know, just take your API server off the internet and, you know, a whole variety of attacks are basically mitigated by that. And, you know, that's a little bit of peace of mind. And, you know, like, likewise, like with nodes and things like that, you know, you can stick them in private clusters as well, you know, for example, on GKE, they have private clusters and you can do stuff like that. And so that's always a good place to start from just air gap your environments a little bit. And some of the defaults that are installed in provider clusters, if you're unaware, might open you up to things like having, you know, it's like default Nginx backgrounds or stuff like that hanging around. If you're not sure, always good to keep it private before you really understand what's going on. Yeah, exactly. A lot of times, you know, you would just click a button and they're made to be super easy to use, right? Like, hey, give me a cluster of perfect and, you know, a lot of this, some of the security defaults are lost in that. And so, you know, you gotta be really careful about when you're actually building a production cluster because what you see a lot is that a cluster that you are playing around with doing a POC on or something like that starts running more and more critical workloads. And then over time now it's become a production cluster or a semi-production cluster. And so now you gotta like take a step back and make sure you're applying all the proper security controls. Yeah, those baselines aren't let's certainly properly there. It might work, but you really have an operationalized it. It actually leads into the next point, which is implementing the least permissive role-based access control. And I think we can kind of assume that everybody has our back set up in their clusters now. This wasn't really an assumption that we could make three years ago, but it's safe to say that now, right? Any thoughts on one, how to go about that, best practice, service accounts, things like that? Yeah, I think, like we just mentioned, sometimes your initial cluster makes it three years, right? And so let's make sure you understand kind of like, you do an audit of all your roles and role bindings and who's accessing your cluster. There's a couple of ways to do this. Within our product, we can show you basically all the access that you have. So you can look at a particular role and say, hey, are you a cluster owner? Do you have full access there? There's also a tool by Jordan Ligget wrote, which like will parse audit logs and it'll kind of tell you what people's access and what they've been using. And so there's some cool stuff out there to look at, parsing of audit logs and breaking that down. And so yeah, just make sure you're always auditing that. And then one of the best parts would be, start with minimal access and as people need it, slowly add it. And then of course, just the realities of running a production cluster is that sometimes you need to break glass permissions. And so building that process in where you have kind of an audit cycle of, okay, I need cluster admin for X, Y, Z. Okay, we got an approval or I write a reason why and that kind of like, then can give me the access I need, right? Because there are those scenarios where, DevOps folks or other engineers do need kind of more privileged access. And so always account for that. Otherwise there's going to be some service account sitting there that will grant that no matter what. Yeah. And reality is always somewhere in the middle. So instead of being most permissive and working down, least and work up, sort of making that a challenge. I think it's like a general theme with like Kubernetes and security, right? There's always this balance of, you have folks who like are managing the cluster, administrating the cluster, and then you have people running applications and there's going to be some level of debugging that you need to do on the cluster, right? And so how do you grant that access and how do you make sure that like, you're still providing the workflows of, actually running the applications first having a really hardened cluster? Completely agree, makes sense. And speaking of applications, container images. Now, not necessarily Kubernetes specific as the orchestration tool, but obviously our main workloads are running in containers. So managing vulnerabilities and providing a safe image. I assume when we talk about number three, we mean base image as well as your own application. So just thoughts on that. Yeah, so there's kind of been this concept for a while of also with even VMs, which is like the concept of like reverse uptime, right? Which is like you have an image and if you just leave it running for a really long time, something that I've always liked to say is that like the number of volumes in an image only ever increases because the image is immutable. And so, in your base distribution, there'll only be more things found over time, right? And so your image is naturally kind of drift towards less secure just based on the number of vulnerabilities found. And so constantly rebuilding your base images, all the distributions do a really nice job of constantly updating, fixing, critical vulnerabilities. And so by just constantly rebuilding your images, even if your code hasn't changed, like you get a new base image, you get up-to-date packages that have fixed vulnerabilities. And then that can lead you to a much better place in terms of the vulnerability management within a particular image. You said something there. I think a lot of management teams don't like, but it's the vulnerabilities are never really going to go away, right? You're going to mitigate them and then over time they're going to creep back up and then you're going to mitigate them again. The challenge I think with vulnerabilities is finding that, you know, what are the things that we want to mitigate with containers, right? What are the most impactful things? It's like, how do you, how do secure teams make that decision? Yeah, I think you have to look at the, like Kubernetes gives you a lot of context about where things are running, how things are running, right? And so when you're looking at a vulnerability, many times there, a lot of them are not fixable and they can vary in criticality, but I always like to look at things that you can actually do, right? Like what can you actually fix? And so we've got a lot of fixable vulnerabilities with like new patch versions. You know, those are the no-brainers in my opinion, right? Like let's go rebuild those images, let's go redeploy them. Hopefully your services are, you know, really robust and you can constantly roll them out, do a CD. I know that's not always the case. And so you have to be cognizant of that, you know, and then constantly update those. And then also, if you can reduce the amount of packages you have, right? That just reduces your surface area. So, you know, if you have a bunch of extraneous packages in your images, let's try to reduce those. And that just like provides, you know, kind of a slimmer workflow in case of vulnerability management. Yeah, and once those containers get into the cluster, Scott kind of moves into the fourth point. Managing secrets, environment variables, config maps, injecting various secrets and variables into your containers. What are your thoughts on securing that, the process for that? Because I always had an issue, especially once you get into teams, how do you manage secrets? How do you do it when there really was no encryption in Kubernetes, you know, three or four years ago at the beginning, right? It was, was it just base 64 there for a while? Yeah, exactly. So, I mean, going back to our RBAC point, that's like the first step, you know, if you're using Kubernetes secrets, the first and foremost thing to audit is who has access to actually reading or listing the secrets, they're the same thing. And like, and when you do that, you know, you do get them back as basics before they are encrypted. You can encrypt them through like KMS and different methods. But, you know, when I, when I get a YAML of a secret, I'll be able to read the secret, right? And so, you know, who has the ability to do that? Some folks don't even let read, read for secrets ever happen. Like you can only write secrets. And so, if you want to roll them, you just have to rewrite over the top. This is a good workflow for like GitOps or different kind of like CD systems, like with the secret store, for example, which is a good example of that. But yeah, the secrets are one of the biggest challenges I think still kind of, that kind of exists within the ecosystem and there hasn't been like the perfect solution for any of that. So it's sort of an interesting area for sure. Yeah, and I know that I think if fig maps became immutable as well, there's the option for that aspect, which some people don't understand why can fig maps are immutable, but if you're injecting environment variables into your container, you kind of want, hey, we're gonna roll out the next version of this container, which should probably make sure our config map that's going to be used to set up this container is also versioned with it. Little things like that I found really interesting. Yeah, one thing I've seen is that it'll take some aspect of the config map and like hash it and then you put it as the label on a pod. So then like if the config map changes, like you can see which version of the config map should be running because they do hot swap it underneath the mount, right? So you can be running a container and the config map could change. And so you're kind of in this weird state where it takes about a minute to propagate and you're not sure, you know, which config map is there or does my service actually hot reload the configs in some ways just rolling out like doing a rollout of the pods would be an easier way to guarantee that, okay, this config map is gonna be mounted because when you roll out the new pod, you will always get the most recent config map attached. And so that's kind of a way that some people mitigate the kind of uncertainty of I modified if config map did my change actually propagate. Yeah, especially if you have any sort of CD process or anything like that with testing setup, you can always roll back, right? If you config map screws up so you're not sitting there in limbo. Yeah, exactly. That's sort of like a helm. I mean, that's like one of the benefits of something like helm, for example, like, you know, they're all in lockstep, you know, our operators in the OpenShift ecosystem, the similar concept, right? Like, you know exactly what you're getting all together at one time. So that's a big benefit of kind of like the bundled approach of deploying your service. Now, had number six, and by the way, anybody, people who are watching feel free to put your questions, comments, things that are bugging you about security. Even be like, oh, security sucks. I like to be wide open, that's fine too. But I'd love to hear your feedback as we get into number six, which I found is a sneaky security aspect that is overlooked, often overlooked as resource limits. Want to get your thoughts on that. So yeah, this is sort of like a little bit contentious too, actually. So resource limits from, you know, depending on what service you're running, yeah, really matter. So one aspect is that helps you provide the segmentation that you want between different services, right? You don't want one service to basically be poorly written or have a fork bomb or whatever, right? Like just suddenly fill up a node and then start having things be killed. I mean, the OOM process is fairly good. It will try to kill first the things that are over utilizing their resources and depict them. It's not a foolproof process by any means. And so I always think that setting resource limits are good, especially in like a multi-team environment where a lot of times people will look at resource limits and be like, this is your quota and like this is how much of a cluster that you're using and like even use like cost center billing to basically be like, here's how much you're costing us because these are the resources that you're requiring. That puts some pressure back on the DevOps team and the team itself to optimize their code. So sometimes without that pressure of like you need to set resource limits and constraints, you know, there isn't as much reason to optimize your code for that reason, right? Because there's no direct impact on you. It's kind of like up to the infrastructure team and that's a challenge for any infrastructure team. And another thing with resource limits and I think security and availability go hand in hand a lot, which is resource limits allow and having them be as low as possible help you schedule things in your cluster, right? And so, exactly. And so you can schedule things more readily if you have proper resource limits and constraints set. So those are some of the main benefits I see from a security perspective. Yeah, I think especially for stateful workloads, if you're setting very strict limits on and especially if you have a limit on your stateful workloads but not on your other ones, they get the default for the namespace but they also get evicted first because they don't have something that's supplied by the developer or the administrator there too, right? So there's a little bit of assumption there and there's some pretty cool features in Kubernetes where you can set it by namespace too. So as an administrator, you can kind of say, hey, you in this namespace, you're only allowed this much work within it. Yeah, it's definitely especially in multi-team clusters. I have seen, I'm curious what your thoughts, I've seen a lot of let's say micro sharding of clusters. I don't know if that's the correct term but a lot of people using K3s or smaller clusters and using a cluster per team. Any thoughts on how you would set up resource limits for that or like scaling of clusters? Just kind of curious what your thoughts are. Yeah, so I mean, from resource limits for those, you kind of have, I mean, there's a concept like limit range, which will just automatically give a pod that has no resources set like the basic resources. But actually from a security perspective, I do like the smaller clusters, though you have to mitigate the challenge of like, where are my clusters, right? Which is something that happens, you know, you have a hundred teams, now you have a hundred clusters. And so you're always trying to find that balance as an organization of like, how many clusters do you have versus like running one large cluster, right? There's always gonna be operational overhead, between of running a cluster and some costs there, but you can really get the natural segmentation that you're looking for between teams by just having separate clusters, right? You know, you know, team A isn't bleeding their service into team B because they're literally segmented, right? Yeah, and so. The challenge is setting that policy so the cluster comes preconfigured for that team, right, as they scale. Right, exactly. And like, you can allow each team to scale up their clusters, but you want to make sure that you have like unified tooling deployed in that cluster, right? And so that's kind of as an infrastructure side, if you want to go that way, you have to be able to say, you know, speaking from personal experience, like with ACS, it's like, you know, you want to make sure that you have our secured cluster components and every single cluster that you deploy, right? And we've had a bunch of customers who have built that into their ops cycle. And so when they launch a new cluster up to 300 or something clusters, right? Then every single cluster gets these components and registers itself and is actively being secured like through security policy, but now, you know, you can really scale that out broadly as your organization grows. Yeah, so clusters that get set up, they don't fall through the cracks, right? You're using some sort of like ACM for overall policy. I thought one of the coolest things that I had seen was the, through ACM, let's say you want to set up a developer specific cluster. You know, you can have the sensor set up for ACS, but just there's no admission controller. So it's like, hey, developer started up a cluster. Okay, there it is. I can see what they're doing, but there's no enforcement. So things like that, you can kind of mitigate stuff before it gets too far, let's say. Yeah, exactly, and in that way too, you can see, oh, hey, this person created a load balancer in this cluster, right? And so we have a lot of insights into kind of the overall context of like what's going on in each cluster. And so you can see, oh, hey, they expose this service over the internet, like that's not within our security policy or we haven't verified that. It hasn't gone through a security review, right? And there's a lot of aspects like that where they make it really easy to use this stuff, expose it over the internet, make it publicly available. And maybe that's not exactly what you want. Maybe they didn't realize the risk in doing that, right? In terms of, hey, yeah, I was just testing this from my local laptop and making sure the proper security controls are there from each individual cluster and the services that are being publicly pushed. The worst situation is to have a prod service that you don't know about that's sitting out there on the internet. And good thing we kind of got into that clusters. Number seven, we have segregate sensitive workloads. So there's a couple of points here, I think. Yeah, we already talked about secrets a little bit, but implementing and monitoring traffic and setting baselines, separating through Kubernetes native tools like namespaces, right? I guess we'll start with that. Namespaces, properly setting them up proper defaults, thoughts on how to operationalize that specifically with Teams. Step one, don't use the default namespace. There's actually some admission controllers and stuff that you can use that will say, don't let anything be deployed into the default namespace. I think it's sort of like, it's just someone ran KubeCuttle and something showed up there is typically what happens. And so you really wanna make sure everyone's in a proper namespace. I mean, the namespaces are super useful for some of the reasons we already outlined around like resourcing, for example, and seeing kind of the resource assumption of an individual namespace, but also it's a really easy barrier to say, this teams should not be able to talk to this team, or also like if you have a multi-tenant environment with different customers, right? Okay, customer A should not be able to talk to customer B and how do I go verify that? And actually I might jump over to the demo real quick foster. Sure, let's bring it in, cool. Sure, so this is our network graph. We're showing live traffic between different namespaces, right? And so this is kind of what's pretty interesting. It's like you wanna create network policies that deny between namespaces, but there might be holes that you wanna poke for things like monitoring, right? Like if you have a Prometheus operator running in your cluster, you want the monitoring to happen against your services. And so what you really wanna do is break down and actually verify, even from an audit perspective, that customer A is not talking to customer B. So Stack Rocks should not be talking to any of these other backends that has nothing to do with them, right? And so this is just a visual verification at least that you can do that. And then there's a lot of capabilities within the product that I probably won't go into right now around simulating network policies. The YAMLs are pretty difficult to write sometimes. And so you really wanna look at them and say, okay, is this doing what I think it's doing? Is this actually gonna block an active flow, right? Is this something that's actually being used? For example, this sensor is in a remote cluster, so there is no centralized component here. And it's going, you can see it reaching out to Google here at the bottom. That's because the central is in a different GKE cluster, right? It's in a different cluster. So that's where that one is flowing to. And you can see like the egress for it is going to Google. So sometimes there are things that you miss or things you don't think about. You don't think about sensor reaching out externally if it's in a remote cluster. And this is just an easy way to like visually check that this is occurring. And then another really cool feature is to be able to look at the allowed connections, which kind of breaks down what things can talk to each other based on network policies, right? And so this will help show you- Because most of the time you're configuring network policies because you're trying to get your app up. And so you're playing around with the network policies just to get your app to work and you have no clue you just exposed everything else. You're right, exactly. Like maybe you just said, oh, everything can talk to this. And depending on the service, you have different levels of connectivity and ingress or egress, right? So for example, this is our collector. This is an agent that runs on every node. It doesn't run a web server. It needs absolutely no ingress, right? So this is saying we'll allow zero ingress flows based on the network policy we wrote. And based on the overall environment, we have 16 egress flows that will be allowed. So a lot of these other namespaces have no network policies set up. And so they can speak to all of these different pods in our cluster. Very cool. Yeah, the visualization aspect of network policies, I don't think that could be understated. That's a huge time saver, not working through all those YAML files. And one of my favorite aspects of moving development to test clusters was to just basically flick on and off network policies. You have some sort of network test, put it in the test cluster, put a default deny on and see what happens or something like that, right? If you don't have anything in the dev cluster. All right, yeah. So it's awesome. Yeah, the flow is cool. Yeah, I think some of the aspects, things that I would do here is like, maybe you have a default template for a namespace and you say, okay, from, you know, the namespace of Prometheus operator will allow all connections, but then deny all the other ones, right? And then every namespace that you basically print will then have that default deny. And then, you know, you just, if you need to poke holes later or someone's got a real use case, right? You can evaluate those at the time. If you can start this way, this is definitely the way to go. Now it's very challenging if you've been using Kubernetes for two years and you're running a bunch of production services, how do you start applying this to a cluster that's already running without breaking a lot of things? And that's where looking at the active traffic and crafting network policies that, you know, align with that active traffic is really useful for folks who already have clusters up and running, but they haven't gotten to the network segmentation portion. Yeah, the observability without enforcement, especially as a security tool is extremely useful, right? You don't want to be shutting down anything that you haven't exactly verified as normal traffic. So very cool. Exactly, yeah. And when it comes to security in configuration, especially, I think it's always turn on violations, right? So like, first you get visibility into something like network traffic, then you can start sending violations about things that violate that network traffic, right? That happens over time. You know, we don't have the perfect view of the world. You know, you could run Postgres and we could look at all the processes that run, but then when someone runs a backup and maybe the back it's by exacting in and running some command, right? We don't want to kill that. We don't want to kill Postgres because of that, right? So you want to see that, you want to say, okay, highlight anything that's outside of that band. Oh, actually, yeah, this is the backup. Good thing we didn't turn on enforcement right away. And then once you're very certain that we've like seen everything that you're looking for, feel free to turn on enforcement or violations that go to a higher, even a higher notifier, right? Getting emailed about something is not like being paged at 2AM or something. And so you kind of have all of these different tiering of kind of how you work through a security program. Now when you're scheduling all these notifiers, how would you group them? Like does it get shot out by team administrator? Like let's say there's a network flow that violates some policy. You obviously don't want to blast it to everybody on Slack, right? For something like that. So. Right, exactly. Is there, on the platform, how would you manage something like that? Yeah, so you can annotate your pods with like specific aspects that you'd like to route through. So you can say, okay, to the Slack webhook or, and that would be done on a pod basis or a deployment basis. The same thing with like email, right? A lot of times people will have an owner field of like here, here's the team, you should email about some of these issues. And then of course you have, you know, levels of when you'd want to send these notifications and how serious and the relative severity of them, right? And so you want to separate from there. So, you know, someone using latest tag, okay, maybe that's a very low severity. And then, you know, someone with a really critical vulnerability that's fixable, hey, like, someone needs to jump on this, you know, this is a, this goes to a pager duty, for example. Pretty cool. That's awesome. Yeah, there's, honestly, I feel like Kubernetes also favors the meticulous, right, in that aspect. You have some applications in a namespace. Everything's going to revolve around those labels and annotations and how you separate your workloads. So, I always used to tell people getting that, that first the namespace setup for your application is that core base that you're going to build all your policies off of, right? Yeah, exactly. Some namespaces are just naturally more sensitive than others, right? Like, you have a namespace for the payments team, right? Yeah. This is going to be a much more stringent from a security perspective than, you know, a machine learning thing that's that, you know, a namespace that's never on the internet, right? It never exposes, has no load balancer. And so there's kind of a large variety of different constraints that you have. And namespaces that allow you to, you know, also within our platform say, oh yeah, these namespaces have these rules. Even this cluster in these namespaces have these rules and these other namespaces don't have these rules or, or, you know, here's what we care about and here's what we care about. So you can kind of separate that by team and namespace. Yeah. And if you do it, right? You don't have the teams pickering with each other over resources. I think that's why we got to sharding of clusters. It's just, hey, here's your cluster, here's your cluster. But then you get cluster sprawl. So bringing that all together is interesting. Speaking of which, audit logging is coming up next. Enabling audit logging, I think by default most clusters nowadays have, and most cloud services have some sort of audit logging functionality. The biggest issue is, what are you looking for? How do you monitor it? And what are the things to look for, right? Because do you look for a bunch of API requests? Are you looking for, you know, QCTL execs? What are the main things we need to look for? Right. Yeah, I think anything that's directly interacting with pods is kind of like, you know, especially those execs or port forwards, right? Like those are things that I would be concerned about. And then also typically if you're, you have an RBAC setup, right? You don't want a lot of people to run kubectl commands against your cluster. Like I think, you know, when you're really at the production level, you don't want systems like that or people necessarily directly interacting with your cluster unless you have to, right? And so that's the stuff you want to log and that's the stuff you want to look out for. And like, you know, we recognize this for sure because, you know, port forwards, yeah, you can have a direct local host to a server, right? Potentially bypassing different things or different like firewalls or load balancers. And then like lies with execs, right? You're in the pod. Like, you know, you could, you could have as much security as you want, but if someone's inside your container and, you know, can cat your database files or like look at your database files, you know, there's a limited, limited things you can do from that perspective. And so actually if you want to flip back real quick, I actually really love this feature that we ended up building and we pipe it through actually the admission controller. So, so you can kind of default deny them as well. I'll go to the violations here. I'm on my terminal, but I'm just port forwarding to or sorry, exacting into one of the Stack Rocks pods on this cluster. Nice. Can you zoom in a little bit? Do you mind? Oh yeah, sure. On the dashboard. Perfect. Didn't really do too much kind of automatically shifted there, but right. So I'll exact into the sensor pod in like all live demos and slightly stressful. Oh cool. So yeah, right. So I just popped right up so we can see that. Okay. We had a Kubernetes action. We exact into a pod and actually if you break it down, you can see that it was me that I did this and like what group I'm a part of. So okay. An authenticated user did exactly to this pod. What you can do is kind of trace these breadcrumbs that if we go to the risk page and look at, for example, the deployment with the sensor in this case, there's two of them. But we can see that this one actually found an anomalous process and I got a process discovery here and it's SH, right? And so I just had opened a bin at SH there. Let's see how I can do here. I reload this real quick. If you picked up all my other stuff that I was doing. Yep. So I ran LS, right? So right, you can kind of like, there's some breadcrumbs to follow here in terms of what processes will run. And this red indicates that we found it's anomalous that we've basically been running this container for quite a while or this pod for quite a while. And we've only ever seen this Kubernetes sensor. It's a go binary. It's very easy to tell that this is the only thing that should be running. And then these are the other processes that came, after we've baselined it, we're like, hey, these are new, these are strange. Now, sometimes this happens or things run periodically and you can just add them to the baseline and they won't show up as anomalous anymore. In this case, I'll leave them because they are anomalous because I exact them. So you can kind of immediately see that. And then we also have the ability for everything that you can highlight through alerts, you can also reject them through the admission controller. So if I change that policy, I could make it reject my exec. Now, when it rejects it, does it delete the pod? So from an admission controller perspective, you can like, you're just blocked. Like you just get a report back to KubeCuddle that your action got blocked. But if you have other runtime data around processes, for example, that's actually running, right? The pod may be already running and then you see some anomalous processes. There is an option on the enforcement to kill the pod. And we'll just kill that through Kubernetes. We'll say, hey, we saw something strange, let's just kill that pod and you can see if it happens again. Again, always be very careful with this, right? You don't want something to exec into a container to do some debugging or something and just, you know, we'd immediately kill that pod. Depending on whether they'd be a little bit more sensitive than others. And so those are some of the concerns just on that strident enforcement, but we'll always show you the violation as well. So you can pipe those to, you know, some more like incident response. Yeah, and having the context of who's doing it, obviously is a huge factor in that decision. But I mean, just blocking the action alone. And then if you're able to delete a pod, if you do have a highly available deployment, deleting a specific pod or enforcing that rule should be okay, right? It really, you obviously don't want to set that as the standard, but then you hope that people have set up their deployment correctly, but, you know, deleting a pod, there's no massive impact, right? It's a little bit different from like wiping a whole workload for a security issue, right? Yeah, yeah, exactly, exactly. Those are kind of like the main issues there. Yeah, and that's what I actually really like about the admission controller on creates, right? Just in general, is that you're usually not changing the state of the world, right? Like someone's trying to create a new deployment. You can be a little bit more stringent about, okay, what hits your cluster at that point because it's not currently running. And so it's kind of just staying in the same state that it is. When you run admission controller on updates, for example, that one's a little bit scarier, right? Because you could be running, you know, just critical updates or image update, but now you're violating some policy and then, you know, that could be rejected at that point. And so, you know, you're still staying in the same state, but it's a little bit more risky from that perspective in my opinion. Definitely. Yeah, we got a, Kevin, I'll post all 10 in the chat at the end where we're just getting on to nine. So two minutes and I'll just copy and paste the whole list that we made. You kind of mentioned it, touched on it a little bit, but that providing a secure software development process, now this might be the most vague out of all of them, I think, because everybody has a different software development process. Real quick top things in a container build to deployment in Kubernetes, what are your thoughts? Yeah, so, you know, let's start with the source code, right? Like you always have to have some sort of like source code scanning, linters, you know, just general hygiene around your code, right? The earlier you can find something, the better. Then you take that code and you put in an image, right? And then now you still have the same sort of hygiene around that image. Let's make sure you're scanning your base image. Things are up to date. The number of times I've seen something like, I'll find three point in someone's environment, right? And you're like, man, I have to go back and look how old that is, you know, are things that you want to, you know, really adjust quickly, you know, make sure you're scanning your actual code too, you know, job applications and everything. And then, you know, when you're crafting your deployment yamls, right? It's like, what is my service actually need? And how do I validate that, right? So, you know, does this thing need to run as privileged, right? Do I need these capabilities, right? Do I need like NetRaw, which is by, which is a Linux capability that's by default available, but almost no one needs to do transparent proxying, I think is what that allows you to do. So very few situations where it's actually necessary, you know, and then, you know, setting your users and UIDs and GIDs and all of that, like kind of fun stuff, like that's kind of the nuts and bolts of like, how do I actually run this service and making sure that you're not running everything as root is kind of like a really good place to start with that. Yeah, I'd almost recommend security teams if there's some sort of template, like you can take YAML files, right? Just take one that's been applied to a cluster, pull it out, set some defaults and give it to the development team and say, hey, fill it out, right? Let me know what's going on in your process. And if you don't know, you have a month or two to get back to me on it, right? It just for teams that are just starting. I think that there needs to be a little bit of communication there about what's expected. Thanks for bringing that up because I actually skipped number three. I started talking about secrets when we should have been talking about security contexts. I think that was probably the biggest thing, especially with pod security policies getting deprecated. You touched on it, Linux capabilities, UID, GIDs, dropping privileges, what else for security contexts should we get? Yeah, I think those are some of the main ones. I mean, if you can use some of the more advanced options there, you have like Secomp or SE Linux, if you can use those, then that's great. I know those can provide their own challenges as well. But those are kind of like just the places to start from the security context perspective. I think, yeah, I think the user one is just big. That just helps just move you out of root immediately. One of the ones that's maybe not in the security context itself, but I actually really like is the read-only root file system. So a lot of general attacks, if you look at Metasploit, for example, a lot of the attacks go to slash temp because that has been traditionally writeable on every file system. You dump everything in there. Forever, and there's a lot of applications that assume that slash temp is writeable. But for example, we're running a sensor, it's a go binary. It doesn't write anything, right? And so, okay, how can I utilize that fact to just reduce the service area of like anyone running an attack against our container, which is also the entire file system read only. And if you try to drop a payload, you just get rejected, right? And so that is like a huge way you can reduce your file. Sorry, your attack surface. And it can be complicated to run. I mean, sometimes you write lock files or things out, but if you can analyze that and just put even like a volume there at those locations through Kubernetes, Docker volumes don't work exactly. Yeah, FIGMAP or something like that. Correctly, yeah, but you can create a volume like just a temp volume in Kubernetes and put it at that spot and the rest of your file system can be read only. And so even that provides a lot of protection from that standpoint. Nice, yeah, and you mentioned SELinux. I found one of the better workflows security team tends to be, I'd say you security team handles SELinux. If you can hand off the deployment file with the list of permissions of user IDs and capabilities you need, you can let the security team figure out a lot of those aspects for SELinux on the host without even having to get into that aspect in your Kubernetes clusters. So yeah, it's just sizing up what responsibilities are where and having that communication I think is one of the biggest challenges. All right. So this kind of bleeds at four and 10, right? So four is like pot security policies and 10 is configured mission controllers, right? As pot security policies get deprecated, right? I think they are basically going to be replaced with admission controllers, right? And so. Yep, it's an admission controller, I believe. And there's, I think there's three in monitor, watch and force. There's like these permissive, medium permissive and very strict. Right. So the admission controllers that things that you can use as pod security policies go away, you know, our platform has an admission controller, you know, like things like Opa gatekeeper in the community, key verno, different things like that to be able to provide some of this context here around security and providing that in a generic way. So I think those are kind of one of the main ways that this will be, you know, you'll do pod security policies, I guess, is kind of replacing them with kind of like, also what's nice is that they're a little bit more generic as well, right? Pod security policies are pretty tightly confined to very specific aspects of your pod and admission controllers can check a large variety of things just around configuration. Yeah, sorry to anybody watching the copy paste interview stream is not great. So that's just a big word soup right there but I'll try to format a little bit. Yeah, one of the, I understand why PSP deprecating is a conversation. We're really still talking about security contexts. It's just, how do we wrap a policy around it so we can actually operationalize that, right? So it starts with just making sure that the security contexts are working for each application and that teams are caring about that and it makes the application of whether it be PSPs admission controller, OPA, OPA, KVRNO makes it so much easier, right? They're just layers of abstraction built on top to help you scale. Yeah, exactly. You're just, you know, it's just a nice method of enforcing that, hey, you guys are using the right things or that you're not running these things as root or at least you have the exception or like we can audit what's going into the cluster because I think one thing that's happening with Kubernetes in general is that there's a lot of like deployment responsibilities now on developers and developer teams. And so, you know, in a lot of ways, I view it as application teams or responsibilities to write and ship applications. And of course those should be secure, but that might not be their forte or their area of expertise. And so you kind of need these guide rails and guard rails to say, hey, you know, this is how you should deploy this or you probably shouldn't run this as root. Is there a good reason? And a lot of times it'll be like, oh no, we can run it as any other user, like just let me know how to do it and we'll just run it as, you know, some other UID. Yeah. Yeah, and that conversation works a lot better than, hey, we've had no policy, let's implement these policies across all our clusters really quickly without any information about their deployments. That'll be a lot of friction that you're creating with something like that. So, yeah, 100%. That's our top 10. I will try to post a better comment in a restream for anybody watching. Kevin, hopefully that helped you with the notes probably made more work for you, but there's the top 10. We did have a couple hot topics. What's next? KubeCon was last week for those who missed it. All the videos are gonna be available online, and I think in the coming weeks, they did a good job with the hybrid setup. Definitely extremely challenging. I liked it at home because I got to, instead of, you know, setting your schedule where you have to go to all the events, like you can just binge watch a bunch. But want to talk about, you know, hot things. Supply chain security was a huge conversation and specifically container image signing and verification, cosine, you know, building and signing images with Podman. Wanted to get your thoughts on maybe proper implementation is, I don't think it's a great tool. It's another tool in the security arsenal. I don't think it's the silver bullet that people think it is. What are your thoughts? Yeah, I think that's super true. I think it's a really cool tool and I think that it does definitely have like good security implications, right? That like, you know, kind of verifying things that are going into your registry and being able to attest where they came from. Now, this sort of happened with SolarWinds, right? Which is, I think, you know, big reason why a lot of these conversations have kind of come to the forefront, but, you know, if the supply chain itself is compromised, then like you could still sign bindings and things that have issues, right? So it doesn't fix all of the issues, but definitely addresses like, you know, malicious images in your registry that didn't come from where you think they came from. And so I think that's... Like injection through a knit containers or something like that, right? If your workload is not supposed to have anything added to it, something happened during the knit process, you'd be able to see that, right? Yeah, exactly. But those can also be caught with existing runtime setups, depending on what you're doing, right? Yeah, absolutely. I mean, I think, you know, the more mitigating controls that you have, the better. And so if you can, you know, a combination of those two things is probably best, right? I think a lot of times, you know, we like to look at, oh, security is like, it's just a multi-layer process, right? And there's a bunch of good tools out there and products out there that help with all of that. And so sometimes, you know, it's not one product is your silver bullet, right? And so you're kind of, like you said, an arsenal. It's kind of like, yeah, you have a bunch of these things layered together and you can utilize the best of each. So I'm a pretty big proponent of that. Yeah, one of the best, probably in terms of operationalizing it, one of the best things is you can do it with the security team can just implement it. Like most likely your developers have set up a container build process that you just basically need to add on to that to sign and verify it and you can build it into your CI process without a lot of, let's say friction between teams because that existing CI system is already there, right? Right, yeah, I agree with that too. It's like, yeah, it should be fairly easy to go implement these things. And the final part is like, okay, the actual verification process which is interesting as well, right? It's like, you have to get, now the last part is, of course, when you go deploy these things, let's go verify that this is what you think it is. And that, you know, verifying the image that you're actually pulling is in fact, you know, signed, right? Yeah, exactly. We talked a little bit on to the next hot topic. We talked a little bit about ephemeral containers, Kubernetes K, it's 1.22, it moves into Alpha stage. Is this going to change some security workflows? So like ACS, which had Cuba CTL exact notifications? I think that still remains there. Do we need extra security around ephemeral containers too? I mean, I think it's another thing to watch, right? Like if someone's launching an ephemeral container, that's something that's kind of out of the norm, right? I expect this to not be launched all the time. From a security perspective, I actually think it's a plus because you can build images that are way slimmer now. A lot of times you'd build an image with maybe some network tooling in it, just in case you needed to like do DS lookups to ensure that like something is working, right? Or like, why can't this namespace talk to this namespace? I think network policy debugging network policies is a really hot topic there where, these two things can't talk to each other, you're trying to figure out why, you really want to run a curl between these two services and be like, what's going on? But you may not want curl in your main image, main application that you're running, because that could be used by an attacker, for example, to download a payload. So in a ephemeral container, right, is just you have all your debugging tools all set up there and the whole point of something that's ephemeral is that it doesn't run that long and that way you can kind of run your debugging and tear it down and move on. I think they're missing the tear down part right now. I think it will stay in the pod for the lifetime of the pod, last I looked, but, you know, for the development standpoint, it seems to be the most applicable. You can tell all the developers in your team, hey, here's an image with every single tool that all of these teams need. You guys can all use it to check your application, just run this one thing. But, you know, you take everything else out of your image that you would normally use and we're alerting on this in production, right? Sort of thing? Yeah, exactly. It gives them a way to respond to the question of like, oh, how do I debug my container in Prod, though? Right, and it's like, here's this formula of how you do it and please remove everything else from your image because we're literally giving you an image that can do this, right? And so it's kind of like a bargaining chip, I would say because I think debugging things in production is always difficult and so this at least gives you kind of a way that you can have application teams use this. Very true. All right, last topic. Validating true security threats and metrics and touching on this a little bit with vulnerabilities. Vulnerabilities never really go away. How do you triage it? Another aspect is how do you validate that the security choices that you're making are helpful to your overall posture? What does that look like and where are the security tools going in terms of measuring that? Measuring risk is always a tough conversation, right? Yeah, no, yeah, I think that's a really good question. I don't have all the answers, unfortunately. Yeah, I don't think anybody does. But yeah, I think a lot of the stuff around configuration and really trying to scope it down as much as possible is really useful. I mean, there's always some edge cases here and there where things need special privileges. I think when you think about measuring risk in a Kubernetes environment, you have a lot more data, right? And like you can start utilizing some of that data to really influence your decision. So you have things like, okay, you have a vulnerability and you have the vectors for it. If it has a network attack vector, for example, like is it exposed on a load balancer? Do you have network policies set up in that namespace to deny access to it? It's like, how do I quantify what really has a lot of risk? It's like, something that's a severe issue that's exposed over the internet. It's like, yeah, I wanna go look at this, right? And I wanna go try to address this issue as fast as possible. Another aspect, something that we just shipped, it's not in this demo, unfortunately, is around the concept of active vulnerabilities. And so how can you see what is actually running? Is this vulnerability potentially exploitable because there's a process that's using this library, for example. And so that helps you provide even more context around this vulnerability and it's like potential to be exploited. Is it in a process that's running? In that case, I'm like, hey, like, wow, we should go look at this immediately. And probably how many containers too, right? Like if it's a base image issue and that base image is being used across the cluster, that's a high impact, right? It's in how many different running containers. Right, exactly. And sometimes you wonder because there's different things in the base image that you never use. Are you, you may not know you're using or you honestly just never use, like sometimes they come with Python, for example. We write Go applications. We never call anything in Python. And so it's like, how risky is this Python bold if we never call Python ever, right? And so you can kind of drop the risk of some vulnerabilities and raise up the other ones based on how active they are. Very, very, very true. And that's to some extent that requires a little bit of user input too, right? And people being aware of the applications that they're running. Right, so this is kind of the value of having a whole runtime component of our product as well, right? Is like where you can look at what are you actually doing, right? If you think about network flows, for example, it's like, okay, how much network flow traffic have I actually stopped? How many violations around networking have I actually created that are of value, right? You know, looking at, you know, one thing that we could look at in the future or something is around file writes and whether or not you could create things read-only file system or influencing, influencing configuration based on what's actually running, right? Because I think the concept of knowing everything before you deploy an application is just a pretty challenging situation, especially if you're deploying something off the shelf. I mean, I run, engine X is always the container that I use. I, you know, I could cuddle run an engine X. I still can't tell you all the processes that run as a part of that or what files are written or what network activity is used, you know? And so, or like what port is opened by default? I think the fact that ranchers no release they drop the engine X default backend. I was like, yeah, it's the defaults. I think that as Kubernetes gets more mature, we're starting to realize what defaults are useful, but that's a little funny engine X. It's always the default test container. Yeah, it is. And for good reason, it seems to always work, right? Yeah. That was always the huge benefit when starting with Kubernetes. It's like, if I take this Helm chart and it works the first time, I'm gonna stick with it for a while until somebody tells me not to. But yeah, that's awesome. Anything else that you're looking forward to? Security space, changes in Kubernetes? Not too much. I mean, I think there's always a lot of interesting movement in the community. And I'm curious to see what the next sort of, they're not RFEs, but the kind of like improvements and the kips, I think they are, the enhancements or caps, sorry. Really curious to see what people have there and what people wanna move forward, especially as Kubernetes scales and needs to get more scalable and there's larger clusters that are being created. And so there's always gonna be interesting use cases and stuff that pop out of that. Nice. Yeah, if anybody is checking us out and wants to talk about a use case, wants to tell us to test something out, you can ping us stackrocks.io, go to the stackrocks.io community and there's some emails, bunch of docs and some communication stuff, shoot me an email and I'd be happy to chat about it on the future streams. Connor, thanks for coming on and walking us through the top 10. Hopefully we'll have you on again soon. Possibly. Yeah, sure. Anytime. Yeah, it's super fun. Nice. Anybody else watching have a good, I guess I could say weekend, right? It's Thursday. Just say good weekend. Hopefully people get a day off. Yeah, awesome. Thanks for joining. Thanks for listening to the top 10. Stay safe, stay secure and have fun out there. Bye.