 Hey, and thank you very much for the intro. So welcome to our session today about Kubernetes configuration. My name is Orca Mara. I'm a senior development team leader at SNCC. And I'm happy to talk with you today, together with Scott McCarty. Scott, do you want to say hi? Oh, Scott, I think you're on mute if you want to. Oops, my apologies. I started talking without on-muting. Thanks, Orca. Thanks, everyone. Yeah, thanks for having me. I'm a principal product manager at Red Hat for all the low-level components, like container engines, run times, and container images. And so, yeah, I've been Red Hat almost 10 years and looking forward to talking about security, bringing my curmudgeonly sysadmin view to the world. Amazing. So, let's start. So, what are we going to have today? We'll start by some background and some explanation about, like, cloud security, like attack vectors in cloud security. We'll give some examples and some demos for those, for some specific vectors that might be a problem as part of the Kubernetes configuration. Then we'll see some solutions for those problems, and we'll finalize with some conclusion. Scott, do you want to take it over? Yeah, I'll kick it off. Would you mind going to the next slide? So, actually, this is a pretty good one. Configuration and vulnerabilities, we just want to highlight here, basically, that configuration is part of the security risk. And we want to support it showing, especially at companies like Capital One, Big Banks, they see this especially because they end up with up to hundreds of thousands of developers and hundreds of thousands of nodes configured and hundreds of thousands of container images and things like that. So, at scale, this configuration vulnerability gets really, really pretty, the risk becomes much higher. Yeah, and I wanted to give just a little tiny background for those that have maybe never heard of CIA because I think in the cloud-native world, there's a lot of new people of software that have started cloud-native and maybe don't know some of these traditional constructs around confidentiality, integrity, and availability. This is a very old-school model of the world that's existed for, I learned it, I don't know, 20-ish years ago, but it's still very apropos even in this world. It's just the concept of there's three main types of risks or attack vectors. And one is around confidentiality, obviously, like leaking data. So, if you have a MySQL database in a container and the container is highly locked down, that container will still have access to the data that it should have access to. And so, if somebody hacks that MySQL, they could leak out the user data. And so, if a container is functioning correctly and it has access to the type of data that it should, it could still leak that out to the world. So, that's one way that people can leak out and basically have a security breach. Another way is around integrity, right? Like they could hamper, they could mess with a container image and sneak a misconfiguration in. They could even sneak in a configuration into your GitHub or something like that and basically mess up the integrity of your cluster by sneaking in a bad config file or something along those lines. And then finally, availability is like the noisy neighbor problem taken to its extreme where like if somebody could, for example, do a man in the middle attack and sneak a Bitcoin miner into one of your container images and then you end up running like thousands of those, you would end up making the noise, the neighbor problem within your cluster and then like take out your own, you basically create your own denial of service attack on yourself by pulling in images that you shouldn't have something along those lines. Another version of this, it's a little nastier, is like in the cloud world, it'll just keep scaling up and it'll just cost you a million dollars. So, it could also just be the other way around where your availability doesn't quite go down. You just end up paying a lot of money to keep the same availability and that's kind of a new thing that didn't really exist in the traditional world. So, I just wanted to kind of highlight those three just so that the context of the rest of the things that we're going to talk about each of each of the misconfigurations that they talk about kind of fall into these three categories. And then I wanted to also highlight from a background like these are the new primitives that you need to worry about. So, like every, you take a traditional IT environment and people think about things like firewalls and routers and servers. Those are the primitives that they think about. That's the traditional IT infrastructure. But when you move into Kubernetes, every Kubernetes has these four primitives. Every Kubernetes has container images, they have container hosts, they have container registries and then you have to think about all of the platform itself. So, Kubernetes itself, the kubelet, the configuration around that, the etcd and all the components that are part of Kubernetes. And so, we wanted to highlight that configuration comes in all three of these. There are config files embedded in the container image. There are config files embedded in the host. There's configuration in the registry the way it's again around availability or something like that or even data breaches. If you embed something in a registry and it's misconfigured and then it's public to the world and somebody steals it, there's essentially configuration embedded in all these. And then, of course, mostly what we'll talk about is around what's actually embedded in the Kubernetes configuration itself. But we just want to highlight that in addition to the traditional IT primitives, these are the new ones that you need to think about when you're thinking about Kubernetes. And so, now putting it in a context where you can see, you can say it stacked up, right? Like historically, I joke, the operations responsibility around the CIA and around these primitives, the old IT primitives were mostly controlled by operations. But when you move into Kubernetes world, the Kubernetes itself will often be ran by some kind of SRE team, which you can think of as a modern operations team. The host, the trusted host, and the way that's configured and the default configurations are there, kind of managed again by the SRE team or the operations team. But it gets a little bit hairier in the container image. That's where things get, where there's really a shared responsibility over the configuration and how it works. And so that's where, that's where you need to pay particular attention. And so we'll probably highlight some of that here. I want to highlight, actually, I want to go a hair deeper in the next slide, if you would mind, or you know, this is actually, I'm going to hand it off to Orr, but he's going to go deeper into like some of the things that the developer actually has to worry about in this container. So yeah, exactly. Let's kind of take a step backward and try to understand what are the, what's the ownership of developers those days. So we'll start with example. So this is my Python application. So I need to make sure that the source code, the code that I write by myself is secured enough. And then we'll start using third-party dependencies like packages that will be part of the requirements to XT. So we need to make sure that no security issues are part of that. Next step for us is to wrap the this application with container, right, to build an image for this application. So we'll write a Docker file and we will use the Python free base image as part of this base image and probably with other packages that we can install as part of the Docker file. There are lots of OS dependencies that we need to be aware of, right, like security issues as part of them as well. And now we want to deploy everything to AWS, for example. And we want to use Terraform as part of those configurations as part of the infrastructure code files. Some security reasons that we need to make sure, like in this case, to make sure that like our ports are not open for everyone. And last but not least, our topic for today, the Kubernetes files, the configuration of the Kubernetes file are definitely a major part of that. And like the example that we started with before is a great example for our configuration files specifically with Kubernetes can do lots of harm. Yeah, lots of things to make sure that we cover. And now let's start to dive into the security context of Kubernetes. So for those of you who are not familiar with, this is a security context. So security context basically let you define privilege and access control settings for your pod or for your container. So in this example, we have a pod configuration. As part of that, we also have the security context. And today we're going to cover two things. We're going to demo what can go wrong with a privilege pod. And we will also demo root containers. So now let's start talking about privilege pods. So what exactly are privilege pods? So think about cases where you develop something and you need to access the host's resources. Think about accessing the network stack or accessing the GPU, for example, or just running in simple cases, like if you want to run a Docker inside the Docker. For all of those, you actually need to run with a privilege mode. And the security risk is very, very simple. It means that processes and privilege pods are exactly the same as processes running on the host, like root processes running on the host. And it basically means that an attacker can do anything they want if they have access to the pod. So the solution is simple as well. Don't use it if you don't need it. And now let's see a demo. So in the next demo, we're going to have two different applications. The first one is supposed to be a secure payment application. And so it's kind of an isolated application. We'll see in a second why it's not really isolated. But the only thing it does is just to write into a secret file named dvcards.json. And in addition, we're going to have a vulnerable application. For this application, we're going to have two different issues. The first one is RC, remote code execution vulnerability that is just part of this application. This is not the interesting part for us. The interesting part is the second point, which is the fact that this application will run in a privilege mode. And we will see how exactly we can use the RC and the fact that this is a privilege pod in order to access the content and data from the security, in the security payment application. So this is our node. And as we said, we have the privilege pod. And we also have the security payment application. Both of them run on the same node. They use the same Docker engine, which also means that they use the same local storage. So just imagine that our attacker managed to use the RC vulnerability. Now they have an access to the pod. And because this is a privilege pod, they can basically access content and data from the payment application as well. And now let's see the demo. Let's just look on the application first. So this is our simple application. We have a simple gas look. I'm going to upload a picture into this gallery. And that's it. Simple as that. And let's take a look on our payment application. So this is the payment application. And as you can see, I'm going to enter my credit card and donate $1. And that's it. I just enter my credit card. Let's take a look on what's going on by the scenes in Kubernetes. Let's look on the pods. And we see that we have those two pods, one for the payment, one for the regular application. So this is the regular secure payment application. Nothing special. And this is our vulnerable application. And we can see it's a privilege one. Now let's see what we can use as part of this application in order to add it. So we're going to do two simple things. The first one is we're going to upload the shell PHP. And this is basically the remote code execution vulnerability that we'll kind of demo. We will use this PHP. We will access it from the outside using caro command. And we're going to run some command on the machine. So this is our shell script. As you can see, we can just run commands as a system. So now let's just upload the shell script. And again, the shell script is just kind of an example for NRC. This is not the interesting part here. But I'm going to take the name of the PHP file. And I'm going to replace it. The first thing we're going to do is to run a caro command that will access this file, this PHP file. And then we'll be able to run simple commands on the machine. So the first thing I'm going to do is just to make a new directory. So I'm going to use the makedir command. The upper terminal will show us what exactly is going on inside the pod. So the upper terminal is just so we will understand what's going on. The lower one will be for us as an attacker. So let's look on the pod. So we see that there is nothing under the 10 directory. And now we just ran the makedir command. And we can see that by using the PHP file, we managed to create a new dir. And now this is the interesting part. We're going to run a mount command, which basically helps us to mount the host's file system into our pod. So we consider there is nothing under 10 os. And after we run the mount command, we see the actual file system of the os. And this is the problem error. And we'll see in a second how we can fix it. So not just assume that I know the attacker, the name of the secret file is the card's JSON. So I'm going to just look for all the files named card JSON. And then I just want to print it. So I just want to print the content of the file. And that's it. So again, this file basically located inside the outer pod, but because the resources are shared between those two pods, we basically managed to access from the one pod to the other because it is a privileged pod. Now let's see what we can do in order to fix this issue. So we're just going to change the privilege into false. I'm going to rebuild all my environment. So I have a cleanup script. And now I'm going to rebuild it again. And let's reopen the application. And now we're going to do exactly the same flow. So we're going to upload our PHP script. And we will try to do exactly the same. So now we're going to access this PHP script from the outside. Let's look on the on our pods first. Let's make sure that they are live. So yeah, they started like 28 seconds ago. Now let's run QtlExec into one of them. And let's try to do exactly the same thing. So we just validated that there is nothing under the temp directory. And we're going to run the makedir directory again. The makedir command again. And just a second. So now I'm going to check the content of the temp file after our render makedir command. And then as you can see everything is working. And now this is the difference. So now we're going to run the mount command. And because of the fact that we're not running in a privilege pod, the mount command fails. So let's try to run exactly the same command inside the pod. So we'll understand what's the issue. And we see that we got permission denied. And again, the reason why we got permission denied is only because we didn't use privilege pod. Good. So that was about privilege pods. Next topic is about, next demo is about root containers. So when do we actually need root containers? Like why is it, when is it useful? So think about every simple case like that you need in order to manage your image. Cases like installing system packages or just edit simple configuration on the image. Or even network operations, like simple network operations like theme. For all of those, you actually need to run as root inside the container. So the security risk is pretty much, is much lower than running inside a privilege pod. But it's too risky because an attacker can use those privileges in order to do, to do some art so they can access files, they can export the network. And I think that another thing to mention here is the problem that in lots of cases, someone that uses like a simple image that was just downloaded for Docker, for example, they will be surprised to find out that this is also, that they are running as root containers as well. So in this example, in the picture, you can see that I just used a PHP based image. Did nothing as part of the Docker file that changed nothing related to the, to the user. And when I, when I will run this image, you can see that I'm still running as root. So lots of, lots of images come with a default, like default root container. So now let's see how we can solve, solve this problem. And we have two different solutions today. The first one is links capabilities, probably most of you are already familiar with this kind of legacy and, and, and all capability. So this is kind of an option to grant very specific permissions to your application. So instead of just saying, okay, I'm going to get my application, everything, we can just pick very specific permissions. So the basic recommendation is to follow the list privilege, list privilege principle. So it means that we will, we will try to start by to start by dropping all of our capabilities, and then gradually add those capabilities that we actually need in order to run our application. And let's try it out. So in this case, we're going to have a kind of a different environment. So this is our, this is our configuration. We see that we are not running as privileged pod. And this is our, our container. So we will run inside, we will run a Qtcdl exec to run inside the, inside the pod. Let's check the IP of our machine. And now we will try to use an end map and then a tool called end map. End map is basically a free network scanner, very common among attackers. So as you can see, I just managed to scan my cluster and I found the IP of the payment service. Just imagine that there is a vulnerability on this payment service. And now, because I have an access to the, to this service, I can just access it and run on the payment service. And so yeah, in this example, we can also see that we just ran end map command to see the open ports and listening ports on the machine as well. So now let's see how we can solve this issue. Just imagine that I will just drop all of those capabilities. And let's restart our environment. So I'm going to clean up everything and rebuild again. And again, the assumption is that there is all kind of, there is kind of a vulnerability like an RCE as part of this application. So I can get an access into it. And, and the problem is that because after I have an access, I can do some problems. So let's try to eliminate those problems as if possible. So I'm going to run exactly the same flow and I'm going to run end map again. This time you will see that there is a failure. And the reason for that is that, is that end map actually requires the capability named network. So when we dropped all of the capabilities, we basically eliminate the option from, from net net end map to run. Amazing. So that was Linux capabilities. Let's, let's continue to the next one. And the next option is run is non-root. This Kubernetes option basically let you, let Kubernetes block containers that would like to run as root. So basically, you can say to containers, okay, please stop from running all the containers that would like to run as root. And the recommendation, if there is, if you know that none of your images, none of your containers need to run as root, so please just turn it on. And now let's see that as part of the example. So I'm going to remove the, the drop all capabilities and I'm going to turn on the run is non-root. Then I can clean up the environment and build everything. And you can see that immediately we, we got an error that create container config, create container config error. Let's try to understand what exactly is going on. So I'm going to run the describe command on the pod and let's look for the actual error. Yeah, there it is. You can see that we failed because the container as run is non-root and the image wants to run as root. So we basically managed to block this container from running on our environment. Amazing. So we talked about privilege pod. We talked about root containers. Next thing for us is to talk about resource limitation. So this is kind of a, kind of a different topic because it's not like an immediate security risk, but we will see in a second what can, what can go wrong when we, we don't have a proper limitation. So let's first describe what kind of resources do we have. So we have like CPU, we have memory and of the basic problem is that pods run with unbounded limits and this is by default. So a single pod can basically take all the resources, all the CPU and memory that available is on the node. And, and the single case is that Kubernetes might kill the application or even nearby applications within the same node. So as you all probably know, defaults are never a good. So the basic recommendation is just to manually assign those limitations for each and every application. So you need to make sure that you know what type and like how many resources your application, your applications need and then to properly set those limits. So let's talk about CPU first. So for CPU, there is a thought throttling mechanism. So, so basically it means that Kubernetes doesn't really terminate those applications. So that it will just cause like slowness on the performance. So this is like the worst case scenario, but with memory, this is a different story. So first of all, memory is not compressible, of course. And pods will basically will be terminated once they will reach the memory limit. So the simple bad case is that for tags like does attack, we can block legitimate user from using our app, right? We will just take all the memory. And that's it. No, no other users will be able to use it. So this is like the, the simple case, but the worst case is that someone will run a dos attack on our application and it will block legitimate users from using a different application that is running on the same node. And now let's see an example for that. So in the next level, we're going to have two different applications, both of them, again, running on the same node. We're going to have the innocent application that shouldn't be affected by any other application. But we'll see in a second that it will be affected. And we also have a vulnerable application. At the beginning, we'll see that we'll run this application without any resource limitation. And one important note is that we will assume that there is a vulnerability on this vulnerable application that will basically let the attacker take more and more resources from the pod. And we'll see that in a second. So again, what you will see is that an attacker will use this kind of vulnerability. It will take more resources from the vulnerable app, but it will also affect the innocent application as well. Good. So let's look at those two applications. So this is like the regular one without any resource limitation. And this is the innocent application, just a regular application. So for each one of those applications, we can see the amount of total available memory. And this is our vulnerable application. And as you can see, there is an API for just taking more and more resources. And again, this is kind of the demo for the vulnerability. So as you can see, when we try to allocate 100 megabytes, we actually affected the other innocent app. And when we try 200 megabytes, we can see an immediate drop in the amount of available memory on the innocent app. So now let's try just to turn on the resource limitation. So I'm going to limit myself to 100 megabytes. And I'm going to clean up all of my environment and rebuild it again. And now let's wait for the application to start. Amazing. Good. So let's try with a simple case. We just allocated 10 megabytes. We see that it's working. But this time, we'll try to allocate more than that. And we see that immediately, we failed with 100 megabyte allocation. So just this simple few lines that we added prevented a potential attacker that might use a vulnerability instead of are up to use it and to take more and more resources. Yeah, so that was basically like the three demos. And now let's try to go over some conclusions. So what did we talk about today? We talked about like the ownership of developers in the cloud environment. We talked about security context, about we saw an example for privileged pods. We said that it's really important for you not to use privileged pods. There are very small scenarios when you need to use it. If you don't need to use it, please don't use it. We also talked about food containers. We mentioned what are the differences from privileged pods. And now we can eliminate those security risks. And last but not least, we talked about the resource limitation. So let's talk about some conclusions. So Kubernetes security is definitely hard, but is also doable. And I think that as long as developers will be more and more familiar with the risks as part of Kubernetes environment, it will be just amazing. And we basically need to understand that it's kind of an inseparable part of our application. It's not like we can just implement our own source codes and forget about everything. It's part of our application. We need to make sure we are familiar with those risks as well. And of course, it's all about education. So we need to make sure that everyone are familiar with those risks. Scott, do you want to take it? Yeah. So as I mentioned at the beginning, when you look at the different primitives, starting with a secure base, the wiggly pieces of your environment are probably where the biggest risk is. So obviously, once you start sharing responsibility with the developers, making sure that you shift left, not only the configuration that you manually choose, but also the configuration that you don't necessarily realize that you're pulling in in the container image. So I always say start with a provenance, start with the trusted thing from a trusted place. So start with the Linux base images that you trust the people that put the configuration files in those base pieces, especially things around the critical pieces, especially around things like OpenSSL and G-Lib C and things like that. And then, or I think I'm going to cover the next one, also think about it in the context of configuration sprawl. It's not just the quality of the configuration, but also the quantity, because with quantity, you're going to have a bigger risk. And so we recommend standardizing on a single base image. No matter what that is, it's still a better thing than letting people pull into any base image they want. So shift left that standard base image, and then make sure that standard thing is high quality. Those two things are kind of, they're both necessary to reach a sufficient level of risk, basically, that you can guarantee they're not pulling in configuration from all over the place, as well as low quality content and other things. And, Orin, I think I'll hand it back to you. Amazing. So yeah, continuing the shifting left messages. So just try to understand and like automatically catch those issues as soon as possible, ideally during the development process. So, you know, let's issue in production. You will have more time. On focusing on your own code. And yeah, I think this is like the most important part. Let's try to find those issues as soon as possible. So, basically, we covered only a very, very small part of Kubernetes security. We skipped lots of like known issues. So if you are interested in this content, if you think that this is also in several part of your application, please make sure that you that you are aware about all of the security risks in this list as well. So make sure you're familiar with your familiar with your familiar with the capabilities of what security policies and how we can use it or just the other options as part of the security context like a far more or a lot of previous escalations. I just want to demo a product that we launched lately at SNCC and called SNCC Infrastructure Escode. So basically SNCC Infrastructure Escode let help you to find and to fix a few of the issues as part of configurations file, configurations file of Kubernetes and Terraform. You can use it with a direct and integration to your Git or you can use it with as part of the CLI. And there are also some options to filter some policies and to change some severities as part of those policies. So if we will take a look at this example, I just scanned one of my repos and I can see my file and the fact that this Kubernetes configuration actually uses Privilege Pod and I get alert on that and same thing for the CLI so I can scan a specific file and to get an alert on that like to get the list of issues in this file and I can also filter by severities of course you can also introduce that as part of the OCI CD as well. That's it. Thanks for listening. I'm not sure if we have questions. Let's see. So look on the question second. It looks like we have about 15 questions. Let me know if you'd like me to read them off to you or you're welcome to. Yeah, you want to read them? Sure. Okay. So the first one we have they're asking if possible please provide a comparison with OKD and if there are differences there related to security. I guess I can probably grab that one or so OKD for those of you that don't know is the upstream project that is sort of the the we call it a midstream in Red Hat parlance and open source parlance between upstream Kubernetes and downstream open shift as a product. And so OKD is kind of like RDO was to open stack. It's kind of like Fedora is to rel. It's this midstream but still upstream project. I might say you know like materially just like Fedora and rel you know it's not that there's it's not that we're doing experiments at OKD seeing what works seeing what doesn't work we're updating it quickly. I'd say that's the biggest differences right like from a configuration perspective it would be really hard for me to nail down like what specific things are different. As of right now I don't know of anything off the top of my head that's specifically different because the vast majority of the configuration probably is the same. But you know you'd probably see some small changes here and there and an open shift as a whole was moving very quickly and so is Kubernetes obviously. And so you'll you'll see you know call Kubernetes the fastest OKD you know closer to the open shift speed and then open shift probably you know has has LTS releases which go even slower and so that kind of gives you some stability to like analyze it and things like that and then run it you know in a life cycle. But but that's about the best I think I could do to that question in a short amount of time. Okay great. Next question. Privilege run mode is false by default. No. Yes it's false by default but the concept is it you need to be aware about the fact that if you turn it on there is a huge risk. That's the concept. And also I'd add or like people turn it on and then they share the files and the next thing you know your default happens to be on because you didn't realize it because you had 200 people sharing a config file that we're all building off things and next thing you know of this Providence where they've built you know like any lazy sys admin or developer I copy things from other people that I trust. Next thing you know you're going to have this thing rampant in your environment. We've seen that happen with open shift customers where we like tell them to not do things and then they turn it on and then it gets shared and next thing you know it's everywhere. Exactly. So next question is. Yes looks like this exploit relies on code injection via an HTTP request. Yeah so I think that like the exploits are like the less interesting part here like the RCE that we demonstrated is only to for us to demo the fact that if you have a vulnerability on as part of the application doesn't matter what type of vulnerability and you can get an access to this code. If the configuration is wrong, if something is wrong with your configuration then lots of troubles can happen. So this is the concept here not like the actual RCE. Yeah I agree. I'd add to that too like think about that think about you know the old saying more than 50% of exploits come from internal users you know a malicious contractor, a malicious user, a disgruntled employee you know like these configuration files can have an effect on either an external exploit like the one he showed you know or showed or an internal person just decided they're going to break out of the container that they have access to. Exactly. Great. Next please provide some more info about the differences between privileged, false, and, I apologize, runs AS. The run is not root. Yeah sorry about that. So I think that just just be aware that they're not not to confuse between like privilege and root. This is something like this is two different things. Privilege means that the container from the container can access the host's resources while root containers mean that your default user the user word that your container stops with is root. But if you're running as root without any without without running as privilege root so a privileged pod you cannot access the host resources. Yeah. Yeah and I'll like all these I'll add one more. So if you Google search for an article called root inside and outside of a container I explain this pretty well like what it what it means. Yeah there's there's root inside the container root outside the container and like they're two separate things and then that's separate from privilege because because privileged with root privilege without root like you could there's all kinds of permutations of this you can go but yeah just know that they're like Orr said they're two different things and they have profound impacts. Great next we have can you share the code used in the demo? Sure of course we can. Next it can be challenging to get good values for memory limits given there can average use than spikes. Any recommendations? It is a challenge. Scott do you have any good answer for here? This is one of my this one's a tough one I'll admit it's a really tough one like I joked that basically all the software that we use was written before containers and so you know like JVMs and you know Python and Ruby and even Node.js they were all written before containers so there's not really this concept of limiting the memory easily and so you end up in the OOM killer you know problem basically but in a container it's not easy to solve I mean you have to you have to be able to scale out your application so that it doesn't want to use up enough RAM I mean it's an art in a nutshell because you gotta be able to scale out the containers so that you don't overrun the memory you know reach one and then end up with a bunch of them getting killed you know and things like that I mean it's it's an art I'll warn you yeah but it is a really important art. So my recommendation is just make sure you're familiar with the application like it's okay to play with it as long as you have monitoring like proper monitoring so even if you get lots of out of memory because you eat the the memory limit if you have proper monitoring you can play with that and in case you eat the the limit you can just raise it a little bit so it's like a combination between get get get to know your application and also make sure that you have a proper monitoring. Great are the sample apps used for the demos something you can share? Same. We are building a repo right now at SNCC for those examples and more examples like that and we want to open that for the community I'm not sure if there is any if there is a specific date for us to make that public but probably soon so I'll make sure we'll post anything about that when it will be public. Any cool resources you recommend for secure and scalable systems architecture? That was a tough one for me or I don't have any like architectural guides but I do have like I work more in concepts I try to arm people with like the concepts that are going to then let them come up with the architecture that makes sense for them because it's so hard to provide what they need because there's still so much but I can share a couple articles in in the chat to like respond with things that I think give good guidelines that will help you come up with your own architecture or maybe we can share them afterwards. Nice and good and and I guess it's also it's hard to learn that because it's very different between environment right so one thing like core architecture that is really suitable for run company and maybe not not so proper for another one just imagine the fact just imagine the the case of the privilege bot so maybe you actually need to run privilege bot because you need to access the GPU it is it is possible so it doesn't mean that you shouldn't use the privilege bot but you need to think about the proper architecture to prevent security issues like that. I agree and then you just it's still just if you google search for container defense in depth I have a talk around this where I talk about you know there's process level isolation all the way to data center isolation you have to have them in different resource zones in Amazon and like Orr said like if you have to provide privilege for like say you have GPUs then you just need a different level of isolation maybe you need a separate Kubernetes cluster where you allow that privilege stuff to happen for those GPUs and you have a different one that that is for your applications are external facing and this this is really no different than what we've always had right like if you look SAP and Oracle and a traditional environment ran on an internal network and then your web and DNS ran in a DMZ facing the internet right I knew you separated those from a network storage and data set you know essentially rack level isolation it's not really any different with cloud you just have different use cases so it's not bad that necessarily you need privileged here and there it's just if you're going to need it know that that whole cluster basically is now dedicated to that that use case yeah that's why I say I try to talk in concepts so people can come up with their own their own architectures it makes sense yeah um next what about destroyless images and root access for container I can go back if you want or yeah so like the first thing I'll say is I don't think those things really have anything to do with each other it's equal risk so just just think of destroyless as doesn't have rpm or apt in the container like it it is not connected to a dependency tree where it pulls in more packages you know um you know we're building root we're for example we're building destroyless images for rel 8 for for ubi and all that means is we're making them smaller and not having a root you know not having rpm and well not having rpm and yum in it basically is all it means and so you're still relying on a distra somewhere behind the scenes you might not understand or see it locally in the container image but it still exists out there because somebody's building cvs and rebuilding that software and tracking all that stuff there's there's no such thing as destroyless there is only somebody else's distra that you borrow and then use and that could be somebody else compiling it but but that is really really very independent from whether it's root access in the container or if you're running dash dash privileged you know for example again those are those are very very distinct you know problems you could have a privileged distra list container that hacks you just as easily like it it's not going to stop somebody from curling something into the container into memory and then executing it like like even if you make the container read only in distra list that's not going to stop or mitigate root access or privilege like because they'll just copy it into ram and run it and then you always have access to ram so like even if you don't have access to disk with read only yeah that's my best shot at that great next is isn't pod security policies being deprecated um so we haven't actually touched the pod security policies we talked about pod security context uh just as a concept for to understand like what's privileged pods are so for those of you who are not familiar like pod security policies cluster level resource and controls and so and it's not i'm not aware that it's being deprecated like scott anything in your side i'm not aware of it either on that on that one i'm a little less less up to date yeah yeah but again like that the concept of privilege pod is not deprecated right it's still there yeah uh same same thing from the like the all the the risk that we demonstrate it yeah we've even been talking there was a little chat in the kubernetes community maybe three four weeks ago right right around the turn of the year i don't remember exactly when but i remember there's some there's some google guys and some red hat guys i was involved in it there was a bunch of people we were talking about how do we get people to run as non-root in the container like it's really really hard because docker had the concept of letting people to run run anything as root and you know internally at google i don't think things run as root in bork but in kubernetes it's really common for people to run as root there's there's there's movements if you will afoot to try to educate the world on how to not do this but we're kind of down the rat hole because the way the image format works we just kind of all adopted root so like that's the best i can kind of give you on that there's definitely a movement afoot every all the smart intelligentsia knows we shouldn't be running things as root but yet we still are right great we still have about 10 questions left in about 10 minutes left in the um this next question uh has about three questions in it so bear with me um it's a long one um does shifting security responsibility to the left potentially have greater implications for a supply chain attack how should final approval of the case containerized service product be handled to avoid such attacks do stakeholders with higher authority need to be aware and have understanding of potential risk when reviewing final configurations i feel like somebody like set this question up for me or i don't know if you yeah go ahead go ahead enjoy all right yeah this is what i've talked about for many years i've talked about with with uh you know it all kinds of talks and panels and all kinds of crap but um yeah the nutshell yes when you shift left obviously when you share this is just a basic you know engineering thing like more moving parts always equals higher risk right like if you can if you can push 100 pounds with a single gear or you need to push 100 pounds with 22 gears it's more likely to fail with 22 gears right and so as you as you add developers you absolutely increase the risk to not only the container images and all the configuration there in those which yes i i think supply chain attack in the image is probably the the biggest scare for me and then there's also the supply chain attack in the configuration files like like or and i mentioned like if somebody turns on root you know in a kubernetes yaml file that thing gets committed to github and everything everybody starts forking it and using it and next thing you know you have root you know privileged equal true or you know like like that's that's a problem um i would say it starts with providence first start with trust right like maybe always go back to a golden template for the kubernetes yaml file that is approved by security and say everybody should develop off of this and anytime you change one of these default configurations we should know why you did that you know like like when you've committed and get somebody needs to explain why they did that you know like that's a good way to do it it's like trust but verify right like and then and then you could scan it later with some linter or something that maybe has all the rules in it to see who's changed and whatnot but but i think people rely too much on the scanning thing they're like ah we just scan it it'll be fine you can never scan your way out of bad security like like the the known things that you'll have in your test is always less than the the reality of the universe and there'll always be things you didn't catch in they'll always be out of sync so like so like it's like ci cd testing like it gives you a bit of a warm and fuzzy but we all know just because it passes ci cd doesn't mean it's secure doesn't mean it performs right doesn't mean that it's you know like actually even doing the right thing it's supposed to do it just means that it passes all the things that we tested for like and so it gives you a level of confidence but it's not there you've got to start with a secure base image a secure configuration and then work forward from there and then scan to maybe verify that you didn't drift too far from those things that's my rant on that sorry yeah so my my my answer is much much shorter like i'll answer only to the last one so yeah definitely i think that you know stack a little must understand what's going on uh you can just introduce new changes to your environment without understanding what's going on uh yeah the cases that scott just mentioned like just download the configuration file who knows what someone put there uh just think about a simple installation of a template so do you really understand what's going on do you really read each and every line out of it so the answer should be you need to be aware of that of it for sure and yeah try to use more tools uh to to to help you as part of this process but yeah the answer is definitely yes um is it enough if we only set up the privileged pod pods to false for pod security so so again it the for privileged pod it depends on the scenario if you don't need it yet it's enough um but in cases where we actually need the privileged pod i guess you will need to think about other workarounds to make sure that nothing else is accessible in this cluster uh but yeah there is a risk that in that yeah i might i might add like removing privilege is a good first step the next thing i would do is not run as rude like like like run it as a regular user like if you're going to run a web server make sure it runs as the hdbd user whatever like make sure those things drop proves and don't need you know also look at the capabilities you know if you like the one example or you gave which is really good drop all proves drop all capabilities and then turn on just a few until your app works like like start with nothing and then work your way backwards i think i think capabilities is a good place so i think setconf is a good place i think se linux and the s for the way like openshift does it is we automatically dynamically generate an s for a label for se linux and then all the different containers can't talk to each other see each other just by default um and then i think not privileged you know you definitely don't and then not run is not rude like those are probably the ones that i'd off top my head yeah i'll think about yeah by the way i think it's a really nice exercise just to to make sure that you understand what's going on inside of your application so next time you you want just to drop all of the capabilities and just to understand gradually what's going on okay we need an like network access we need some access to the disk or something like that but it's it's really useful just to understand what's going on uh so it's a it's a huge recommendation all right we have just about four more questions and just a few minutes left so let's see does a uh sneak container gather scan gather data from the images it scans so we haven't talked about sneak container scan uh we just talked about the sneak infrastructure scan uh so for the sneak container scan just understand what dependencies are part of your docker image um and we check what uh security issues are part of those part of this list and that's it we don't we don't gather any information like content files stuff like that all right what's linux wmde was used in the demos uh i3 how to balance how to validate kubernetes yaml that's a tough one to answer because like you have to have some expertise and know what you're looking at but like or made some good examples but i'm not aware of any tools that simplify it or kind of do it for you off top of my head i'm not aware of anything but i'm sure stuff exists all right and last question setting a memory limited model the application to scale out with multiple replicas instead of using more memory per pod is that a proper approach yeah that's what i was hinting at when i said you've got to do a balancing act between scaling out so like if you know what the load on a particular i'll just go one layer deeper or if you don't mind uh like if you know what load generates 100 megabytes of you know memory usage you can then kind of scale out horizontally and only load up each web server each database server whatever that they use up a certain amount of memory that's part of the art of all of this i do this in one of the labs i i run where i show you can scale something out it actually doesn't perform any better it actually performs worse sometimes when you scale out there is a p id loop there where you go too far it doesn't perform well and you scale it in it actually performs a little bit better and it's an art it's something that you get load testing and there's always an art you gotta always load test things and then see how it works yeah but by the way we didn't cover that at all as part of this uh this uh lecture like we just talked about the security risk that might be part of it so it's definitely an expertise that someone need to learn for sure um i think this one is an answer to the previous question right yes yeah they're just showing um cube yaml.com perfect um all right well i think that's it i just want to thank vor and scott so much for their time today and thank you to all the participants who joined us as a reminder this recording will be on the linux foundation youtube page later today we hope you are able to join us for future webinars have a wonderful day everyone thank you