 That was an awesome last talk. I was really happy to see Obershift 4 again today. We've been working really hard these past, I don't know how many months. Ever since really CoroS came on board, the synergy with CoroS and with OpenShift has been amazing and we're all so proud to finally have Obershift 4 out soon, very soon, not officially, oh, hold on. So I'm Sally. We were here to talk about container security. I'm Sally. I'm on Obershift, the auth team now, but when I started at Red Hat a few years ago, I was on the containers team. We were submitting pull requests to a little-known project upstream. Maybe you guys have heard of it, Docker. But a lot has changed in the past few years and that's what we're here to talk about today. Hello, everyone. Oh, yeah, it's on. Okay. Hello, everyone. My name is Urvashi Munani and I'm a software engineer at Red Hat on the OpenShift Runtimes team. I work on the lower level container tools that Kubernetes and OpenShift use. And today we're gonna sort of talk about container security and how you can make your workloads more secure and how these tools sort of make up, have a huge role in that. Yeah, so like I mentioned a few years ago, we were submitting pull requests to Docker. Also, there were some other companies getting involved in containers. CoreOS was developing Rocket. There were some runtimes being used other than Run-C, like Cota Containers and GVisor. Well, we all realized very quickly that a set of open industry standards, we were gonna need a set of open industry standards to propel things forward. Otherwise, we were gonna end up with, you know, rocket images versus Red Hat images versus Docker images and that just nobody wanted that. So Google, Docker, Red Hat, Microsoft, all the big players got together and came up with the Open Container Initiative. It put in place standards around what is a container image format and what is a container runtime. And so now any OCI image can run with any OCI runtime and we were free to develop new tools that suit our specific needs. So before we dive in, though, just real quickly, what are containers? They're just Linux processes. I'm sure you've heard this. They have any, like any process in Linux, it's secured by things like SE Linux, App Armor, SecComp for syscall filtering, Linux capabilities. They're also constrained in the amount of resources they can take up on your system, like CPU and memory. And that's done through Linux C groups. And finally, they're isolated through the use of namespaces, Linux namespaces, like if you're in a container, you're in a pid namespace. And if you're from inside that container, if you run PS, you'll only see the processes running inside your container. You won't have any access or view to the processes running on the host outside of that pid namespace. So namespaces give you an isolated view with regards to a resource. So what's a container image? Again, nothing special. It's a tarred up set of layers. And with a JSON file description, there's a base layer, usually. That's like an operating system, user space layer. And then additional layers are packages or binaries, dependencies, anything you need on top of that operating system. That's it, tarred all up, that's a container image. If it's an OCI image, it follows certain specs that are defined by the OCI. What do container engines do? They are programs that know how to take that container image and extract it, explode it onto your local disk, usually using copy on write, hopefully using copy on write file system. Because otherwise, if you had 10 Fedora images, you'd need 10 copies of that base layer. With copy on write, it shares the layers. It also, a container engine then creates a runtime config. That's another JSON file. It creates that from user input, like any flags you pass to your container run command, like dash, dash, privileged, or dash, dash, dash v, volume, whatever. And it takes the user inputs with the container runtime system configs and knows how to, from there, launch that runtime. So that's a container runtime engine. And that's it for my overview of containers. All right, so now we know what containers are. So the containers, please, can actually be broken down into four different sets of action. You have building your container images, second, running and testing these container images locally. Then you would want to be able to share these images, probably move it from a local storage to a remote registry. And finally, you want to run these containers in a production cluster, such as Kubernetes or OpenShift. Now, what would happen if we had all these functionality and just a monolithic tool? Like, think about it in terms of security. We would obviously end up with the least common denominator when it comes to security. So for example, you don't need as many privileges to run a container in production as you need to build the container images. So why bunch them all up in one tool? So we decided to follow the UNIX philosophy, which says that you should design programs to do one thing, to do it well, and to work well with other programs. And as you can see here, the UNIX founders are pretty happy that we're trying to follow their ideology here. And we decided to make four tools to target each of the set of actions that I mentioned. So we have Builder, obviously, standing for building your container images. Podman is this all-in-one CLI tool that you can use to run and test your containers locally. Scopio is used to share these containers, move them from one environment to another, and Cryo for running your containers in a production cluster. And now we will go in-depth into all of these. Oh, yeah, I think we should be brave and do a live demo. Why not? We like living life on the edge. So we have a script to show off some security features of these four projects that Red Hat has been working on the past few years. So first one is Builder. It's the tool for building container images. When you're building a secure image, one thing you should think about first is make your image as minimal as possible. Only put in your container exactly what you need to run. Builder makes this really easy to do. If you run the command Builder from scratch, that sets up the scaffolding of what is a container and puts nothing in it. It's an empty container with just the namespaces and C groups set up. Oh, sorry. Forgot to do this before. And you can see we're running these commands as sudo. But with Builder, the really cool thing is that you don't need sudo to run Builder. When you're building from a Docker file, especially, you don't need to run it as sudo. But when you're setting up a mount point like this, you do because mount requires a real. So once I have set up my mount point to my working container, I can now use my host system package manager, DNF, to install whatever package I need and nothing else into the mount point. And here I pre-downloaded the RPM for busybox because I didn't want to chance it that much. And DNF installed it and gave it the little known flag to DNF, which is dash, dash install root. So once we're done with that, we can simply unmount that container and commit it to an image. Now I have a container image that I can run with any run command. So we'll use podman run. And what's interesting is what's not in the image. There's no ping. You can see that aired out. Usually any from command in a Docker file, you'll end up with ping in your container. Same with Python. There's no Python. So the only thing in here is busybox. There's the busybox help menu. The more you put in an image, the more that can go wrong. So shrink your attack surface. OK. And the next thing that comes to mind when thinking of building images in a secure fashion is how do I do the build process securely? For this, you can run Builda inside a container. And by doing so, you're adding an extra layer of isolation between your build process and your host system. So inside your container, you can give your build process all the elevator privileges you want to. And if the process happens to break out of this container and tries to attack the host, it wouldn't be able to because it wouldn't have the same elevator privileges on the host. So this is a Docker file I used to create an image that already has Builda installed inside it. This is another Docker file that I want to build inside my container. A pretty simple Docker file, not doing much. And using Podman, I'm going to run that image and try to build that Docker file inside it. It looks a bit long because I'm just walley mounting in path so I can access the image that's built and then transfer my Docker file from my host to the container. So this should just take about a few seconds to do so. Yeah. And in OpenShift 4, we actually use Builda now to build all of our images in OpenShift. And so that has resulted in our image builds truly being containerized. There's no demon with Builda. So there's no leaking of information from inside the container to the Docker, to the demon running on the host. And it should finish up. Come on, internet. Ooh, funny story. Builda was named Builda because their team lead, Irvashi's team lead, Dan Walsh, has a really prominent Boston accent. So when everyone was like, what should we call this new tool? He was like, oh, Builda. And that's how it got named. I think the story was he was like, you might as well call it Builda, and I don't care. And he pronounced it as Builda. So we named it Builda. OK, so now I'm going to do Builda images in there to try to see the image. And as you can see, the image that's on the bottom there called My Image, I was able to build that. And now from here, I can move this to registry, run it, do whatever I want to do with it. And that's what we have for Builda. Yeah, so what's next? We've built our container image. What do we do now? Yeah, so once I build my container image, I kind of like to test them locally just to ensure I have everything I want in there. And for that, we have the tool called Podman. So with Podman, you can do everything. It's like an all-in-one CLI tool. And you can do everything from building container images to running containers and even pods. Podman stands for Podmanager. So and to tie back to your next philosophy, Podman actually uses Builda under the hood to do the build processes. Now, one great feature about Podman and the same story with all our tools here is you can run them without root privileges. This way, admins can get away without giving access to their developers, I mean, root access to the developers. And there's less chances of you breaking the system. So as you can see here now, the Podman command doesn't have a pseudo in front of it. And I'm just going to list the images I have pulled as without root to show you that it's actually rootless. I'm going to run it with pseudo. And you can see it's a different list. So when you're running Podman without root, the storage is being created under your user. So it's tied to your specific user. This is great because you can have multiple users use the same computer or machine. And they wouldn't even know of the existence of the other users, containers or images. So that's this isolation there. And I'm just going to run this to show you what the UI is inside the host. So even though I'm invoking Podman without root in the container, I am root. While on the host, I am 1,000, which is the user I locked in as. So there you have, rootless Podman. So one thing we've noticed is that when we mentioned that you can run Podman without root, people think that, oh, this is great. I can run it without root. And inside the container, I can do everything I can probably do if I have system root. But if on the host, you can't access certain files or processes because your user doesn't have access to, you wouldn't be able to do the same in the container if you mount that in. So the rules still apply. So just a way to explain this a bit further, so every modern system, VEL7 and above, have this file called Etsy sub-UID. This shows a mapping of UIDs that your user has assigned. So what this says is that the user, SOMally, can access any files or processes or anything that are in the range of 100,000 to 165,536. And now let's look at what we have in this directory right now. So in this directory, we have a bunch of files that are owned by my user and one file that's owned by root. Now let's go into username space. And let's do the same thing here. So build a unshare. Let me just point out it drops you into a username space. So you can play around and see what's going on with Builda when you're in a username space. And don't forget, though, when you're running Builda Unshare that you are running it, because I do that all of the time and things start to look wonky. And I'm like, why can't I run sudo here? And then I remember, oh, I'm in a username space. So to get out, you hit exit. So now, as you can see, all the files that were owned by SOMally are now owned by root in this username space. And the one that was owned by root is owned by nobody. Well, yeah, that's because on the host, my user doesn't have access to access those root files. And as you saw from the mapping, I can only access anything that was from 100,000 to 165,000, blah, blah, blah. So therefore, because I couldn't access it on the host as a user, I wouldn't be able to do it in the username space as well. So you're not truly root, even though you are root, if that made sense. And a way to see what your UID mapping looks like. There are certain root things you can do in a username space, like setUID and setGID. But there are a whole lot of things, common sense, you can think of that you can't do just because you're in a username space. You can't change the system time or things like that. So there's this file called, and prox self, is file called UID map. And this shows what your UID mapping looks like. So as you can see here, the user 1,000 is mapped to zero in this username space. That's why all the files that were owned by SM Alley on the host is now owned by root in this username space. And then everything from 10,000 and onwards is mapped to one. So this is the way the mapping works and how you end up basically getting the permissions in the container. So yeah, that's it for how Portman Roofless actually works. All right, so this username space stuff is pretty cool. It can add a layer of isolation between your host and your containers. With Portman, you can easily do this by setting the UID map flag and defining a flag you want to map. Sorry, defining the range of UIDs you want to map. So I'm going to run this pretty quickly. And as you can see, as I showed you before, in the container it's root, but on the host it's 100,000 because that's the mapping I did. I mapped 100,000 of the host to zero in the container. The Portman top command is a pretty, it's a cool command that lets you look at various features of your container, like security, user, PID, host, PID, and all. And the latest flag is also a very user-friendly feature because it picks up the most recent container you've created or ran so you don't have to go back and look at what your container ID was, et cetera. So now when we look at PS on the host, you can see that that container is running as 100,000 on the host, even though inside the container it's root. So now with Portman, you can actually define different UID ranges for different containers. This way, you're adding another layer of isolation between your containers, not only between your container and your host. Same thing as before, root in the container, 200,000 on the host. And now think about it. So let's say a process from container A breaks out and tries to attack the process from container B. It won't be able to because container A is running as 100,000 while B is 200,000, so different permissions and a more secure container workload. And what's next? Oh, Podman, as we mentioned, there's no demon with Podman. It runs with a true fork exec model. So that means that the login UID is inherited from the parent process to the child process. I can show you this in a few different ways. So on every system, there's a file, proc self login UID, that will tell you who is currently logged into the system. On this machine, it's me, it's 1,000. Whether or not I gain root access, that login UID stays the same. And as I mentioned, it's inherited for every child process. So if I run Podman process and inside the container, cat proc self login UID, you can see it's 1,000 as you'd expect. Docker is a client server model. And so if I run a Docker command, you can see that the login UID inside the container is this unsigned 32-bit int. It means it's a user who is never logged into the system. The system doesn't know who that is. Now let me show you another way. If I set up an audit rule to watch some sort of sensitive data like the Etsy shadow file, if I run Podman, I'm a user given privilege to run pseudo privilege containers. I mount the root file system, and I want to make a change Etsy shadow just to be mischievous. You can see in the audit logs clearly that SOMally has been messing around with Etsy shadow. And a system admin will come to my cubicle and ask me some questions. But if you run with the Docker command, you're the same users. You give access to the Docker daemon. You can see that that audit log shows that the unset user has been changing Etsy shadow. So that's another way to show that the fork exec model has benefits when auditing who's running what on a system. And as Irvashi mentioned, these Podman top commands, they give you some useful information about what's currently configured on your running containers. So if we just start a container in the background, and again, that latest flag is super convenient, you don't have to remember the container ID you cut and paste it. So you can see the PID inside the container versus the host PID. You can see what SC Linux labels are currently set up. Make note of that SC Linux label container underscore T because we are going to talk about that at the end of our demo. It can show that SecComp syscall filtering is currently turned on. And it's a nice, pretty list of all of your Linux capabilities that are currently enabled in that running container. It's a nice feature. Just remember how long this list of capabilities is. We'll get back to it when we want to cry out. Yeah, we'll talk about Linux capabilities a little later too, yeah. So oh, OK, cool. We've talked about Builda. We've talked about Podman. I guess Scopio is next. We want to manage our images once we have them, how we want them. We want to push them to registries, public private registries. We want to inspect images off of remote registries. The tool to do that is Scopio. Again, Scopio does not require root. Here's the help menu for Scopio. It started out as a command to just pull down, not pull down, but inspect the JSON file off of a remote registry without having to pull it down to your system. It's crazy, but before this tool, Scopio, you had to actually pull the image down to your system to see the information. So you can get information about the tags, who owns it, when it was. So you can, in the spirit of don't run random crap on your system, you can know exactly what you're going to download before you download it. And yeah, yeah. And just so you know, Scopio stands for remote viewing in Greek. All right, so another feature of Scopio is you can move your images from one environment to another. And a pretty cool thing about this is that let's say you have something in your private registry, and you want to move to your public registry, but you don't actually have it locally on your machine. With Scopio, you don't have to download the image locally. You can literally say move from registry A to registry B, and it will copy it over. You can do something similar locally also. So let's say you all are interested in trying on Podman or Belldale. And you have been using Docker, for example, right now. And let's say you have an image in the Docker and your local storage that you built or downloaded using Docker. And so with Scopio, you can easily move that from the storage used by Docker to the storage used by R Tools by the simple Scopio copy command. Here I'm going to move the Ubuntu image to my Podman storage. I'm going to call it Ubuntu with the demo tag. And now when we list the images, as you can see right here, it was copied over. Pretty simple. All right, so now we have built a container image. We have tested them locally. We're happy with it. We have pushed it up to our public registry. And now the next thing to do is actually run it in a production cluster. And that's where Cryo comes in. Cryo is a lightweight container engine that's used to run your container deployments in a Kubernetes cluster with an OpenShift cluster. Cryo is OCA-compatible. That means it can. It supports all OCA-compatible images in all OCA runtimes, such as Run, C, Cata, GVisor. And Cryo actually has a demon. It's a lightweight demon just because it needs to be able to talk to the Kubernetes, the CRI API that Kubernetes provides. So when we run containers in production, we firmly believe that you should run them in read-only mode. What this means is that all the processes running inside your container should not be able to write to any path in the container. It should only be able to write to volumes you've been mounted in or their 3 temp FS paths we have made writable. This way, if your container does get hacked into, the first thing that a bad actor would probably want to do is to place a back door in your container. So the next time it restarts, they'll have access to it. But if you're in read-only mode, they won't be able to do that. So you can easily set read-only mode as a system-wide default setting. We have a cryo.com file. And in there, there's an option called read-only. And I set it to true right now. I'm going to restart the cryo demon. And since cryo was created for Kubernetes and OpenShift, the way to run pods and containers locally without a cluster, it's a bit complicated than what Podman and Build does. You have to have a whole JSON config and everything. And for that, we have a different tool called cryctl. But yeah, it's just a way of running it locally. It's not really meant to run locally. It's really meant to run in production in Kubernetes. So I'm going to create my pod, create my container here, and start the container. Now, let's say I want to install Build and this container just because I want to use it to build images. Guess what? That fails because I'm running in read-only mode. When you try to install a package, it's expected to write to paths like slash var, slash var log. And because read-only mode restricts it, it would fail to do so. And some more secure workload. An added advantage is that, let's see when you're continuing, you're actually storing some data there. When your container disappears, your data is going to be gone forever also. So this stops you from doing that as you will probably be writing to volume bind mounted and which will stay on the host even after your container is gone. Yeah, Cryo is also very easy to modify what Linux capabilities are enabled system-wide throughout your cluster in every container, every pod. So Linux capabilities are just parcels of pseudo power. So pseudo privileges are divided by different functions. The idea is if you disable all of your Linux capabilities and you run a pseudo, you have no increase in privilege. So here, we'll just remove DAC override. And oh, in Cryo, you can see the list of default capabilities that are enabled is much shorter than if you were to look in, say, Podman's or Docker's default capabilities. And that's because we believe that you should run with as few capabilities as you absolutely need in production. So if we remove DAC override and then restart Cryo and, again, start a pod, you can see that it's not as pretty as Podman top, but that's how you list the capabilities. And you can see DAC override is missing. The cool thing is that it gets carried through every container in the pod as well. And you can see that none of them have DAC override now. So again, run with as few capabilities as you need. Be conscious of what your production containers are doing so that you know exactly what shouldn't be in there. And that is the end of demos. Awesome. I think we gave a lot of information. We should probably recap. Yeah. So to recap. So we talked about Builda for building securely. You want to shrink the attack surface with minimal images. Run your builds isolated in container, like in OpenShift 4. That's what we're doing. Run without root whenever possible. And for Podman, we spoke about running without root again. We also spoke about how you can have isolation with username spaces, not only between your container and host, but also between multiple containers. We spoke about how the fork exec model lets us keep track of who's doing what on your system so you can easily find out who's trying to be shady. And as Dan Walsh would say, hashtag no big fat demons. These tools don't have demons. And Scopio, Remote Inspect, it enables you to look at images off of remote registries, move between registries without ever having your images on your system. And it doesn't require root. Yeah. And don't download random crap off the internet. And for Cryo, we mentioned to run your containers in read only mode when running in production as much as possible. We also showed you how we have fewer capabilities enabled. And you can even reduce the list as much as possible, depending on what all you need your containers to have. Cryo has the same username space support as we saw on Podman. It's just a work in progress on Kubernetes right now. So waiting on Kubernetes to catch up, and we'll take full advantage of it. And my favorite FIPS mode support, if you have to run your system in FIPS mode, you work for the government or something, I don't know. Cryo is your only option for a container runtime engine. Cryo is the only engine that can recognize if you're running in FIPS mode. And if you're running a FIPS compliant image, it can enforce FIPS mode in your containers. So if you try to use a weak crypto algorithm, it will error out like it would on your host. So please use all of these. We showed you some cool stuff. We've been working on it for a few years. Really proud of our work. Proud of our favorite person, Dan Walsh. Yeah, so use all these security features and make him happy. And we have one more thing to tell you. A few months ago, there was a CVE that was announced. Out of nowhere, it was like 90% of all containers running in production are affected by this exploit where you can take over the Run C binary, rewrite it, and have full access to the host. And we were like, oh my gosh, what are we going to do? People are walking around red hat with their head hung low. Might as well just pack up and quit because we're done. It wasn't really that bad, though. Yeah, so if we were following some of the security stuff we spoke about today, you were less likely to be affected. Well, one of the most you don't run random images of the internet. Second was if you're already running without root, it would have been more difficult for that exploit to actually exploit your Run C binary. But the main thing that was stopping this was having AC Linux enabled. So how AC Linux does stuff is it has labels for each file or process. And based on what label you have, it gives you access or not. And the Run C binary has the container runtime exec label. And while all container processes usually have container T and can only access files that are labeled container file T. So now, as you can see, container file T is not the same as container runtime exec T. So if you had AC Linux enabled, AC Linux would have been like, no, no, no. You can't access this. It's not allowed. So that was completely blocking that exploit from happening. And usually a lot of exploits that happen, a file system exploits. And AC Linux is pretty good at blocking that. And of course, in OpenShift 4, we run with AC Linux enabled. Yeah. And one more thing, in OpenShift 4, we have prios, the only container engine being used there. Builda is being used for your S2I images. And on all the nodes, you'll have Bodmin and Scopea as well. So all these tools are being used in production from now on. So thanks a lot. We'll be around. We're giving a few demos at the booth. And please stop by and ask us questions and talk to us about whatever. And we have a cool coloring book that you can download of that link. It just talks about all our products and how they work together. Yeah. Thank you. Thanks, Diane.