 I'm a software engineer at Red Hat. I work on the OpenShift team. I absolutely love my job. I wasn't planning to say all this, sorry. But I started out as an intern working with the containers team, and that's how I got really interested in Linux containers. I now work up in, like I'm working for the OpenShift installer now, but I worked with Dan Walsh as an intern, and I won the lottery for internships ever, because he's awesome. Yeah, I'm Urvashi. I'm a software engineer at Red Hat also on the OpenShift one-time team. I also started as an intern under Dan and still working with him, and he really is a great guy. So today, we're going to talk about container security and how you can use them all. There's so many options. There are so many options out there, and so much innovation going on surrounding this container space. All right, before we start, how many of you were at Dan's talk in the morning? OK, quite a few. Cool. Some of you, but so some of you or most of you or many of you know what a Linux container is, and you've gotten to maybe a few talks today where it's been discussed. But Linux containers, when we talk about them, they're a normal process. He's running on a Linux host, and they have three things going for them. They're constrained, they're isolated, and they have some extra Linux security features added. So how are they constrained? Well, Linux has C groups as a mechanism or control groups that can limit the amount of resources like CPU or memory that a process on your container can use. And the isolation comes in with Linux namespaces. There are six Linux namespaces, but one example of a Linux namespace, it's what gives the virtualized feel of a containerized process. So if I'm in a PID namespace, I'm a process running in a PID namespace, I think I'm the only process running on that host. And I can't see any processes running outside the host. Same thing with mountain namespace. You can mount a whole root FS inside a mountain namespace. And in that way, you can have the whole Ubuntu user space inside your container and feel like you're running an Ubuntu system, but really, you're just a process running on a Fedora host. So that's the idea of namespaces. And then Linux also has Secom for syscall filtering. And I have Linux capabilities and SE Linux to add to the isolation of a Linux container. So Linux containers have become super popular over the past just few years. And there's been a ton of innovation and development surrounding this. And really, in two different areas, there's the container image format and there's the container runtime. Those are like the two pieces that you need. So the industry got together and formed some open industry standards surrounding those two areas, the container image format, the container runtime. So now we have OCI images, open container initiative. And that has enabled all sorts of development. Now we have a non-traditional Linux container called host-separated containers where they don't share the host kernel. They actually wrap each containerized process in its own virtual machine. So that's like Cata containers or GVisor or NABLA. And also, now we're free to develop all sorts of tools surrounding Linux containers. Tools that we know will work. If we follow the OCI specs, we know they'll work with all container runtimes. And we can run any OCI image with any OCI container runtimes. So standards are very important in moving things forward with containers. OK. So now that we know what containers are, we can actually bring the container space into four different sets of actions. These are one, building your container images. Two, running and developing these containers locally. Three, sharing your container images to like remote registries, moving them around from one environment to another. And four, finally running them in a production cluster. So what would happen if all of these functions were actually in a monolithic tool? Of course, we would end up having the least common denominator for permissions, which will affect the security of the system overall. So for example, you don't need all the privileges to run a container as you need to build container images. So why have them all together? Hence, we decided to break these actions into four different tools following the UNIX philosophy, which is design programs would do a single thing. So they do it well and perform well together. And obviously, all these UNIX founders are very happy that we followed that. So these four tools I'm talking about are Builder. The name says it all, Building Container Images. Podman for running and developing containers locally. Scopeio for moving around container images and sharing them on container registries, as well as Cryo for running your containers in production in Kubernetes or OpenShift. So yeah, let's go through these and talk about how we can add security all the way through when working with containers. So the first one we're gonna talk about is Builder. Builder is a tool that Red Hat's been working on the past few years to build containers. What do I think about when I wanna build securely and build secure containers? One thing that comes to mind is minimal images. You wanna create images that have as little in them as possible to minimize the attack surface. The more you have in an image, the more that can go wrong. Yeah, and another security feature that Builder offers is that you can actually run Builder in a container. This way, you're adding an extra layer of isolation between your host and your build process. So the new container images that you're building, you can give them elevated privileges that if they end up breaking out of the container, they would not affect your host in a bad way. They would not be able to affect your host. Wouldn't it be cool if we could show this live? Yeah, we have live demos for you. Oh, good, all right, so. And we've already sacrificed to the demo gods so they all go well. Okay. Yeah, so here we are, live-ish, we're pretty live. So I'm gonna show how easy it is with Builder to create a minimal image. When I run Builder from scratch, the command Builder from scratch starts a container with absolutely nothing inside. It's completely empty. You're literally from scratch. And this will spit out a working container. So of course with Builder, you can use Docker files, but this is, I'm showing without a Docker file that you can just start a working container, plot stuff in it, and then commit it. That's what we're showing. So say I want to install a package. I have one thing I need in my container. Well, my friend, Nolan, told me about a, oh, first, so if I wanna do that, I wanna create a mount path so that I can install something from my host into the container without having to have DNF in my container. Again, there's nothing in there. So I'm gonna use my host DNF to install this small package that my friend, Nolan, told me has no dependencies because we're live demo and I didn't wanna take forever to install this, and I should have hit that button while I was saying that. But anyways, we're gonna get all the metadata from the Fedora repo, and voila. I can tell you while we're waiting, actually it's pretty fast. I don't think I'm gonna have time to tell you. Damn, I'll tell you later. Last week, I created a minimal image in work. I'm working with OpenShift installer and they have a bunch of Terraform files. So with every pull request, we wanna make sure we run Terraform format. So that requires this Terraform binary that not everybody has on their system and certainly our CI didn't have it. So instead of installing it, you know, CI wide, oh, and we're running in proud so everything runs in a container. So I created a minimal container with Terraform. And I, to run it in proud, I'm just gonna finish my story. To run it in proud, you have to volume out the source code that you want to check. It's like Golint or GoVet, you just, and so you volume out the source code and then you also have to volume out and read, write the temp directory because Terraform has to write to the temp directory to do its thing. And then we run that in proud and it's a minimal image and it's not gonna blow up the CI infrastructure if something goes wrong with it. There is nothing that's gonna go wrong because there's only that one thing in there. All right, so now that I have installed my small, oh, what does it say? I think you can order that. Oh, okay. So now I can commit my image. I, you noticed I unmounted that directory and I can commit it and I'm gonna call it, what did I call it? Call that minimal image, okay. There. Now that I have my image committed and I can just run any container run tools such as Podman and you can see what's interesting is what's not in the image. If I try to ping, that's not there. Most packages, a lot of packages will pull on, pull in Python. I didn't need it and I don't want it so that's not there. All this image can do is run busy gox and that's the busy gox help menu. Pretty awesome. So again, I use the host DNF to install packages or you know, with Terraform, I downloaded the zip file and unzipped it in that mount directory and then there it was in my container. Okay, so now I'm gonna demo and running build it inside a container. So I have this Docker file. I already built an image that has build and installed in it from Fedora before starting this demo so you don't have to set to that. The entry point is set to build us about, I can use Podman to run this image and I can do stuff like build a bud, my image and I'm telling it where my Docker file is which is in the volume I'm mounting in my wall. So once I do this, it's a small simple Docker file that I will, from Alpine, just set environment label and then commit it and I can actually go in and look at what this, if this image was built by doing build images and as you can see the last image, the most recent one, that's the image that was just built inside the container. Yeah, so this is all inside the container. Yeah, this is all inside the container. And so now you have that image, if you could push it up to a registry, you could push it over to your host. Yeah, clear around with it. Yep. Is that it for build? Yeah, that's it for build up. All right, let's go back to the slides. Yeah. So now the next thing we want to do is to be able to run and develop containers locally. For that we have this tool called Podman where you can manage, develop, test your containers locally. It's an all in one tool, more of like an entry level tool. It has, we basically have covered everything that the Docker CLI has to offer and much more. We actually have Podman pods command as well where you can create pods. So one of the cool features that comes with Podman in terms of security is that you don't need to have root privileges to run Podman. That's great. So admins can actually get away with not giving the developers any root privileges. And added consequence that is really good about this is that it offers you compartmentalization, such that in the way that multiple users can work simultaneously on the same host machine and not be able to access each other's work. So for example, Sally and I can be working on a host machine, but I won't be able to see or even know that she has any containers or images on the host. Everything is in their own compartments. Adding to this compartmentalization, we actually, Podman actually has this feature called username spaces that adds to more isolation. So what username spaces mean is that you can map a certain range of user IDs in the container to a different range of user IDs on the host. So I can map, for example, UID zero in the container to UID 100,000 on the host. So my processes will have the root privilege on the container, but on the host, they'll be running as 100,000. If it breaks up, it can't cause any diameters. I won't have those privileges to do so. That's pretty cool. So adding to that also is that we can run each container in its own separate username space. What is this? This means that if the process breaks out of the first container, it will still not be able to access the second container as it won't have the same privileges. I'll delve into this a bit further when we go back to the demos. It was easier to see. Yeah, and also with Podman, if you went to Dan's talk, there's no daemon, there's no daemon there, no big vet daemon. So Podman runs in a true fork exec model rather than that client server model that we're used to. So what that means is the child process is started by Podman, inherit the parent login UID, and you can easily trace through Podman who on the host system has been running things. And I can show that in the demo too. I think we can go to the demo now. Yeah. Demo time. So you do the root list first. Yeah. So as I mentioned, as you can see here, I don't have student from Podman, so I'm going to be pulling an image by running Podman in rootless mode. When I list the images, you can see Alpine. And just for comparison sake, I'm going to list the images using root privileges. So you can see that my root images I have way more than I have Alpine to just emphasize the compartmentalization I was talking about earlier. Just a quick to show that actually works by running the container. I can run Alpine and list what is there in my home directory. Yeah. So now back to the username space stuff. So using Podman run, you can use a UID map and that is basically telling Podman that map, UID is here on the container to UID 100,000 on the host and do it for the next 5,000 UIDs there. So that's the range. I'm going to run this detached on the background and we can use the Podman top command to look at what the user ID is in the container and in the host. So as you can see, it's root and 100,000. The latest flag is just a really cool feature we have in Podman that tells it that just use the container, the most recent container you've created. So you don't have to go back and get the ID of the container and all. When I do a PS of the same thing and graph for sleep on the host, you can see that it is running with the UID, user ID 100,000. To show you how what I meant by each container having its own username space, I'm going to create another one but map it to 200,000 instead. Same thing as before. And as you can see, the process is here. One of them has 100,000, the other has 200,000. So if any process from the container that has the UID 100,000 breaks out and tries to talk to the one with 200,000, it won't be able to, if you're completely different username spaces. Well, so the fork exec model I wanted to show is pretty easy to show. So on my host system, to see who I am, I can cat, proc, self, login UID and you'll see that it's 1000. That's the user currently logged in on the host. So now I'm going to run a container, just a Fedora container. I'm going to cat, proc, self, login UID from inside the container. And as you'd expect, since it's a fork exec model, the person logged in there is me, login 1000. Now, the interesting thing is in, with another container runtime, if I do that same exact command, what does that mean? Well, that is the number that equates to an unsigned 32 bit and, I think I said that right, yeah. And that means that that's, I have no idea who that is, that user has never logged into the system. So I hope you see the problem here. And that is if I try to do something tricky, like touch the Etsy shadow file. Now, back on the host, Sis Admin can use the audit search tool to see exactly who that was. User 1000 who is cloud user did that. So the interesting thing is I can run a Docker command and I can volume up the root directory and touch Etsy shadow. And now I see someone did it again and who was it? Well, that user is unset, which means I have no idea who that was. And so you can see the problem here and the benefit of that fork is like model in being able to audit who's doing what on your system. Now, we want to show a couple of the neat features of the podman top command. Podman top just prints out in a nice pretty way. You can use it to see what security things are enabled with your container. So here I'm just gonna run a Fedora container and if I pass label to podman top, you can see the SeLinux label that's currently there. I can make sure that my container is running with Stockholm filtering turned on. Also, you can check and see what capabilities are currently effective inside the container. I'll talk a little bit more about the Linux capabilities when we talk about cryo next. But podman top is a really useful command. Back to slides. So Scopio is the, so we know what a container is. We've created the image, we've played around with it on our local system with podman and now we're ready to, what do we use to manage the image? What do you, when you add security to a system, the first thing you wanna do is not have to run root. And there's no reason to run root when you're managing your images. Scopio is our tool to, originally it was designed so that we could inspect a remote image from a remote registry. Before Scopio, in order to check out an image, you had to download that image to your system and then run inspect on it. Now you can run Scopio inspect from a remote registry and get some useful information. The JSON file that describes the image and the layers and who owns it. But the important thing is that it can, you don't need root to run Scopio. There's no daemon, there's no reason for root. And also since the original use case was so great, we've added some other things like you can copy an image from one registry to another and never have to even have that image on your host system. You can delete images right, yeah. Yeah. I don't know if you mentioned this before, for Builda also we can run Builda without root. So all our three tools here, Scopio, Podman and Builda, you do not need to have root privileges to run them. You have the option to do that, but you can also do it without root privileges. So here's just an example of some information you can pull down. We pulled this down from the Docker Hub. You can see all the tags available. You can see the images in there. That's just our Docker Hub Fedora image. And in the spirit of like don't download and run random crap off of the internet, that's what Scopio solves there. Okay, so. We're moving right along. Yeah. So now that we have a tool to like, we can use it to build our container images. We have already tested and run them locally using Podman. We have put them on registries, moved them around using Scopio. The final thing we want to do is to be able to run these containers in production, right? We want to be able to run them in Kubernetes, for example. So that's what Cryo does. Cryo is a container runtime interface that helps. So, cryo is a container runtime interface that you can use to interact with the Kubernetes API to launch containers in production. We firmly believe that when you're running containers in production, you should run them in read-only mode. What does this mean? It just means that processes running inside your container should not be able to write to any part of the container that came from the image, making like almost every path in your container immutable. Now you're wondering, what if I need to run processes that write out the right stuff to a path in the container? What if I need to save vital information? Guess what? You're actually glad that we have the read-only mode because if you were writing in the container, you could have lost the, you would lose the information when the container is destroyed. The way to go around to doing this is to mount volumes into your container, write to those paths, and then those paths will have, and the contents will still persist even after your containers are destroyed as those will be linked to a path on your host. So that's what the read-only mode does in Cryo. And then we'll show you that Cryo, it's really convenient to set what Linux capabilities are enabled, system-wide for all of your containers. And so Linux capabilities are a way, they divvy up the super privilege that you can have on a Linux system. So there's like the Cheong capability, the NetRoc capability, there's a list of about 40 of them, but with Cryo, by default, we only enable a very small subset. I'll show you that list in a minute. And the idea is run with as few capabilities enabled as you can, and only run with the ones you need. This will just, again, minimize the attack surface, minimize the chance that something can break free and wreak havoc on your host. Cryo also has the same username, space support as Bodman does. The only thing is that it's still working for August and Kubernetes, so we're waiting for Kubernetes together so we can actually take advantage of this feature in Cryo. Oh, and if anybody here works for the federal government, you might be interested in knowing about Cryo as your only option for running things in FIPS compliant. So FIPS, it's a list of encryption algorithms that are permitted to be used and the federal government pretty much makes their employees run their systems in FIPS mode. So Cryo is the only container on time that knows what that is and can carry that information into the containers and enforce it. Back to demos. So the first demo is read-only mode. I'm just, we have a config file for Cryo. I'm just gonna show you that I've said the read-only flag to true. It's telling Cryo to run all containers in read-only mode. Restarting the Cryo daemon, and oh, so Cryo CTL is actually a CLI tool that you can use to debug and run containers on Cryo since Cryo was actually made to be run with Kubernetes. So this is just a way to locally do it and we have, we use JSON files for that. So I'm using the run P to create a pod. Using that pod, I'm creating a container and starting the container. Now, I'm going to exactly the container and try to DNF install Builder, for example. As you can see, that field, that's saying it's a read-only file system because when you DNF, it expects to write logs to our log which is a restricted file path. The great thing about this is that for example, if a container gets hacked into by mistake, the first, not by mistake, but gets hacked into, your hacker would want to put a backdoor in place, right? So the next time you start up your container, they would have easy access to it. This stops it. So, run your containers and read-only mode in production. Also run with as few capabilities as possible. I want to show you which capabilities are enabled by default with Cryo and it's just a small subset and it's super easy to just go in and delete a couple of them. And all you have to do then is restart Cryo and I'm going to start a pod. Again, the Cryo starting pod is a little bit cumbersome. And here if we print out the capabilities, it's not as pretty but you can see which capabilities are enabled there. And the, if you're in a cluster, this information also carries through to the pod, not just the container. So this is information about a pod in your cluster and you can see it has those, it has the shown gone and whichever went out. Also I deleted, I can't remember. Dack overwrite, I think. Dack overwrite, yeah. Yeah. And is that, I think that's it. Yeah. Oh my God, have the demos. That wasn't so scary. Okay, so now that we've told you about all these security features that come with our tools, please, please try to use them when you're actually using them so that a grid like Dan Walsh doesn't get upset and he's too good for that. And remember the UNIX founders. Yeah. So these are the resources for you if you want the GitHub links to our tools and the demo script is right up there if you wanna get it, play around with it. And we actually have this coloring book so if you didn't go to Dan's talk or Scott's talk in the morning and then get this, we have them here for you to take out. It's just a book that highlights the tools we have on a high level and you can learn as you color. Thank you. Awesome. Questions, I'll combine. Give you the mic. Can we explain everything? Perfectly. No questions. Just a few closing things. So don't forget tomorrow at 9.30 to attend the keynote speech by Chris Wright, our CDO, Red Hat CDO. And tonight, I think they're still available. So if you don't have it, you can go walk up to the registration they should be able to do. Thank you so much.