 So we're here to talk about some more container security. My name is Sally. I'm a software engineer at Red Hat. I work on OpenShift. I've been on a few different teams at OpenShift. The OpenShift online, our hosted platform, the installer team, I'm now learning all things off. I like being on different teams and learning a little bit about everything. When I started at Red Hat, though, I was an intern on the containers team. And that's where I became interested in containers. A lot has changed since then. We were submitting pull requests to upstream Docker. And we were carrying a bunch of passes for things that Docker didn't accept. But over the past few years, Red Hat's been working on some new tools. And that's what we're here to talk to you about today. And I'm Matt Heon. I've been working at Red Hat for about five years now and spent all that time on Dan's containers team. Initially, I worked a lot on Docker. I got some work on Seccomp merged into RunC. I did, actually. All right, aside from the technical difficulties, right now I've been working on Podman for about two years. And I'm very excited to be here. As am I. I love Bernaud, by the way. Never been here before. So let's just start off with a brief overview for some of you maybe new to containers. This will be a review for the rest. But nothing magic with containers. They're normal processes running in Linux. They're constrained. They're isolated. And they're secure. Constrained, that's done with C groups. C group hierarchy is what limits the amount of system resources like CPU and memory that a process or a group of processes can consume on a system. And isolation is done through Linux namespaces. There are six namespaces in Linux. IPC, UTS, PID, mount, network, and username space. But namespaces give you an isolated view of the system. The idea is if you're in, say, a PID namespace, you're in a container, in a PID namespace, and you have a shell, and you run PS, in that container, you'll only see the processes running in that PID namespace in that container. You won't have access to or be able to see any processes running outside the PID namespace. Similar like mount namespace, if you're in an Ubuntu container, you're running in a mount namespace, and you have the root files. You have the root FS of Ubuntu. So you have the whole user space of Ubuntu in that container, and you feel like you're on an Ubuntu system, but you could be on a Fedora or a REL host. And then, other than that, there's extra security with Linux, like SELinux, app armor, secomp filtering, and Linux capabilities. We'll talk about these as we go. Why are we, why are containers so popular? Well, over the past few years, it's caught on that containers are a great way of deploying things. They're lightweight, they're portable, they're scalable. By portable, I mean everything you need to run your program is packaged in that container image. So things run exactly the same on your local system as they do on your test environment, and then as they do in production. This is great for deploying things and iterating upon things. And Docker, Docker came along and made containers really accessible for the average user. Docker provided a really friendly CLI for working with containers. The Docker file for building container images easily, and also container registries. The Docker Hub gave people a way to share and store their images. However, the Linux foundation stepped in and along with some companies like Red Hat and CoreOS, Google, set up some open industry standards surrounding the two main aspects of containers. Container image format and container runtime. This propelled things forward with containers. This allowed us to build new tools and look at new ways of using containers. We weren't locked into any one thing. And so the OCI was formed, the open container initiative. And now we have OCI image format and OCI runtime. And any OCI image will run nicely on any OCI runtime. Kubernetes, the most widely used orchestration platform leaves the choice of container runtime up to you. As long as it's an OCI runtime, it will plug in nicely to Kubernetes. So Red Hat's been working for a few years on some new tools and that's what we're talking about. So as we decided that we needed to build some new container tools, we identified four real big areas that you need to work on when you're making containers. The first of them, you need to build images. You can't start containers without images, so obviously you're gonna need to build those images first. The second thing, you need to iterate and develop those images locally. You wanna run them on your developers' machines and then you wanna rapidly rebuild as you identify changes you need to make. And you wanna keep going through this until you get an image that you're satisfied with, that you're ready to push to production. At that point, you're gonna need to push it up to a registry because it's no use to anyone just sitting on some developers' machines, so. Once we've pushed it up to production, now we have our file step. We need to run it in a production cluster, ideally under Kubernetes. And we decided to build four separate tools for each of these jobs because if you have one big monolith, you usually end up doing a least common denominator approach to security. Container builds, for example, you need a lot of permission for a typical build. You want to install things in the container and you want to do debugging tasks when you run them locally. Those require a lot of permissions that you don't really want to have when you're running a production. By running four separate tools, we can do a more tailored security approach. We can have only the capabilities and permissions that we actually need. And furthermore, this is very close to the Unix philosophy where you build a tool that is good at one thing and very good at that thing and does nothing else. And by using each of these tools and composing them together, you can build powerful applications that suit your needs. And the tools that we developed were for building, Builda, for running and developing images locally, Podman, to store and share images to a registry, Scopio, and to run production Kubernetes clusters, Cryo. So the first step when working with containers is to build a container image. Builda is our tool for that. I wasn't sure if I'd tell this, but I'm going to, because some of you might not know how we got the name Builda. If you've ever heard Dan Walsh talk, he has a very prominent Boston accent. So when he says the word Builder, it comes out as Builda. So that name stuck, somebody was joking, so we should call it Builda and there it is. And that's, I guess, why the Boston Terrier. It's a little nod. So anyways, when you're building container images, what do you think of for security, for securely building them and building secure images? You want to shrink your attack surface. You can do that with minimal images. Only put what you need to run your program in your image and nothing else. Builda makes that very easy and we have a demo in a minute where we'll show you that. Another thing you want to do to secure your images is secure how they're being built by running them in containers, running the actual building containers. So as I just said, builds require a lot of privileges and so you could potentially use those privileges to escalate yourself and escape onto the host system. And if you have a system that's doing builds for multiple tenants on the same system, you could compromise a lot by compromising that box. So if you want to run Builda inside a container, you can limit the potential for a breakout. Even if they get outside of the build environment, they're still stuck inside another container and we can expect that they're reasonably contained by that. And Builda doesn't require root. You can run it as root but it doesn't require. So I've done this before and I always like to give my respect to the demo gods before I go on to the live demo. So there we go. All right, first, well, we haven't talked about anything else. So let's show Builda in action. This command Builda from scratch, it sets up a working container with nothing in it. It just sets up the namespaces of C group. And if you create a mount point, oh, I chickened out, I'll explain this, but if you create a mount point, it creates a here, varlib containers, because we're running PseudoBuilda. If it wasn't PseudoBuilda, it would be not in varlib, but, and then you can copy, move, install, anything on your host at that point and it will be in your container, then you can commit it, that's the idea. So originally we were DNF installing, but with the uncertain Wi-Fi, I changed it, but I wanted to keep this here because a lot of people aren't familiar with this install root flag with DNF, but you can set the install root to mount and install whatever you want, and then DNF clean to get rid of the cross-bin. But instead, earlier I created the Obershift 4.0 installer binary on the side, I just ran hack build, and instead I'm going to just copy that here. And once we have what we need in the container, we can unmount it and commit it. This might take a minute, it doesn't at home. I can quickly show you, I guess I'll let this finish and then I'll show you. I want to show you another way to create a minimal image and that's with a multi-stage Docker file. I have one handy, so I just, I'll point that out too, but what's interesting in this container is what's not in it. So usually if you start with any base image you might have Ping, there's no Ping, there's no Python, you might not even know you have Ping and Python and almost everything that you use to install. But all that's in this image is the Obershift install binary and here's the help menu if anybody's interested in launching a 4.0 cluster. I said I wanted to show you the multi-stage build. So there's two from, the first one has all of the source code. This is the repo for the Obershift installer if you want to check it out. Then we run just the build command. So we have the binary now and then here's a very minimal base image and we just copy that in bin Obershift install in our minimal container image and that's it, commit it and set the enter point to Obershift install. We use this image in our CI. Since the next talk is about CI, I'm just gonna give a little introduction. RCI for Obershift is amazing. It's crazy. There are about 50 GitHub repos that contribute to what is Obershift 4.0 or more. There could be more. And every one of those repos has a gating job and what that gating job does is every PR in one of those 50 repos RCI is run in an Obershift cluster. So every PR in one of those 50 repos from the Obershift cluster, a Obershift cluster is launched in AWS. And then, and that's why this is containerized because we need this containerized to launch the cluster in AWS. And then after that cluster is launched another container in Obershift runs some conformance origin and Kubernetes conformance tests against the cluster. And then if those all pass, that's the only way a PR in any one of those 50 repos is eligible for a merge. You'll hear more about that in the next talk. I just wanted to give a little shout out. So getting back to the minimal image and build up, Matt. All right. So next we're gonna be talking about build inside a container. So we'll start off by showing a Docker file and that's the Docker file we're using for build in a container. And you can see it's really simple. We install build into a Fedora image. All you really need. Now let's show the Docker file we're gonna be running inside the container. It can be basically anything, but here's a real simple one for demonstration. And now we have our podman command that runs build. So we have a couple magic volume invocations here. And what these are basically doing is putting parts of the image storage onto the host. So if you run multiple containers we will be able to share images between them. And then we have one other flag. As we invoke build up we're using the hyponiphon isolation charoot flag. And we are running in a container here. So we don't really need the full isolation of build up. We can use the more relaxed isolation of charoot. And that will run more easily inside the container. So let's let that run. And hopefully this isn't gonna take that long. Let's see. Oh, there we go. And we've stored our image. And there you go. We've run build inside a container. Easy as that. And you can push that image up to a registry or use it, whatever. So I think we're back to the slides, yeah? Yeah. And now we're gonna talk to you about podman. Podman is my baby so I'm a bit excited about this. So podman is basically Docker, but without a demon. It builds containers using build up. You remember that we said we were going to build simple tools. So podman does not re-implement building. We just include build up to your builds. We run containers. We run pods. And generally podman is used for local development work, but you can also use it to run single node production images. And the first big thing that you're gonna want to do with podman relates to user namespaces. Our first two things relate to user namespaces. Anyone here see the user namespace we talked earlier today? Draw fans? Oh, at least a few people. So I won't go that in depth into it, but suffice it to say user namespaces are a way of mapping users inside a container onto the host. And they let us run podman as non-root. You can create a container or a user namespace rather. You can create a user namespace where root inside the user namespace is some non-root UID on the host. And that gives us some additional privileges which let us run podman as non-root. Also, we can do some cool isolation things with those user namespaces as root. So we can create separate user namespaces for different containers. These user namespaces all map to different roots on the host with non-overlapping UIDs, which means that if you were to somehow get out of a container on that host, not only can you not access root because the user namespace doesn't map it to root, you actually can't access any other container on the system because they all have different UIDs than you. If you heard any of the other talks today about podman, you know there's no big fat demon. That's because podman runs with a true fork exact model. So the login UID of the child process is inherited from the parent. And that has some security consequences, benefits, I should say, that we'll show in the demo. All right, so our first demo is podman is rootless and it's really simple. You run podman but no sudo in front of it. So let's pull an image here and let's do what podman images now that it's down and you can see that we have an image in our local repository. Now let's show you sudo podman images and you can see that the list is entirely different. So non-root podman runs its own completely separate storage from root so you're not interacting with containers that are running as root at all. And furthermore, it's actually completely separate from any other users on the system so you can have multiple people on the same system using rootless podman, no interaction between them. And the non-root storage is in the user's home directory. And now let's just run a simple container here. We ran ID twice. The first time it ran inside the container, you can see we are zero zero or at least we appear to be. Meanwhile, running it on the host, you can see that we're UID 1000, GID 1000. And if you were to run a PS on the host, you'd also see the process running as 1000, 1000, not zero zero. And pay no attention to the fact that I've added myself to the Docker group because that's a no-no with security. Now let's do something with build a unshare. If you've never heard of a build unshare as a way of entering that username space that rootless podman makes. It's basically a neat tool for interacting with files who create with rootless podman because they're going to be as, if you do anything that's not as root in a rootless container, it's going to be as UID you don't normally have access to. So build a unshare will let you get access here. So let's show Etsy sub UID. This is the user ID mappings that are going to be used in that rootless container. And then let's do an LS on the host. And you can see here, we've got one file owned by Roots and then we have a bunch owned by Sally here. And let's enter the build a unshare username space and let's do another LS here. And the one file that was owned by Root is now owned by nobody because Root is not mapped into this username space. We can't even see the user. Any attempt to access that file is going to be a permission denied. But all the other files that were owned by Sally, we now see them as owned by Root because from the kernel's perspective, we're root in this username space. And right, that's, yeah, we're root. And if I try to say cat that file, permission denied, but if I go outside of my script, here I'm just on my host and you can see I can check out the file. Is that, or do we wanna show cat? Oh yeah. So let's also show the username space mappings that are running in build unshare. You can see these even from within the username space with cat proc self UID map. And you can see that UID 1000, which is Sally's user is mapped to Root. And then we have another 100,000, or 65536 users mapped in from user 100,000 on the host to user one on. I think that's all we need to show. I'm gonna exit out of this username space. Now we're also gonna talk about username spaces as Root and using them to isolate different containers. So let's start off by running a single container. This one as the user 100,000 mapped in as Root and 5000 UIDs mapped. And let's do a quick podman top. This shows you user and age user, which is user on the host. So you can see they're different here. User in the container shows as Root, but user on the host shows as 100,000. And a PS shows that the 100,000 number is accurate on the host. So let's run another one. Now we've mapped a different user name space in here. They have different Roots. And let's see again. And we have 200,000 instead of 100,000, like we passed it. And PS shows that again, we're UID 200,000. This will take a few extra seconds because we were mucking with the username space. Hopefully it'll clear up. It was either a choice of waiting a few seconds or showing you guys an error that says device busy. So I chose to wait. You might have a bit of a bug there. I didn't want any errors in my script. There you go, it's done. So now we were talking about fork exec in Podman versus the client server model in Docker. And it's easy to show the consequences. So if you want to know who you are, who is currently logged into a system in Linux, you do cat proc self login UID, that's me. And as you'd expect with fork exec, the login UID is inherited by the child process. So when I run a Podman container with a shell and cat proc self login UID from that, it's the same thing, it's me. And whether or not you gain pseudo, your login UID stays the same. And if you do the same with Docker, this will always be the result. It's a user who has never been signed on to the system. It's an anonymous user. It's the Docker demon. Now say I have access to a shared system and I run a privileged container with Podman and I mount the host and I try to make a change to Etsy Shadow, an admin can go in and look through the audit logs and clearly see that Sallium has been doing something shady. And that's great, because with auditing you wanna know who's doing what on your system. However, if you do the same thing and run a privileged Docker container, you mount the host and you do something to Etsy Shadow, same thing and admin will go into the system and will only see that it's the unset user. It can't be traced back to me. So also Podman has some convenient features to show security currently configured in running containers. So I'll start a container in the background. It's called Podman Top and I can get information about the PID and the host PID. I can pass the label to Podman Top to get Etsy Linux label. I can pass seccomp to see that filtering is currently turned on. Capabilities, effective capabilities, you get a list of which ones are currently enabled. I also wanna point out this convenient flag, the latest flag, with Podman you don't have to go and copy and paste the container ID. If for no other reason, you should use Podman for that because it's convenient. Yeah, so there are other Podman Top features. When you go home tonight and install Podman, you can check them out yourself. Oh, we're back to the slides. Well, we have built a secure image. We've run it, it's great, it's working great and now we wanna share it with the world. This was originally, the tool for that is Scopio and it means in Greek remote inspect and that was the original purpose of this tool. It was a patch I think we are carrying to Docker to be able to get some information about images from a remote registry without having to have them on your local system. It's crazy that it didn't exist and it's crazy that they didn't accept it but it became the tool Scopio and it has evolved since then. It doesn't only give you information about images on remote registries, now it can move images between environments, it can between registries and it doesn't require root. So here, this is the help menu for Scopio, I'll just give you an idea of what you can do with it. Like I said, it's not just inspect anymore and also if you're working with a private registry, Scopio can accept credentials to a private registry so you can also work with private registry. So you can get useful information from any random image on the Docker Hub without having to pull any random image off the Docker Hub onto your system. So you can check out the, you might not remember the spelling of the tag that you need, that would be a convenient way to get it. You get the idea. And another convenient feature of Podman and really this isn't just Podman, this is Scopio and Builda and Cryo too. They all share the same storage library and image library, in this case, container storage and containers image. So a neat feature of containers image is we can pull images from places that aren't actually registries. Here we're gonna pull an image from a Docker daemon that's on the same system. So let's show Podman images and here we can see that we don't have any bunch of image. That's gonna change a bit. Now we have Docker images and there isn't a bunch of image right there. So let's use Scopio to copy an image into the image store that we share between Podman and Cryo and Builda. Yeah, so we put this in the script because it's a little bit of a tricky, it'd be nice to go to the script later that will give you a link to copy this command here so that after you install Podman, you might wanna move all of your local Docker images over to container storage image and so you can set up a little script to do that. And let's show Podman images again and voila, we have a new bunch of container. We're getting there, all right. So now let's talk about Cryo. Cryo is our solution for running containers in a Kubernetes production cluster. It is developed exclusively for Kubernetes and only really intended for running Kubernetes Pod and because of that, we can run it with some very tight security features that normally would not fly if we were trying to develop on this. And the first of those big security features is a read-only file system. We can default every container running in Cryo to have a read-only file system, which means you're only able to write to a few tempfs mounted in the image, like slash temp, and any volumes that you've mounted into the container. So because of the volumes, you can still run things like a database, you're still a persistent storage, but if someone breaks into that container, they're gonna have a bit of a harder time doing any damage, for example, modifying any configs, installing backdoors, and any changes that they do make, exciter into volumes will be wiped out after a reboot. It also conveniently, Kubernetes Pods are sort of disposable. If a cloud goes away, everything that's in the containers, aside from the stuff stored in volumes, will just go away. So this will force you to put everything on persistent storage, so you won't have to worry about losing anything. Yeah, and if you wanna run a container not in read-only, you have to make that conscious effort to per container run it that way, if you set Cryo to be read-only. Also Cryo has a convenient way of disabling Linux capabilities system-wide for all of your containers, all of your pods. Linux capabilities are parcels of pseudo-power that can be individually enabled. The idea is you could be UID zero, you could be root, but if you have no Linux capabilities, you have no pseudo-power. We recommend that in production, you run with as few capabilities as you need. You should know exactly what you need enabled and disable everything else. Just shrinking your attack surface again. Also, Cryo has the same username space support that Podman does. Unfortunately, this isn't quite supported in upstream Kubernetes yet, but once it is, we'll be raised for day one. And if you run your system with fifth mode or if you work for the government, you might be interested to know that Cryo is your only option for container runtime. Cryo can recognize if a host system is running in fifth mode and if you're running a fifth compliant container image, it will enforce fifth mode in the container. So if you try to use a weak crypto algorithm, your container will error out the same way as it would on your host. Back to demo. So our first demo is gonna be Cryo in read-only mode. So first we're gonna show you how to set this read-only mode. It's a simple flag in your config file. And you can see here, read-only true. If that's set, every container made has a read-only file system by default. And let's restart Cryo to make sure that takes effect. Now, the difficulty with interacting with Cryo in a demo like this is it's only really meant to run underneath Kubernetes. And it doesn't present a nice neat command line interface for us to work with. So instead we have to work with something called CryCuttle. The Kubernetes interface to container engines is something called CRI, the container runtime interface. And CryCuttle is a tool that will mimic what Kubernetes sends down this interface and allow us to interact with Cryo without having a full cube cluster stood up. So first we're gonna make ourselves a pod here. And then we're gonna make a container in that pod and start that container. And now we're gonna try and install something in that container. And we didn't even get to try installing things. When we tried to open the log file for DNF, it failed. So you can see that that container is definitely read-only. Yeah, Cryo, it's not really meant to be run in local, like just running for fun. That's what you'd use Podman for. So you kinda have, you have to trick it to think it's in a Kubernetes cluster. And so again, an easy way of enabling our disabling Linux capability. So that is, everything is done through cryo.comp. I can just sort of check it out. use.comp, secret manager. That's how we're using system B. You should be by default. Yes, all right. So let's just remove DAC override and that let's root ignore file permissions, I believe. Is that right? And we'll remove challenge too. Cause I'm not really doing anything in this container. So we can just pretend we don't need anything. Okay, let's restart cryo. And now any container or pod that I start will have, will not have those enabled. So here it's not as pretty as Podman top, but you can see the capabilities currently enabled that don't include DAC override or the other two shown. And that carries through to pods. The first was showing in the container and this is a pod. After the demo, we can show you the JSON file we used to start this pod. But that's the idea, run with as few capabilities as you need. So. So the JSON we're gonna show you here. Oh, yeah, yeah, yeah, I forgot. This is, it's very similar to Kubernetes API structs, but it's actually purely internal. It's meant only for being transmitted to run times like cryo. So it's a bit more esoteric than the actual API. And that's the pod config right there. And the other one be a sample container config. And again, we'll give you a link to the demo. So if you wanna check it out, these files are all in the GitHub repo we'll share. And then here's the pod. I mean, the container. So it's like a Kubernetes stuff. So back to the slides. Think so. So we've showed you a lot of what Red Hat's been working on for the past few years with a focus on security always because that's what Red Hat does. And Dan would really like you to use these tools. Here's your knife. Did you know we kept that in? Oh, one more. And before we do questions, I'd like to make a shout out to Ravashi on my team at Red Hat. She was supposed to be doing this talk with Sally, but she had random visa issues, so I'm filling in for her. We miss you Ravashi if you're watching this. Yes, and Matt, thank you for reminding me. I wanna thank Matt for stepping in at the last minute and helping me with this talk. No problem, it's been a joy. And here we were just advertising this cool container commandos coloring book that will give you some accessible information about these tools. And the demo script here is in GitHub at containers demo and under security. I think we're good for questions. I'm gonna have to ask you guys questions if you don't have any. Did anyone see Dan's talk this morning? And nine? Yeah, you notice we overlap a little bit with our examples. Well, I can speak from a Podman perspective. We're targeting compatibility with the Docker 113 API and command line. And that is not exactly the latest, but we feel that it's a stable target that a lot of people are still using despite it being pretty old. New features, we'll take a look if you wanna open a specific feature request for anything you find that's missing or compatibility requests. If you find anything major that's breaking, we would definitely take a look at that. But for now our solid goal at least on the Podman side is 113. Yeah, please submit issues or pull requests or whatever to our open source. Any other questions? I'm trying to think if I have any good stories to tell. Oh yeah, hi. Seccom profiles, how do you... Oh, that's a good question. Yeah, that's a good question. So the question is about Seccom profiles and how much they're actually being used. I can say that all our tools ship a default Seccom profile, which is a fairly extensive white list. It's not really tailored to any one specific application and if you want to use a more specific Seccom profile for any of your applications, we can definitely do that for you. There's ways to do it in Podman and Cryo. But everything is running it by default. It's just not very restrictive because it's not tailored to any one specific application. Any more questions? Question. Which one? So Podman and Cryo, we don't view as really competitors. Podman is for running on your local system. Cryo is for running in a Kubernetes production cluster. When we'd say that Cryo is definitely future proof in the sense that we are keeping it constantly updated against Kubernetes master. So there will never be a version of Kubernetes that Cryo does not have a support for. The moment the Kubernetes branch is a new version, it starts to grow support for in Cryo. Podman, we're also definitely gonna say that's future proof. We're putting in a lot of effort to keep up with demands in the container space. So I would say that all our tools are definitely future proof and we have a mind towards what's next and we know where we wanna take them. And Cryo was only developed to run in Kubernetes. It has no other interest. So it keeps pace with Kubernetes and is only for Kubernetes. Podman's the more all purpose tool. Question. So the question is about Docker composing, whether we have an alternative plan. And our current plan, if you saw a dance demo, he demonstrated something we call pod or podman generate queue. And we have another counterpart to that called podman play queue. So the idea is instead of compose where you define your files beforehand, you're gonna have to find your containers in podman first to find the relationships between them in a pod and then export that as Kubernetes in the animal using podman export queue. And then if you wanna run that in another system, you replay it using podman play queue. So it's not quite the same thing, but we'd like to think that it offers roughly the same functionality. And you can pass parameters like m variable and when you run the command, you get a YAML. Sure. I'm not sure how much of that is implemented right now because this is very early, but that's definitely on the roadmap. Networks are definitely in there right now. If you have any ports forward into your pod that will persist after you export it, re-import it. Not so sure about volumes, but that's definitely on the roadmap if it's not done already. Just so everyone knows, I'm not sure if it's common knowledge. In Overshift 4.0, what we're working on releasing now, cryo is the runtime. And we use podman for everything.