 Hey, hi, so we can begin now. It's one being sharp. So hi, everyone. This is the session for the application development and containerization track. It will be a talk by Daniel Walsh and Urbashi Mohani, and it will be for the running board man within a container. So all the best guys, take it away. One second. I am not prepared for this. Okay, so my name is Daniel Walsh and basically the lead engine, the technical architect of container technologies, the Red Hat, everything runs at the Linux platform, everything runs underneath Kubernetes. So Urbashi, who are you? Hello, everyone. My name is Urbashi Mohani and I'm the team lead for the OpenShift Containers team as well as an engineer on it and a co-organizer for DevCon. If you didn't know that from today morning, see notes. So this talk is sort of, we've been probably one of the most often questions that we get asked in the podman world is how can I run podman inside of a container? Most people are looking at running podman inside of a container. I use cases that usually around the ability to run containers within a CI CD system. Lots of people want to run, you know, that they might have native support for something like Docker, but they want to test out podman commands. So they want to run podman inside of Docker. A lot of people want to do things like building container images inside of the CI CD system or inside of Kubernetes. So we're constantly getting asked and people are trying to run podman inside of a container and then there's lots and lots of repeat errors. So a couple of months ago, Urbashi and I started working on a couple of papers together to sort of generate what we thought were the best practices for running containers, running podman containers inside of a container. And then during that time, we also sort of developed an actual container image that we now subscribe as being the best way we've have figured out to run containers in containers and sort of get everybody focused so we all could work together on improving the ability to run containers within containers. But interesting thing about, you know, a lot of people had used Docker inside of a container, but Docker really sort of runs in one way, right? Sort of a running container. It always requires privileges. And with podman, we had lots of different ways of running podman. And also we had different security constraints. So we wanted, while we're going through this exercise, I think we came out with, I don't know, about 20 different mechanisms for running podman in a container. And so, you know, we're going to go through some of them today, and I'm going to go through and describe sort of the image, how we configure the image to make this easier, the image that actually runs podman. So go back one, sorry, for you to jump ahead. So to give you some of the scenarios that we're talking about, that we investigated in the papers that we wrote, was, you know, you could run on the host, you could use a root for podman, or you might want to use Docker daemon to run the podman. And then, you know, you could also want to use rootless podman to run, you know, a podman. And then you want to run, say, with a tool like Kubernetes, you might want to run it. But inside of the container that you have two options, right? You're either going to be running root full podman, or you can run this podman. So what we looked at is all the different potential combinations of these, and then ways to do it best. So now we get, so during this process, we came up with what's the quay.io slash podman slash stable image. And that's what we recommend currently to run podman. You know, if you want to run podman inside of a container, this is sort of our best practices. This is the image that we defined as being the best way they might want to run podman inside of a container. There are other images under quay.io slash podman that's one called upstream, which is really so stable is sort of the latest version of what we release in Fedora. Most of us work, most of the core engineers work inside of Fedora. So it'd be a Fedora 34 image at this point. So the latest stable version of Fedora, as well as the latest stable version of podman that's been released to Fedora. Testing is a similar image, except that in testing, it's using a podman that might not have been fully released to stable, but might be in the updates testing part of Fedora. And then there's upstream, which is just taking whatever the daily, basically it takes the daily main branch of GitHub and trade supply of podman based on that. So that one's totally, that if you want the latest and greatest is a way to experiment with that. For the most part, we tell everybody it's stable. So now let's go look at how we actually built the container image. And of course, I'm having a hard time seeing that. So let's go down and click in the next section. So first thing is, as I said, this is basically the Docker file that we use. The reason we use Docker file is the container file, which are really the same thing is that Docker file is the only thing that's supported for quay.io to automatically build. So we had to use a Docker file. But the first image that we use for building this is again, the latest Fedora image, which is in this case right now is Fedora 34. And then we start to install software on top of that. So now we go down and basically what we're installing is the latest version of podman. We're fully updating the system. Then we're installing podman. There's a bug in upstream Fedora images for shadow utils. So we do a quick reinstall of that. But really what we're installing is podman. We also install fuse overlay. And then we actually exclude things like container, sc1x to try to keep the size of the image. Since we're not really running sc1x inside of the container, we don't need to have installed them. The next section goes out and basically creates a podman user. Since we're going to use the same image for running either rootful or rootless containers, we wanted to create a user inside of the container image that we would use for running rootless containers. And then we call that user podman. And we configure the etsy sub UID and etsy sub GID files inside of the container to be able to run with a group of UIDs. Now, when we're running in rootless mode ordinarily users, if you're running a rootless container inside of a rootless container, you usually only have 65,000 UIDs available. So we can't use the full 65,000 inside of the container if we only have 65,000 outside of the container. So because of that, we just allocate 5,000 UIDs to run containers inside of containers. Now, this might be something that you would want to change if you wanted to build your own image off of this. But right now, we only support using fact that inside of the stable podman images only allows you to use 5,000 for rootless containers. The next section is kind of interesting in that one of the issues when people tend to run containers with the containers is overlay file systems and fuse overlay file systems don't allow you to mount fuse overlay on top of fuse overlay or overlay on top of overlay. So if you run a container, say root full container inside of a container, if the container image that you're running podman inside of is on overlay, and you try to create more overlay mounts on top of that, the kernel rejects that. So we create the overlay mounts in by live container storage and in the home directories.local share storage. So in order to stop us from stopping users from accidentally using overlay on top of overlay, we built great built-in volumes for them. And this is the tell podman or darker on the outside to create volumes that aren't overlay to put the storage on next section. So we also, this podman has this concept of what's called containers.conf and containers.conf allows us to change the default way that containers run in the environment. So let's go look at the containers.conf that we're creating. So here, we actually ship with two containers.conf, so we should ship with the system-wide defaults, which we put in that container containers.conf, and then we ship with modifications in the rootless environment. So in the system-wide one, it's interesting that we basically set up all the namespaces to use the host namespace. So we feel that since you're already inside of a container, we don't need the complexity of having two network namespaces set up and two user namespaces and two IPC namespaces. So we're telling podman inside of the container just use the containers namespaces that were already set up. Similarly, we also disable cgroups because we're using cgroups for the outside container. And there's certain things that we also want to force because you're not going to be running system-d inside of a podman container. So you want to force it to use cgroupfs and file. And we ship with crun instead of run-c because mainly because crun is a lot smaller. So the image stays smaller. Inside of rootless containers, there is an issue running slash proc, setting up slash proc inside of the container. So we're taking a hack to basically, again, use the host, the container, the parent containers slash proc rather than running a more lockdown slash proc inside of the container. So next slide. So now we've set up all this configuration file. The next group, we want to make sure that podman owns all the content in the home directory. So we want to make sure that we added all these files and then make sure that they're all owned by correctly. And the rest of the code at the bottom of the container file or Docker file is all about potentially allowing users to use sort of the hosts. There's a way to leak the images that you have in a host into the container. But we're not going to cover that in the rest of this talk. But if you go and read the papers, then you'll be able to find out more information about it. Next. So I think this is me then. Yeah, you go ahead. All right. So it takes quite a lot to run podman inside a container. Container engines require a fair amount of privileges. They need to be able to mount file systems, use system called clone to create username spaces and containers usually require multiple UIDs to run. So we have put together a demo showing how podman works within a podman container and within a Kubernetes container. So let me share my demo part. All right. So hopefully you can see my screen. So the easiest way to run podman within a container is to use the privilege flag. The privilege flag basically gives the container processes elevated capabilities similar to that of the host. So container engines can run inside it without any issues. So the first scenario here is running rootful podman and rootful podman with the privilege flag. As you can see here, I'm invoking podman with sudo on my host and we can see and I'm running a simple container UBI 8 minimal side might continue that I'm running on the host and just echoing hello. And as we can see, the festival pulls the image and the container ran successfully. All right. The next scenario is running rootless podman and rootful podman with the privilege flag. I'm setting I'm going to be making podman bootless by setting the user flag to podman which is UID 1000 inside the container. And as you can see here, the container ran successfully. I'm going to say the container ran successfully a lot of times because it does run successfully every time. And then the next scenario is running rootful podman rootless podman with privilege. So as you can see here, I'm no longer using sudo to invoke podman on my host and running simple UBI 8 minimal container and I get my echo hello output. And then the last scenario is rootless podman inside rootless podman is privileged. Similar things that the user to podman inside the container with the user flag and the container runs as expected within the container. So since podman can be run in both rootful and rootless modes, we have these various combinations for you to choose from or how you want to run your podman on the host and within the container. So yeah, using privilege is the easiest way, but we want to be able to run podman in an app privilege container to make sure that we're more secure. And we can do this with podman by making a few adjustments. So over here, I'm running rootful podman in rootful podman without the privilege flag. So as you can see, I'm using sudo to invoke podman. The first thing we need to do is we need to give it two additional capabilities. The cap says admin is required by the by podman running in your container as root so that it's able to mount the file systems it needs to. And then cap make node is required by the podman running your container as root to be able to create devices under slash dev. We need to mount the defuse device because we need to use the fuse overly fs file system within a non-privileged container. And finally, we need to disable acetylene operation because acetylenex doesn't allow containerized processors to mount all the required file systems underneath. Sorry, that we needed to disable acetylene here. So as you can see here, the container ran successfully within my container in an unprivileged mode and we got the expected hello output. All right, so we can also do rootless podman and rootful podman without the privilege flag. Same thing here, we need to disable acetylenex separation and we need to mount defuse device. But we no longer need to give it the two additional capabilities, we have to give it when we were trying to run rootful podman inside the container. That is because rootful podman will be created within the user names space and the container doesn't need those elevated capabilities. And similarly, we can run rootless podman and rootless podman without the privilege flag. As you can see here, I'm not using rootful podman. The flags are the same as the table acetylenex and mount defuse. All right, so that's running podman inside podman. Those are a few use cases. Another way you can also do it is to run the podman on your host and leak the socket inside, but I'm not demoing that today. Irvish, did you want to make a couple of comments? Irvish, can you hear me? Yeah, go ahead. You're very choppy right now. I don't know why. One of the features that's covering the latest kernel is the ability to run native overlay. Excuse me. Irvish, can you hear yourself? This vehicle is coming through your system. So one of the features that just got released in the 5.11 kernel, if you weren't attending the last session, was basically native overlay support. So a lot of the demonstrations there, she was leaking. She had to add the defuse into the containers in order to make this to work. But once we get to the 5.11 kernel and everybody starts using 5.11 kernel, then needing the defuse and fuse the overlay running inside of these containers will no longer be necessary. I'm not sure if Sally, Irvish, did you drop? I think she's having some technical difficulties. Let's just wait for a couple of minutes for her to join back. Okay, she's back. Yeah, sorry about that. I don't know what happened to my system. Irvish, did you hear what I just said? I was just talking about defuse will not be necessary once we get everybody gets to the 5.11 kernel. Yeah, yeah. All right. Shall I continue? Yep. All right. Hopefully my system is okay. All right. So the next thing I'm going to show is actually running podman inside of a container in a Kubernetes cluster. So here I have a simple Kubernetes cluster running on my host. My node is up and ready. And the first case we're going to look at is running with the privilege flag set to true because that's the easiest way to do. So very simple thing in your YAML, you just need to set the privilege option to true under security context. And I'm setting the user as 1000. So I'll be running podman as rootless within the container. So as you can see here, my pod is up and running. We exact into the pod and I printed out the ID and we are running as user 1000 or user podman. Let's run a quick container. I'm just going to skip quickly to that. And the container runs successfully. We see the hello output. So one of the biggest use cases for running podman inside a container is for build purposes. So I'm going to build a very simple Docker file. As you saw, there was just three lines just to show that builds also work within a container. And as you can see here, my build was successful and my image exists on my inside my container. All right. So now, as we mentioned earlier, we don't like running podman. We don't want to run podman in a privileged container. We want to do it in an unproplished container. So we can do that similarly within a container in Kubernetes. Similar idea here. First thing you need to do is you need to disable SELinux on the host that is running your Kubernetes cluster. So as you can see here, I've already disabled it. My SELinux is in permissive mode. The next thing you need to do is need to mount the fuse device. So the way to do that in Kubernetes is to create a device plugin and then link the device plugin in your pod yaml. So I just created my device plugin over here. Now, let's take a look at what the pod yaml looks like for an unproplished container. So very similar. This is where I have, the limit part is where I have set my defuse and I'm just setting this to user 1000. So I will be rootless within the container. And let's quickly create that. And we can see my pod is up and running. And when we exec into the container, UID 1000 is expected. Let's run a simple container. And we're going to wait for the pod. And yep, the container successfully ran. So we can also run build over here. One thing to note though here for running on build is that we need to use, we need to set isolation to true root. That's because we're already in a confined container and when we're running builds, we're trying to create more containers within other containers. And since we're not in college mode, it won't have all the permissions it needs to mount all the various file systems. So setting isolation to true root is, we'll make that work because we can bypass some of the permissions we need. And it's completely fine because we're already confined in the container. So that was running rootless pod man within an unproplished container. Now let's take a look at running rootful pod man within an unproplished container in Kubernetes. So as you can see here, the additional things are four additional capabilities. So sys admin and make node are needed, as mentioned earlier, so that rootful pod man in the container can mount the file system that needs to and create devices under slash dev. We need sys true root and setf cap here because these two capabilities are actually part of the default list of capabilities pod man runs with. But since cryo is created for running containers in production environment, it has a much shorter list of default capabilities. So it's more locked down. And my Kubernetes cluster here is using cryo. And when you run pod man, it tries to load all the capabilities it needs. So if I run my container and Kubernetes that is using cryo without adding these two capabilities, pod man will fail to run within the container. So that's why I need those two additional capabilities when it comes to using a Kubernetes cluster. All right, so pod is up and running. Let's exec into the pod and we are running as root inside the pod. Simple container and there you go. We have the hello output. And we can also do builds here. Similarly, we have to set isolation to true root. And we can see it build was successful and my image is there. There you go. So that is all I have for demo wise. There are a few other cases that, one second, let me go back to the slides. There are a few other cases that you can use to run pod man within the container. As I mentioned earlier, you can run pod man as a demo in your host and leak the socket into your container. You can do this both within pod man and the Kubernetes cluster. You can also run your container with username spaces and pretend to get pod man the elevated capabilities it needs so that if a process breaks out, it won't really affect your host. We have highlighted all of these in the two blogs that Dan and I have written that you can check out if you want more examples and more details and just some steps on how to get started with this. Anything else, Dan? No, that's about as quick as we could possibly do this. So if anybody has any questions, we probably can take like two minutes worth of questions separately. Okay, thanks guys. That was an amazing presentation. Thank you for sharing that with us. Looking at the Q&A section and there's no questions as of yet. So we can wait for a couple of minutes in case anyone wants to ask anything. Yeah, and obviously we went through this very quickly. One of my goals is as people, we want to make the two reports eventually into living guides because we, you know, as people experiment with running pod man inside a container, so build it inside a container, and the operating system, the kernel evolves and we'll be able to tighten these up and make them even more secure. But SE Linux being disabled for containers as much as I hate to do that is really fundamentally that SE Linux is trying to block the type of activity that requires to run a container inside of a container. So we have to disable it for those use cases. But we tend not to disable it on the host just for the particular containers where you want to run a container within a container. Okay, I guess we're done. I guess we're done. I explained everything perfectly. Okay, we do have one question. Would this work on another OS running Docker? Like macOS? So yeah, it would work. Basically, in macOS's case, it's really using, we're using a remote pod man at that point. So we're talking to a pod man service running inside of VM that's running on a Linux box and that pod man service can launch pod man inside of pod man. So yeah, it would work on any Linux operating system or any operating system that could communicate with the pod man service running inside of a Unix or not persistent. Similarly, if you're running a Docker daemon inside of a Linux, say Docker for Mac, you could also talk to the Docker daemon and have it launch pod man inside of a container. Okay, thank you, Dan. Yeah, I don't see any other questions. Okay, there is one, sorry. Are the YAML files used in the demo available on GitHub repository or other sites they want to replicate? So the YAML files are not available right now, but we do have a container slash demos repo on GitHub. I will put them over there by tonight. So you can take a look at it. We do have a pod man and pod man directory over there. Let me see if I can find the link. And a lot of these YAML files are available in the two blogs, the linked blogs. Oh, yeah, they're all in the blog. Yeah, that's right. And we would love to have feedback. So when you try to do this and you get it working, then give us feedback on how well it worked or any problems. Yeah, open issues and then open PRs to fix those issues for us as well. There is a lot. I mean, the problem with doing this talk right now is that we had 25 minutes and there's a lot of deep concepts going on here. But basically they all do work and hopefully we've optimized the images to use to be able to do it. And our goal with those optimized images and the optimized archive file is that people can then take those and modify them to fit their own use cases. It's not that this is the only way to do it. We believe that these concepts make it easier to do and then further expand them as we go forward. Yeah, great. Thank you so much, guys. Urvashi has the blogs on the chat. I will send a link for the breakout rooms in case anyone wants to interact with Dan and Urvashi. Feel free to go there and get the next. Thank you so much.