 OK. So this talk is called Containers Without Demons, Internals of Podman. I don't have the presentation. I don't have anything. Matt called in sick. He's got the Brno flu. And so they asked me to come, and I've been asking. I should put up the chat where I've been, like, desperately saying, can you send me the slides? Can you send me the, he supposedly has extensive notes, and I know nothing about what he was going to talk about. So I'm going to wing it. If you want to get on your phone right now and rate this the worst talk of the week, that's fine with me. But make sure Matt's name's associated with him. So hopefully I've tried to browbeat some members of my team to come down. We're going to be talking, I guess, about everybody here. Previous talks on Podman. How many people have played with Podman? How many people think Podman is the coolest thing in the world? All right. So I'll give you a little history of Podman. Podman was originally called Kpod for Kubernetes pods. And it was, we first started working two and a half years ago on Cryo. And Cryo, we wanted to basically build, we felt that we were replacing the Docker demon for running Kubernetes containers. And we felt that a lot of people SSH into a node in the Kubernetes world and they have the ability now, with Docker as the back end, to do Docker commands. You know, Docker PS, show me the containers are running. Things like that. So what we wanted to do was basically build a tool originally called Kpod that would act like Docker but be able to see the stuff that Cryo was using. And over time, well, first of all, we figured that Kpod was a bad name. And the US Kpod is a coffee pod thing that you stick in there. And so we eventually renamed it Podman because it was managing pods. And so we took the effort and we actually split it apart from Cryo at that point. And over time, it's evolved into what it is now. At the time, I actually had, I don't know, about six or eight interns. And I just put them on and said, here's the man paid from Docker. Let's make that command in Podman. Here's the man paid from Docker. Let's make that. And so that's how it sort of evolved. Eventually, we decided we needed a better database for running it. So one of the problems when you, if you look at the Docker daemon, what does the Docker daemon do that's pretty cool, anyways, is it basically keeps all its locks in memory. So if I want to kill a container and someone else wants to, if I want to remove a container and someone else wants to exec it, it has to be locking to control that flow. So if I want to remove an image and I want to create a container off that image, it has to be controlled. Well, all that stuff was locked up into the Docker daemon. So the first thing we did when we started to build these tools is we wanted to move that stuff out of the daemon into the file system. So one of the first things we had to build was a thing called container storage. And we had to build the locking to container storage. So Nalan is getting nervous in the back of the room because I'm about to call him down and have him talk about how locking works in container storage. Nalan, come on down. Here we go. Can people hear me? OK, well, as Dan explained, the main issue we ran into trying to do things on the same host that a Docker daemon was running on, and this goes back to even before we had started to work on cryo, was that essentially if you wanted to do anything that manipulated the actual storage that the Docker daemon was managing, there was no way to do it without surprising the daemon because all the knowledge of the state and locking of the state and things that you'd normally want to do to make sure you weren't trying to remove a container at the same time you were trying to launch a command running inside of it. That was all internal to a daemon because it was stored in memory. And it's a really good idea for performance reasons, but it made things very convenient for us. So when we sat down and needed to be able to do this from multiple processes, the first thing we had to do was take all the locking and move it into a place that multiple processes could access it. The most obvious solution for that is file-based locking, which is what we do. I don't really know if there's more information about that. Well, tell them what's in container storage. Well, container storage contains, well, that's a really bad use of phrasing. Container storage includes drivers, many of the ones that you're familiar with from the Docker daemon impact. Many of them are started off as the exact same code because it has worked from the code base. If you go back in the GitHub history, it's actually all the way back far enough, you're going to find Docker because that's where we started from. We didn't delete any of that history there. So it includes the drivers for the overlay driver, which we ditched in favor of the overlay two driver and just renamed to overlay, the device mapper driver, BFS, sorry, butterFS and CFS. And of course, the VFS driver, which doesn't do anything clever kernel-side, it is very good for testing. On top of that, we ended up putting in a new set of management functionality, so everything above that is brand new code, which manages layers, containers, and images in a fairly straightforward way because we didn't have a lot of plans for complexity at the time, and so far, mostly, it hasn't required a lot of weird stuff to retrofit additional things into it, but every now and again it does. But that's really all it exposes. At that point, we have the ability to create layered file system storage, use copy and write semantics, do some deduplication with a good amount of help from the image library because the image library has to really know how to drive the storage library when you're downloading an image. And those things put together, along with something that you can use to launch a container, is the easy bits of writing something like Cryo, which is probably how we decided, yeah, we could actually do that. Okay, I'll let you off the hook now. You don't actually need the mic. I don't need the mic, that's right. Oh, watch your step. So actually, the funny thing is this past week, I wrote a blog that delved deeply into container storage, all the dirty facts and things like that, and it covers a lot of stuff, like where the content is stored. So interesting facts, like if I'm running containers by default, the storage is in via live containers, and if I'm running in my home directory, it ends up being underneath, tilde slash homes dot local slash share slash containers. So all that's covered in this blog up here, if you wanna take pictures. But basically, that digs deeply into everything, all the deep doc secrets that you really don't wanna know about container storage. But basically, because of that, container storage is shared between all the tools that we're building. Builder uses it, libpod, cryo, and Scoville can write to them. So container storage is a key factor in the use of how podman, internal is a podman. Another thing that podman does underneath the covers is it uses a thing called containers image. Containers image, if you saw my talk yesterday, covered it a little bit. It's a library that allows you to pull and push images from container registries. And it has a whole bunch of really cool features because traditionally the implementation was just take an OCI image or a Docker image that's sitting at, say, docker.io, quay.io, and you pull it into your local machine. Well, container image has developed a whole bunch of other protocols. They're basically really translation layers. So you can actually, I think, how many people saw our mats and Rabashi's talk yesterday on container security. So at the end of her talk, she actually pulled directly out of the Docker daemon and pushed into container storage. So she actually was, I think she was running a pod. She basically was using Scopio to copy it out. But you can actually do things like podman with a podman run docker-daemon, colon, specify the image. And what will happen is underneath the covers container's image actually knows how to talk to the docker daemon. There's seats in the middle up here. People want to move in a little bit or let them by. So container's image allows you to sort of have these translate you. It's basically doing pretty much the same protocol over the wire, but it's able to pull from different types of container sources. So you can pull from the docker daemon. You can pull from local container storage. You can pull from a local directory. You can translate from traditional docker, images into OCI images. But that's all based on container's image. At this point, I'm going to make Valentin come up and talk a little bit about it. And he can talk about it. Valentin has added some really nice features to really make pulling and pushing images faster. And I signed him up for Lightning Talks. And I'll go to his Lightning Talks later. So this is Valentin Rothberg. So I guess I can practice the Lightning Talk now. Yeah, this is the Lightning Talk. So I have two Lightning Talks. So somehow a double one now. Yeah, I didn't practice the Lightning Talk now because that's somehow part of it. I guess Dan wants me to talk a little bit about how we made pulling faster. So just to say, I'm pretty new to the game. Basically most of, or all of the work for containers image has been done by Milo Slav, who is sitting there. And Antonio, who gave a talk about Cryo yesterday. I don't see him here. He's also second. Yeah, everybody has the burn-off flu. But one of the, so we've been looking at the tools and did some profiling and we're checking basically where we can tweak a little bit to make it faster because we all want to execute containers, but it's nice when it's happening somehow as fast as possible. And one of the bottlenecks that we've seen was image copying in this case. So as Dan has explained, there are many different so-called transport in the containers image library which allows you to copy from a directory to a registry from the Docker daemon into container storage or also between registries. So it basically doesn't have to be pulled first but we can copy from registry A, maybe Quail to docker.io for instance. And initially the containers image library hasn't been implemented in a way that it was meant to be used concurrently. But if we want to pull in parallel, well, we have to make sure that we don't corrupt our data, basically parallelization problems. And this is something we did last December, so it's rather fresh. So whenever we do a potman pull, Dan could you, let's do a live demo. Live demo? Yeah. Dan is very brave, he always does demos live. Yeah, they usually blob my face. And I'm in a comfortable position to blame him if it doesn't work because I'm not using the keyboard, right? So when we do a potman pull on for instance, let's do engine X. No, it's N, N, G, I, N, X, yes. Yeah. Right, so now we're trying to pull it and then you should hopefully see that, yeah, here you can see it, the progress bars indicate, hopefully it was a short one here, that the pulling happens in parallel. One other cool thing that we found or basically that Giuseppe found, you can say hello, he gave yesterday a very cool talk with Akihiro about rootless containers and Giuseppe found also that so compressing the layers that basically a container image consists of, so basically all the data, when we compress it, it takes a lot of time. So he ported the tools to use a new compression library, P Giuseppe, I guess it's called for parallel Giuseppe and in combination with the parallel pulling, pulling is now up to 50% faster, which is pretty impressive. For sure it depends on the amount of layers that we have and on the size of the individual layers, but this was a pretty, pretty cool improvement. One thing we also are looking at at the moment is now we can pull in parallel, it would be nice especially when we build containers to push them also in parallel. The containers image library now allows it, but we're still serialized by the locks in container storage that Nalin was talking about. And now we are wrapping our head around a pretty neat problem, which is how can we transition the locks of container storage in a way that we can read in parallel? So we need to transform it into a read write lock, but well, we have the problem, many tools on the system are using the container storage library. If we update tool A with the read write lock semantics, how can we somehow gracefully allow it not to corrupt the data of tools that are still using the old version? I think there's a talk tomorrow about philocking and I hope to somehow talk to the presenter a bit about that. That's pretty much it. Good, thank you. We have a question, yes. You can interrupt at any time. I've gone through 20 minutes now, I have another 40 to go. So if you're copying an image that has 10 layers and you only make a change on one of the layers, only that layer is gonna get copied. And that's actually, if Vincent Batz was here, he could talk about some ideas he has. So right now images are based on tie balls and so you have to copy the entire tie ball and there's some thought about changing that format, changing basically moving to an OCI image format number two and looking at it, can we do things at a lower level at a block level and there's been some talk about CAsync and these different protocols for potentially storing your images in a much easier format. Right now if you create a layer that's a gigabyte in size, we end up moving a gigabyte of size across because of the format of tie balls. So we've talked a little bit about the origins of Podman. We talked a little bit about, and by the way Podman went to 1.0 this week, so pretty cool, huh? So we released it as 1.0. It's gonna be, it's in rel7.5. Yesterday we had a talk on user namespace. How many people saw the user namespace talk yesterday? Okay, so Giuseppe and Acura, will be at my current center, right? It worked a lot on actually making, the real reason Podman, people are excited about Podman, in my opinion, is mainly around the rootless stuff, but I like the fact that there's no big fact even. And they've done a lot of work outside of Podman to basically get the ability to take advantage of the user namespace. So for the people that didn't see that yesterday, I'm gonna make Giuseppe come down and talk a little bit about user namespace and how we're internals, how we're setting it up, so containers will run inside of a user namespace on the host, and that's how you can run containers as non-root. Yeah, so we started using user namespaces first in the root version of Podman. And user namespace basically allows the containers, the process running inside of a container to believe, to be using different IDs than they're in reality running. So from the host, you can see the process is running with a UADN group ID, but in reality, but then in the namespace, the process believes to have a, maybe to be running like root, but so once we had the support for namespaces in root, the next step was to allow running Podman as a unprivileged user. For doing that, an unprivileged user, it's allowed by Linux to create a namespace where there is just one mapping. You can map your UID to an ID you wish and the application believes to be running with that ID, but for the system, it's still running as the original user. So with the help of other tools like new ID map and new GID map, it's possible to add multiple IDs instead of the user namespace. So we could really run any kind of application as a unprivileged user. We'll, is that a question? So Flatpak does it, Flatpak's, my understanding of Flatpak is use bubble wrap underneath bubble wrap as its container runtime and bubble wrap was originally, using namespace wasn't fully supported on top of REL, so bubble wrap is actually a set UID application that actually does the configuration so that Flatpak will work. I believe bubble wrap also has smarts in it to know if user namespace will work, it'll use that instead of the set UID, but I might be totally lying to you now. Yeah, Flatpak, the new version of Flatpak are using username space and they use the simple version that just one ID is mapped. So, yes, so the limitation of Flatpak, well, we added support for a rootless network. You can have a network namespace inside of the container instead of with a Flatpak app, you run with a network namespace of the host. Yeah, but the biggest difference is that in Flatpak you have only one ID available. Instead with Podman, you have multiple IDs so you can run any application. So let me show you an example of that. So there's a tool called BuilderRunShare and what BuilderRunShare allows you to do is basically just enter a user. I didn't launch a container, right? I'm just regular, the end-wash and I'm gonna do a BuilderRunShare and suddenly I'm inside of a container, I'm inside of a user namespace. So again, I haven't taken Podman or anything else. BuilderRunShare just sets up the user namespace and this is the same way Podman does. As he was explaining, we have the SC sub UID file. Now this file is on every single Ubuntu, Fedora, so ShadowUtils now populates this file and what this file does is it actually allocates to DeWalsh 65,000 UIDs. So every user, you see I added another user there on test and that starts at the first UID available, so 165,536 and allocates that user 65,000. So as you add users to it, you will allocate UIDs. What ShadowUtils also provides is that it's two set UID applications that basically allow you to tell the kernel to put the next process, the child process, into a user namespace and what we map the user namespace for these containers looks like this. So if you look at that user namespace there, it says that my UID is 3267, so when I launch BuilderRunShare, it used those set UID applications from ShadowUtils to configure first my UID as being root inside of the container and then it says for a range of one, so it's only basically allocating one and then it says starting at UID one, map UID one to 100,000 and then sequentially for the next 65,000. So what I can do, it's interesting inside of the container now is I can actually do a, I can create content. So now I've created a directory and a file inside the directory that's owned, it's been binned, but it's owned as been binned inside of the container, inside of the username space. Now if I go outside the username space to my home directory, it created its UID 100,000. So as you see, UID 100,000 was right, it was the first UID, it was mapped to UID one. So if I wanted to create a file as owned by 22, it would be 100,001, 100,002 in my directory. So this allows users now to use a far greater group of UIDs on the system, but the interesting thing here is I can't remove it. So if I try to remove, I get permission denied because I'm not in the username space, so therefore I no longer have that mapping of the extra 100,000 UIDs. But if I go back and the other thing is if you look at my home directory now, everything's owned by D-Walsh, but if I look at the same home directory while I'm inside the container, everything's owned by root, exact same files. So what's happening is the kernel is actually lying to me. The kernel is telling me that, oh yeah, that's root, but really outside the container, it's based just inside it. Just inside my username space, it looks like root. If I went and put, I think Sally's talk yesterday, Sally and Matt showed a file owned by root, and if there's a file owned by root inside of my container, I see it as nobody, nobody. So it's a nobody user, minus one. And any UID that's not mapped. So if I try to interact with any files that aren't owned by me or any processes that aren't mapped into my username space, including root, then it treats as if they're not existing. Thanks for helping out. Okay, so that's sort of, that's some way we do the magic. There's really cool features that have also been added to Podman for running containers. You can basically select different username spaces. And we've added some optimizations to one of the features over the years. I've been badmouthing username space probably for the last five years. Not because username space was bad, but because the file system doesn't understand username space. So what we really would want is the ability to take an image and mount it into, if I want to run, say, the same image. So we talked about this one point, one gigabyte image earlier. If I have a gigabyte image on disk, I want to run it in two different containers. I really want to use the name space to be able to separate those two containers. The problem is, I need, if I want it to look like root inside of a container that's running as 100,000, and I want it to look like root inside of a container that's running as username space 200,000, I have a problem, right? And the problem is I have to chone the files. That there's no way to tell the operating system that while this file system inside the username space is in there, map that file that's owned by real root as if it's owned by 100,000 and map it in a different container as if the file that's owned by root in a different container is mapped as 200,000, right? There's no way to do that. It's been tried. I've been working with the kernel team for many years saying this has to be fixed in order for us to use the name spaces. The problem with, well, there's lots of reasons that hasn't been fixed. There has been attempts. The first attempt actually was one of the guys that works with me, Vivek Goyal, he works on most of Ovalay file systems and I wanted him to say, can you just do it in Ovalay? To me it makes sense to do it in Ovalay. He actually presented a patch to do it in Ovalay centered to the upstream kernel and they said, no, we don't want it in Ovalay, throw it away. There's another guy named James Bartley works for IBM. He's a major kernel contributor. He's like one of the top kernel contributors in the world. He built it a file system called ShiftFS that basically did the same thing and this guy has pretty much, you know, Leonard's ear, I mean, Linus's ear. Leonard would like that. He has Linus's ear and he got shot down. The reason he got shot down is because we'd have a file system like an Ovalay file system. Now I gotta map another file system on top of a shifting file system. We ended up using up a different i-node for each one of these and we had an escalation of number i-nodes and they said that's stupid and they said what we need to do is just do this at the VFS layer. So the kernel guys now are working at the VFS layer to do the shifting. So eventually what you're gonna have is you're gonna have a mount command, basically like a bind mount and when you bind mount, you're gonna put it, you're gonna specify that I want this inside of this user namespace. So it'll be a mount command that does that but there's real tricky security here and actually I don't like what the kernel guys want. The kernel guys want that to work both ways. So the kernel guys want to shift, what I would like is just to read only shift, because in containers world we don't write the lower layer. So if I shift it to the upper layer, I can write to the upper layer but the kernel guys want to be able to shift and actually write to the lower layer. So if I am in a user namespace running as 100,000 and I write a file, they want to put on disk as UID zero. So they want to shift it back and there's reasons for that and what they sort of want to do is basically specify, an admin could specify certain parts of the operating system to allow certain containers to be able to do that but certain processes to be able to do that. My opinion, containers, we're not interested in that. So we just want to shift it at the higher level. So I'm really rambling on a lot about this but what's happened recently is going, so that's still out there and again you can imagine if I am a process running as you know, it was de-wash and I write a file and make a set UID and it ends up on the operating system. It has root, that's a problem. So what happened is, recently is, so we've never been able to have it. So, and the kernel guys continue to work on it but I mean this could be, the kernel works at a colossally slow pace for this stuff and for a good reason, this could be major security issues. So Vivek Goyal, the guy that did overlay file system actually, one of the ideas we had is in user space we can handle this. We could actually chone all the files. So we can go through, if we're running as UID 100,000 we can take the image and say all root files are now going to be owned as 100,000 and all one files to be owned as 100,001 and we can actually write a chone and Nalan actually has that built into container storage. So we can actually chone it. Well if you run it on a BM with say Fedora it takes about 30 seconds to chone the entire container. So what's happening in an overlay is as I chone each one of those files they get copied up. So if I had a gigabyte file system I'd be copying up a gigabyte file system. What Vivek has done is he actually added a new feature to the kernel that actually just creates an iNode on the overlay. So the overlay creates an iNode now that points back to the lower level. That means if I'm copying up a thousand files I'm not actually copying them up I'm just creating iNodes related to the thousand files. So because of that now we can actually create use a namespace separated containers instead of taking 30 seconds we can do it in some one second. So we're gonna be announcing we have, I could demonstrate it but I'm worried it's not gonna work so we'll hold off on that but actually what the hell. Okay so we'll see if this works. So I'm fooling myself. Let's get to real root. So what I can do, I'm in run.ti user map UID map, and we'll find out in a second. Okay we're gonna do 111000 1000 fedora. So theoretically I believe I don't have a container whether it's already has this. So it's gonna, this is basically saying map UID 0 to what's that a million, 11, 110,000 and starting at zero we're gonna map a thousand UIDs and let's see how long this takes. That's pretty good. Anybody know why that went so fast? Cause it was a bug. It actually didn't run the container. So now, now I was right. So this is why it takes, look at that it's taking forever. That took 15 seconds. Not impressive. That's why you don't do a lot of demos. Okay so what happened here is I ran the username space, same username space twice. What we're doing now in container storage is we're caching the username space. So we went through, we choned everything in the container and it created a new container. So every time we run the container from now on or run a container on that same username space it's as fast as running any normal container. So we only do the choned the first time. Now what's weird here is it didn't go as fast as I thought it would. And that's the reason. Look at that, this is a teaching example. So one of the interesting things we can do in container storage inside a pod man or anything that uses a lip pod or anything that uses a container storage sorry is you can actually set the flags on the mount point so you're gonna create inside the container. So when I mount up the image, again all this stuff is buried. If you're using Docker you can't do any of this stuff. But in container storage because we're trying to reveal the different parts what you can do is you can actually set the mount options that are gonna go in it. So we set by default I believe that the mount option so on any storage device that's mounted into your container should be no dev. You shouldn't be using device nodes that happen to be on a storage layer. So by default we're mounting containers as no dev. That's a security feature that we haven't really talked about much. But I can also turn on, in order to turn on that fast new user name space thing I can turn on metacopyup. So what I'm gonna do is somewhere along the line I uncommitted that and I'm gonna comment this one. Now if the demo guides don't load this up on me I'm gonna change one little number here so we are gonna run in a different username space and let's see, seven seconds. Something went wrong, I'm not sure. That's why you don't do live demos. Anyways it should have been real fast. I'm not sure. You'd have to dig into it. Am I using an older version upon me? Well 1.0, yeah. Coding on the fly while you're demoing. Everybody watching me type my password? The tension is just immense in here right now. It's tough to tap dance when you have this thing on your leg. So anyways it's kind of a new feature, right? It's kind of cool. Pretty much felt like normal speed of running a container. And if I ran it again obviously it's gonna be faster because it went through. But it actually went through the entire Fedora base image, choned every single I node and created it on disk. Now those disks, that choned thing disappears, right? We don't commit that as an image. We're never saving that. It's just for when I'm running username space. So a future version, hopefully a very soon future version of Podman, I'd like to get if you're running it as root that to happen automatically. So all of a sudden there'll be like a dash dash username space equals random versus and then we'll pick out username spaces and store it into a storage layer to basically say we're running with which username space we're gonna run it with. So, but it's kind of a cool feature. It's still a little hacky, right? We're still taking almost a full second just to chone the file system. At some point, hopefully before we all die, we will get a shifting file system. And as I said I've been talking about username space since sort of the Docker revolution which was 2013. So five, six years and I've said we needed a shifting file system. So we have a user space sort of shifting file system. Now Giuseppe also has built a fuse overlay. So he took overlay and put it into fuse and he actually put shifting into his version of overlay. So we could use a fuse file system on it for doing this. And matter of fact, if you ran, if you did similar things inside of a rootless container you would be taking advantage of fuse overlay. So you wouldn't have to be doing the chone stuff that we're doing as root. So that's another thing. Let's look at another thing, internal of Podman. Did the vial link guys check it out? Those bums. Okay, so there's a file here. So this is libpod, so basically Podman is a sub project of libpod. We created libpod, so if you wanted to find out where the sort that you want to contribute to Podman, it's actually on the GitHub containers slash libpod. So there's a library for creating pods, is what the goal was. And then Podman was like the initial implementation that uses that library. Reason we did that is we wanted to, we were talking to different people that might have been using distributions other than Kubernetes. And they were interested in the concept of pods and we wanted to basically build a library that people could start to experiment with pods. For those that didn't see any of the talks, Podman has the ability to create pods. Now pod is basically one or more containers running in the same namespaces, not all namespaces, but basically like the cgroups namespace, well cgroups, pid namespace, network namespace, and IPC namespace. And it runs with the same SC Linux label. So it's really sort of a way of grouping multiple containers together. And those containers always run together. And that's how Kubernetes runs pods. So Kubernetes sort of developed a concept of pods. And a lot of people like to build pods. The secondary container is called the SideCat container. You're gonna have 50 containers there, but usually people have maybe two containers inside of a pod. And that secondary container usually is watching the primary. So it's like making sure it's working right. You can do health checks, things like that inside of it. Actually, if anybody here is Istio, R-E-S-T-I-O, that's actually using, in Kubernetes, that's actually using this SideCat container that gets launched with each one of your containers and actually sets up a side network for the container processes to work. So that's how they do their, whatever Istio does, it's doing with SideCat containers. I can only know so much. So anyways, libpod was, so we originally, we were designing libpod to be able to do that. And then podman was sort of the tool that we experimented with. If it came to other talks, my talk yesterday, I showed you running containers inside of a pod and stuff like that. But we also, underneath the covers, we actually have tools that are launched. And because distributions ship these executables in different paths, we actually sort of a lot, these are the search routines for figuring out which path. So we look for run C in a whole bunch of different places. We look for conman path in a whole bunch of different places. So when we launch a container, we first thing we do underneath the covers is we launch this program called conman. So one of the, you know, if I'm running a container, if I go and run a container on the system, and I'm gonna do that HD of Fedora, and I'm gonna just do sleep 1000, podman exits. The container is running on the system, right? So in the Docker world, the Docker daemon theoretically is watching that container. Well, there is no more podman there. So we need something to watch the container. So if I went onto the system, which I could do this correctly, you'll see conman's running, and remember I talked about the pod, you'll see conman running there and it's running a pod's container. So it runs a pod's container and conman. So conman is monitoring the pod's container. Then down below there's some Kubernetes containers running on a machine, and you can see them running sidecars. And let's see if we can find the sleep that I just launched. Okay, so right here we have sleep running on the container. So conman is actually running. Now if I wanted to go into that container and podman, if I wanna do a podman exec into that container, you see basically three processes running. One of those is running sleep. What's the command to surrounding sleep 1,000? So what I just did, what happened when I ran podman, there podman actually connected to conman. So inside of the podman database, there is a link that says, this container is running underneath conman and it's running with this PID. And so we connected to conman, conman then injected us into the container. So exactly the container. That's why we're demon-less, right? We don't need a big fat demon sitting out there watching it. We just have this little tiny program. And that program is just a little C program that's just really watching for the, keeps open its TTYs and just watching for the container to exit. And when the container exits, it captures its exit code and then writes it to a directory which podman then can go look at. So if I wanna know why this container exited or why the exit code was successful or not, I can go and look at that. So that's all the stuff happening under the covers. This is the same exact tool that trial uses. So you can, anybody that's played with Docker over the years knew that you could never restart the Docker demon. If you restarted the Docker demon, all the containers would go down, okay? Later container, now later Docker versions actually use container D as a separate process. So now you could cycle Docker, but you could never cycle container D. I don't know if that's been fixed at this point, but basically you always had the, anytime you wanted to upgrade, if you wanna upgrade trial, you can upgrade it because these con-mons are running. And obviously you can upgrade the con-mons but the next time the con-mons run, they'll fix that. So anything else of interest in here? So we'll show you a couple of other things. I have a question, I love questions. Help me tap the answer. Yes, ask it slowly, yes. So the question is Docker, basically a lot of people wrap Docker containers inside a system D unit file. So at boot time, I'd communicate with the Docker demon and say start engine X, all right, start some container. So what, in the Docker world, they have things like restart, auto restart for a container or different ways of starting containers. And in the podman world, we say put podman directly into the system D unit file. Matter of fact, we don't support auto restart. Auto restart in our mind makes no sense at all because we don't have a demon. Well, we do have a demon, it's called system D. It's in charge of starting processes on the system. So we would say if you wanted to, if I wanna run a container at boot up time, I would put podman into the unit file to automatically start. We actually, if I look at, there's a thing called, I was gonna try to get people to talk about Violink but this is actually a decent example. Okay, so one of the things we have, so we have the concept of podman remote, which actually, again, we don't have a demon. We don't have the Docker demon sitting out there listening on a CLI. So what we added is basically a mechanism for socket-activated podman. So I need power, ooh, that's good. I get out of the rest of the talk when I was running out of power. So anyways, so this is an example of running podman in a unit file. And usually we tell you to create the container and then just do a start and stops. So on the exact start, you do a start of a container and when you were done with the container, you would do a stop of the container in the unit file. But that leads me into the Violink stuff. So one of the things we wanted is we wanted a mechanism, there's a couple advantages to what Docker did that they had a demon that listened to other protocols or listened to a protocol to talk to it that you could basically do with it. A lot of people probably played with the Python bindings to Docker. And what the Python bindings do is they talk to the Docker demon. And what we needed is a way for Python users to be able to use podman, to basically be able to use podman. We also wanted Node.js users to use it so we could plug it into Cockpit. And we wanted Mac users to be able to use it remotely. So we actually created, we don't have a demon, we actually have a socket-activated podman that when you connect to the socket, it will talk a protocol to you that will launch a container. But it's basically every time you launch a container, it launches a different podman. Yes, it's infinitely possible. Yes, maybe not a good idea. But now, so if you had to run a privileged container, a privileged container would, I mean a privileged lockdown container would not allow you to set up user namespace and things like that inside of a container. But anybody that's, well, we had meetings earlier today talking about this new thing called Toolbox. And Toolbox is actually a new desktop item where you'll be launching terminals and the terminals will be running podman under the covers and actually sticking you into a privileged, rootless container. Which seems like an oxymoron. And then inside of that, if you wanted to run podman to a rootless podman to create more containers, you could do that, okay? So yeah, you could, and matter of fact, one of the demos we did yesterday and Sally did it is actually running Builda inside of a podman container. So it's really good use cases actually distributing out your builds inside of, say, Kubernetes or something. Imagine firing off 100 builders to build all your images. So if you have a CI, CD system, you could plug it into Kubernetes, have Kubernetes take all your darker files and put them into individual containers and have Builda actually process them. The government, people do that now. The way they do that is they take the darker socket, probably the most dangerous thing in the universe, and stick it into the container. And they can do it in a lockdown container, right? But if I can talk to the darker socket, I have more power than pseudo without password, okay? Because I, as the hacker, can wipe out any logs of me doing anything on the system. So putting Docker into a container, it just gives pseudo root. Matter of fact, anybody that wants, anybody like Sally who showed that she's using the Docker, there's a way to set up Docker so that it has the group Docker. Anybody seen this? So you can do 6.6.0 on the darker socket and then you put yourself in the Docker group and now all of a sudden you can run Docker as non-root. Ooh, great! And those same people, I say, well, why don't you just run pseudo Docker and just allow them to go with no password? And they say, no, that's dangerous. Well, there's a problem with that, okay? You're giving them full root to your machine and they can wipe out anything they do on it, right? All I have to do is a Docker run dash ti slash, slash, colon slash host. Might as well show it. So if I do this, dash host, I'm doing it the wrong way. Slash host, dash dash privileged, fedora to root slash host. Guess where I am? I am root on that host, right? I have turned off because I can run privileged, right? There's no controls on what options you can hand to the Docker socket. I'm out of time, awesome! Okay, so anyways, I become root in the system. I go and do my evil, malicious things as a developer, which is what we're gonna do if you give me root to your system. And then, not only that, but after I'm done, I can do a podman or Docker RM, the container. And if you didn't set up container to journal and just had basically log files, it will wipe out the log file, all right? Now how was that for made up on the fly, all right?