 Hello, I'm Tomasz. I'm working for Red Hat for almost nine years already. And for the past three years, I've been working on developing containers applications running in OpenShift. So, and before we actually start, I will ask you, at least those of you who would like to follow with us, to download this Fedora tar image we've prepared for you. And extract it somewhere on your disk. And if you do not remember... If you don't feel like going along with us and doing all the typing, you can just watch. And I still believe you will benefit from that. But for those of you who want to get your hands dirty, please download the image. Sorry for interrupting. And if you do not remember the tar syntax, like I do, I've also included a command to extract the tar archive. So, first of all, we are not an expert when it comes to container. We are not developing it and we are merely just developing containers applications. But for many points of our career, I was interviewing people for what containers are, and sometimes I hear and answer that containers are lightweight VMs. And they are not. The name of the containers come from the word contain. And in fact, the containers are a mean how to encapsulate and isolate and running process. And all this isolation and resource confinement is done using standard Linux means which are included in kernel. For example, kernel namespaces capabilities on the run like systems as a Linux and of course control groups, short and C groups. And at this moment, I would like to give my word to Jan. We were just tiny bit of theory, just guys keep in mind that this is not theory, theory focused, so we will be very light on that. I want to talk for a while what are actually container tools. Because you are probably familiar with tools like podman or Docker or even Kubernetes Docker swarm. And all of these tools basically enable you work with container containers. Can you please just put into chat, which of those those tools have you used or which of them you are the most familiar with be be either Docker podman, some other some other things Alex see maybe maybe some other things I don't know. Okay, so there is there is some podman audience definitely some podman audience. Oh, there's a poll already. Ah, nice there. The poll is already running sorry I didn't notice. So we are getting we are getting some votes in the poll. Well, yeah, I mean, so far so far Docker and podman are definite winners so so so you have an idea what I'm talking about. It's you know these tools. There are some people who don't know them, but the, the, this is a minority. And what those tools do they enable you to easily does the keyword here here to easily spin up containers join them work with them. The point of our talk or the workshop is to show you that under the hood, they use already existing kernel features, and you can actually create containers without ever touching those tools. So, this is for the theory. I will jump in my terminal. When I'm typing away. I won't be able to see the chat so while I'm hacking away. Tomash will be there in the chat to help you if you have some issues. So, let's get straight to it if you don't see my terminal. You can just double click it in the hopin platform. And it should be maximized for you. And what I recommend to you recommends to you if you want to follow along just open to terminals like I did and put them side by side with our presentation. We will have one top terminal and one bottom terminal and I will refer to them as such. So, we are, we are in home of DevCon user. We created a directory workshop. And in that directory, we have Fedora.tar, which is the same archive that you downloaded. Oh, by the way, Tomash, if there is some problem or some question in the chat, feel free to interrupt me if such a need arises, I don't see them. So, let's start with the most difficult thing, exploding the tar. Yeah, remembering tar syntax is difficult, but I'll give you a hint is tar XF and the tar archive. And now you can see we have Fedora directory. So, let's actually go into the directory and let's see what's in there. Okay, this is some pretty standard Linux file system, you might think, and you are right. Actually, now I will tell you one lie, but it's this kind of high school lie that is acceptable. And the truth is that container images are basically, basically archives. They are archived Linux file systems, not full fledged. Of course, some of the things that you need on operating systems are omitted there because we want to minimize the size of our images. But the, it's basically there, the basic structure. So, let me just go one level up again. And you can see that if I do cat slash ETC slash redhead release, and this might not work if you are on Ubuntu, for example, but I'm on, we are on CentOS machine. So, this is file from host operating system. We are on CentOS 8.2. I can do the same inside our image. So, I can do cat Fedora. This is our local directory slash ETC slash redhead release. And we can see that the image is actually Fedora 33. Okay, so we have our image. Now we want to run a container from it that is isolated. So, first thing we can use for that is change root. I'm pretty sure that most of you are familiar with change root. It is a way how to, how to change, how to take a process and change its view of file system. That it's, the root of the file system will be placed some place different for the one particular process. If you, if I'm not making sense, doesn't matter, we will see it soon enough. So, in the top terminal, I just again do PWD to make the point that we are in slash home slash devconf slash workshop. What I'm gonna do now, now, and you can follow me doing that is do sudo because that's needing for change root. Change root, that's the command itself. Fedora, this is the directory where we want to place our new root of the file system hierarchy. So, we are changing the root to our container image and then what we want to run in there. So, let's just simply run bash. So, this command sudo true to Fedora bin bash. Okay, we have a new shell. Let's see where are we now? We are in the top of the file system hierarchy. We see content of the image. And now if we do cat slash etc slash redhead release, what will this point to? Okay, you have guessed correctly. It's Fedora. So, even though we are referring to the root of the hierarchy, if we were on our host OS, we would see sentos there. But since we have changed the root to our container image, we see that our operating system is Fedora 33. Okay, that works. So, running bin bash is nice in our change root, but we can definitely do some more interesting things. So, let's try doing the same. Running change root in directory Fedora. And let's say we want to run Python there. I created the image myself, so I included the Python binary. I am sure it is there. So, let's start by taking a look at its version. We can see we have Python 3.9 in the container image. So, let's use it for some good. We will run a very simple HTTP server. So, we can do minus m for running a module. The module's name is HTTP.server. And we will be listening on port 8000. So, this is the whole syntax. Python minus m HTTP.server 8000. And you can see the logs that we are serving HTTP on port 8000. Now, I'm switching to my bottom terminal. And I can, yeah, we can verify it works. And we can just scroll localhost, port 8000. And I injected a text file into the image to the top of the hierarchy, just called hello.txt. So, localhost colon 8000 slash hello.txt. And we are actually receiving its content. And in the top terminal, you can see that the server actually received the request as expected. So, what we have here now is actually something pretty useful. We can run HTTP server using the Python version that we want in a seemingly isolated environment. But it has some problems. Let's examine them. So, I'm stopping the HTTP server in the top terminal. And thereby, I exited the container. Let's call it like that. Let me clear the terminals. I'm going to the bottom terminal. And I'm running top command there. You are surely familiar with top command showing all the different processes running on the system. So, let's just run in the bottom terminal. And in the top terminal, let's do the thing we are very familiar already. Let's just change root to Fedora directory and run bash in there. Okay, let us list processes with psaux. Wow, a problem. We cannot do this. There is some kind of Linux theory behind this. The reason why we cannot do that is that processes are tracked via virtual file system in Linux. So, in order to see them, we need to have access to this file system. So, let's just do what the system tells us. Let's just do mount because we'll need to mount the file system. Type of the file system is proc. Close enough. Name is proc. And we want to mount it at slash proc. And keep in mind, this slash proc is already inside the container. Okay, that went well. So, let's try doing psaux again. Okay, and here's the issue. We see all the system processes. If we do psaux and we grab for top, running on our host OS in the second terminal, we can see it. And not only that, we can do something nasty. We can do pkill top for process kill. And as you can see in the bottom terminal, we actually killed it. So, there's not much of an isolation, is it? If we can kill system processes from the container. So, the bit that will help us with that are namespaces. Basically, again, theory very quickly. Namespaces enable you to isolate one process, or many processes, doesn't matter, to isolate process in a way that they have different view of system resources. If this is, again, to theoretical, let's just dive in. So, first, I would like to show you how namespaces are implemented. So, let's clear our terminal there. I'll exit from the container again. And in the bottom terminal, this is our host operating system, we go to slash proc slash dollar dollar. And this is actually PID of our current process. If we do echo dollar dollar, you can see it's this number. So, let's once again go to proc dollar dollar NS. And in this directory, you can see the namespaces. This particular process is part of. We can see that there are namespaces for network, for processes, for users, for example. Let's do the same in the top terminal, which is also our host operating system. Slash proc dollar dollar NS. So, if we take a look, for example, at PID namespace, it is 836 for top terminal and 836 for bottom terminal. Your numbers will of course vary, but they should be the same. This is expected, that's fine, because all the processes start in the same namespaces. But that's not what we want to achieve. So, in our top terminal, let's once again go to our working directory, workshop. And now we will use new command. We will need to do again. The command is unshare to create a new process, which is in different namespace or namespaces. We will use minus U, because we want to unshare UTS namespace. Okay, you might be thinking, what is UTS namespace? It's very fancy name, that basically for our purposes means hostname. We will just isolate hostname of the machine. With this being unshared, we change root and you already know the rest. Fedora slash bin slash bash. Okay, did something change? Let's see, hostname in our container is localhost. In bottom terminal, hostname of our OS is localhost as well. So, let's try doing hostname and changing it in the container. We will change it to container. And now we can see that the hostname in the container is actually container, while the hostname in the host machine has not changed. To further improve my point, in the container, I can go to cd slash proc slash dollar dollar slash ns. List its content. And if we take a look at UTS namespace, it's 441 for the container and it's 838 for the host operating system. So, we actually managed to isolate that particular resource. So, that was mainly for illustration. The actual thing we want to achieve here is to isolate processes. That's the problem statement we started with. So, for that, this will be my grand finale. We will use unshare again. That's the same command we used just a second ago. But now we will provide minus p option for unsharing pid namespace. We need to use minus f so that a new process is forked. We also, and here be really attentive, we also need to use mount proc option. From the same reason I already explained to you. If we want to have a new view of processes, we need to remount proc file system. And here it basically wants to know where we want to mount it. And we need to provide an absolute path. So, let's just use pwd environment variable, which points to the place where we are. Then we want to use the directory fedora and in that slash proc. And again, with this being unshared, we go to the same old stuff we know. We do change root in there to fedora directory and run bash. Okay, let's see our pid and it's one. So, it obviously works because on our host OS we couldn't have pid one because that's always systemd or init or whatever your first process in the OS is. So, we have pid one, that's great. Let's see all the processes. And we can really see only the processes running in the container. So, we achieved process isolation as we wished. I intended to show you also NS Enter utility, which allows you to enter already existing namespaces. That's kind of like attaching shell to containers. But I kind of fell behind schedule. I guess I was two verbose. But it doesn't matter. We've seen everything important that has to do with namespaces and with change root. So, with this being said, I will hand it over to Tomas. In case I'm quick, we can show NS Enter at the end of the demo. And I would like to ask you to switch to the view of my terminal. This is the one with cat in the bottom view. Just double-click it. And I will basically do the same stuff as the undit. I will enter my or create in it my container using the exact same command like he did the pseudo-unshare-p-f to prove that I'm really isolating, isolated like Jan was. I did run ps in my container. So we've achieved isolation of PIDs. So processes running in container can no longer kill other processes running in other containers in the host operating system. But it might not help because there might be some malicious piece of code running in the container. For example, like this one called idmemory.py, which basically each half second allocates 10 megabytes by just creating a new byte array. So let me just run it. You can see that it clearly prints that it's allocating memory and in the top, you can see it time from time appear. Right now it has 0.7, 0.9 memory. In a while it will be more over, yeah, 1%. So a process running in container can for staff the host system for resources. So let's use tool called C groups or control groups for that to really restrict resources a container and processes running in container can use. I will switch to root actually, because for that you have to have elevated permissions all the time. And it's easier for me just to be root. And first thing I will do is root is to cd to sys slash fs slash c group directory. This is again a virtual file system exactly like the slash proc one is. And in there we have couple of directories, which represents all type of resources, which can be restricted by C groups. For example, it can be CPU time, it can be access to bulk devices, network. And right now we are definitely most interested in the memory one. So let's just go to the memory directory and explore the directory a bit. You can see that the memory directory contains multiple files. And then basically the most important one is the memory limit in bytes. And this is a pretty huge number for one process can use. And this applies for all the processes running in the default C group. So let's create a new C group using make their directory make their command. Let's call our C group DevConf. You can see that a new directory called DevConf appeared. If we change our directory to DevConf, we can see that there's already exactly the same files are pre-populated. And we can write using echo command and the number 100 million into the memory. Zero is that? Yeah, it's 123456780s and write it into memory limit in bytes. In theory, this command would restrict all processes with the DevConf C group to be able to use in total 100 million bytes, which is roughly 100 megabytes. And to further unbreak my system by containers, I will also echo zero to memory swap-in-is file, which will disable swap for the processes in the C group. As you can see, I'm not using swap on this machine, but without this zero in memory swap-in-is, it might not work on your machine. And the last thing I need is actually assign this bash running in the top terminal into the DevConf C group. To do that, I will just find the process of the bash. And it's actually this one. The hint for you to determine which of the bashes running on your system is that the PID of the bash follows the PID of the unshare command you spawn your change-routed bash. So let me just echo the PID of the bash into the tasks file. The tasks file contain a number PIDs of all processes associated with this C group. So at the moment, it should contain only the PID of the bash, which is in the top terminal. And if I try to run the eat-memory-py script again, it should be killed. And it was killed because the processes accumulated 100 megabytes of memory. And what's worth mentioning here that all processes inherit their C group setting of their parent. So let me just exit this change-route. And if I do cut tasks, it's empty obviously because the process which was associated with this C group was exited. So we can actually go and remove the defconf C group, right? Let's use rm-rf defconf and it will fail because for removing the C groups which are not associated with any running process, you have to use command rmdir. And I will just cut the tasks file for the default C group and you can see that the default C group contains pretty much everything running on my machine. So that was for illustration what C group allow us. As I mentioned, we can also control access to CPU or how many CPU the process can use, devices, network, stuff like that. So let me just exit the root terminal and use the defconf user in the workshop directory for the next part of the demo or workshop. And this is mounts. The purpose of mounts in the containers is pretty much the same like purpose of the mount command. In the real use case, most of the time you just want to add some direct view of some directories or some resources to your machine or container. This is especially useful with, for example, Kubernetes or OpenShift where you want to have your application code in the container but the configuration is separate. So you want to give the application access to the configuration. Yeah. Thank you, Jan. So I will create a new new directory called RO files, RO will stand for read only in the workshop directory. And let's just create a file there. Into the RO files slash hello txt. And so you believe me that it's there. So I will just find the unshare command in my registry story run it again. And, yeah, at the moment, it's not there. So I need to somehow make the contents of the RO files available to the container. And it's done by amount command, which requires elevated permissions. I will use minus minus bind option. And as for mount options, I will use RO. In case I would like to write it, mount it, write also with write, I would use our W, but for this example, I will use just RO and I will use the absolute paths for the things. So in this case, the mounted stuff goes first. So RO files directory. And then the second, the last argument is where I want to mount it. I would like to mount it to pwt slash fedora. Even in the first part, there is nothing after the dollar sign. Yeah, thank you. And to do it, actually, I have to have some mount point inside the container. So let's examine the contents of the slash mnt in the container, it's nothing there. So let's just create a directory there in the mnt in container called files. So now the mnt directory in the container contains a new directory called files. So I will mount the current directory slash RO files into the current directory slash fedora, which is the file system for the container slash mnt slash files. I can also do mount command. And you can see that there's something in is the last line here that the devsdc2 is mounted in home devcon workshop fedora mnt files. So let's just explore the mnt files in the container. It's not working. Yeah, it's not working because it's not applied for the currently running change route. So I have to exit it and enter it again. Now, if I do it again, the ls-mnt slash files, I can see that there's a file. I can get it. It says exactly the same stuff as if I would get it in the host machine. So let's try to change it from the container file. I will use double rows. So I will just try to append it. And yeah, it fails because it's mounted as read only. And to further illustrate that the mounts are not using some kind of snapshot magic or many kind of similar tricks. I will create an on the host machine because I do not have right access in the container. Another file and call it just another. And you can see that it's there and I can cut it.